Home » SQL & PL/SQL » SQL & PL/SQL » Best way to refresh data
Best way to refresh data [message #288600] Tue, 18 December 2007 02:57 Go to next message
kinwho
Messages: 6
Registered: April 2007
Location: Bay area
Junior Member
I have a table that needs to refresh daily from external data file. The data will need to be cleaned up before it is available, so I can't use sql*loader to refresh the table directly. The table have about 1 million rows. Any good method to refresh the data!?

Here are some of my option now:
- load data into staging table, and run a merge statement to update data into target table
- create a single partition table, and switch partition from staging table daily.
- Use create or replace view to switch the table that contains the latest data


Any suggestion is welcome.

Thank you
Re: Best way to refresh data [message #288604 is a reply to message #288600] Tue, 18 December 2007 03:05 Go to previous messageGo to next message
Michel Cadot
Messages: 64111
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Quote:

The data will need to be cleaned up before it is available, I can't use sql*loader to refresh the table directly

Why? Just use the REPLACE keyword

Quote:

Any good method to refresh the data!?

Use an external table.

Quote:

merge statement to update data into target table

Do you want to merge or to replace? Your statements are not consistent or clear.

Regards
Michel

Re: Best way to refresh data [message #288616 is a reply to message #288604] Tue, 18 December 2007 03:33 Go to previous messageGo to next message
kinwho
Messages: 6
Registered: April 2007
Location: Bay area
Junior Member
<quote>
Why? Just use the REPLACE keyword</quote>

I don't want to use SQL*loader to directly replace my table, since it will lock my table. Besides, I will need to generate some new records used to link with other table.


Can I use the merge statement to replace the data?

MERGE INTO target_table trgt
USING (SELECT id, col1, ... FROM staging_table) stg
ON trgt.id = stg.id
WHEN MATCHED THEN UPDATE SET trgt.col1 = stg.col1, ....
DELETE WHERE NOT EXISTS(SELECT 1 FROM staging_table
WHERE tmp.id = trgt.id)
WHEN NOT MATCHED THEN INSERT (trgt.id, trgt.col1, ...)
VALUES (stg.id, stg.col1, ...)
Re: Best way to refresh data [message #288624 is a reply to message #288616] Tue, 18 December 2007 03:45 Go to previous messageGo to next message
Michel Cadot
Messages: 64111
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Still don't understand your requirement.
Why are you talking about locking?

Regards
Michel
Re: Best way to refresh data [message #288633 is a reply to message #288624] Tue, 18 December 2007 04:04 Go to previous messageGo to next message
kinwho
Messages: 6
Registered: April 2007
Location: Bay area
Junior Member
If you use sql*Loader to load the table directly, other user may not be able to access your data. I believe the SQL*Loader will use a table lock.

Any workaround for this!?
Re: Best way to refresh data [message #288637 is a reply to message #288624] Tue, 18 December 2007 04:08 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
I think the OP wants high-availability. If he SQL*Loads with REPLACE, there will be a period of time where the table is empty and users will not see any data.

Partition-exchange might be a problem. If a cursor is open against the table you may not be able to perform DDL. Someone else in the forum may be able to confirm or disprove this.

I wouldn't use MERGE. Updates are too expensive.

I like the idea of a VIEW that would switch between tables. I think that open cursors will only lock the underlying table against DDL - not the view. Not sure though - you need to check. Be aware though that if a cursor stays open all day, the "old" table will still be locked and you wont be able to truncate it.

You could also use SYNONYMS instead of VIEWs.

If all this fails, another possibility would be to create a list-partitioned table with an extra column that would be the partition key. Every row in a load would have the same partition key, but each load would use a different partition-key. The table could then hold potentially many loads.

To avoid users seeing data for 2 or more loads, you would have to use a view with a WHERE clause such as
WHERE partition_key = current_key()

The current_key() function would get the value of the current partition key from a single-row table. You could drop partitions at your leisure when there were no queries running. The optimiser would know that only a single partition would be read - so there is no danger of a full scan of all partitions.

Ross Leishman
Re: Best way to refresh data [message #288642 is a reply to message #288633] Tue, 18 December 2007 04:19 Go to previous messageGo to next message
Michel Cadot
Messages: 64111
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
I don't know what OP wants, if he wants the table is locked or not, if he does not care about an intermediate state because he does his load at quiesce time, if...

Regards
Michel
Re: Best way to refresh data [message #288774 is a reply to message #288642] Tue, 18 December 2007 11:23 Go to previous messageGo to next message
kinwho
Messages: 6
Registered: April 2007
Location: Bay area
Junior Member
Sorry for the confusion! The OP is trying to get a high level solution to solve a problem on refreshing a table daily that was used 24*7.

Here is more detail: The source data will be come from multiple data files in different format. After the data is being loaded to the staging table, some data cleansing processing is needed. Besides, we will need to generate sme extra records to link with other tables.

Let me know if you have a better answer besides SQL*Loader.
Re: Best way to refresh data [message #288777 is a reply to message #288774] Tue, 18 December 2007 11:29 Go to previous messageGo to next message
Michel Cadot
Messages: 64111
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
The answer depends on those many non speficied and fuzzy parts.
But I'm pretty sure that external table is the best way to do it.

Regards
Michel
Re: Best way to refresh data [message #288804 is a reply to message #288600] Tue, 18 December 2007 14:50 Go to previous messageGo to next message
Kevin Meade
Messages: 2101
Registered: December 1999
Location: Connecticut USA
Senior Member
Sounds like a typical ETL job. I do a lot of ETL these days. Most of my ETL jobs have common characteristics:

1) pulls from already loaded tables where possible
2) pulls from external tables where possible
3) uses sql*loader to temp tables as last resort
4) is usually writen as one basic PL/SQL package with PL/SQL doing job control and SQL doing the actual data massaging and migration etc.
5) avoids plsql looping (row by row aka. slow by slow processing) like the plague
6) always uses a common logging package
7) never commits till the end so the system is never down because of "empty" tables

These are just my comments, not anything special, but I get asked a lot about how I do it at work so figured I'd share.

Ross, I am curious, what is it about update during merge that makes you think this is expensive? Compared to what? I wonder what I don't know now.

Do you mean expensive compared to the alternative INSERT/UPDATE combo that people would write without merge?
Or do you mean updates are in general expensive compared to the alternative of trucnate/insert?

Can you please provide some detail so I am not worried about a big hole in my understanding of costs. I have always had mixed feelings about merge but not its performance in general. Merge does not play well with other oracle features (instead of triggers for example), but it is often convenient over a more traditional coding of insert/update.

Thanks, Kevin.
Re: Best way to refresh data [message #288829 is a reply to message #288804] Tue, 18 December 2007 21:08 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Kevin Meade wrote on Wed, 19 December 2007 07:50


Ross, I am curious, what is it about update during merge that makes you think this is expensive? Compared to what? I wonder what I don't know now.


Fair enough, it was a bit of an unqualified statement.

It depends on the profile of the ETL. If you are going to be updating a significant percentage of rows, it is cheaper to truncate and replace them because of the advantages of direct-path inserts. The last time I bothered to do a benchmark, the break-even point came out at a bit lower than 5%. That of course will vary with indexing, constraints, partitioning, hardware, and lots of other stuff. I have examples as low as 1%.

The way the OP described the problem led me to believe that this was a replacement feed where large proportions of the data would be changed. If true, MERGE would be a poor solution if a direct-path alternative is possible.

Ross Leishman
Re: Best way to refresh data [message #289073 is a reply to message #288829] Wed, 19 December 2007 07:57 Go to previous messageGo to next message
Kevin Meade
Messages: 2101
Registered: December 1999
Location: Connecticut USA
Senior Member
Thanks Ross, this is what I thought you meant, makes me feel better.

Kevin
Re: Best way to refresh data [message #293738 is a reply to message #288600] Mon, 14 January 2008 17:04 Go to previous message
navkrish
Messages: 189
Registered: May 2006
Location: NJ,USA
Senior Member

Here are other two ways of refresh

solution 1 :
Have two script in place and run it alternatively -

script 1 :

CREATE TABLE tbl_1
AS SELECT * FROM EXTERNAL_TBL@DB_LINK;

CREATE OR REPLACE SYNONYM tbl FOR tbl_1;

DROP tbl_2;



For NEXT Refresh run Script 2:


CREATE TABLE tbl_2
AS SELECT * FROM EXTERNAL_TBL@DB_LINK;

CREATE OR REPLACE SYNONYM tbl FOR tbl_2;

DROP tbl_1;



Refer table synonym 'tbl' in all the scripts.

solution 2 :

drop table TBL_ORIG
/
WHENEVER SQLERROR EXIT SQL.SQLCODE

create table TBL_TEMP 
AS SELECT * FROM EXTERNAL_TBL@DB_LINK;

-- Now table is created, ignore errors and keep going
WHENEVER SQLERROR CONTINUE

CREATE INDEX TBL_IDX_TEMP ON TBL_TEMP; -- create indexes...

ALTER TABLE TBL RENAME TO TBL_ORIG
/

ALTER TABLE TBL_TEMP RENAME TO TBL
/

ALTER INDEX TBL_IDX RENAME TO TBL_IDX_ORIG
/

ALTER INDEX TBL_IDX_TEMP RENAME TO TBL_IDX
/




Regards,
Naveen
Previous Topic: Outer Join with Wildcard
Next Topic: RPAD
Goto Forum:
  


Current Time: Mon Dec 05 15:06:43 CST 2016

Total time taken to generate the page: 0.08114 seconds