Re: Best practice for loading data into relational tables

From: joel garry <joel-garry_at_home.com>
Date: Mon, 17 Mar 2008 16:37:58 -0700 (PDT)
Message-ID: <3c69025e-f793-453c-97e8-6cf405f71dd2@d21g2000prf.googlegroups.com>


On Mar 15, 11:19 pm, Anoop <anoopkum..._at_gmail.com> wrote:
> I am not sure if I can word this right. But I am trying to explore the
> best way of loading data into about 40-50 oracle tables. We have
> collected the data through various means / formats - some are in text
> files, excel, some that we know need to be arranged etc..
>
> The way we are doing it now, is to write insert statements for every
> line of data. So the actual arrangement of the data and writing the
> insert sql's are happening simultaneously. But there are many issues
> in writing insert sql's. The primary being debugging, - once I run the
> file containing all inserts, I get "1 row loaded" - doesnt really tell
> me which sql ran - each line runs independent of the row above or
> below it, the file just helps me run them all together, that is all.
> The other issue (related to the first) is if the tables are related
> then I need to order my inserts so that the primary key gets inserted
> before the child/foreign keys. This is a very manual process and I do
> not think this can be automated much.
>
> So I am thinking probably my approach is wrong - maybe I should deal
> with the data load in a different manner - is there any best practice
> that I could use? How have you guys handled data loads of medium size
> ~10000 rows going into 40 tables that are related to one another?
>
> Thanks.

I'm on unix, so I tend to use unix tools to deal with this. For the case of inserts, I'll often write an extra line for each line inserted, generating a prompt or select from dual that will wind up telling me in the log file which row I was on. Since I'm often processing a csv type of file with awk, this is often a trivial amount of work.

Not the best practice on its face, but it works with the crappy data, implicit business rules and not thinking in sets that often comes with people entering data manually into spreadsheets, and I have to run repetitive updates to a test db to winnow out all the problems. Often each line will explode into many as I discover they've put part of the primary key into the filename or whatever, the spreadsheet does different things with quoting and formatting, and on and on anon. Even from the same person, it seems I rarely get the same format twice.

jg

--
@home.com is bogus.
Those darn young'uns:  http://www.baselinemag.com/c/a/Security/Beware-a-Generation-of-Risk-Takers/
Received on Mon Mar 17 2008 - 18:37:58 CDT

Original text of this message