Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> RE: Guys - loading comma-delimited - dynamically generate table

RE: Guys - loading comma-delimited - dynamically generate table

From: Lord, David - CSG <david.lord_at_hays.com>
Date: Fri, 16 May 2003 00:17:09 -0800
Message-ID: <F001.0059A63E.20030516001709@fatcity.com>


This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible.

------_=_NextPart_001_01C31B7A.B8FDD9B0
Content-Type: text/plain; charset="iso-8859-1"

Paula  

I've just recently written a little awk script to do what I think you're after... I might get shot for using awk ;) but at least I can guarantee that it will be installed on any unix box I come to.  

Regards
David Lord  

#!/usr/bin/nawk -f
############################################################################
####
# NAME
#
# loadcsv.awk - generate scripts to create a table and load a CSV file into
it
#
# SYNOPSIS
#
# loadcsv.awk <filename>.csv
#
# DESCRIPTION
#
# Generates two output files: -
#
# 1) A SQL script named <filename>.sql (where <filename> is the basename of
the
# input file) containing statements to create a table suitable for loading
the
# data into. The table is named <filename>. All columns are varchar2's
large
# enough to hold the maximum length of the column. The names of the columns
# are taken from the first line of the CSV file, tidied up to remove leading
# and trailing blanks and replace spaces with underscores. If any column in
the
# CSV file is empty, the column is not added to the table.
#
# 2) A SQL*Loader control file named <filename>.ctl for loading the data
into
# the table. Fields which are empty are specified as FILLER fields.
#
# OPTIONS
#
# filename.csv - The name of a csv file. It must have the column names
# as the first line.
#
# EXAMPLE
#
# First, run the script: -
#
# % loadcsv.awk myfile.csv
#
# Then, login to sqlplus and run the SQL script to create the table: -
#
# SQL> @myfile.sql
#
# Finally, use sqlldr to load the data into the table: -
#
# % sqlldr scott/tiger myfile.ctl data = myfile.csv
#
# KNOWN ISSUES
#
# This script does not check that the length of the columns is sensible.
So,
# if any cell contains more than 4000 characters, the create table command
will
# fail.
#
# AUTHOR
#
# David Lord, April 2003
############################################################################
####
 

BEGIN {
  # Field separator is a comma
  FS = ",";  

  # Get the name of the input file and strip the .csv   if(ARGC > 1) {
    filename = ARGV[1];
    gsub(/\.csv/, "", filename);
  } else {
    filename = FILENAME;
  }  

  # The name of the 'create table' file
  sqlfilename = filename ".sql";  

  # The name of the sqlloader control file   ctlfilename = filename ".ctl";
}  

# The first line contains the column names
NR == 1 {
  num_cols = NF;
  for(i = 1; i <= num_cols; i++) {
    # Tidy up the column name

    gsub(/ /, "_", $i);
    gsub(/_*$/, "", $i);
    gsub(/^_*/, "", $i);
 

    # Add to an array of column names
    col_names[i] = tolower($i);  

    # Initialise the column length
    col_lengths[i] = 0;
  }
}  

# Find the maximum length of each column
NR > 1 {
  i = 1;
  j = 1;
  while(i <= NF) {
    col_length = 0;
    if($i ~ /^ *\"/) {

      gsub(/^ *\"/, "", $i);
      while(i <= NF && $i !~ /\" *$/) {
        col_length += length($i) + 1;
        i++;
      }
      gsub(/\" *$/, "", $i);

    }
    col_length += length($i);
    if(col_length > col_lengths[j]) {
      col_lengths[j] = col_length;
    }
    i++;
    j++;
  }
}  

# Print the scripts

END {
  # Begin the create table statement
  printf("create table " filename "(") > sqlfilename;   firstcol = 1;  

  # Loop over the columns
  for(i = 1; i <= num_cols; i++) {
    col_name = col_names[i];
    col_length = col_lengths[i];  

    # Ignore empty columns
    if(col_length > 0) {

      # Put a comma on the end of every line but the first
      if(firstcol == 1) {
        firstcol = 0;
      } else {
        printf(",") >> sqlfilename;
      }
 
      # Print the column definition
      printf("\n\t%-30s %s", col_name, "varchar2(" col_length ")") >>
sqlfilename;

    }
  }  

  # Finish the create table statement
  printf("\n);\n") >> sqlfilename;  

  # Begin the control file

  printf("options (skip=1)") > ctlfilename;
  printf("\nload data") >> ctlfilename;
  printf("\nappend") >> ctlfilename;
  printf("\ninto table %s", filename) >> ctlfilename;
  printf("\nfields terminated by \",\" optionally enclosed by '\"'") >>
ctlfilename;
  printf("\ntrailing nullcols") >> ctlfilename;   printf("\n(") >> ctlfilename;  

  # Loop over the columns
  firstcol = 1;
  for(i = 1; i <= num_cols; i++) {
    col_name = col_names[i];
    col_length = col_lengths[i];     

    # Put a comma on the end of every line but the first     if(firstcol == 1) {
      firstcol = 0;
    } else {
      printf(",") >> ctlfilename;
    }     

    # Print the column definition
    printf("\n\t%s", col_name) >> ctlfilename;  

    # If the column is empty use a filler field     if(col_length == 0) {
      printf(" filler") >> ctlfilename;
    }
  }  

  # Finish the control file
  printf("\n)\n") >> ctlfilename;
}

-----Original Message-----
Sent: 15 May 2003 15:42
To: Multiple recipients of list ORACLE-L

Am I dreaming, hoping? Wasn't it possible to take a comma-dlt. text file and from the column list dynamically generate a table then load the data. Hmmmm. I could write that pretty easily but problem is wouldn't necessarily know datatypes and sizes. Okay - I guess I am dreaming.

Data Warehouse builder versus data migration tool:

-when would you use one or the other?
-is the juice worth the squeeze or am I better off doing myself as I have last 6 years
-extra purchase - which one?

Thanks,
Paula



This message (including any attachments) is confidential and may be legally privileged. If you are not the intended recipient, you should not disclose, copy or use any part of it - please delete all copies immediately and notify the Hays Group Email Helpdesk at email.helpdesk_at_hays.com
Any information, statements or opinions contained in this message (including any attachments) are given by the author. They are not given on behalf of Hays unless subsequently confirmed by an individual other than the author who is duly authorised to represent Hays.  

A member of the Hays plc group of companies. Hays plc is registered in England and Wales number 2150950. Registered Office Hays House Millmead Guildford Surrey GU2 4HJ.


------_=_NextPart_001_01C31B7A.B8FDD9B0
Content-Type: text/html; charset="iso-8859-1"

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<TITLE>RE: Guys - loading comma-delimited - dynamically generate table</TITLE>

<META content="MSHTML 6.00.2800.1106" name=GENERATOR></HEAD>
<BODY>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>Paula</FONT></SPAN></DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>I've just recently written a little awk script to do what I think you're after...&nbsp; I might get shot for using awk ;)&nbsp;but at least I can guarantee that it will be installed on any unix box I come to.</FONT></SPAN></DIV> <DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>Regards</FONT></SPAN></DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>David Lord</FONT></SPAN></DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>#!/usr/bin/nawk
-f<BR>################################################################################<BR># 
NAME<BR>#<BR># loadcsv.awk - generate scripts to create a table and load a CSV file into it<BR>#<BR># SYNOPSIS<BR>#<BR># loadcsv.awk &lt;filename&gt;.csv<BR>#<BR># DESCRIPTION<BR>#<BR># Generates two output files: -<BR>#<BR># 1) A SQL script named &lt;filename&gt;.sql (where &lt;filename&gt; is the basename of the<BR># input file) containing statements to create a table suitable for loading the<BR># data into.&nbsp; The table is named &lt;filename&gt;.&nbsp; All columns are varchar2's large<BR># enough to hold the maximum length of the column.&nbsp; The names of the columns<BR># are taken from the first line of the CSV file, tidied up to remove leading<BR># and trailing blanks and replace spaces with underscores. If any column in the<BR># CSV file is empty, the column is not added to the table.<BR>#<BR># 2) A SQL*Loader control file named &lt;filename&gt;.ctl for loading the data into <BR># the table.&nbsp; Fields which are empty are specified as FILLER fields.<BR>#<BR># OPTIONS<BR>#<BR># filename.csv - The name of a csv file.&nbsp; It must have the column
names<BR>#&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; as the first line.<BR>#<BR># EXAMPLE<BR>#<BR># First, run the script: -<BR>#<BR>#&nbsp;&nbsp; % loadcsv.awk myfile.csv<BR>#<BR># Then, login to sqlplus and run the SQL script to create the table: -<BR>#<BR>#&nbsp;&nbsp; SQL&gt; @myfile.sql<BR>#<BR># Finally, use sqlldr to load the data into the table: -<BR>#<BR>#&nbsp;&nbsp; % sqlldr scott/tiger myfile.ctl data = myfile.csv<BR>#<BR># KNOWN ISSUES<BR>#<BR># This script does not check that the length of the columns is sensible.&nbsp; So,<BR># if any cell contains more than 4000 characters, the create table command will<BR># fail.<BR>#<BR># AUTHOR<BR>#<BR># David Lord, April
2003<BR>################################################################################</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>BEGIN 
{<BR>&nbsp; # Field separator is a comma<BR>&nbsp; FS = ",";</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>&nbsp; # Get the name of the input file and strip the .csv<BR>&nbsp; if(ARGC &gt; 1) {<BR>&nbsp;&nbsp;&nbsp; filename = ARGV[1];<BR>&nbsp;&nbsp;&nbsp; gsub(/\.csv/, "", filename);<BR>&nbsp; } else {<BR>&nbsp;&nbsp;&nbsp; filename = FILENAME;<BR>&nbsp; }</FONT></SPAN></DIV> <DIV>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>&nbsp; # The name of the 'create table' file<BR>&nbsp; sqlfilename = filename ".sql";</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>&nbsp; # The name of the sqlloader control file<BR>&nbsp; ctlfilename = filename ".ctl";<BR>}</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2># The first line contains the column names<BR>NR == 1 {<BR>&nbsp; num_cols = NF;<BR>&nbsp; for(i = 1; i &lt;= num_cols; i++) {<BR>&nbsp;&nbsp;&nbsp; # Tidy up the column name<BR>&nbsp;&nbsp;&nbsp; gsub(/ /, "_", $i);<BR>&nbsp;&nbsp;&nbsp; gsub(/_*$/, "", $i);<BR>&nbsp;&nbsp;&nbsp; gsub(/^_*/, "", $i);</FONT></SPAN></DIV> <DIV>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>&nbsp;&nbsp;&nbsp; # Add to an array of column names<BR>&nbsp;&nbsp;&nbsp; col_names[i] = tolower($i);</FONT></SPAN></DIV> <DIV>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>&nbsp;&nbsp;&nbsp; # Initialise the column length<BR>&nbsp;&nbsp;&nbsp; col_lengths[i] = 0;<BR>&nbsp; }<BR>}</FONT></SPAN></DIV> <DIV>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2># Find the maximum length of each column<BR>NR &gt; 1 {<BR>&nbsp; i = 1;<BR>&nbsp; j = 1;<BR>&nbsp; while(i &lt;= NF) {<BR>&nbsp;&nbsp;&nbsp; col_length = 0;<BR>&nbsp;&nbsp;&nbsp; if($i ~ /^ *\"/) {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; gsub(/^ *\"/, "", $i);<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; while(i &lt;= NF &amp;&amp; $i !~ /\" *$/) {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; col_length += length($i) + 1;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; i++;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; gsub(/\" *$/, "", $i);<BR>&nbsp;&nbsp;&nbsp; }<BR>&nbsp;&nbsp;&nbsp; col_length += length($i);<BR>&nbsp;&nbsp;&nbsp; if(col_length &gt; col_lengths[j]) {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; col_lengths[j] = col_length;<BR>&nbsp;&nbsp;&nbsp; }<BR>&nbsp;&nbsp;&nbsp; i++;<BR>&nbsp;&nbsp;&nbsp; j++;<BR>&nbsp; }<BR>}</FONT></SPAN></DIV> <DIV>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2># Print the scripts<BR>END {<BR>&nbsp; # Begin the create table statement<BR>&nbsp; printf("create table " filename "(") &gt; sqlfilename;<BR>&nbsp; firstcol = 1;</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>&nbsp; # Loop over the columns<BR>&nbsp; for(i = 1; i &lt;= num_cols; i++) {<BR>&nbsp;&nbsp;&nbsp; col_name = col_names[i];<BR>&nbsp;&nbsp;&nbsp; col_length = col_lengths[i];</FONT></SPAN></DIV> <DIV>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>&nbsp;&nbsp;&nbsp; # Ignore empty columns<BR>&nbsp;&nbsp;&nbsp; if(col_length &gt; 0) {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # Put a comma on the end of every line but the first<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if(firstcol == 1) {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; firstcol = 0;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; } else {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; printf(",") &gt;&gt; sqlfilename;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</FONT></SPAN></DIV> <DIV>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # Print the column definition<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; printf("\n\t%-30s %s", col_name, "varchar2(" col_length ")") &gt;&gt; sqlfilename;<BR>&nbsp;&nbsp;&nbsp;
}<BR>&nbsp; }</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>&nbsp; # 
Finish the create table statement<BR>&nbsp; printf("\n);\n") &gt;&gt; sqlfilename;</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>&nbsp; # Begin the control file<BR>&nbsp; printf("options (skip=1)") &gt; ctlfilename;<BR>&nbsp; printf("\nload data") &gt;&gt; ctlfilename;<BR>&nbsp; printf("\nappend") &gt;&gt; ctlfilename;<BR>&nbsp; printf("\ninto table %s", filename) &gt;&gt; ctlfilename;<BR>&nbsp; printf("\nfields terminated by \",\" optionally enclosed by '\"'") &gt;&gt; ctlfilename;<BR>&nbsp; printf("\ntrailing nullcols") &gt;&gt; ctlfilename;<BR>&nbsp; printf("\n(") &gt;&gt; ctlfilename;</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>&nbsp; # Loop over the columns<BR>&nbsp; firstcol = 1;<BR>&nbsp; for(i = 1; i &lt;=
num_cols; i++) {<BR>&nbsp;&nbsp;&nbsp; col_name = 
col_names[i];<BR>&nbsp;&nbsp;&nbsp; col_length = 
col_lengths[i];<BR>&nbsp;&nbsp;&nbsp; <BR>&nbsp;&nbsp;&nbsp; # Put a comma on 
the end of every line but the first<BR>&nbsp;&nbsp;&nbsp; if(firstcol == 1) {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; firstcol = 0;<BR>&nbsp;&nbsp;&nbsp; } else {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; printf(",") &gt;&gt; ctlfilename;<BR>&nbsp;&nbsp;&nbsp; }<BR>&nbsp;&nbsp;&nbsp; <BR>&nbsp;&nbsp;&nbsp; # Print the column definition<BR>&nbsp;&nbsp;&nbsp; printf("\n\t%s", col_name) &gt;&gt; ctlfilename;</FONT></SPAN></DIV> <DIV>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>&nbsp;&nbsp;&nbsp; # If the column is empty use a filler field<BR>&nbsp;&nbsp;&nbsp; if(col_length == 0) {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; printf(" filler") &gt;&gt; ctlfilename;<BR>&nbsp;&nbsp;&nbsp; }<BR>&nbsp; }</FONT></SPAN></DIV> <DIV>&nbsp;</DIV>
<DIV><SPAN class=154191307-16052003><FONT face="Courier New" size=2>&nbsp; # Finish the control file<BR>&nbsp; printf("\n)\n") &gt;&gt; ctlfilename;<BR>}<BR></FONT></SPAN></DIV> <DIV><FONT face=Tahoma size=2>-----Original Message-----<BR><B>From:</B> Paula_Stankus_at_doh.state.fl.us
[mailto:Paula_Stankus_at_doh.state.fl.us]<BR><B>Sent:</B> 15 May 2003 15:42<BR><B>To:</B> Multiple recipients of list ORACLE-L<BR><B>Subject:</B> RE: Guys - loading comma-delimited - dynamically generate table<BR><BR></FONT></DIV> <BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">   <P><FONT size=2>Am I dreaming, hoping?&nbsp; Wasn't it possible to take a   comma-dlt. text file and from the column list dynamically generate a table   then load the data.&nbsp; Hmmmm.&nbsp; I could write that pretty easily but   problem is wouldn't necessarily know datatypes and sizes.&nbsp; Okay - I guess   I am dreaming.&nbsp; </FONT></P>
  <P><FONT size=2>Data Warehouse builder versus data migration tool:</FONT> </P>   <P><FONT size=2>-when would you use one or the other?</FONT> <BR><FONT   size=2>-is the juice worth the squeeze or am I better off doing myself as I   have last 6 years</FONT> <BR><FONT size=2>-extra purchase - which one?   </FONT></P>
  <P><FONT size=2>Thanks,</FONT> <BR><FONT size=2>Paula</FONT> </P></BLOCKQUOTE><CODE><FONT SIZE=3><BR> <BR>
**********************************************************************<BR>
This message (including any attachments) is confidential and may be <BR> legally privileged. If you are not the intended recipient, you should <BR> not disclose, copy or use any part of it - please delete all copies <BR> immediately and notify the Hays Group Email Helpdesk at<BR> email.helpdesk_at_hays.com<BR>
Any information, statements or opinions contained in this message<BR> (including any attachments) are given by the author. They are not <BR> given on behalf of Hays unless subsequently confirmed by an individual<BR> other than the author who is duly authorised to represent Hays.<BR>  <BR>
A member of the Hays plc group of companies.<BR> Hays plc is registered in England and Wales number 2150950.<BR> Registered Office Hays House Millmead Guildford Surrey GU2 4HJ.<BR>
**********************************************************************<BR>
</FONT></CODE>
</BODY></HTML>

------_=_NextPart_001_01C31B7A.B8FDD9B0--

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Lord, David - CSG
  INET: david.lord_at_hays.com

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).
Received on Fri May 16 2003 - 03:17:09 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US