Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Extracting XML

Re: Extracting XML

From: Michael O'Shea <michael.oshea_at_tessella.com>
Date: 15 Mar 2007 04:56:17 -0700
Message-ID: <1173959777.258156.99080@e65g2000hsc.googlegroups.com>


On Mar 15, 8:51 am, Markus Schmidt <schmidt..._at_gmx.net> wrote:
> Hi,
>
> I have ~1100 XML files which should be stored in the 10gR1EE.
>
> The structure of a XML file is like this:
>
> <a val="1167919973">
> <b val="12345">
> <c val1="1167922800" val2="12.2" val3="515.3"/>
> </b>
> <b> ... </b>
> ...
> </a>
>
> (always 1 <a>, ~20 <b> and ~120 <c>)
>
> I created a table:
> CREATE TABLE myimport OF XMLTYPE;
> Then the files were imported with sqlldr.
> Until this point all works fine.
>
> But now I want to extract the information and copy it into another table
> (insert as select).
> I want to select all "rows" in the form
> (b.val, a.val, c.val1, c.val2, c.val3)
> At the end, there schould be ~2.6mio rows.
>
> I tried this SELECT statement:
>
> select extractvalue(value(t1), '/b/@val') b_val,
> extractvalue(value(p), '/a/@val') a_val,
> extractvalue(value(t2), '/c/@val1') c_val1,
> extractvalue(value(t2), '/c/@val2') c_val2,
> extractvalue(value(t2), '/c/@val3') c_val3
> from myimport p,
> table(xmlsequence(extract(value(p), '/a/b'))) t1,
> table(xmlsequence(extract(value(t1), '/b/c'))) t2
>
> This statement is veeeeeeery slow... and I don't know if it is correct...
>
> Questions:
> 1. I think there is no need for a join between p, t1 and t2 but I am not
> sure. Do I need it?
> 2. Is there a better way to select all values?
> 3. Should I create the table as a relational table, not as XMLType? How
> to import this?
>
> Thanks,
>
> Markus

Hi Markus, the SQL satisfies the request to "flatten" your XML (see script below).

> Questions:
> 1. I think there is no need for a join between p, t1 and
> t2 but I am not sure. Do I need it?

No

> 2. Is there a better way to select all values?

In SQL, no

> 3. Should I create the table as a relational table, not
> as XMLType?

If your source data is XML, you can flatten it within the database using SQL constructs not unlike that shown below. Alternatively you can flatten it prior to import into the database. For XML trees more complex than the one shown, I suggest you look at using a tool outside the database to effect this task as a) your "SQL" is likely to become quite messy with more complicated XML structures, and b) XML manipulation is likely to be slow and impose considerable load of the DBMS server.

> 3. How to import this?

Should you choose to flatten the XML outside the database, you have two mainstream options for its import - "sqlldr" or "external tables". You appear to already familiar with sqlldr. Information on the syntax and structure of external tables can be gleaned from the primary docs at tahiti.oracle.com. Daniel has also put together a simple usage example at http://www.psoug.org/reference/externaltab.html

Regards

Mike

TESSELLA Michael.OShea_at_tessella.com

__/__/__/  Tessella Support Services plc
__/__/__/  3 Vineyard Chambers, ABINGDON, OX14 3PX, England
__/__/__/  Tel: (44)(0)1235-555511  Fax: (44)(0)1235-553301
www.tessella.com Registered in England No. 1466429
SQL>
SQL>
SQL>

CREATE TABLE tblTest(v XMLTYPE);
SQL>
Table created.

SQL>
INSERT INTO tblTest(v) VALUES(XMLTYPE('
<a val="1167919973">

   <b val="12345">
     <c val1="x1.1val1" val2="x2.1val2" val3="x3.1val3"/>    </b>
   <b val="12346">
     <c val1="x1.2val1" val2="x2.2val2" val3="x3.2val3"/>    </b>
   <b val="12347">
     <c val1="x1.3val1" val2="x2.3val2" val3="x3.3val3"/>    </b>
</a>'));SQL> 2 3 4 5 6 7 8 9 10 11 12

1 row created.

SQL>
INSERT INTO tblTest(v) VALUES(XMLTYPE('
<a val="2234234">

   <b val="12348">
     <c val1="y1.1val1" val2="y2.1val2" val3="y3.1val3"/>    </b>
   <b val="12349">
     <c val1="y1.2val1" val2="y2.2val2" val3="y3.2val3"/>    </b>
</a>'));SQL> 2 3 4 5 6 7 8 9

1 row created.

SQL>
INSERT INTO tblTest(v) VALUES(XMLTYPE('
<a val="12332222">

   <b val="12310">
     <c val1="z1.1val1" val2="z2.1val2" val3="z3.1val3"/>    </b>
   <b val="12311">

     <c val1="z1.2val1a" val2="z2.2val2a" val3="z3.2val3a"/>
     <c val1="z1.2val1b" val2="z2.2val2b" val3="z3.2val3b"/>
     <c val1="z1.2val1c" val2="z2.2val2c" val3="z3.2val3c"/>
   </b>
   <b val="12312">
     <c val1="z1.3val1a" val2="z2.3val2a" val3="z3.3val3a"/>
     <c val1="z1.3val1b" val2="z2.3val2b" val3="z3.3val3b"/>
   </b>
   <b val="12313">
     <c val1="z1.4val1" val2="z2.4val2" val3="z3.4val3"/>    </b>
</a>'));
SQL> 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 row created.

SQL>
COLUMN vala FORMAT a10
COLUMN valb FORMAT a10
COLUMN valc1 FORMAT a10
COLUMN valc2 FORMAT a10
COLUMN valc3 FORMAT a10SQL> SQL> SQL> SQL> SQL> SQL>

SELECT EXTRACTVALUE(t0.v,'/a/@val') vala,
       EXTRACTVALUE(t1.COLUMN_VALUE,'/b/@val') valb,
       EXTRACTVALUE(t2.COLUMN_VALUE,'/c/@val1') valc1,
       EXTRACTVALUE(t2.COLUMN_VALUE,'/c/@val2') valc2,
       EXTRACTVALUE(t2.COLUMN_VALUE,'/c/@val3') valc3
   FROM tblTest t0,
        TABLE(XMLSEQUENCE(EXTRACT(t0.v, '/a/b'))) t1,
        TABLE(XMLSEQUENCE(EXTRACT(t1.COLUMN_VALUE, '/b/c'))) t2;
SQL>   2    3    4    5    6    7    8
VALA       VALB       VALC1      VALC2      VALC3
---------- ---------- ---------- ---------- ----------
1167919973 12345      x1.1val1   x2.1val2   x3.1val3
1167919973 12346      x1.2val1   x2.2val2   x3.2val3
1167919973 12347      x1.3val1   x2.3val2   x3.3val3
2234234    12348      y1.1val1   y2.1val2   y3.1val3
2234234    12349      y1.2val1   y2.2val2   y3.2val3
12332222   12310      z1.1val1   z2.1val2   z3.1val3
12332222   12311      z1.2val1a  z2.2val2a  z3.2val3a
12332222   12311      z1.2val1b  z2.2val2b  z3.2val3b
12332222   12311      z1.2val1c  z2.2val2c  z3.2val3c
12332222   12312      z1.3val1a  z2.3val2a  z3.3val3a
12332222   12312      z1.3val1b  z2.3val2b  z3.3val3b
12332222   12313      z1.4val1   z2.4val2   z3.4val3

12 rows selected.

SQL> SELECT *
 FROM V$VERSION;
  2
BANNER



Oracle Database 10g Express Edition Release 10.2.0.1.0 - Product PL/SQL Release 10.2.0.1.0 - Production
CORE 10.2.0.1.0 Production
TNS for Linux: Version 10.2.0.1.0 - Production NLSRTL Version 10.2.0.1.0 - Production

SQL> Received on Thu Mar 15 2007 - 06:56:17 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US