Home » Developer & Programmer » JDeveloper, Java & XML » Clob csv to xml performance issue (11g2 win32)
Clob csv to xml performance issue [message #476826] Sun, 26 September 2010 09:54 Go to next message
cikic
Messages: 12
Registered: September 2006
Location: Austria
Junior Member
Hi!

In my application the user is anabled to upload csv Files into a clob field. For further processing I want to use an existing XDB xml interface. And therefore I have to translate csv clob data into valid xml. I have written a PL/SQL Package working pretty fine on really small data, but usually useres upload files bigger than 1MB. The following package needs near an hour to do the job. Are there any Ideas to do the job better, faster, easier?

Thanks
Christian
CREATE OR REPLACE PACKAGE CSV2XML AS
  procedure translate(i_csv in clob, o_xml out clob);
  procedure test_translate;
  procedure setDelimiter(i_delim char);
  procedure setWindowsNl(i_wnl boolean);
  procedure setEscapeChar(i_char char);
  procedure setHasHeadline(i_hl boolean);
  procedure setEncloseChar(i_char char);
end CSV2XML;
/
CREATE OR REPLACE PACKAGE BODY CSV2XML AS
  delimiter char(1) := ',';
  is_windows_newline  boolean := true;
  escape_char         char(1) := '\';
  has_head_line       boolean := true;
  enclosed_char       char(1) := '"';
      
  procedure translate(i_csv in clob, o_xml out clob)
  is
    l_csv               clob;  
    actual              char(1);
    is_open_enclosed    boolean := false;
    is_escaped          boolean := false;
    is_first_line       boolean := true;
  begin
    -- use only unix newline
    if is_windows_newline then
      l_csv := replace(i_csv, chr(10)||chr(13),chr(10));
    else
      l_csv := i_csv;
    end if;
    
    -- init xml
    o_xml := '<table>';
    if has_head_line then 
      o_xml := o_xml || '<head>';
    else
      o_xml := o_xml || '<body>';
    end if;
    
    -- loop all characters in  csv file
    for i in 1 .. length(i_csv) loop
      actual := substr(i_csv,i,1);
      
      if i = 1 then
        o_xml := o_xml || '<tr><td>';
      end if;
      
      -- escape seqence
      if is_escaped then
        o_xml := o_xml || actual;
        is_escaped := false;
      elsif actual = escape_char then
        is_escaped := true;
      else
          -- Field enclose sequence
          if is_open_enclosed and actual != enclosed_char then
            o_xml := o_xml || actual;
          elsif actual = enclosed_char then
            if is_open_enclosed then
              is_open_enclosed := false;
            else
              is_open_enclosed := true;
            end if;
          else
              -- normal carachter processing
              if actual = delimiter then
                o_xml := o_xml || '</td><td>';
              elsif actual = chr(10) and is_first_line then
                is_first_line := false;
                if has_head_line then
                  o_xml := o_xml || '</td></tr></head><body><tr><td>';
                else
                  o_xml := o_xml || '<tr><td>';
                end if;
              elsif actual = chr(10) and i < length(i_csv) and not is_first_line then
                o_xml := o_xml || '</td></tr><tr><td>';
              elsif actual = chr(10) or i = length(i_csv) then
                if actual != chr(10) then
                  o_xml := o_xml || actual || '</td></tr>';
                else 
                  o_xml := o_xml || '</td></tr>';
                end if;
              else
                o_xml := o_xml || actual;
              end if;
          end if;
      end if;
    end loop;
    
    -- close tags
    o_xml := o_xml || '</body></table>';    
    
  end translate;

  procedure setDelimiter(i_delim char)
  is
  begin
    delimiter := i_delim;
  end setDelimiter;
  
  procedure setWindowsNl(i_wnl boolean)
  is
  begin
    is_windows_newline := i_wnl;
  end setwindowsnl;
  
  procedure setEscapeChar(i_char char)
  is
  begin
    escape_char := i_char;
  end setEscapeChar;
  
  procedure setHasHeadline(i_hl boolean)
  is
  begin
    has_head_line := i_hl;
  end setHasHeadline;
  
  procedure setEncloseChar(i_char char)
  is
  begin
    enclosed_char := i_char;
  end setEncloseChar;
  
  procedure test_translate
  is
    o_xml clob;
  begin
    csv2xml.translate(
       'a,b,c' || chr(10) || '1,2,"hello wolrd"' || chr(10) || '3,4,da'
      ,o_xml
    );
    dbms_output.put_line(o_xml);
  end test_translate;
end CSV2XML;
/



Re: Clob csv to xml performance issue [message #476828 is a reply to message #476826] Sun, 26 September 2010 10:02 Go to previous messageGo to next message
BlackSwan
Messages: 22790
Registered: January 2009
Senior Member
Excel can also generate XML data file (in addition to CSV file) which would simplify your task.
Re: Clob csv to xml performance issue [message #476865 is a reply to message #476828] Mon, 27 September 2010 00:40 Go to previous messageGo to next message
_jum
Messages: 490
Registered: February 2008
Senior Member
May be you can increase the perfomance by loading the data as external table and then access them via ORACLE XML interface.
Re: Clob csv to xml performance issue [message #477015 is a reply to message #476865] Mon, 27 September 2010 15:47 Go to previous messageGo to next message
cikic
Messages: 12
Registered: September 2006
Location: Austria
Junior Member
Yes I thougt about external tables, but my problem was/is, I do not know how the table will look like. Because I do not know anything of the csv file in the clob. But I have found out, that splitting the clob into lines of varchar2 and parsing the lines (char by char) to xml data is much more faster then parsing the whole clob in one piece. I got down to times between 1 min and 15 secs. This is good enough for the moment.

Thanks
Chris
Re: Clob csv to xml performance issue [message #477035 is a reply to message #477015] Tue, 28 September 2010 01:47 Go to previous messageGo to next message
Michel Cadot
Messages: 59142
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Quote:
But I have found out, that splitting the clob into lines of varchar2 and parsing the lines (char by char) to xml data is much more faster then parsing the whole clob in one piece

With external you don't have to split anything as each row already matches with a line of your file.

Regards
Michel
Re: Clob csv to xml performance issue [message #477059 is a reply to message #477015] Tue, 28 September 2010 03:47 Go to previous messageGo to next message
Michel Cadot
Messages: 59142
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Here's a starting point:
SQL> create table t (val varchar2(4000))
  2  organization external (
  3    type ORACLE_LOADER 
  4    default directory TMP
  5    access parameters (
  6      records delimited by newline
  7      fields terminated by x'0A'
  8    )
  9    location ('t.csv')
 10  )
 11  reject limit unlimited
 12  /

Table created.

SQL> select * from t;
VAL
-----------------------------------------------------------------------------
a,b,c
1,2,"hello wolrd"
3,4,da

3 rows selected.

SQL> col ord noprint
SQL> select 1 ord, '<body><table>' val from dual
  2  union all
  3  select 2, 
  4         '<tr><td>' || 
  5         trim('"' from regexp_replace(val,'"{0,1},"{0,1}','</td><td>')) ||
  6         '</td></tr>' val
  7  from t
  8  union all
  9  select 3 ord, '</table></body>' val from dual
 10  order by 1
 11  /
VAL
-----------------------------------------------------------------------------
<body><table>
<tr><td>a</td><td>b</td><td>c</td></tr>
<tr><td>1</td><td>2</td><td>hello wolrd</td></tr>
<tr><td>3</td><td>4</td><td>da</td></tr>
</table></body>

Regards
Michel
Re: Clob csv to xml performance issue [message #478001 is a reply to message #477059] Tue, 05 October 2010 10:34 Go to previous messageGo to next message
cikic
Messages: 12
Registered: September 2006
Location: Austria
Junior Member
Thanks! But as you can see, you have to know the stricture of the table in the create ddl. I do not know anything about the csv file. after parsing to xml some allready existing xslt interfaces are processing the fields. Anyway it is working now and the performance is ok.

Thanks
Chris
Re: Clob csv to xml performance issue [message #478006 is a reply to message #478001] Tue, 05 October 2010 10:48 Go to previous message
Michel Cadot
Messages: 59142
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Quote:
But as you can see, you have to know the stricture of the table in the create ddl.

Where do you see I used the structure of the table?
I used only 3 informations:
- each line of the file is a row
- cell/column values are separated by comma
- value may be enclosed between quotes

Regards
Michel

[Updated on: Tue, 05 October 2010 10:49]

Report message to a moderator

Previous Topic: error in simple hello world
Next Topic: Network Adaptyer Error
Goto Forum:
  


Current Time: Fri Sep 19 11:41:27 CDT 2014

Total time taken to generate the page: 0.09096 seconds