Home » SQL & PL/SQL » SQL & PL/SQL » REG_EXP csv and double quotes (Oracle 11g)
REG_EXP csv and double quotes [message #620035] Mon, 28 July 2014 03:45 Go to next message
wmgonzalbo
Messages: 98
Registered: November 2008
Member
Hi OraFAQ,

I'm having issues with REGEXPs functions.

Basically I need to parse a CSV - easy. But I must not consider the comma/s (,) enclosed in a double quote.

Ex: 'a,"b, bb",,c,d,"e,e",f'

This should be parsed as:
a
b, bb <-- treated as one field
<null>
c
d
e,e <-- treated as one field
f

Currently this is what I got:
With T As
  (SELECT 'a,"b, bb",,c,d,"e,e",f' str From Dual)
SELECT REGEXP_SUBSTR (str, '(([^,^\"])*(\".*\")*([^,^\"])*)(,|$)', 1, LEVEL) str1
From T
CONNECT BY LEVEL <= (select REGEXP_COUNT (str, '(([^,^\"])*(\".*\")*([^,^\"])*)(,|$)') - 1 from t);


-OR-

Using regular expressions, parsing a CSV, how can I replace comma instances inside the double quotes to a specific character?

Ex: 'a,"b, bb",,c,d,"e,e",f'

This should be parsed as:
a
b@ bb <-- (,) was replaced with @
<null>
c
d
e@e <-- (,) was replaced with @
f


Would really appreciate inputs. TIA.

Regards,
W
Re: REG_EXP csv and double quotes [message #620036 is a reply to message #620035] Mon, 28 July 2014 04:08 Go to previous messageGo to next message
Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web
Senior Member
SQL> column newstr format A30;
SQL> 
SQL> WITH t AS
  2   (SELECT 'a,"b, bb",,c,d,"e,e",f,","' str FROM dual)
  3  SELECT str, listagg(rep, '') within GROUP(ORDER BY n) newstr
  4    FROM (SELECT str,
  5                 n,
  6                 CASE
  7                    WHEN (MOD(n, 2) = q) THEN
  8                     '"' || REPLACE(sub, ',', '@') || '"'
  9                    ELSE
 10                     sub
 11                 END rep
 12            FROM (SELECT str,
 13                         m.column_value n,
 14                         CASE
 15                            WHEN regexp_like(str, '^"') THEN
 16                             1
 17                            ELSE
 18                             0
 19                         END q,
 20                         regexp_substr(str, '[^"]+', 1, m.column_value) sub
 21                    FROM t,
 22                         TABLE(CAST(MULTISET
 23                                    (SELECT LEVEL
 24                                       FROM dual
 25                                     CONNECT BY LEVEL <= regexp_count(str, '[^"]+')) AS
 26                                    sys.odcinumberlist)) m))
 27   GROUP BY str
 28  /
STR                        NEWSTR
-------------------------- ------------------------------
a,"b, bb",,c,d,"e,e",f,"," a,"b@ bb",,c,d,"e@e",f,"@"


More examples using MODEL clause, Recursion are in this OTN thread https://community.oracle.com/thread/2372168 You would need to replace blank with @ in the link.

[Updated on: Mon, 28 July 2014 04:09]

Report message to a moderator

Re: REG_EXP csv and double quotes [message #620041 is a reply to message #620035] Mon, 28 July 2014 04:42 Go to previous messageGo to next message
Littlefoot
Messages: 21808
Registered: June 2005
Location: Croatia, Europe
Senior Member
Account Moderator
This probably won't help much, but - my first thought was a simple SQL*Loader option. Here's a demonstration.
SQL> CREATE TABLE test
  2  (
  3     col1   VARCHAR2 (20),
  4     col2   VARCHAR2 (20),
  5     col3   VARCHAR2 (20),
  6     col4   VARCHAR2 (20),
  7     col5   VARCHAR2 (20),
  8     col6   VARCHAR2 (20),
  9     col7   VARCHAR2 (20)
 10  );

Table created.

SQL>

A control file (along with sample record you posted):
load data
infile *
replace
into table test
fields terminated by ',' optionally enclosed by '"' trailing nullcols
(col1,
 col2,
 col3,
 col4,
 col5,
 col6,
 col7
 )
 
begindata
a,"b, bb",,c,d,"e,e",f

Loading session & checking what we've done:
PS M:\a1_maknuto> sqlldr scott/tiger@ora10 control=test15.ctl log=test15.log

SQL*Loader: Release 11.2.0.1.0 - Production on Pon Srp 28 11:39:14 2014

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

Commit point reached - logical record count 1
PS M:\a1_maknuto> sqlplus scott/tiger@ora10

SQL*Plus: Release 11.2.0.3.0 Production on Mon Jul 28 11:39:27 2014

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> select * from test;

COL1  COL2  COL3  COL4  COL5  COL6  COL7
----- ----- ----- ----- ----- ----- -----
a     b, bb       c     d     e,e   f

SQL>


See? Every value is nicely stored into its own column. The whole "magic" is in this simple
fields terminated by ',' optionally enclosed by '"'
Re: REG_EXP csv and double quotes [message #620133 is a reply to message #620035] Tue, 29 July 2014 06:38 Go to previous messageGo to next message
Michel Cadot
Messages: 68645
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

SQL> with
  2    data    as (select 'a,"b, m, bb",,c,d,"e,e",f' str from dual),
  3    nb      as (select level nb from data connect by level <= regexp_count(str,'[^"]+')),
  4    vals    as (
  5      select decode(mod(nb,2),
  6                    1, regexp_substr(str, '[^"]+', 1, nb),
  7                    '"'||replace(regexp_substr(str, '[^"]+', 1, nb), ',', '@')||'"'
  8                   ) val,
  9             nb
 10      from data, nb
 11    )
 12  select listagg(val, '') within group (order by nb) newstr from vals
 13  /
NEWSTR
-------------------------------------------------------------------------------------
a,"b@ m@ bb",,c,d,"e@e",f

Re: REG_EXP csv and double quotes [message #620191 is a reply to message #620133] Tue, 29 July 2014 20:29 Go to previous messageGo to next message
wmgonzalbo
Messages: 98
Registered: November 2008
Member
Thank you for the inputs Lalit Kumar B, Littlefoot and Michel Cadot! They were very helpful.
Re: REG_EXP csv and double quotes [message #620201 is a reply to message #620191] Wed, 30 July 2014 00:59 Go to previous messageGo to next message
_jum
Messages: 577
Registered: February 2008
Senior Member
For the given string also works a simple regexp_replace:

WITH data AS (SELECT 'a,"b, m, bb",,c,d,"e,e",f' str FROM DUAL)
SELECT REGEXP_REPLACE (str, '(\"(.+?),(.*?)\")', '"\2@\3"') newstr
  FROM data;

NEWSTR                   
-------------------------
a,"b@ m, bb",,c,d,"e@e",f

[Updated on: Wed, 30 July 2014 01:01]

Report message to a moderator

Re: REG_EXP csv and double quotes [message #620206 is a reply to message #620201] Wed, 30 July 2014 01:11 Go to previous messageGo to next message
Michel Cadot
Messages: 68645
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

No, it does work: "b@ m, bb" should be "b@ m@ bb"

Re: REG_EXP csv and double quotes [message #620215 is a reply to message #620206] Wed, 30 July 2014 03:22 Go to previous messageGo to next message
_jum
Messages: 577
Registered: February 2008
Senior Member
You have sharp eyes indeed - I overlooked this little differenc - Thanks.
Re: REG_EXP csv and double quotes [message #620228 is a reply to message #620133] Wed, 30 July 2014 06:54 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA
Senior Member
This will work with three assumptions:

1. We can identify a character that can't appear in CSV string (@ in your example)
2. Empty element is never enclosed in double quotes
3. CSV string doesn't contain double quote character, since then, by CSV rules, double quote has to be escaped by another double quote

So your code will not work, for example, if string has empty element enclosed in double quotes

'a,"b, m, bb",,c,"",d,"e,e",f'

with data as (
               select 'a,"b, m, bb",,c,"",d,"e,e",f' str from dual
              ),
  nb      as (select level nb from data connect by level <= regexp_count(str,'[^"]+')),
  vals    as (
    select decode(mod(nb,2),
                  1, regexp_substr(str, '[^"]+', 1, nb),
                  '"'||replace(regexp_substr(str, '[^"]+', 1, nb), ',', '@')||'"'
                 ) val,
           nb
     from data, nb
   )
select listagg(val, '') within group (order by nb) newstr from vals
/

NEWSTR
----------------------------
a,"b@ m@ bb",,c,"@d@"e,e"@f"

SQL> 


Or CSV contains double quote:

'a,"b, m, ""bb""",,c,,d,"e,e",f'

with data as (
               select 'a,"b, m, ""bb""",,c,,d,"e,e",f' str from dual
              ),
  nb      as (select level nb from data connect by level <= regexp_count(str,'[^"]+')),
  vals    as (
    select decode(mod(nb,2),
                  1, regexp_substr(str, '[^"]+', 1, nb),
                  '"'||replace(regexp_substr(str, '[^"]+', 1, nb), ',', '@')||'"'
                 ) val,
           nb
     from data, nb
   )
select listagg(val, '') within group (order by nb) newstr from vals
/

NEWSTR
----------------------------
a,"b@ m@ "bb"@@c@@d@"e,e"@f"

SQL> 


Anyway, below is my suggestion:

with data as (
               select 1 id,'a,"b, m, ""bb""",,c,"",d,"e,e",f' str from dual
              ),
         t as (
               select  id,
                       str,
                       element,
                       rn,
                       case mod(
                                sum(
                                    case
                                      when element = '""' then 0
                                      when element like '%"%' then 1
                                      else 0
                                    end
                                   )
                                  over(
                                       partition by id
                                       order by rn
                                       rows between unbounded preceding and 1 preceding
                                      ),
                                2
                               )
                         when 0 then rn
                         else last_value(
                                         case
                                           when element like '"%' then rn
                                         end
                                         ignore nulls
                                        )
                                over(partition by id order by rn)
                       end grp
                 from  data,
                       xmltable(
                                'ora:tokenize($str,",")'
                                passing str as "str"
                                columns
                                  element varchar2(4000) path '.',
                                  rn for ordinality
                               )
             )
select  id,
        str,
        row_number() over(partition by id order by grp nulls first) element_number,
        replace(
                regexp_replace(
                               listagg(element,',') within group(order by rn),
                               '(^")|("$)'
                              ),
                '""',
                '"'
               ) element
  from  t
  group by id,
           str,
           grp
/

        ID STR                              ELEMENT_NUMBER ELEMENT
---------- -------------------------------- -------------- --------------------
         1 a,"b, m, ""bb""",,c,"",d,"e,e",f              1 a
         1 a,"b, m, ""bb""",,c,"",d,"e,e",f              2 b, m, "bb"
         1 a,"b, m, ""bb""",,c,"",d,"e,e",f              3
         1 a,"b, m, ""bb""",,c,"",d,"e,e",f              4 c
         1 a,"b, m, ""bb""",,c,"",d,"e,e",f              5
         1 a,"b, m, ""bb""",,c,"",d,"e,e",f              6 d
         1 a,"b, m, ""bb""",,c,"",d,"e,e",f              7 e,e
         1 a,"b, m, ""bb""",,c,"",d,"e,e",f              8 f

8 rows selected.

SQL> 


SY.
Re: REG_EXP csv and double quotes [message #620230 is a reply to message #620228] Wed, 30 July 2014 07:06 Go to previous messageGo to next message
Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web
Senior Member
Even this http://www.orafaq.com/forum/mv/msg/193807/620036/#msg_620036 should work for all the three cases. Not tested which one is better in performance though.
Re: REG_EXP csv and double quotes [message #620233 is a reply to message #620230] Wed, 30 July 2014 07:50 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA
Senior Member
Lalit Kumar B wrote on Wed, 30 July 2014 08:06
Even this http://www.orafaq.com/forum/mv/msg/193807/620036/#msg_620036 should work for all the three cases.



No, it doesn't:

with t as (
               select 1 id,'a,"b, m, ""bb""",,c,"",d,"e,e",f' str from dual
              )
SELECT str, listagg(rep, '') within GROUP(ORDER BY n) newstr
  FROM (SELECT str,
              n,
               CASE
                 WHEN (MOD(n, 2) = q) THEN
                   '"' || REPLACE(sub, ',', '@') || '"'
                  ELSE
                   sub
               END rep
          FROM (SELECT str,
                       m.column_value n,
                       CASE
                          WHEN regexp_like(str, '^"') THEN
                           1
                          ELSE
                           0
                       END q,
                       regexp_substr(str, '[^"]+', 1, m.column_value) sub
                  FROM t,
                       TABLE(CAST(MULTISET
                                  (SELECT LEVEL
                                     FROM dual
                                   CONNECT BY LEVEL <= regexp_count(str, '[^"]+')) AS
                                  sys.odcinumberlist)) m))
 GROUP BY str
/


The above returns

a,"b@ m@ "bb"@@c@",d,"e@e",f

while it should be

a,"b@ m@ "bb"",,c,"",d,"e@e",f

And then you still have to split it and remove enclosing double quotes.

SY.
Re: REG_EXP csv and double quotes [message #620236 is a reply to message #620233] Wed, 30 July 2014 08:15 Go to previous message
Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web
Senior Member
Solomon Yakobson wrote on Wed, 30 July 2014 18:20

And then you still have to split it and remove enclosing double quotes.


Agreed. Split, check and remove. More workaround.
Previous Topic: Using REGEXP_SUBSTR
Next Topic: Oracle server address
Goto Forum:
  


Current Time: Fri Apr 26 12:32:54 CDT 2024