Home » SQL & PL/SQL » SQL & PL/SQL » REG_EXP csv and double quotes (Oracle 11g)
REG_EXP csv and double quotes [message #620035] |
Mon, 28 July 2014 03:45 |
wmgonzalbo
Messages: 98 Registered: November 2008
|
Member |
|
|
Hi OraFAQ,
I'm having issues with REGEXPs functions.
Basically I need to parse a CSV - easy. But I must not consider the comma/s (,) enclosed in a double quote.
Ex: 'a,"b, bb",,c,d,"e,e",f'
This should be parsed as:
a
b, bb <-- treated as one field
<null>
c
d
e,e <-- treated as one field
f
Currently this is what I got:
With T As
(SELECT 'a,"b, bb",,c,d,"e,e",f' str From Dual)
SELECT REGEXP_SUBSTR (str, '(([^,^\"])*(\".*\")*([^,^\"])*)(,|$)', 1, LEVEL) str1
From T
CONNECT BY LEVEL <= (select REGEXP_COUNT (str, '(([^,^\"])*(\".*\")*([^,^\"])*)(,|$)') - 1 from t);
-OR-
Using regular expressions, parsing a CSV, how can I replace comma instances inside the double quotes to a specific character?
Ex: 'a,"b, bb",,c,d,"e,e",f'
This should be parsed as:
a
b@ bb <-- (,) was replaced with @
<null>
c
d
e@e <-- (,) was replaced with @
f
Would really appreciate inputs. TIA.
Regards,
W
|
|
|
Re: REG_EXP csv and double quotes [message #620036 is a reply to message #620035] |
Mon, 28 July 2014 04:08 |
Lalit Kumar B
Messages: 3174 Registered: May 2013 Location: World Wide on the Web
|
Senior Member |
|
|
SQL> column newstr format A30;
SQL>
SQL> WITH t AS
2 (SELECT 'a,"b, bb",,c,d,"e,e",f,","' str FROM dual)
3 SELECT str, listagg(rep, '') within GROUP(ORDER BY n) newstr
4 FROM (SELECT str,
5 n,
6 CASE
7 WHEN (MOD(n, 2) = q) THEN
8 '"' || REPLACE(sub, ',', '@') || '"'
9 ELSE
10 sub
11 END rep
12 FROM (SELECT str,
13 m.column_value n,
14 CASE
15 WHEN regexp_like(str, '^"') THEN
16 1
17 ELSE
18 0
19 END q,
20 regexp_substr(str, '[^"]+', 1, m.column_value) sub
21 FROM t,
22 TABLE(CAST(MULTISET
23 (SELECT LEVEL
24 FROM dual
25 CONNECT BY LEVEL <= regexp_count(str, '[^"]+')) AS
26 sys.odcinumberlist)) m))
27 GROUP BY str
28 /
STR NEWSTR
-------------------------- ------------------------------
a,"b, bb",,c,d,"e,e",f,"," a,"b@ bb",,c,d,"e@e",f,"@"
More examples using MODEL clause, Recursion are in this OTN thread https://community.oracle.com/thread/2372168 You would need to replace blank with @ in the link.
[Updated on: Mon, 28 July 2014 04:09] Report message to a moderator
|
|
|
Re: REG_EXP csv and double quotes [message #620041 is a reply to message #620035] |
Mon, 28 July 2014 04:42 |
|
Littlefoot
Messages: 21808 Registered: June 2005 Location: Croatia, Europe
|
Senior Member Account Moderator |
|
|
This probably won't help much, but - my first thought was a simple SQL*Loader option. Here's a demonstration.
SQL> CREATE TABLE test
2 (
3 col1 VARCHAR2 (20),
4 col2 VARCHAR2 (20),
5 col3 VARCHAR2 (20),
6 col4 VARCHAR2 (20),
7 col5 VARCHAR2 (20),
8 col6 VARCHAR2 (20),
9 col7 VARCHAR2 (20)
10 );
Table created.
SQL>
A control file (along with sample record you posted):
load data
infile *
replace
into table test
fields terminated by ',' optionally enclosed by '"' trailing nullcols
(col1,
col2,
col3,
col4,
col5,
col6,
col7
)
begindata
a,"b, bb",,c,d,"e,e",f
Loading session & checking what we've done:
PS M:\a1_maknuto> sqlldr scott/tiger@ora10 control=test15.ctl log=test15.log
SQL*Loader: Release 11.2.0.1.0 - Production on Pon Srp 28 11:39:14 2014
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
Commit point reached - logical record count 1
PS M:\a1_maknuto> sqlplus scott/tiger@ora10
SQL*Plus: Release 11.2.0.3.0 Production on Mon Jul 28 11:39:27 2014
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
SQL> select * from test;
COL1 COL2 COL3 COL4 COL5 COL6 COL7
----- ----- ----- ----- ----- ----- -----
a b, bb c d e,e f
SQL>
See? Every value is nicely stored into its own column. The whole "magic" is in this simplefields terminated by ',' optionally enclosed by '"'
|
|
|
Re: REG_EXP csv and double quotes [message #620133 is a reply to message #620035] |
Tue, 29 July 2014 06:38 |
|
Michel Cadot
Messages: 68645 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
SQL> with
2 data as (select 'a,"b, m, bb",,c,d,"e,e",f' str from dual),
3 nb as (select level nb from data connect by level <= regexp_count(str,'[^"]+')),
4 vals as (
5 select decode(mod(nb,2),
6 1, regexp_substr(str, '[^"]+', 1, nb),
7 '"'||replace(regexp_substr(str, '[^"]+', 1, nb), ',', '@')||'"'
8 ) val,
9 nb
10 from data, nb
11 )
12 select listagg(val, '') within group (order by nb) newstr from vals
13 /
NEWSTR
-------------------------------------------------------------------------------------
a,"b@ m@ bb",,c,d,"e@e",f
|
|
|
|
Re: REG_EXP csv and double quotes [message #620201 is a reply to message #620191] |
Wed, 30 July 2014 00:59 |
_jum
Messages: 577 Registered: February 2008
|
Senior Member |
|
|
For the given string also works a simple regexp_replace:
WITH data AS (SELECT 'a,"b, m, bb",,c,d,"e,e",f' str FROM DUAL)
SELECT REGEXP_REPLACE (str, '(\"(.+?),(.*?)\")', '"\2@\3"') newstr
FROM data;
NEWSTR
-------------------------
a,"b@ m, bb",,c,d,"e@e",f
[Updated on: Wed, 30 July 2014 01:01] Report message to a moderator
|
|
|
|
|
Re: REG_EXP csv and double quotes [message #620228 is a reply to message #620133] |
Wed, 30 July 2014 06:54 |
Solomon Yakobson
Messages: 3273 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
This will work with three assumptions:
1. We can identify a character that can't appear in CSV string (@ in your example)
2. Empty element is never enclosed in double quotes
3. CSV string doesn't contain double quote character, since then, by CSV rules, double quote has to be escaped by another double quote
So your code will not work, for example, if string has empty element enclosed in double quotes
'a,"b, m, bb",,c,"",d,"e,e",f'
with data as (
select 'a,"b, m, bb",,c,"",d,"e,e",f' str from dual
),
nb as (select level nb from data connect by level <= regexp_count(str,'[^"]+')),
vals as (
select decode(mod(nb,2),
1, regexp_substr(str, '[^"]+', 1, nb),
'"'||replace(regexp_substr(str, '[^"]+', 1, nb), ',', '@')||'"'
) val,
nb
from data, nb
)
select listagg(val, '') within group (order by nb) newstr from vals
/
NEWSTR
----------------------------
a,"b@ m@ bb",,c,"@d@"e,e"@f"
SQL>
Or CSV contains double quote:
'a,"b, m, ""bb""",,c,,d,"e,e",f'
with data as (
select 'a,"b, m, ""bb""",,c,,d,"e,e",f' str from dual
),
nb as (select level nb from data connect by level <= regexp_count(str,'[^"]+')),
vals as (
select decode(mod(nb,2),
1, regexp_substr(str, '[^"]+', 1, nb),
'"'||replace(regexp_substr(str, '[^"]+', 1, nb), ',', '@')||'"'
) val,
nb
from data, nb
)
select listagg(val, '') within group (order by nb) newstr from vals
/
NEWSTR
----------------------------
a,"b@ m@ "bb"@@c@@d@"e,e"@f"
SQL>
Anyway, below is my suggestion:
with data as (
select 1 id,'a,"b, m, ""bb""",,c,"",d,"e,e",f' str from dual
),
t as (
select id,
str,
element,
rn,
case mod(
sum(
case
when element = '""' then 0
when element like '%"%' then 1
else 0
end
)
over(
partition by id
order by rn
rows between unbounded preceding and 1 preceding
),
2
)
when 0 then rn
else last_value(
case
when element like '"%' then rn
end
ignore nulls
)
over(partition by id order by rn)
end grp
from data,
xmltable(
'ora:tokenize($str,",")'
passing str as "str"
columns
element varchar2(4000) path '.',
rn for ordinality
)
)
select id,
str,
row_number() over(partition by id order by grp nulls first) element_number,
replace(
regexp_replace(
listagg(element,',') within group(order by rn),
'(^")|("$)'
),
'""',
'"'
) element
from t
group by id,
str,
grp
/
ID STR ELEMENT_NUMBER ELEMENT
---------- -------------------------------- -------------- --------------------
1 a,"b, m, ""bb""",,c,"",d,"e,e",f 1 a
1 a,"b, m, ""bb""",,c,"",d,"e,e",f 2 b, m, "bb"
1 a,"b, m, ""bb""",,c,"",d,"e,e",f 3
1 a,"b, m, ""bb""",,c,"",d,"e,e",f 4 c
1 a,"b, m, ""bb""",,c,"",d,"e,e",f 5
1 a,"b, m, ""bb""",,c,"",d,"e,e",f 6 d
1 a,"b, m, ""bb""",,c,"",d,"e,e",f 7 e,e
1 a,"b, m, ""bb""",,c,"",d,"e,e",f 8 f
8 rows selected.
SQL>
SY.
|
|
|
|
Re: REG_EXP csv and double quotes [message #620233 is a reply to message #620230] |
Wed, 30 July 2014 07:50 |
Solomon Yakobson
Messages: 3273 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
Lalit Kumar B wrote on Wed, 30 July 2014 08:06Even this http://www.orafaq.com/forum/mv/msg/193807/620036/#msg_620036 should work for all the three cases.
No, it doesn't:
with t as (
select 1 id,'a,"b, m, ""bb""",,c,"",d,"e,e",f' str from dual
)
SELECT str, listagg(rep, '') within GROUP(ORDER BY n) newstr
FROM (SELECT str,
n,
CASE
WHEN (MOD(n, 2) = q) THEN
'"' || REPLACE(sub, ',', '@') || '"'
ELSE
sub
END rep
FROM (SELECT str,
m.column_value n,
CASE
WHEN regexp_like(str, '^"') THEN
1
ELSE
0
END q,
regexp_substr(str, '[^"]+', 1, m.column_value) sub
FROM t,
TABLE(CAST(MULTISET
(SELECT LEVEL
FROM dual
CONNECT BY LEVEL <= regexp_count(str, '[^"]+')) AS
sys.odcinumberlist)) m))
GROUP BY str
/
The above returns
a,"b@ m@ "bb"@@c@",d,"e@e",f
while it should be
a,"b@ m@ "bb"",,c,"",d,"e@e",f
And then you still have to split it and remove enclosing double quotes.
SY.
|
|
|
|
Goto Forum:
Current Time: Fri Apr 26 12:32:54 CDT 2024
|