Home » SQL & PL/SQL » SQL & PL/SQL » REGEXP_SUBSTR matching question (Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit)
REGEXP_SUBSTR matching question [message #574764] Tue, 15 January 2013 12:36 Go to next message
lgiguere
Messages: 6
Registered: January 2013
Location: Montreal, Quebec
Junior Member
Hi,

I've been using SQL since 3 months or so. I try to match different parts of a string delimited by '$$' followed by a letter, but can't get the result I'm looking for.

Is there away with the REGEXP_SUBSTR function to get, with different patterns:

$$aSingapore ;$$aHackensack, N.J. :
or
$$bWorld Scientific Pub. Co.,
or
$$cc2011.


Example of a string
$$aSingapore ;$$aHackensack, N.J. :$$bWorld Scientific Pub. Co.,$$cc2011.


Examples of regexp on the strings I tried
-1-
SELECT REGEXP_SUBSTR(z00r_text, '(\$\$a.*\$\$a)*') from z00r where substr(z00r_field_code,1,3) = '260' and rownum < 11;

REGEXP_SUBSTR(Z00R_TEXT,'(\$\$A.*\$\$A)*')
--------------------------------------------------------------------------------
$$aSingapore ;$$a


-2-
SELECT REGEXP_SUBSTR(z00r_text,'(\$\$a(.*)\$)*(\$\$)*') || ' ' || REGEXP_SUBSTR(z00r_text,'(\$\$c(.*)\$)*(\$\$)*') from z00r where substr(z00r_field_code,1,3) = '260' and rownum < 11;

REGEXP_SUBSTR(Z00R_TEXT,'(\$\$A(.*)\$)*(\$\$)*')||''||REGEXP_SUBSTR(Z00R_TEXT,'(
--------------------------------------------------------------------------------
$$aSingapore ;$$aHackensack, N.J. :$$bWorld Scientific Pub. Co.,$$ $$


-3-
SELECT REGEXP_SUBSTR(z00r_text,'(\$\$a.*\$\$[a^b-z]{1})',1,1,'i') from z00r where substr(z00r_field_code,1,3) = '260' and rownum < 11;

REGEXP_SUBSTR(Z00R_TEXT,'(\$\$A.*\$\$[B-Z]{1})',1,1,'I')
--------------------------------------------------------------------------------
$$aSingapore ;$$aHackensack, N.J. :$$bWorld Scientific Pub. Co.,$$c


Thanks a lot,
Luc Giguere
Re: REGEXP_SUBSTR matching question [message #574765 is a reply to message #574764] Tue, 15 January 2013 12:38 Go to previous messageGo to next message
BlackSwan
Messages: 22905
Registered: January 2009
Senior Member
Please read and follow the forum guidelines, to enable us to help you:

http://www.orafaq.com/forum/t/88153/0/
Re: REGEXP_SUBSTR matching question [message #574768 is a reply to message #574764] Tue, 15 January 2013 13:01 Go to previous messageGo to next message
Michel Cadot
Messages: 59421
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Here's a first step:
SQL> with 
  2    data as (
  3      select '$$aSingapore ;$$aHackensack, N.J. :$$bWorld Scientific Pub. Co.,$$cc2011.' val from dual
  4    ),
  5    lines as (
  6      select level line from data connect by level <= (length(val)-length(replace(val,'$')))/2
  7    )
  8  select ltrim(regexp_substr(val,'\$\$[^$]*', 1, line), '$') val
  9  from data, lines
 10  order by line
 11  /
VAL
-------------------------------------------------------------------------
aSingapore ;
aHackensack, N.J. :
bWorld Scientific Pub. Co.,
cc2011.

4 rows selected.


Please read OraFAQ Forum Guide and How to use [code] tags and make your code easier to read.
Make sure that lines of code do not exceed 80 characters when you format.
Indent the code, use code tags and align the columns in result.
Use the "Preview Message" or "Preview Quick Reply" button to verify.

If you post a working Test case: create table and insert statements along with the result you want with these data then we will work with your table and data.

Regards
Michel
Re: REGEXP_SUBSTR matching question [message #575062 is a reply to message #574768] Fri, 18 January 2013 13:25 Go to previous messageGo to next message
lgiguere
Messages: 6
Registered: January 2013
Location: Montreal, Quebec
Junior Member
Thank you Michel,

I read the Forum guide and I'll try to write a more complete and detailed post.
I was able to understand your WITH query by searching on the web and was able to edit it so it works with our
database. But I can't figured out 'from data connect by level'.



udm01@ALEPH20> SELECT * FROM V$VERSION;
**** Hit return to continue ****

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
PL/SQL Release 11.1.0.7.0 - Production
CORE 11.1.0.7.0 Production
TNS for Linux: Version 11.1.0.7.0 - Production
NLSRTL Version 11.1.0.7.0 - Production

5 rows selected.


udm01@ALEPH20> with
data as (
select z00r_text val from z00r where z00r_doc_number = '002057669' and substr(z00r_field_code,1,3) = '260'
),
lines as (
select level line from data connect by level <= (length(val)-length(replace(val,'$')))/2
)
select ltrim(regexp_substr(val,'\$\$[^$]*', 1, line), '$') val
from data, lines
order by line
/
**** Hit return to continue ****

VAL
--------------------------------------------------------------------------------
aSingapore ;
aHackensack, N.J. :
bWorld Scientific Pub. Co.,
cc2011.

4 rows selected.


The 'z00r' file contains 2100204 bibliographic records detailed in 55468141 rows.

udm01@ALEPH20> descr z00r;
Name Null? Type
----------------------------------------- -------- ----------------------------
Z00R_SEQUENCE NOT NULL CHAR(6)
Z00R_DOC_NUMBER NOT NULL CHAR(9)
Z00R_FIELD_CODE CHAR(5)
Z00R_ALPHA CHAR(1)
Z00R_TEXT VARCHAR2(2000)



udm01@ALEPH20> select count(distinct z00r_doc_number) from z00r;
**** Hit return to continue ****

COUNT(DISTINCTZ00R_DOC_NUMBER)
------------------------------
2100204

1 row selected.

udm01@ALEPH20> select count(*) from z00r;
**** Hit return to continue ****

COUNT(*)
----------
55468141

1 row selected.

Here is a sample bibliographic record.

select * from z00r where z00r_doc_number = '001116717';

udm01@ALEPH20> select * from z00r where z00r_doc_number = '001116717';
**** Hit return to continue ****

Z00R_S Z00R_DOC_ Z00R_ Z
------ --------- ----- -
Z00R_TEXT
--------------------------------------------------------------------------------
000001 001116717 FMT L
BK

000002 001116717 LDR L
^^^^^nam^^2200373^a^45U0

000003 001116717 001 L
001116717

000004 001116717 003 L
CaQMU

000005 001116717 005 L
20080804160251.0

000006 001116717 006 L
m^^^^^^^^d^^^^^^^^

000007 001116717 007 L
cr^cnu

000008 001116717 008 L
070703s2005^^^^caua^^^^sb^^^^001^0^eng^d

000009 001116717 020 L
$$z0123693799 (pbk. : acid-free paper)

000010 001116717 020 L
$$z9780123693792

000011 001116717 0359 L
$$aCaQMUb11767246

000012 001116717 035 L
$$a(CtWfDGI)bks00017820

000013 001116717 040 L
$$aCtWfDGI$$bfre$$cCtWfDGI$$dCaQMU

000014 001116717 05014 L
$$aQA76.73.S67$$bC44 2005eb

000015 001116717 08204 L
$$a005.13/3$$222

000016 001116717 090 L
$$aRessource électronique$$bW3$$sMEL

000017 001116717 1001 L
$$aCelko, Joe.

000018 001116717 24010 L
$$aSQL for smarties

000019 001116717 24510 L
$$aJoe Celko's SQL for smarties$$h[ressource électronique] :$$badvanced SQL prog
ramming, third edition /$$cJoe Celko.

000020 001116717 24630 L
$$aSQL for smarties

000021 001116717 250 L
$$a3rd ed.

000022 001116717 260 L
$$aSan Francisco :$$bMorgan Kaufmann,$$cc2005.

000023 001116717 533 L
$$aReproduction électronique.$$b[Norwood, Mass.] :$$cBooks24x7,$$d[2005-2006?].

000024 001116717 506 L
$$aRéservé UdeM.

000025 001116717 500 L
$$aIn IT Pro / ITPro Collection.

000026 001116717 538 L
$$aMode d'accès : disponible sur Internet.

000027 001116717 650 0 L
$$aSQL (Computer program language)

000028 001116717 650 6 L
$$aSQL (Langage de programmation)

000029 001116717 7102 L
$$aBooks24x7, Inc.

000030 001116717 85640 L
$$uwww.books24x7.com/marc.asp?bookid=17820$$zAccès réservé UdeM. Accès il
limité par adresse IP et mot de passe individuel. Inscription nécessaire avec vo
tre adresse de courriel de l'Université (umontreal.ca), un login et un mot de pa
sse vous seront ensuite acheminés par courriel.

000031 001116717 830 0 L
$$aIT Pro

000032 001116717 039 L
$$fCG

000033 001116717 CAT L
$$aCONV$$b20$$c20111219$$lUDM01$$h1210


33 rows selected.


There's a row for each bibliographic field (z00r_field_code) of a bibliographic record.

What I would like to be able to do is to find row of records that match and to display the subfield ($$ parts of z00r_text) information.

For example, finding records
with the string 'SQL' in the title ( the row where substr(z00r_field_code,1,3)=245)
and with the string 'ITPro Collection' in the notes (any of the rows where substr(z00r_field_code,1,1) = 5 )
and with the string 'MEL' in the bibliographic subfield $$s of the row where substr(z00r_field_code,1,3)=090)

A display result could be the following and taken from these rows:


000016 001116717 090 L
$$aRessource électronique$$bW3$$sMEL

000019 001116717 24510 L
$$aJoe Celko's SQL for smarties$$h[ressource électronique] :$$badvanced SQL prog
ramming, third edition /$$cJoe Celko.

000022 001116717 260 L
$$aSan Francisco :$$bMorgan Kaufmann,$$cc2005.

000023 001116717 533 L
$$aReproduction électronique.$$b[Norwood, Mass.] :$$cBooks24x7,$$d[2005-2006?].

000024 001116717 506 L
$$aRéservé UdeM.

000025 001116717 500 L
$$aIn IT Pro / ITPro Collection.

000026 001116717 538 L
$$aMode d'accès : disponible sur Internet.


Expected result:

Z00R_DOC_N TITLE FORMAT CONDITIONS PUBLISHER
---------- ------------------------------------------------------------------------------- ------- ----------------- -------------------------------
001116717 Joe Celko's SQL for smarties : advanced SQL programming, third edition MEL $$aRéservé UdeM. Morgan Kaufmann : San Francisco

TITLE is 245$$a $$b
FORMAT is 090$$s
CONDITIONS is 506$$a
PUBLISHER is 260$$b $$a

Could this be possible?
Can the diacritics be converted?
'Réservé UdeM' to 'Réservé UdeM.'


I tried this but it strips the diacritics.


udm01@ALEPH20> select convert(z00r_text, 'US7ASCII','UTF8') from z00r where z00r_doc_number = '001116717' and substr(z00r_field_code,1,3) = '506';
**** Hit return to continue ****

CONVERT(Z00R_TEXT,'US7ASCII','UTF8')
--------------------------------------------------------------------------------
$$aReserve UdeM.

1 row selected.


Thank you very much,
Luc
Re: REGEXP_SUBSTR matching question [message #575065 is a reply to message #575062] Fri, 18 January 2013 14:30 Go to previous messageGo to next message
Michel Cadot
Messages: 59421
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Quote:
I read the Forum guide


You didn't read neither read both links nor the lines I posted about formatting.
Please do so, don't you see a difference in reading my posted code and yours?

Regards
Michel
Re: REGEXP_SUBSTR matching question [message #575066 is a reply to message #575065] Fri, 18 January 2013 14:50 Go to previous messageGo to next message
lgiguere
Messages: 6
Registered: January 2013
Location: Montreal, Quebec
Junior Member
Thank you Michel,

I hope this is formatted OK.
I was able to understand your WITH query by searching on the web
and was able to edit it so it works with our
database. But I can't figured out 'from data connect by level'.


udm01@ALEPH20> with 
data as (
select z00r_text val from z00r where z00r_doc_number = '002057669' and substr(z00r_field_code,1,3) = '260'
),
lines as (
select level line from data connect by level <= (length(val)-length(replace(val,'$')))/2
)
select ltrim(regexp_substr(val,'\$\$[^$]*', 1, line), '$') val
from data, lines
order by line
/  
**** Hit return to continue ****

VAL
--------------------------------------------------------------------------------
aSingapore ;
aHackensack, N.J. :
bWorld Scientific Pub. Co.,
cc2011.

4 rows selected.


The 'z00r' file contains 2100204 bibliographic records detailed in 55468141 rows.

udm01@ALEPH20> descr z00r;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 Z00R_SEQUENCE                             NOT NULL CHAR(6)
 Z00R_DOC_NUMBER                           NOT NULL CHAR(9)
 Z00R_FIELD_CODE                                    CHAR(5)
 Z00R_ALPHA                                         CHAR(1)
 Z00R_TEXT                                          VARCHAR2(2000)



udm01@ALEPH20> select count(distinct z00r_doc_number) from z00r;
**** Hit return to continue ****

COUNT(DISTINCTZ00R_DOC_NUMBER)
------------------------------
                       2100204

1 row selected.



udm01@ALEPH20> select count(*) from z00r;
**** Hit return to continue ****

  COUNT(*)
----------
  55468141

1 row selected.


Here is a sample bibliographic record.

udm01@ALEPH20> select * from z00r where z00r_doc_number = '001116717';
**** Hit return to continue ****

Z00R_S  Z00R_DOC_  Z00R_  Z
------  ---------  -----  -
Z00R_TEXT
--------------------------------------------------------------------------------
000001  001116717  FMT    L
BK

000002  001116717  LDR    L
^^^^^nam^^2200373^a^45U0

000003  001116717  001    L
001116717

000004  001116717  003    L
CaQMU

000005  001116717  005    L
20080804160251.0

000006  001116717  006    L
m^^^^^^^^d^^^^^^^^

000007  001116717  007    L
cr^cnu

000008  001116717  008    L
070703s2005^^^^caua^^^^sb^^^^001^0^eng^d

000009  001116717  020    L
$$z0123693799 (pbk. : acid-free paper)

000010  001116717  020    L
$$z9780123693792

000011  001116717  0359   L
$$aCaQMUb11767246

000012  001116717  035    L
$$a(CtWfDGI)bks00017820

000013  001116717  040    L
$$aCtWfDGI$$bfre$$cCtWfDGI$$dCaQMU

000014  001116717  05014  L
$$aQA76.73.S67$$bC44 2005eb

000015  001116717  08204  L
$$a005.13/3$$222

000016  001116717  090    L
$$aRessource électronique$$bW3$$sMEL

000017  001116717  1001   L
$$aCelko, Joe.

000018  001116717  24010  L
$$aSQL for smarties

000019  001116717  24510  L
$$aJoe Celko's SQL for smarties$$h[ressource électronique] :$$badvanced SQL prog
ramming, third edition /$$cJoe Celko.

000020  001116717  24630  L
$$aSQL for smarties

000021  001116717  250    L
$$a3rd ed.

000022  001116717  260    L
$$aSan Francisco :$$bMorgan Kaufmann,$$cc2005.

000023  001116717  533    L
$$aReproduction électronique.$$b[Norwood, Mass.] :$$cBooks24x7,$$d[2005-2006?].

000024  001116717  506    L
$$aRéservé UdeM.

000025  001116717  500    L
$$aIn IT Pro / ITPro Collection.

000026  001116717  538    L
$$aMode d'accès : disponible sur Internet.

000027  001116717  650 0  L
$$aSQL (Computer program language)

000028  001116717  650 6  L
$$aSQL (Langage de programmation)

000029  001116717  7102   L
$$aBooks24x7, Inc.

000030  001116717  85640  L
$$uwww.books24x7.com/marc.asp?bookid=17820$$zAccès réservé UdeM. Accès il
limité par adresse IP et mot de passe individuel. Inscription nécessaire avec vo
tre adresse de courriel de l'Université (umontreal.ca), un login et un mot de pa
sse vous seront ensuite acheminés par courriel.

000031  001116717  830 0  L
$$aIT Pro

000032  001116717  039    L
$$fCG

000033  001116717  CAT    L
$$aCONV$$b20$$c20111219$$lUDM01$$h1210


33 rows selected.



There's a row for each bibliographic field (z00r_field_code) of a bibliographic record.

What I would like to be able to do is to find row of records that match and to display the subfield ($$ parts of z00r_text) information.

For example, finding records
with the string 'SQL' in the title ( the row where substr(z00r_field_code,1,3)=245)
and with the string 'ITPro Collection' in the notes (any of the rows where substr(z00r_field_code,1,1) = 5 )
and with the string 'MEL' in the bibliographic subfield $$s of the row where substr(z00r_field_code,1,3)=090)

A display result could be the following and taken from these rows:

000016  001116717  090    L
$$aRessource électronique$$bW3$$sMEL

000019  001116717  24510  L
$$aJoe Celko's SQL for smarties$$h[ressource électronique] :$$badvanced SQL prog
ramming, third edition /$$cJoe Celko.

000022  001116717  260    L
$$aSan Francisco :$$bMorgan Kaufmann,$$cc2005.

000023  001116717  533    L
$$aReproduction électronique.$$b[Norwood, Mass.] :$$cBooks24x7,$$d[2005-2006?].

000024  001116717  506    L
$$aRéservé UdeM.

000025  001116717  500    L
$$aIn IT Pro / ITPro Collection.

000026  001116717  538    L
$$aMode d'accès : disponible sur Internet.



Result:

Z00R_DOC_N TITLE                                                                           FORMAT  CONDITIONS        PUBLISHER
---------- ------------------------------------------------------------------------------- ------- ----------------- -------------------------------
001116717  Joe Celko's SQL for smarties : advanced SQL programming, third edition          MEL     $$aRéservé UdeM.  Morgan Kaufmann : San Francisco


TITLE is 245$$a $$b
FORMAT is 090$$s
CONDITIONS is 506$$a
PUBLISHER is 260$$b $$a

Could this be possible?


Thank you very much,
Luc
Re: REGEXP_SUBSTR matching question [message #575067 is a reply to message #575066] Fri, 18 January 2013 15:27 Go to previous messageGo to next message
Michel Cadot
Messages: 59421
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Michel Cadot wrote on Tue, 15 January 2013 20:01
...If you post a working Test case: create table and insert statements along with the result you want with these data then we will work with your table and data.

Regards
Michel


Re: REGEXP_SUBSTR matching question [message #576421 is a reply to message #575067] Mon, 04 February 2013 15:47 Go to previous messageGo to next message
lgiguere
Messages: 6
Registered: January 2013
Location: Montreal, Quebec
Junior Member
Hi,

I tried to simplify as much as I could my problem to this test case.

DROP TABLE test_z00r;
CREATE TABLE test_z00r( DOC_NUMBER CHAR(2), DOC_FIELD CHAR(3), DOC_TEXT VARCHAR2(120) );
INSERT INTO test_z00r VALUES ('1','260','$$aMontreal :$$bDunod;$$aLondon :$$bMacmillan,$$c2005.');
INSERT INTO test_z00r VALUES ('1','650','$$aMicro computers$$zCanada$$xHistory$$vHandbooks.');
INSERT INTO test_z00r VALUES ('2','260','$$aMontreal :$$bDunod;$$aLondon :$$bMacmillan,$$c2005.');
INSERT INTO test_z00r VALUES ('2','650','$$aComputers$$xDatabases.');
INSERT INTO test_z00r VALUES ('2','650','$$aComputers$$zUnited States$$xHistory.');
INSERT INTO test_z00r VALUES ('3','260','$$aLondon :$$bO''reilly,$$c2005.');
INSERT INTO test_z00r VALUES ('3','650','$$aComputers$$xDatabases.');
INSERT INTO test_z00r VALUES ('3','650','$$aInternet$$xHistory.');
INSERT INTO test_z00r VALUES ('4','260','$$aMontreal :$$bDunod;$$aLondon :$$bMacmillan,$$c2004.');
INSERT INTO test_z00r VALUES ('4','650','$$aComputers$$zCanada$$xHistory.');
INSERT INTO test_z00r VALUES ('4','650','$$aMicrocomputers$$xDatabases.');
INSERT INTO test_z00r VALUES ('5','260','$$aParis :$$bLumex,$$c2005.');
INSERT INTO test_z00r VALUES ('5','650','$$aComputers$$zFrance$$xHistory.');
INSERT INTO test_z00r VALUES ('6','260','$$aNew London :$$bFides,$$c2006.');
INSERT INTO test_z00r VALUES ('6','650','$$aComputers$$zCanada$$xHistory.');
INSERT INTO test_z00r VALUES ('7','260','$$aMontreal :$$bDunod;$$aLondon :$$bMacmillan,$$c2001.');
INSERT INTO test_z00r VALUES ('7','650','$$aComputers$$zCanada$$xHistory.');
INSERT INTO test_z00r VALUES ('7','650','$$aComputers$$xDatabases.');
/



To help me build other reports, I'd like to learn two things:
- a way to do is a query that would extract the 'doc_number' that respond to the criterias of the query below
- and, for the selected doc_numbers, a way to display specific 'subfields' of a field code (doc_field)

(In the doc_text column, a subfield begins with '$$' followed by a letter or a digit, it ends with the beginning of a subfield or the end of line)

Example of a query
'London' is at the beginning of subfield $$a in doc_field 260
AND
'Computers' (case insensitive) is a substring of subfield $$a AND subfield $$x is equal to 'History' that in the same row where doc_field begins with '650'.


There are 4 doc_numbers that respond to these criterias : 1,2,4 and 7.

udm01@ALEPH20> select * from test_z00r;
**** Hit return to continue ****

DO  DOC  DOC_TEXT
--  ---  ------------------------------------------------------------------------------------------------------------------------
1   260  $$aMontreal :$$bDunod;$$a[color=red]London[/color] :$$bMacmillan,$$c2005.
1   650  $$aMicro [color=red]computers[/color]$$zCanada$$x[color=red]History[/color]$$vHandbooks.
2   260  $$aMontreal :$$bDunod;$$a[color=red]London[/color] :$$bMacmillan,$$c2005.
2   650  $$aComputers$$xDatabases.
2   650  $$a[color=red]Computers[/color]$$zUnited States$$x[color=red]History[/color].
3   260  $$aLondon :$$bO'reilly,$$c2005.
3   650  $$aComputers$$xDatabases.
3   650  $$aInternet$$xHistory.
4   260  $$aMontreal :$$bDunod;$$a[color=red]London[/color] :$$bMacmillan,$$c2004.
4   650  $$a[color=red]Computers[/color]$$zCanada$$x[color=red]History[/color].
4   650  $$aMicrocomputers$$xDatabases.
5   260  $$aParis :$$bLumex,$$c2005.
5   650  $$aComputers$$zFrance$$xHistory.
6   260  $$aNew London :$$bFides,$$c2006.
6   650  $$aComputers$$zCanada$$xHistory.
7   260  $$aMontreal :$$bDunod;$$a[color=red]London[/color] :$$bMacmillan,$$c2001.
7   650  $$a[color=red]Computers[/color]$$zCanada$$x[color=red]History[/color].
7   650  $$aComputers$$xDatabases.




For results display, I'd like to get this:

DOC_NUMBER  260$$b              650$$a$$x$$v$$z
----------  ----------------    -------------------------------------------------------
1           Dunod||Macmillan    Micro computers--Canada--History--Handbooks
2           Dunod||Macmillan    Computers--Databases||Computers--United States--History
4           Dunod||Macmillan    Computers--Canada--History||Microcomputers--Databases
7           Dunod||Macmillan    Computers--Canada--History||Computers--Databases



Thanks,
Luc Giguere
Re: REGEXP_SUBSTR matching question [message #576434 is a reply to message #576421] Tue, 05 February 2013 00:16 Go to previous messageGo to next message
Michel Cadot
Messages: 59421
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
There is no SQL query that can answer any of your user query.
You should better normalize your model, at least in Normal Form 1.

Regards
Michel
Re: REGEXP_SUBSTR matching question [message #576495 is a reply to message #576434] Tue, 05 February 2013 10:39 Go to previous messageGo to next message
lgiguere
Messages: 6
Registered: January 2013
Location: Montreal, Quebec
Junior Member
Hi,

We bought a proprietary Integrated library management and we can't modify the database.

Do I understand correctly if I write that the queries can't be done independently? If possible, we could use a temporary file of selected doc_numbers of the first query as input for the second one.

First query
'London' is at the beginning of subfield $$a in rows doc_field 260

Second query
'Computers' (case insensitive) is a substring of subfield $$a AND subfield $$x is equal to 'History' in the same row where doc_field begins with '650'.

Also, is there a way to get the results displayed in one of these ways?
DOC_NUMBER  260$$b              650$$a$$x$$v$$z
----------  ----------------    -------------------------------------------------------
1           Dunod||Macmillan    Micro computers--Canada--History--Handbooks
2           Dunod||Macmillan    Computers--Databases||Computers--United States--History
4           Dunod||Macmillan    Computers--Canada--History||Microcomputers--Databases
7           Dunod||Macmillan    Computers--Canada--History||Computers--Databases


OR

DOC_NUMBER  260$$b              650$$a$$x$$v$$z
----------  ----------------    -------------------------------------------------------
1           Dunod||Macmillan    Micro computers--Canada--History--Handbooks
2           Dunod||Macmillan    Computers--Databases
2           Dunod||Macmillan    Computers--United States--History
4           Dunod||Macmillan    Computers--Canada--History
4           Dunod||Macmillan    Microcomputers--Databases
7           Dunod||Macmillan    Computers--Canada--History
7           Dunod||Macmillan    Computers--Databases


Thanks,
Luc Giguere
Re: REGEXP_SUBSTR matching question [message #578603 is a reply to message #576421] Sat, 02 March 2013 00:36 Go to previous messageGo to next message
Barbara Boehmer
Messages: 7995
Registered: November 2002
Location: California, USA
Senior Member
-- test data:
SCOTT@orcl_11gR2> SELECT * FROM test_z00r ORDER BY doc_number, doc_field, doc_text
  2  /

DOC_NUMBER DOC DOC_TEXT
---------- --- ------------------------------------------------------------
1          260 $$aMontreal :$$bDunod;$$aLondon :$$bMacmillan,$$c2005.
1          650 $$aMicro computers$$zCanada$$xHistory$$vHandbooks.
2          260 $$aMontreal :$$bDunod;$$aLondon :$$bMacmillan,$$c2005.
2          650 $$aComputers$$xDatabases.
2          650 $$aComputers$$zUnited States$$xHistory.
3          260 $$aLondon :$$bO'reilly,$$c2005.
3          650 $$aComputers$$xDatabases.
3          650 $$aInternet$$xHistory.
4          260 $$aMontreal :$$bDunod;$$aLondon :$$bMacmillan,$$c2004.
4          650 $$aComputers$$zCanada$$xHistory.
4          650 $$aMicrocomputers$$xDatabases.
5          260 $$aParis :$$bLumex,$$c2005.
5          650 $$aComputers$$zFrance$$xHistory.
6          260 $$aNew London :$$bFides,$$c2006.
6          650 $$aComputers$$zCanada$$xHistory.
7          260 $$aMontreal :$$bDunod;$$aLondon :$$bMacmillan,$$c2001.
7          650 $$aComputers$$xDatabases.
7          650 $$aComputers$$zCanada$$xHistory.

18 rows selected.


-- query:
SCOTT@orcl_11gR2> column doc_number	   format a10
SCOTT@orcl_11gR2> column "260$$b"	   format a16
SCOTT@orcl_11gR2> column "650$$a$$x$$v$$z" format a57
SCOTT@orcl_11gR2> select b.doc_number, b."260$$b",
  2  	    listagg (a."650$$a$$x$$v$$z", '||') within group (order by rn) "650$$a$$x$$v$$z"
  3  from   (select doc_number,
  4  		    listagg (doc_text, '||') within group (order by line) "260$$b"
  5  	     from   (select doc_number, column_value line,
  6  			    rtrim
  7  			      (ltrim
  8  				(regexp_substr (doc_text, '\$\$b[^$]*', 1, column_value),
  9  				 '$b'),
 10  			       ';,.') doc_text
 11  		     from   test_z00r,
 12  			    table
 13  			      (cast
 14  				(multiset
 15  				  (select level
 16  				   from   dual
 17  				   connect by level <=
 18  				     (length (doc_text) - length (replace (doc_text, '$$b'))) + 1)
 19  			       as sys.odcinumberlist))
 20  		     where  doc_field = 260
 21  		     and    instr (doc_text, '$$aLondon') > 0)
 22  	     where  doc_text is not null
 23  	     group  by doc_number) b,
 24  	    (select doc_number, rn,
 25  		    listagg (doc_text, '--') within group (order by line) "650$$a$$x$$v$$z"
 26  	     from   (select doc_number, column_value line, rn,
 27  			    rtrim
 28  			      (ltrim
 29  				(regexp_substr (doc_text, '\$\$[axvz][^$]*', 1, column_value),
 30  				 '$axvz'),
 31  			       ';,.') doc_text
 32  		     from   (select t.*, rownum rn
 33  			     from   test_z00r t) t1,
 34  			    table
 35  			      (cast
 36  				(multiset
 37  				  (select level
 38  				   from   dual
 39  				   connect by level <=
 40  				     (length (doc_text) - length (replace (doc_text, '$$'))) + 1)
 41  			       as sys.odcinumberlist))
 42  		     where  doc_field = 650
 43  		     and    exists
 44  			      (select *
 45  			       from   test_z00r t2
 46  			       where  t2.doc_number = t1.doc_number
 47  			       and    instr (lower (doc_text), 'computers') > 0
 48  			       and    instr (lower (doc_text), 'history') > 0))
 49  	     where  doc_text is not null
 50  	     group  by doc_number, rn) a
 51  where  b.doc_number = a.doc_number
 52  group  by b.doc_number, b."260$$b"
 53  order  by doc_number
 54  /

DOC_NUMBER 260$$b           650$$a$$x$$v$$z
---------- ---------------- ---------------------------------------------------------
1          Dunod||Macmillan Micro computers--Canada--History--Handbooks
2          Dunod||Macmillan Computers--Databases||Computers--United States--History
4          Dunod||Macmillan Computers--Canada--History||Microcomputers--Databases
7          Dunod||Macmillan Computers--Canada--History||Computers--Databases

4 rows selected.

[Updated on: Sat, 02 March 2013 00:42]

Report message to a moderator

Re: REGEXP_SUBSTR matching question [message #579366 is a reply to message #578603] Mon, 11 March 2013 13:58 Go to previous message
lgiguere
Messages: 6
Registered: January 2013
Location: Montreal, Quebec
Junior Member
Thanks a lot Barbara!

I'll look at the code you sent and try to understand what it does and try to adapt it to our database.

Thanks,
Luc Giguere
Previous Topic: Performance Tuning for Oracle tables
Next Topic: update previous row based on next row & update duplicate record (merged)
Goto Forum:
  


Current Time: Wed Oct 22 23:47:33 CDT 2014

Total time taken to generate the page: 0.05404 seconds