Home » SQL & PL/SQL » SQL & PL/SQL » SQL: Substring folder name based on numeric value (Oracle DB 11.2.0.4)
icon5.gif  SQL: Substring folder name based on numeric value [message #670182] Thu, 14 June 2018 19:55 Go to next message
nickz
Messages: 34
Registered: September 2013
Location: US
Member
I would need some help in how to substring a couple of folder names using only SQL.

REQUIREMENT: I have folder names like the following stored in a table. What I want to do is remove any character values towards the end. So, based on the fact that the last few characters are mostly preceded by a numeric value, I was thinking maybe we can SUBSTRing the folder name based on that cut-off rule, utilizing USE-CASE.

Couple of things here on the input folder names:
1. The length of the names is not fixed. It varies.
2. It doesn't necessarily have an underscore in between.
3. Name ALWAYS starts with a character, but it might be any character.
4. Name ALWAYS ends with a character.
5. The last few characters are MOST OF THE TIMES preceded by a few numeric values.
6. Occasionally, the name will be all characters (no numeric values) and in that case, I only want to remove the last character. This can be one of the USE-CASEs where if foldername doesn't contain any numeric value, substr last character.

INPUT
PR01_1234A
JN01_1234AB
PR3234M
PR04773BC
PC07_52435CA
PRJBBCA
JB09_803B

FINAL INTENDED OUTPUT
PR01_1234
JN01_1234
PR3234
PR04773
PC07_52435
PRJBBC
JB09_803

Request your help.
Re: SQL: Substring folder name based on numeric value [message #670184 is a reply to message #670182] Fri, 15 June 2018 00:52 Go to previous messageGo to next message
Michel Cadot
Messages: 65545
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

There are many ways, here's one:
SQL> with
  2    data as (
  3      select 'PR01_1234A' val from dual union all
  4      select 'JN01_1234AB' val from dual union all
  5      select 'PR3234M' val from dual union all
  6      select 'PR04773BC' val from dual union all
  7      select 'PC07_52435CA' val from dual union all
  8      select 'PRJBBCA' val from dual union all
  9      select 'JB09_803B' val from dual
 10    )
 11  select val original_val,
 12         case
 13           when regexp_like(val,'\d') then regexp_replace(val,'[^[:digit:]]+$','')
 14           else regexp_replace(val,'.$','')
 15         end new_val
 16  from data
 17  /

ORIGINAL_VAL         NEW_VAL
-------------------- --------------------
PR01_1234A           PR01_1234
JN01_1234AB          JN01_1234
PR3234M              PR3234
PR04773BC            PR04773
PC07_52435CA         PC07_52435
PRJBBCA              PRJBBC
JB09_803B            JB09_803
Re: SQL: Substring folder name based on numeric value [message #670218 is a reply to message #670184] Mon, 18 June 2018 20:04 Go to previous messageGo to next message
nickz
Messages: 34
Registered: September 2013
Location: US
Member
Thanks so much Michel.. I tweaked it a little bit as per other scenarios. Somehow, I always find using the Regular Expressions a bit complex. It's powerful with all the usage of patterns, but putting in the correct pattern is where I stumble upon. Maybe, I use the regular functions too much like SUBSTR, REPLACE etc. I guess I need to read up more. Apart from the pattern match features, does it also provide performance benefits? If so, how many records by the time I see performance benefits over the regular expressions? I am guessing for smaller tables, it doesn't matter much whether I use REPLACE or REGEXP_REPLACE.
Re: SQL: Substring folder name based on numeric value [message #670219 is a reply to message #670218] Tue, 19 June 2018 00:11 Go to previous messageGo to next message
Michel Cadot
Messages: 65545
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

In 10g, I used to say use regexp functions as less as possible as performances were awful. In 11g, these functions have been rewritten and performances tremendously improved, so I now say don't hesitate to use them unless original functions give the same thing in an easy way.
For instance, I hesitate to write "regexp_replace(val,'.$','')" over the old "replace(val,1,length(val-1))". If in 10g I was sure the performances will be better with the later, I'm no more in 11g and up.
(A double) "translate" can be used instead of "regexp_like(val,'\d')" but once again I won't bet on their respective performances.

Also, it is very possible that a more complex regexp can be used to merge both cases and avoid the CASE expression.

[Updated on: Tue, 19 June 2018 00:14]

Report message to a moderator

Re: SQL: Substring folder name based on numeric value [message #670223 is a reply to message #670219] Tue, 19 June 2018 14:08 Go to previous messageGo to next message
Michel Cadot
Messages: 65545
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Here's an example of the later solution I mentioned:
SQL>    with
  2    data as (
  3      select 'PR01_1234A' val from dual union all
  4      select 'JN01_1234AB' val from dual union all
  5      select 'PR3234M' val from dual union all
  6      select 'PR04773BC' val from dual union all
  7      select 'PC07_52435CA' val from dual union all
  8      select 'PRJBBCA' val from dual union all
  9      select 'JB09_803B' val from dual
 10    )
 11  select val original_val,
 12         regexp_replace(val, '((\d)[^[:digit:]]+|.)$', '\2') new_val
 13  from data
 14  /
ORIGINAL_VAL NEW_VAL
------------ --------------------
PR01_1234A   PR01_1234
JN01_1234AB  JN01_1234
PR3234M      PR3234
PR04773BC    PR04773
PC07_52435CA PC07_52435
PRJBBCA      PRJBBC
JB09_803B    JB09_803
Re: SQL: Substring folder name based on numeric value [message #670224 is a reply to message #670223] Tue, 19 June 2018 14:09 Go to previous messageGo to next message
Solomon Yakobson
Messages: 2756
Registered: January 2010
Location: Connecticut, USA
Senior Member
To save some keystrokes: [^[:digit:]] = \D

SY.
Re: SQL: Substring folder name based on numeric value [message #670226 is a reply to message #670224] Tue, 19 June 2018 14:18 Go to previous messageGo to next message
Michel Cadot
Messages: 65545
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Yes, new formula:
SQL> with
  2    data as (
  3      select 'PR01_1234A' val from dual union all
  4      select 'JN01_1234AB' val from dual union all
  5      select 'PR3234M' val from dual union all
  6      select 'PR04773BC' val from dual union all
  7      select 'PC07_52435CA' val from dual union all
  8      select 'PRJBBCA' val from dual union all
  9      select 'JB09_803B' val from dual
 10    )
 11  select val original_val,
 12         regexp_replace(val, '((\d)\D+|.)$', '\2') new_val
 13  from data
 14  /
ORIGINAL_VAL NEW_VAL
------------ --------------------
PR01_1234A   PR01_1234
JN01_1234AB  JN01_1234
PR3234M      PR3234
PR04773BC    PR04773
PC07_52435CA PC07_52435
PRJBBCA      PRJBBC
JB09_803B    JB09_803
Re: SQL: Substring folder name based on numeric value [message #670233 is a reply to message #670226] Tue, 19 June 2018 15:50 Go to previous messageGo to next message
nickz
Messages: 34
Registered: September 2013
Location: US
Member
I feel so dumb right now.. I can understand what we are trying to do here by looking at you query and then googling the usage of patterns. But, I don't think I will ever be able to make use of the patterns and combinations off the top of my head. So easily confused. Laughing
Re: SQL: Substring folder name based on numeric value [message #670241 is a reply to message #670233] Tue, 19 June 2018 23:57 Go to previous messageGo to next message
Michel Cadot
Messages: 65545
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

There is one case that is not handle: when a name ends with a digit.
What is the specification in this case?

Re: SQL: Substring folder name based on numeric value [message #670244 is a reply to message #670241] Wed, 20 June 2018 01:28 Go to previous messageGo to next message
nickz
Messages: 34
Registered: September 2013
Location: US
Member
Actually, that scenario ending with digits doesn't exist in my case. Names always end with characters. The query you provided earlier worked for me well, except for one scenario.

See the last two name formats I added in the query (jip=job in progress)): JN03_1234.Ajip and PRJBBC.Bjip. Outputs should be JN03_1234 and PRJBBC respectively. The first one got covered with the script you already provided, but the second one was not showing up as expected (it was showing as "PRJBBC.Bji"), so I added a second CASE condition.
SELECT val original_val
     , CASE
          WHEN REGEXP_LIKE ( val, '\d' ) THEN REGEXP_REPLACE ( val, '[^[:digit:]]+$', '' )
          WHEN REGEXP_LIKE ( val, '\.' ) THEN SUBSTR ( val, 1, INSTR ( val, '.' ) - 1 )
          ELSE REGEXP_REPLACE ( val, '.$', '' )
       END
          new_val
FROM (  SELECT TRIM(regexp_substr('PR01_1234A, JN01_1234AB, PR3234M, PR04773BC, PC07_52435CA, PRJBBCA, JB09_803B, JN03_1234.Ajip, PRJBBC.Bjip', '[^,]+', 1, LEVEL)) val
        FROM sys.dual
        CONNECT BY TRIM(regexp_substr('PR01_1234A, JN01_1234AB, PR3234M, PR04773BC, PC07_52435CA, PRJBBCA, JB09_803B, JN03_1234.Ajip, PRJBBC.Bjip', '[^,]+', 1, LEVEL)) IS NOT NULL
     );
Interested to see how all these conditions can be merged into a single REGEXP_REPLACE statement, including the above two examples I mentioned.
Re: SQL: Substring folder name based on numeric value [message #670246 is a reply to message #670244] Wed, 20 June 2018 01:44 Go to previous messageGo to next message
Michel Cadot
Messages: 65545
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

I also added the case where name ends with a digit assuming in this case name is not changed; fr the case with a dot I removed everything including and after it:
SQL> with
  2    data as (
  3      select 'PR01_1234A' val from dual union all
  4      select 'JN01_1234AB' val from dual union all
  5      select 'PR3234M' val from dual union all
  6      select 'PR04773BC' val from dual union all
  7      select 'PC07_52435CA' val from dual union all
  8      select 'PRJBBCA' val from dual union all
  9      select 'JB09_803B' val from dual union all
 10      select 'JN03_1234.Ajip' val from dual union all
 11      select 'PRJBBC.Bjip' val from dual union all
 12      select 'JN03_1234' val from dual
 13    )
 14  select val original_val,
 15         regexp_replace(val, '((\d)\D*|\..*|.)$', '\2') new_val
 16  from data
 17  /
ORIGINAL_VAL   NEW_VAL
-------------- --------------------
PR01_1234A     PR01_1234
JN01_1234AB    JN01_1234
PR3234M        PR3234
PR04773BC      PR04773
PC07_52435CA   PC07_52435
PRJBBCA        PRJBBC
JB09_803B      JB09_803
JN03_1234.Ajip JN03_1234
PRJBBC.Bjip    PRJBBC
JN03_1234      JN03_1234

[Updated on: Wed, 20 June 2018 01:45]

Report message to a moderator

Re: SQL: Substring folder name based on numeric value [message #670248 is a reply to message #670246] Wed, 20 June 2018 02:06 Go to previous message
nickz
Messages: 34
Registered: September 2013
Location: US
Member
Thanks so much Michel. I would have never known so many scenarios can be done with a single small REGEXP_REPLACE function. Smile
Previous Topic: Help Grouping by
Next Topic: How to replace the new line and line feed in a specific manner
Goto Forum:
  


Current Time: Mon Jun 25 07:43:18 CDT 2018