Home » SQL & PL/SQL » SQL & PL/SQL » Help with parsing a full name field (Oracle 10g)
Help with parsing a full name field [message #408556] Tue, 16 June 2009 18:05 Go to next message
RiverX
Messages: 9
Registered: March 2008
Junior Member
Hi all,

I was wondering if anyone could help me with the parsing a full name field.
I would like to seperate it into lastname, firstname, middle initial, suffix

Here are some inputs for fullname followed by how i would like for them to be parsed.
        name                  | parsed lastname  | parsed firstname        |  parsed middle initial   | parsed suffix |
    ----------------------------------------------------------------------------------------
    PUBLIC, JOHN              | PUBLIC    | JOHN             |  NULL             | NULL
    PUBLIC, CHUN CH KIM       | PUBLIC    | CHUN CH KIM      |  NULL             | NULL
    PUBLIC, MARY L            | PUBLIC    | MARY             |  L                | NULL
    PUBLIC, FRED J JR         | PUBLIC    | FRED             |  J                | JR
    PUBLIC, SUE ELLEN J SR    | PUBLIC    | SUE ELLEN        |  J                | SR

I have a list of all the suffix values that one is able to enter i.e.

JR, SR, I,II,III,IV,V,VI


I've gotten to a point where i split up the lastname and the rest of the name
but i can't quite figure out how to do the rest.


Here's what i currently have
     select id,
            name,
            substr(name,1, instr(name,',')-1) as lname,
            substr(name,(instr(name,',')+1),length(name)) as rest_of_the_name
     from    my_table
     where status='A';

Thanks for any help

[Updated on: Tue, 16 June 2009 23:07] by Moderator

Report message to a moderator

Re: Help with parsing a full name field [message #408636 is a reply to message #408556] Wed, 17 June 2009 03:05 Go to previous messageGo to next message
c_stenersen
Messages: 255
Registered: August 2007
Senior Member
You could use regular expressions to find the initial and suffix.
regexp_instr and regexp_substr might help you.

http://www.regular-expressions.info/reference.html
http://www.oracle.com/technology/oramag/webcolumns/2003/techarticles/rischert_regexp_pt1.html

Although I have a question. How would you distinguish between your one-letter suffixes and initials? John V would be ambiguous in that case.
Re: Help with parsing a full name field [message #408726 is a reply to message #408556] Wed, 17 June 2009 07:24 Go to previous messageGo to next message
Michel Cadot
Messages: 64152
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
SQL> with 
  2    data as (
  3      select 'PUBLIC, JOHN' val from dual
  4      union all
  5      select 'PUBLIC, CHUN CH KIM' from dual
  6      union all
  7      select 'PUBLIC, CHUN CH C' from dual
  8      union all
  9      select 'PUBLIC, MARY L' from dual
 10      union all
 11      select 'PUBLIC, FRED J JR' from dual
 12      union all
 13      select 'PUBLIC, SUE ELLEN J SR' from dual
 14    ),
 15    step1 as (
 16      select val,
 17             substr(val, 1, instr(val,',')-1) part1,
 18             substr(val, instr(val,',')+2)||' ' part2
 19      from data
 20    ),
 21    step2 as (
 22      select val, trim(part1) lastname, part2,
 23             instr(part2,' ') i1, 
 24             instr(part2,' ',-1,3) i2,
 25             instr(part2,' ',-1,2) i3,
 26             trim(substr(part2, 
 27                         instr(part2,' ',-1,3)+1, 
 28                         instr(part2,' ',-1,2)-instr(part2,' ',-1,3))) 
 29                part2b,
 30             trim(substr(part2, instr(part2,' ',-1,2)+1)) part2c
 31      from step1
 32    )
 33  select val, lastname, 
 34         trim(substr(part2, 1, 
 35                     case
 36                       when i3 = 0 then length(part2)-1
 37                       when     i2 = 0 
 38                            and (  part2c in ('JR','SR','I','II','III','IV','V','VI')
 39                                or length(part2c) = 1 )
 40                         then i3-1
 41                       when     i2 != 0
 42                            and part2c in ('JR','SR','I','II','III','IV','V','VI')
 43                            and length(part2b) = 1
 44                         then i2-1
 45                       when     i2 != 0 
 46                            and part2c not in ('JR','SR','I','II','III','IV','V','VI')
 47                            and length(part2c) = 1
 48                         then i3-1
 49                       else length(part2)-1
 50                     end)) firstname,
 51         case 
 52           when i3 = 0 then null -- no initial nor suffix
 53           when    (    i2 = 0 -- suffix not present
 54                   and part2c not in ('JR','SR','I','II','III','IV','V','VI')
 55                   and length(part2c) = 1
 56                   )
 57             then part2c
 58           when    (    i2 != 0 -- suffix present
 59                   and part2c in ('JR','SR','I','II','III','IV','V','VI')
 60                   and length(part2b) = 1
 61                   )
 62             then part2b
 63           when    (    i2 != 0 -- wrong suffix present
 64                   and part2c not in ('JR','SR','I','II','III','IV','V','VI')
 65                   and length(part2c) = 1
 66                   )
 67             then part2c
 68        end middle,
 69        case 
 70          when part2c in ('JR','SR','I','II','III','IV','V','VI')
 71            then part2c
 72        end suffix
 73  from step2
 74  /
VAL                       LASTNAME               FIRSTNAME               MIDDLE                  SUFFIX
------------------------- ---------------------- ----------------------- ----------------------- -----------------------
PUBLIC, JOHN              PUBLIC                 JOHN
PUBLIC, CHUN CH KIM       PUBLIC                 CHUN CH KIM
PUBLIC, CHUN CH C         PUBLIC                 CHUN CH                 C
PUBLIC, MARY L            PUBLIC                 MARY                    L
PUBLIC, FRED J JR         PUBLIC                 FRED                    J                       JR
PUBLIC, SUE ELLEN J SR    PUBLIC                 SUE ELLEN               J                       SR

6 rows selected.

Regards
Michel
Re: Help with parsing a full name field [message #408755 is a reply to message #408726] Wed, 17 June 2009 11:12 Go to previous messageGo to next message
RiverX
Messages: 9
Registered: March 2008
Junior Member
Thanks Michel!
That worked perfectly
Re: Help with parsing a full name field [message #408981 is a reply to message #408755] Thu, 18 June 2009 10:19 Go to previous messageGo to next message
ebrian
Messages: 2794
Registered: April 2006
Senior Member
Different spin using regex:

SQL> select name
  2  , regexp_substr(name, '[^,]+') lastname
  3  , case when sf not in ('JR','SR','I','II','III','IV','V','VI') and lt is null
  4     then trim(regexp_substr(name, '[^,]*$'))
  5     when sf not in ('JR','SR','I','II','III','IV','V','VI') and lt is not null
  6     then regexp_replace(name, '(.*, )(.*)( .$)', '\2')
  7     when sf in ('JR','SR','I','II','III','IV','V','VI') and lt is not null
  8     then regexp_replace(name, '(.*, )(.*)( [^ ] [^ ]*$)', '\2')
  9     when sf in ('JR','SR','I','II','III','IV','V','VI') and lt is null
 10     then regexp_replace(name, '(.*, )(.*)( [^ ]*$)', '\2')
 11    end firstname
 12  , case when sf not in ('JR','SR','I','II','III','IV','V','VI') and lt is not null
 13     then regexp_substr(name, '.$')
 14     when sf in ('JR','SR','I','II','III','IV','V','VI') and lt is not null
 15     then lt
 16    end middle
 17  , case when sf in ('JR','SR','I','II','III','IV','V','VI')
 18     then sf
 19    end suffix
 20  from (
 21	select name
 22		, regexp_substr(name, ' .( |$)') lt
 23	        , regexp_substr(name, '[^ ]+$') sf
 24	from full_name
 25    	);

NAME                      LASTNAME               FIRSTNAME               MIDDLE                  SUFFIX
------------------------- ---------------------- ----------------------- ----------------------- ------------
PUBLIC, FRED J JR         PUBLIC                 FRED                    J                       JR
PUBLIC, SUE ELLEN J SR    PUBLIC                 SUE ELLEN               J                       SR
PUBLIC, JIM I I           PUBLIC                 JIM			 I			 I
PUBLIC, MARY L            PUBLIC                 MARY                    L
PUBLIC, JOHN              PUBLIC                 JOHN
PUBLIC, CHUN CH KIM       PUBLIC                 CHUN CH KIM
PUBLIC, CHUN CH C         PUBLIC                 CHUN CH                 C
PUBLIC, BOB SR            PUBLIC                 BOB						 SR

8 rows selected.
Re: Help with parsing a full name field [message #408982 is a reply to message #408981] Thu, 18 June 2009 10:24 Go to previous messageGo to next message
Michel Cadot
Messages: 64152
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Question

Hey! I posted another solution a couple of hours ago, where did it go?

Question

Re: Help with parsing a full name field [message #408985 is a reply to message #408982] Thu, 18 June 2009 10:42 Go to previous messageGo to next message
Frank
Messages: 7880
Registered: March 2000
Senior Member
Michel Cadot wrote on Thu, 18 June 2009 17:24

Question

Hey! I posted another solution a couple of hours ago, where did it go?

Question



Probably got rejected by the site because "we don't give full solutions here, only hints"

(and no, I did NOT remove it Wink )
Re: Help with parsing a full name field [message #408987 is a reply to message #408985] Thu, 18 June 2009 10:46 Go to previous messageGo to next message
ebrian
Messages: 2794
Registered: April 2006
Senior Member
Frank wrote on Thu, 18 June 2009 11:42

Probably got rejected by the site because "we don't give full solutions here, only hints"

Now that is funny !
Re: Help with parsing a full name field [message #409053 is a reply to message #408982] Fri, 19 June 2009 01:45 Go to previous messageGo to next message
Michel Cadot
Messages: 64152
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Here's (about) the query I posted.
The solution is based on splitting the line into words, characterizing each word then regrouping the words into each type and finally line.
SQL> with 
  2    raw_data as (
  3      select 'PUBLIC, JOHN' val from dual
  4      union all
  5      select 'PUBLIC, CHUN CH KIM' from dual
  6      union all
  7      select 'PUBLIC, CHUN CH C' from dual
  8      union all
  9      select 'PUBLIC, MARY L' from dual
 10      union all
 11      select 'PUBLIC, FRED J JR' from dual
 12      union all
 13      select 'PUBLIC, SUE ELLEN J SR' from dual
 14      union all
 15      select 'PUBLIC, BOB SR' from dual
 16    ),
 17    data as (        -- add space and number the rows
 18      select ' '||val||' ' val, rownum rn
 19      from raw_data
 20    ),
 21    lines as ( select level line from dual connect by level < 10 ),
 22    splitted as (    -- split rows into words
 23      select rn, val, line word_no, 
 24             count(*) over (partition by rn) nb_word,
 25             substr(val, 
 26                    instr(val,' ',1,line)+1,
 27                    instr(val,' ',1,line+1)-instr(val,' ',1,line)-1
 28                      - decode(line,1,1,0)
 29                   ) word
 30      from data, lines
 31      where line < length(val)-length(replace(val,' ',''))
 32    ),
 33    typed as (      -- type and number each word
 34      select rn, val, nb_word, word,
 35             case 
 36               when word_no = 1 then 'LastName'         -- last name
 37               when     word_no = nb_word               -- Suffix
 38                    and word in ('JR','SR','I','II','III','IV','V','VI')
 39                 then 'Suffix'
 40               when     word_no = nb_word               -- Initial
 41                    and length(word) = 1
 42                 then 'Initial'
 43               when     word_no = nb_word-1             -- Other initial case
 44                    and lead(word) over(partition by rn order by word_no)
 45                          in ('JR','SR','I','II','III','IV','V','VI')
 46                    and length(word) = 1
 47                 then 'Initial'
 48               else 'FirstName'
 49             end type,
 50             case 
 51               when word_no = 1 then 1                  -- LastName
 52               when     word_no = nb_word               -- Suffix
 53                    and word in ('JR','SR','I','II','III','IV','V','VI')
 54                 then 1
 55               when     word_no = nb_word               -- Initial
 56                    and length(word) = 1
 57                 then 1
 58               when     word_no = nb_word-1             -- Other initial case
 59                    and lead(word) over(partition by rn order by word_no)
 60                          in ('JR','SR','I','II','III','IV','V','VI')
 61                    and length(word) = 1
 62                 then 1
 63               else word_no-1
 64             end new_word_no
 65       from splitted
 66    ),
 67    regrouped as (   -- concatenate each word per type
 68      select rn, val, type, sys_connect_by_path(word,' ') value
 69      from typed
 70      connect by prior rn = rn and prior type = type and prior new_word_no = new_word_no-1
 71      start with new_word_no = 1
 72    )
 73  select val,
 74         trim(max(decode(type,'LastName',value))) lastname,
 75         trim(max(decode(type,'FirstName',value))) firstname,
 76         trim(max(decode(type,'Initial',value))) middle,
 77         trim(max(decode(type,'Suffix',value))) Suffix
 78  from regrouped
 79  group by rn, val
 80  order by rn
 81  /
VAL                      LASTNAME FIRSTNAME    MIDDLE SUFFIX
------------------------ -------- ------------ ------ ------
 PUBLIC, JOHN            PUBLIC   JOHN
 PUBLIC, CHUN CH KIM     PUBLIC   CHUN CH KIM
 PUBLIC, CHUN CH C       PUBLIC   CHUN CH      C
 PUBLIC, MARY L          PUBLIC   MARY         L
 PUBLIC, FRED J JR       PUBLIC   FRED         J      JR
 PUBLIC, SUE ELLEN J SR  PUBLIC   SUE ELLEN    J      SR
 PUBLIC, BOB SR          PUBLIC   BOB                 SR

7 rows selected.

Regards
Michel
Re: Help with parsing a full name field [message #409057 is a reply to message #408556] Fri, 19 June 2009 02:03 Go to previous messageGo to next message
Frank
Messages: 7880
Registered: March 2000
Senior Member
How can you ever tell whether "I" is a middle initial or a suffix?
Formatting free-format fields afterwards always result in ambiguous results.
Re: Help with parsing a full name field [message #409097 is a reply to message #409057] Fri, 19 June 2009 06:26 Go to previous messageGo to next message
ebrian
Messages: 2794
Registered: April 2006
Senior Member
Frank wrote on Fri, 19 June 2009 03:03
How can you ever tell whether "I" is a middle initial or a suffix?

Yep...that's the one ambiguous part that no code could really decipher.
Re: Help with parsing a full name field [message #409101 is a reply to message #409057] Fri, 19 June 2009 07:14 Go to previous messageGo to next message
Michel Cadot
Messages: 64152
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Frank wrote on Fri, 19 June 2009 09:03
How can you ever tell whether "I" is a middle initial or a suffix?
Formatting free-format fields afterwards always result in ambiguous results.

This has realy been asked by c_stenersen in the very first answer.
For myself, I give priority to suffix in the queries I posted (in short, test in suffix list, before test length=1). Of course, this is totally arbitrary but my purpose was to show how to build a solution (several of the subqueries in my replies can be merged for a shorter code but intentionally left to show the steps).

Regards
Michel

Re: Help with parsing a full name field [message #409103 is a reply to message #409101] Fri, 19 June 2009 07:17 Go to previous messageGo to next message
Frank
Messages: 7880
Registered: March 2000
Senior Member
Michel Cadot wrote on Fri, 19 June 2009 14:14
Frank wrote on Fri, 19 June 2009 09:03
How can you ever tell whether "I" is a middle initial or a suffix?
Formatting free-format fields afterwards always result in ambiguous results.

This has realy been asked by c_stenersen in the very first answer.
For myself, I give priority to suffix in the queries I posted (in short, test in suffix list, before test length=1). Of course, this is totally arbitrary but my purpose was to show how to build a solution (several of the subqueries in my replies can be merged for a shorter code but intentionally left to show the steps).

Regards
Michel



Ah, but I was not referring to your code. Just more in general pointing to the trouble with these approaches.
Re: Help with parsing a full name field [message #409134 is a reply to message #409103] Fri, 19 June 2009 10:40 Go to previous message
Kevin Meade
Messages: 2101
Registered: December 1999
Location: Connecticut USA
Senior Member
I seem to recall about 10 years ago, someone wrote a name parsing routine for oracle. It was supposed to be "smart" somehow. But... my memory fails me as to how to find it. Anyone else remember this?

Kevin
Previous Topic: select - various count
Next Topic: Help in Creating a Text File using UTL_FILE
Goto Forum:
  


Current Time: Sat Dec 10 07:11:22 CST 2016

Total time taken to generate the page: 0.12428 seconds