Re: Crazy data query

From: Michael D O'Shea <michael.oshea_at_tessella.com>
Date: Wed, 24 Sep 2008 01:11:05 -0700 (PDT)
Message-ID: <b6cc8051-8449-4bff-b1cf-1a3900a575b9@e39g2000hsf.googlegroups.com>


On Sep 22, 10:45 pm, JAW <jwill..._at_aglresources.com> wrote:
>  have an database column that is like the below and not normalized. So
> the entire address in in on field.
>
> Below is a sample of the data
>
> There is a space between each item.
>
> 4918 HULK PKY *LOT 34C BUGTUSEL GA
> 4932 HULK PL BUGTUSEL GA 30000
> 205 SPIDER ST SAINT JOHN GA 00000
> 4027 LEE-OAKS PL *LOT 61C, SAVOY CHASE
> 4027 LEE-OAKS PL *LOT 61C, SAVOY CHASE
> 4033 LEE-OAKS PL *LOT 60C, SVOY CHASE
> 4033 LEE-OAKS PL *LOT 60C, SVOY CHASE
> 4085 LEE-OAKS PL, SAVOY CHASE CROSSING
>
> Since the data is not consistent I would like to extract each item
> into
>
> I am trying to do something outside of PL/SQL with SQL
>
> The second record is the ideal record for this column with a zip code.
>
> Note that LOT XXX should be part of the STREET in this senario
>
> I am trying to utilize INSTR, STR and 10g REGEXP to strip each record
> into separate columns.

Hi, an example of the type of SQL you could write to lex out content from this semistructured address data is included inline below. Modify/ update as appropriate to represent fully all your data.

Regards
Mike

TESSELLA Michael.O'Shea_at_tessella.com

__/__/__/  Tessella Support Services plc
__/__/__/  3 Vineyard Chambers, ABINGDON, OX14 3PX, England
__/__/__/  Tel: (44)(0)1235-555511  Fax: (44)(0)1235-553301
www.tessella.com Registered in England No. 1466429
SQL>
SQL>
SQL> SET LINESIZE 132
SQL> SET WRAP OFF
SQL>
SQL> COLUMN field1 FORMAT 9999
SQL> COLUMN field2 FORMAT A20
SQL>
SQL>
SQL> WITH yourTable AS
  2   (
  3     SELECT '4918 HULK PKY *LOT 34C BUGTUSEL GA' address
  4      FROM DUAL
  5       UNION
  6     SELECT '4932 HULK PL BUGTUSEL GA 30000'
  7      FROM DUAL
  8       UNION
  9     SELECT '205 SPIDER ST SAINT JOHN GA 00000'
 10      FROM DUAL
 11      UNION
 12     SELECT '4027 LEE-OAKS PL *LOT 61C, SAVOY CHASE'
 13      FROM DUAL

 14 )
 15 SELECT REGEXP_SUBSTR(p.address,'^[0-9]+',1) field1,  16 REGEXP_SUBSTR(p.address,'[A-Z ]+(PL|ST|PKY)+',1) field2  17 FROM yourTable p;
FIELD1                                 FIELD2
-------------------------------------- --------------------
205                                     SPIDER ST
4027                                    OAKS PL
4918                                    HULK PKY
4932                                    HULK PL

SQL>

SQL>
SQL> SELECT *
  2 FROM V$VERSION; BANNER

Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 - Prod PL/SQL Release 10.1.0.4.0 - Production
CORE 10.1.0.4.0 Production
TNS for 32-bit Windows: Version 10.1.0.4.0 - Production NLSRTL Version 10.1.0.4.0 - Production

SQL>
SQL> Received on Wed Sep 24 2008 - 03:11:05 CDT

Original text of this message