Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: ORACLE SUBSTR FUNCTION HELP

Re: ORACLE SUBSTR FUNCTION HELP

From: John Roberts <jroberts_at_bogus.sprintmail.com>
Date: Tue, 01 Oct 2002 22:20:52 GMT
Message-ID: <8dpm9.3361$OB5.298897@newsread2.prod.itd.earthlink.net>

"Dave" <dchou1108_at_hotmail.com> wrote in message news:8f19196e.0209290611.38cc9b2_at_posting.google.com...
> I am trying to get the city name from an address column of type
VARCHAR2(80)
>
> The address looks as follow:
> ADDRESS
> --------------------------------------------------------------------------



> 2800 Montrose Ave. Lacresenta, CA 92832
> 1231 Fullerton Ave. Fullerton, CA 92898
> 3301 Huston Ave. Garden Grove, CA 92123
> 1234 Placentia Rd. Placentia, CA 92871
> 34 Starbucks Rd. Seattle, WA 43249
> 1214 Ford Ave. LA, CA 98649
> 1317 Bond Ave. LA, CA 98649
>
> How do I get the city for the substring. I am trying to group my query by
the city.
>
> Thanks
>

It all depends on whether this is a one-shot deal (like a data conversion) or a continuing application requirement.

If the latter, then table redesign is appropriate, splitting your ADDRESS column into STREET_1, STREET_2, CITY, STATE, COUNTRY, POSTAL_CD etc.

For a one-shot requirement, or for a non-critical requirement (such as some reports), then some sort of
parsing algorithm implemented as a custom oracle function might work. Sometimes you have to work
 with what you're dealt (garbage in = garbage out). Your function should begin by breaking up the text into tokens and delimiters. An attempt should be made to classify each token by inference from length, numeric vs non-numeric, mixed case versus upper case, etc. You could apply heuristics such as assuming the natural order of things (street, then city, then state/province, then zip/postcode). You could also do lookups for known state codes, known city names. Once you nail down the city and state, you could assume that anything preceeding is the street info, anything following is the zip/postcode. In the end you should be able to accurately identfy 95% of the correct city/state combos. Whether a 5% error rate is acceptable depends on your application. I you are using the addresses for sending junk mail, then I suspect a 5% error rate is OK. If on the otherhand you are mailing legal notices, I suspect you need 100% accuracy.

Hope this suggestion is worthwhile.

John Roberts Received on Tue Oct 01 2002 - 17:20:52 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US