Path: news.easynews.com!easynews!news.he.net!news-hog.berkeley.edu!ucberkeley!nntp-relay.ihug.net!ihug.co.nz!telocity-west!TELOCITY!newsrump.sjc.telocity.net!not-for-mail
From: "contrapositive" <nosp@m.com>
Newsgroups: comp.databases.oracle,comp.databases.oracle.misc,comp.databases.oracle.server,comp.databases.oracle.tools
References: <e6ff1e18.0110091050.44538712@posting.google.com>
Subject: Re: How to parse StreetAddresses String into individual fields
Lines: 76
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 5.50.4522.1200
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
Message-ID: <HSNw7.170$cu1.171295@newsrump.sjc.telocity.net>
X-Trace: NzYgTm9BdXRoVXNlciBURUxPQ0lUWalSRUGlRVJTIDY0LjE51S4yMzIuMTk3ICBUdWUsIDA5IE9j!dCAyMDAxIDE4OjUzOrAzIFBEVA==
X-Abuse-Info: Please forward ALL headers when reporting abuse.
X-Complaints-To: abuse@bb.directv.com
NNTP-Posting-Date: Tue, 09 Oct 2001 18:53:43 PDT
Date: Tue, 9 Oct 2001 21:53:16 -0400
Xref: easynews comp.databases.oracle:10801 comp.databases.oracle.misc:70096 comp.databases.oracle.server:120752 comp.databases.oracle.tools:45361
X-Received-Date: Tue, 09 Oct 2001 20:57:28 MST (news.easynews.com)

Yuck! There is nothing easy about this. On the surface it's a LOT of string
manipulation, and I would start by writing some utility functions: functions
that will separate "words" (as delimited by white space), return the numeric
part of a word (similar to Fix() in VB), strip punctuation (commas and
periods will just get in the way), etc.

But that's overlooking a bigger part of the problem. I don't know the source
of your addresses, but chances are they haven't been validated against the
USPS standards for correction. Even if they have, there are no simple, hard
fast rules for formatting addresses, so you would be incorrect to assume
that addresses always follow a predefined format. For example, these are all
valid addresses:
ONE ALEXANDRIA PLACE
215 WEST SOUTH TEMPLE
100 12TH AVE WEST
504 EAST ST JOHN ST (East Saint John Street)

I've seen people in apartments do things like 100-28 ELM ST, instead of the
correct, 100 ELM ST APT 28, most likely because that's how the landlord or
previous owner told them. Given all of this, you'd probably want to identify
some keywords, not by their position but by their content. You can identify
directionals rather easily (N, SOUTHWEST, NORTH EAST, etc.). You can also do
this with street designators, but there are a LOT of them (ST, AVE, BLVD and
more obscure things like ARC (Arcade), RADL (Radial) and VIA (Viaduct)). And
often times they are spelled or abbreviated in more than one way, or just
incorrectly (PKY vs. PKWY, RIDGE vs. RDGE vs. RDG, etc.). You probably don't
want to hardcode hundreds of such items.

I believe that there are software and database packages that can assist or
even manage this kind of functionality for you, but I'm sure they're
expensive and that's probably not an option for you. If you're careful, you
could probably write something that will work fairly well, but even the most
well-written program will only be as good as the data -- and even then,
there will always be errors.

-jk

"Raj Cherukuri" <raj_strobe@yahoo.com> wrote in message
news:e6ff1e18.0110091050.44538712@posting.google.com...
> I have to take a string (from a data entry form) that contains Street
> Addresses such as (17824 NE 12TH ST APT 10-301) and have it parsed
> field by field and pass it into a PL/SQL stored procedured, format the
> string so that the individual fields are set to a specific column. For
> example
>
> FullAddress = '17824 NE 12TH ST APT 10-301'
> HouseNumber = 17824
> Prefix = NE
> Name   = 12TH
> Type   = ST
> UnitType = APT
> Unit   = 10-301
>
> Once the fields are parsed column by column, have to reassemble them
> (by concatenation) and check against an existing string in the
> database. That is:
>
> streetnm   varchar2(75);
> concatendated_address varchar2(75);
> concatenated_address := HouseNumber || ' ' || Prefix || ' ' || Name ||
> ' ' || Type || ' ' || UnitType || ' ' || unit;
> select fulladdress into streetnm from lisaddress where fulladdress
> like 'concatenated_address';
> if rowcount > 0 then
>   'execute another procedure that takes concatenated_address as a
> parameter
> endif
>
>
> Could you Show me the steps you do to accomplish this.
>
> Thanks for your time,
>
> Raj


