Re: How to parse StreetAddresses String into individual fields

From: contrapositive <nosp_at_m.com>
Date: Tue, 9 Oct 2001 21:53:16 -0400
Message-ID: <HSNw7.170$cu1.171295_at_newsrump.sjc.telocity.net>


Yuck! There is nothing easy about this. On the surface it's a LOT of string manipulation, and I would start by writing some utility functions: functions that will separate "words" (as delimited by white space), return the numeric part of a word (similar to Fix() in VB), strip punctuation (commas and periods will just get in the way), etc.

But that's overlooking a bigger part of the problem. I don't know the source of your addresses, but chances are they haven't been validated against the USPS standards for correction. Even if they have, there are no simple, hard fast rules for formatting addresses, so you would be incorrect to assume that addresses always follow a predefined format. For example, these are all valid addresses:
ONE ALEXANDRIA PLACE
215 WEST SOUTH TEMPLE
100 12TH AVE WEST
504 EAST ST JOHN ST (East Saint John Street)

I've seen people in apartments do things like 100-28 ELM ST, instead of the correct, 100 ELM ST APT 28, most likely because that's how the landlord or previous owner told them. Given all of this, you'd probably want to identify some keywords, not by their position but by their content. You can identify directionals rather easily (N, SOUTHWEST, NORTH EAST, etc.). You can also do this with street designators, but there are a LOT of them (ST, AVE, BLVD and more obscure things like ARC (Arcade), RADL (Radial) and VIA (Viaduct)). And often times they are spelled or abbreviated in more than one way, or just incorrectly (PKY vs. PKWY, RIDGE vs. RDGE vs. RDG, etc.). You probably don't want to hardcode hundreds of such items.

I believe that there are software and database packages that can assist or even manage this kind of functionality for you, but I'm sure they're expensive and that's probably not an option for you. If you're careful, you could probably write something that will work fairly well, but even the most well-written program will only be as good as the data -- and even then, there will always be errors.

-jk

"Raj Cherukuri" <raj_strobe_at_yahoo.com> wrote in message news:e6ff1e18.0110091050.44538712_at_posting.google.com...
> I have to take a string (from a data entry form) that contains Street
> Addresses such as (17824 NE 12TH ST APT 10-301) and have it parsed
> field by field and pass it into a PL/SQL stored procedured, format the
> string so that the individual fields are set to a specific column. For
> example
>
> FullAddress = '17824 NE 12TH ST APT 10-301'
> HouseNumber = 17824
> Prefix = NE
> Name = 12TH
> Type = ST
> UnitType = APT
> Unit = 10-301
>
> Once the fields are parsed column by column, have to reassemble them
> (by concatenation) and check against an existing string in the
> database. That is:
>
> streetnm varchar2(75);
> concatendated_address varchar2(75);
> concatenated_address := HouseNumber || ' ' || Prefix || ' ' || Name ||
> ' ' || Type || ' ' || UnitType || ' ' || unit;
> select fulladdress into streetnm from lisaddress where fulladdress
> like 'concatenated_address';
> if rowcount > 0 then
> 'execute another procedure that takes concatenated_address as a
> parameter
> endif
>
>
> Could you Show me the steps you do to accomplish this.
>
> Thanks for your time,
>
> Raj
Received on Wed Oct 10 2001 - 03:53:16 CEST

Original text of this message