Managing large physical mailings ...

From: stv <stvsmth_at_gmail.com>
Date: Mon, 2 Oct 2006 16:49:18 -0600
Message-ID: <9493d0340610021549k2a48b520y6741a390ba94029d@mail.gmail.com>

Hi all,

We're a non-profit that does quite a bit of physical mailings: newsletter, catalogs, etc. There are three data sources and there's quite a bit of overlap. It'll take some time to clean that mess up ... meanwhile we'll have to work with what we have.

I'm new here and I have a few weeks before the next "big" mailing goes out (only 20,000). I've never dealt with bulk snail mailing before, but it seems to me the problem of isolating duplicates from multiple data sources was solved many, many, many years ago.

Do y'all do this much yourself? Or do you outsource it? If you do it yourself, what advice would you have for a newbie in this area?

Are there generic queries or regex that I can start from and expand? We are dealing solely with US addresses and our databases use the basic first, last, addr1, addr2, city, st, zip columns.

It should be noted that we'd prefer to over-filter on folks and save on postage; if we miss a few folks because they cannot type in a proper two-letter state code, then oh well. We're not mailing out 1099s are anything.

I would think the obvious steps are to

Normalize the data: all upper, strip some punctuation, check for valid state (2 char) & zips (5nums and/or ???) maybe clean up the street/st Ave/ave situations. Maybe. However, this seems endless: 4th vs Fourth, all the combinations of apartment number ids, etc. Surely there are some existing queries for this
Remove obvious duplicates
Create some compound "keys" addr1 || addr2 || zip ... possibly a number of others in order to find less obvious duplicates.
Probably need to do some selects & see if the proposed dupes are actually dupes; refine the queries and/or keys; repeat.
Remove less obvious duplicates.

The previous DBA did something like the above, but he would strip all rows with same first/last name :( Not the end of the world, as I said, but I think we can be a bit more sophisticated than that. Also, no normalization of data before creating compound keys, aside from UPPER. We had a bunch of people complain about multiple mailings from the last round, so I'm not too high on re-using his stuff.

--Steve Smith
--

http://www.freelists.org/webpage/oracle-l Received on Mon Oct 02 2006 - 17:49:18 CDT