Better search algorithm?

From: Jason Chancellor <newsgroups_at_makethematch.com>
Date: 23 Jul 2001 12:55:17 -0700
Message-ID: <a19c9a2e.0107231155.7f3c4dd8_at_posting.google.com>


Hello, I am working on a project which is beginning to show some performance issues, and I have been trying some different techniques with no better results....so I thought I would ask around for some advice.

The project is a matchmaker which basically pairs up people based on a questionnaire and their distance from each other. IE, I would like to find all people who are girls and live 30 miles from me. The database is MS Sql Server 7

The tables in question are as follows :

USER_INFO



user_id numeric(10,0)
zipcode int
sex        int
age        int

etc......

ZIP_CODE



zipcode int
longitude float
latitude float

SEARCH



search_user_id numeric(10,0)
found_user_id numeric(10,0)
distance float

OK, heres what I was trying. I was selecting a group of users from the USER_INFO table based on their answers to questions and inserting them into the SEARCH table. The search_user_id is who requested the search, and the found_user_id is who the search produced (depending on how broad the search was, this insert could be 1000's of rows. After the insert, I had a stored proc which went through the SEARCH table and calculated the distance the two people were from each other based on the longitude and latitude of their two zipcodes, and then this distance was used to determine if these two users were close enough together. This distance calculation is what is killing my performance, 1st because its a complicated algorithm, and two because I have to basically go through 1000s of rows and do 1 update at a time.

I would like to be able to calculate the distance at the same time I am selecting the users based on their questions. My initial thought was to create a table which stored the precalculated distances between all zipcodes in America and then join in this table when doing my search. However, after a few calculations I realized that this would require this distance table to have over 640 million rows (thats a couple too many)

So here I am, totally out of ideas... Can anyone think of a better way to perform the search I am trying to do, or can anyone point me to a book or a URL of some info that might be of help?

Thanks for your time.
Jason Chancellor
jason_chancellor_at__NOSPAMHERE_hotmail.com Received on Mon Jul 23 2001 - 21:55:17 CEST

Original text of this message