Q: calculating a hash value in SQL (data integrity) [long]

From: Jason Steffler <jagwar_at_my-deja.com>
Date: 1999/12/14
Message-ID: <835u0v$bnj$1_at_nnrp1.deja.com>#1/1


Hi All,

I'm working with a system that has multiple shared dbs across different components that are in various dialects. There is currently little to no data validation going on, which naturally leads to many problems with data corruption, and interface difficulties between components.

I need something other than the usual database validation mechanisms (referential integrity, column typing, etc.), for a number of reasons. An obvious solution to this problem is generating a hash value of a shared data entity across systems, then use the hashed value as a checksum. For flexibility, it would be useful to be able to calculate the hashed value in each system independently, including the databases.

One of the many problems I'm running into, is how can I calculate a meaningful hash value across the various languages/systems, and in particular, SQL? For example, say we use the algorithm of:

hash('text') = ((ASCII('t')*X + ASCII('e'))*X + ASCII('x'))*X + ASCII('t')
where X is the base of the polynomial

in C, I'd calculate this with:
public int hashValue( String data, int baseNumber) {

	int hashVal = 0;
	for (int i = 0; i< data.length(); i++)
		hashVal = hashVal * baseNumber + data.charAt(i);
	return hashVal;

}

Now to the point of the question: how do I implement something like this in SQL?

Being a SQL novice, the closest query I could come up with is (using 3 as the base):

SELECT
  (ASCII(SUBSTR(TABLE.COL, 1, 1))*3 +
  ASCII(SUBSTR(TABLE.COL, (LENGTH(TABLE.COL)), (LENGTH(TABLE.COL))) FROM TABLE This query has a lot lacking. I can't use proprietary functions, like getchecksum in SQL Server, can't use the value returned from a SOUNDEX, or SIMILAR, as this algorithm should be implementable in COBAL, Java, C, & Smalltalk (unless anybody knows the algoritm used for these functs, so they can be reimplemented in other languages). This should be runnable in NonStop SQL, so I want to try and stick to basic SQL (as opposed to using something like PL/SQL)

Any suggestions? Any better algorithms/ways to approach this problem?

Cheers

---
Jason Steffler
 % Unix is user friendly; it's just selective about who its friends are
 (take out .removeMe.org in address for email)


Sent via Deja.com http://www.deja.com/
Before you buy.
Received on Tue Dec 14 1999 - 00:00:00 CET

Original text of this message