RE: Hash Code Based Identity of a Relational Table

From: Jaromir D.B.Nemec <jaromir_at_db-nemec.com>
Date: Wed, 6 Jun 2018 00:44:19 +0200
Message-ID: <092e01d3fd1e$bec31ae0$3c4950a0$_at_db-nemec.com>



Hello Andy,  

thanks very much for the link, I must confess I never heard of this feature and it could save me a lot of work.  

But there is a subtle point in the described feature – the proposed calculation is order independent. This is achieved with the combination of the row hashes using the XOR operation, that is commutative, so any ordering leads to the same result.  

This small difference has a big performance impact as for the hash code calculation simple full table scan is sufficient (no need of sort).  

Anyway this is a very good news because to achieve the described functionality with a native Oracle implementation would be sufficient to add new digest_types to existing functionality of DBMS_SQLHASH that represents a calculation of say MD5 with XOR row hash combination.  

Kind Regards,  

Jaromir Nemec

 <http://www.db-nemec.com> http://www.db-nemec.com

Tel +436764039288  

From: Andy Sayer [mailto:andysayer_at_gmail.com] Sent: Dienstag, 05. Juni 2018 23:49
To: jaromir_at_db-nemec.com
Cc: ORACLE-L <oracle-l_at_freelists.org> Subject: Re: Hash Code Based Identity of a Relational Table  

Hi Jaromir,  

There is certainly a benefit to being able to compare things in two different locations without sending the entire thing over the wire again. You might be very annoyed though, Oracle has already written something for this, it's not very well documented but it's there: dbms_sqlhash

https://docs.oracle.com/cd/B19306_01/network.102/b14266/appendixb.htm#CHDGHCAE  

You just give it a query which can just be a select * statement. It computes the hash of each row (by concatenating the columns together as a string with a space between each column) and then combines that for a hash of the complete row set at the very end. You need to be careful about row ordering and any NLS style parameters that will change the implicit conversion of any data type you have in your table.  

Thanks,

Andrew  

On 5 June 2018 at 22:34, Jaromir D.B.Nemec <jaromir_at_db-nemec.com <mailto:jaromir_at_db-nemec.com> > wrote:

Hi All,  

I worked once on a project where a file had to be spooled from a database table and transferred on a remote host. I wanted to introduce a possibility of validating that the produced file has the same identical content as the source table and came across the idea of using hash codes (MD5) to achieve it.

On a next occasion when there were two tables to be verified being identical I came back to this idea and I wrote a simple user defined aggregate function based on a MD5 hash code for that task.

So basically the aggregate function MD5_XOR represented a state of the table that could be compared with other table.  

Example

select MD5_XOR(to_char(ID)||COL_TXT|| to_char(COL_DATE,'dd.mm.yyyy hh24:mi:ss')) md5 from tab1

MD5_XOR


173f1f8f85f1a154044b7629a23e949c  

The details can be found in http://www.db-nemec.com/MD5/CompareTablesUsingMD5Hash.html  

I found that approach interesting and shared the idea with some colleagues, but the response was ranging from “may I ask what is it good for?” to “this is very unwise idea” which may be trace of the Oracle “Database Ideas” page under https://community.oracle.com/ideas/20275

This motivated me to write a white paper, where I tried to explain the idea in more detail and present some possible use cases:

http://www.db-nemec.com/MD5/HashCodeBasedIndentityintheDatabase.html

So my main point is to receive this way a more technical grounded feedback about this approach of using a hash code for representing a state of a whole table or a part of it.

Any response is highly appreciated.  

Kind Regards,  

Jaromir D.B. Nemec    

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Jun 06 2018 - 00:44:19 CEST

Original text of this message