Re: RDBMS using hashtables

From: Derek Asirvadem <derek.asirvadem_at_gmail.com>
Date: Mon, 25 Aug 2014 17:51:34 -0700 (PDT)
Message-ID: <5cfed530-4368-49b0-8326-02e9a2514064_at_googlegroups.com>


> On Friday, 4 October 2013 23:09:08 UTC+10, Alexander Langer wrote:
> Hi all,
>
> do you know read-optimized RDBMS software using hashtables and order
>
> preserving hash functions instead of trees ?

Alexander

> do you know read-optimized RDBMS software

Most commercial RDBMS is heavily read-optimised already. They have a number of functions to optimise reads, such as (one example only) keeping data and index pages in memory, and managing data vs index pages separately; reading Large Buffers (eg. 1MB instead of one 2KB oage) from disk for queries that tranverse large portions of a table; etc; etc; etc.

Non-commercial RDBMS are not RDBMS, or even DBMS, they are merely freeware that has a few file handling capabilities, and none of the serious grunt or server architecture of commercial RDBMS. Further, they have very little *R*elational capability. Some of them fraudulently claim to be SQL, it is a fraud, because they do not comply with, or provide the facilities of, SQL. Some of them have various features that are highly touted, such as Deferred Constraint Checking and MVCC, which are not relevant in commercial RDBMS. That is to say, they have some irrelevant features, but they do not have the essential features.

> using hashtables *and* order-preserving hash functions instead of trees ?

Well that is a contradiction, so you won't find that in a commercial platform (you might find it in freeware).

The contradiction is this. "order-preserving" requires trees, an *order* implies a tree. "hash function" has no order. Therefore you can have one, XOR the other, but not both in the same table.

> order-preserving hash functions

I am not sure what you mean by that, I will try to answer, but it may not apply. All HASH() functions that I am aware of (ie. commercial RDBMS) provide a hash using established algorithms, MD5, SHA, etc. They are repeatable, given the same data content, they will produce the same hash value. Therefore if you give that result an *non-tree order*, that order can be maintained. And if you give that result an *tree order*, that order can be maintained.

--

The RDBMS does not need to implement hash-tables, in order for you to have hashed access.  What the RDBMS does, under the covers, is physical, and you should divorce yourself from it.  If you don't, you will be married to the physical, which is a massive limitation, and something that we do not do in commercial database implementations, at least not since 1970 when Dr E F Codd wrote the Relational Model, and not since 1987, since we had genuine Relational DBMS platforms.  Your code, and the entire database definition, should be logical only.  If and when you get to optimising performance at the machine level, then and only then, you should enter into the physical, and the facilities that the particular commercial RDBMS platform provides.

There is no problem at all in implementing hashed-access tables in a RDBMS platform (that does not provide hashed-access tables).  First, model and define your database using the RM, and IDEF1X, such that it is Relational.  That already has Order, and Trees, built-in.

Second, use a hash function on the Primary Key (ie. the genuine Relational Key), and form a Surrogate from the hashed PK.  The original PK would now be an Alternate Key, and the Surrogate will be the new PK.

Therefore you do not actually need the RDBMS to *implement* hash-tables, hash-tables are a simple matter to implement if the RDBMS has a HASH() function.  If you implement the sequence that I suggest, you will have a truly Relational table, plus hashed access at the physical level.

If your data store is made up of Surrogates only, then it is not a Relational Database, it is in fact a Record Filing system, and you have none of the power or speed of a Relational Database.  In that case, you do not need an RDBMS, or even a commercial one, any of the freeware will handle filing systems quite adequately.  They provide some of the SQL features, so you might get some of the ease-of-access.

That said, some commercial RDBMS do provide hash-tables, either directly (not necessary, if you understand the above), or indirectly.  Sybase provides it indirectly.  It even provides tables that are partitioned (the low-level physical storage) by the hash value.  This improves Insert performance, it does nothing for read performance (the read performance is massively improved by a number of *other* features and functions).
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc36271.1570/html/blocks/blocks139.htm

Cheers
Derek
Received on Tue Aug 26 2014 - 02:51:34 CEST

Original text of this message