Re: index

From: David BL <davidbl_at_iinet.net.au>
Date: Wed, 30 Jul 2008 08:06:55 -0700 (PDT)
Message-ID: <c87f64b2-ceb3-4134-b3ca-93fa7c117359_at_h17g2000prg.googlegroups.com>


On Jul 30, 9:56 pm, "Brian Selzer" <br..._at_selzer-software.com> wrote:
> "David BL" <davi..._at_iinet.net.au> wrote in message
>
> news:fda99e61-ebc0-4bf5-b6c3-348e54d3f66e_at_27g2000hsf.googlegroups.com...
>
>
>
>
>
> > On Jul 30, 11:13 am, "Brian Selzer" <br..._at_selzer-software.com> wrote:
> > > "David BL" <davi..._at_iinet.net.au> wrote in message
>
> > >news:281568e5-bca8-4107-bab9-f8cb304f7b41_at_j1g2000prb.googlegroups.com...>
> > > On Jul 29, 9:08 pm, "Brian Selzer" <br..._at_selzer-software.com> wrote:
>
> > > [snip]
>
> > > > > Yes it would. The additional read and an exclusive lock /would/ be
> > > > > required
> > > > > whenever there is an update.
>
> > > > > You appear to be equating an index that covers
> > > > > the heading with a clustered index. Covering the heading--that is,
> > > > > containing all of the columns in the heading--is not what makes an
> > > > > index
> > > > > a
> > > > > clustered index.
>
> > > > Yes, I would define a clustered index as what you call a covering
> > > > index. Evidently you want to distinguish a "primary" copy of the data
> > > > - based on your argument concerning write locks for updates. However,
> > > > it seems odd to me that the definition of a clustered index would be
> > > > concerned with where locks are located. In fact a lock manager will
> > > > invariably dynamically allocate locks (only) when they are needed,
> > > > indexed in transient data structures (such as hash tables) in system
> > > > memory. Where is the need to distinguish a primary copy of the data?
>
> > > It's not so much about where locks are located: it's about whether index
> > > keys can be considered to be part of the content of the database. To be
> > > clear: the content of the database is characterized by the set of
> > > queries
> > > that can be answered given the data in the database. With the exception
> > > of
> > > clustered indexes, which are not just indexes but also tables, every
> > > index
> > > can be eliminated without reducing or restructuring the content of the
> > > database. A clustered index, on the other hand, can only be eliminated
> > > by
> > > being transformed into a heap--otherwise there would be a reduction in
> > > the
> > > content of the database. Even so, every query that can be answered with
> > > indexes can be answered without indexes (at least theoretically. Some
> > > queries may become impractical to the point of being unusable without
> > > indexes.). So doesn't it follow that there already is a primary copy of
> > > the
> > > data? Except for clustered index keys--each of which may be the only
> > > copy
> > > of a particular datum in the database--can index keys really be
> > > considered
> > > part of the content of the database? Wouldn't that therefore relegate
> > > them
> > > to secondary status?
>
> > It may be useful to look at an analogy in C++. Consider a bijection
> > between two integer attributes named x and y coded as follows:
>
> > class Bijection
> > {
> > public:
> > void insert(int x,int y) {xtoy[x]=y; ytox[y]=x;}
> > int getx(int y) {return ytox[y];}
> > int gety(int x) {return xtoy[x];}
> > private:
> > map<int,int> xtoy;
> > map<int,int> ytox;
> > };
>
> > As a systems programmer I have used this technique in various forms on
> > a number of occasions. I see a class instance as analogous to a
> > single “table”, and the two maps as “clustered indexes”. Each is
> > redundant with respect to the other. Both are needed for performance.
>
> I think I see what you're driving at, even though your example is broken:
>
> Bijection b; int i, j, k, l;
> b.insert(4, 3); b.insert(4, 5); b.insert(4, 7);
> i = b.getx(3); // returns 4
> j = b.getx(5); // returns 4
> k = b.getx(7); // returns 4
> l = b.gety(4); // returns 7

Yes of course I didn't bother worrying about repeats. However whether one needs to test for that explicitly depends on the application. For example in the past I have needed a transient bijection between GUID and memory pointer and I could *assume* uniqueness of both in the design.

> Your example is illuminating--aside from the fact that it doesn't operate as
> its name suggests. I don't think it's thread safe: x == b.getx[b.gety[x]]
> could sometimes return false. That highlights one of the pitfalls of
> duplicating content.

Most classes in most OO programs are not threadsafe because they do not need to be threadsafe. Imagine the performance hit if every variable of every data type was protected with a mutex!

Don't scoff at that Bijection class. It is like a Lamborghini when used in an appropriate context. Why use a general purpose off road vehicle when you know you're only driving on the highway?

> But the bottom line is that what is a clustered index in the implementation
> implied by the original poster's question differs from what you consider to
> be a clustered index.

Perhaps. Received on Wed Jul 30 2008 - 17:06:55 CEST

Original text of this message