Re: Dreaming About Redesigning SQL

From: Patrick K. O'Brien <pobrien_at_orbtech.com>
Date: 10 Oct 2003 11:22:45 -0500
Message-ID: <m265ixf4fe.fsf_at_orbtech.com>


lauri.pietarinen_at_atbusiness.com (Lauri Pietarinen) writes:

Please allow me to take exception to a few of your points.

> If you take a standard SQL-query with sorting and grouping and some
> joins and compare it with hand written "navigational" code you will
> notice that what can be expressed in SQL in 5-10 lines of code will
> require several pages of hand written code for equivalent result.

That really depends on two things: the specific SQL query, the "navigational" programming language, and the capabilities of the ODBMS. In general, I would agree that a declarative language is going to be able to express an operation more concisely than a procedural or object-oriented language. But a language like Python is very powerful and expressive. I can do the equivalent of a simple SQL query in about as many lines of code in Python using the PyPerSyst ODBMS. It would take an incredibly complex SQL query before I'd get to "several pages of hand written code". PyPerSyst itself is only several pages of hand written code. ;-)

> The third issue is that the SQL query can be optimised to a much
> higher degree than procedural code. Say you add a new index. The
> DBMS will/can immediately start using it without any user
> intervention (if it decides to). In the procedural alternative you
> would have to recode your query.

You wouldn't necessarily *have to*. PyPerSyst does not yet have a declarative query capability. But if you add an index, the same procedural code will now make use of that index (or rather, the procedural code is calling methods of objects that are smart enough to look for indexes to optimize themselves without requiring any changes in application code).

I only point this out becase critics of object databases really need to improve their understanding of the capabilities of object languages and object databases. They just aren't as inflexible as some pundits would like to claim.

> The optimiser can also take into account the cardinality of tables
> (=number of rows) so that it will produce a different plan for a
> small database and a big database.

PyPerSyst doesn't do this, but it is completely within the realm of possibilities. All we are really talking about is levels of abstraction. Nobody should be writing procedural code at such a low level of abstraction that their code can't be optimized at runtime based on the presence or absence of indexes, or the size of the data involved. There is nothing that precludes an ODBMS, or procedural code, from being able to be optimized at runtime without changing code.

> An optimiser can even be sensitive to input from users at run time,
> say you have
>
> select name, salary
> from nurses
> where sex = ?
>
> If the parameter given at run time is 'M' then it could be
> advantageous to use an index, but not if parameter is 'F'. This
> decision could be made by the optimiser "on the fly". I hope you get
> the picture...

I hope you get the picture that every example you have given can be true about procedural code. Here is a very simple example of the source code for the "find" method of the PyPerSyst Extent class:

    def find(self, **criteria):

        """Return list of instances exactly matching all criteria."""
        instances = []
        names = criteria.keys()
        spec = self._makekeyspec(names)
        if spec in self._altkeys:
            # Use alternate keys as a shortcut.
            key = self._makekey(criteria, spec)
            d = self._altkeys[spec]
            if key in d:
                instance = d[key]
                instances.append(instance)
        else:
            # Scan all instances for a match.
            for instance in self._instances.values():
                match = True
                for name, value in criteria.items():
                    if getattr(instance, name) != value:
                        match = False
                        break
                if match:
                    instances.append(instance)
        return instances

This find method returns a list of instances matching some criteria supplied by the application code. A specific example of its use would look like this:

    people = db.root['Person'].find(name='Lauri Pietarinen')

If there is an alternate key index on the name attribute, the find method will use that as an internal optimization. If one does not exist, then it will scan all instance of the Person class looking for a match. Let's assume that today there is no index, and all instances are scanned. If tomorrow the schema is changed to declare that there should be an alternate key enforced on the name attribute, then PyPerSyst will build and maintain an index that will be used to enforce that constraint. As a side benefit, the find method will also use that index to optimize its performance. The application code will not have changed.

> > Other than that, what's wrong with the network (or similar)
> > models? I have not found a good link to such a discussion though
> > I hear it repeated that there are certain classical arguments
> > against them.
>
> In 1974 there was the BIG DEBATE where Codd defended the RM against
> network guys. I wonder if there is a transcript available?
>
> I think one of Codds arguments was that the number of manipulative
> operators needed in the RM (i.e. insert, update, delete) was much
> lower than in CODASYL (connect, disconnect, etc etc...)

Any arguments that are true about CODASYL are not necessarily true about current ODBMSs, in spite of the fact that they both could be described as implementing a network model. I have also tried to find information about the weaknesses of the network model and have not found much of value. Some of the obvious weaknesses of network products, such as the difficulty in changing your schema because pointers were static and tied to the physical location of information on disk, are easily avoided in modern object databases. For example, PyPerSyst uses references, but has no problem evolving schemas and migrating instances from the old schema to the new schema. IMO, references in modern OO languages aren't as evil as pointers in old CODASYL products. But I'm still looking for information to confirm or dispute that opinion.

Thanks for listening. :-)

-- 
Patrick K. O'Brien
Orbtech      http://www.orbtech.com/web/pobrien
-----------------------------------------------
"Your source for Python programming expertise."
-----------------------------------------------
Received on Fri Oct 10 2003 - 18:22:45 CEST

Original text of this message