Re: Dreaming About Redesigning SQL

From: Patrick K. O'Brien <pobrien_at_orbtech.com>
Date: 13 Oct 2003 12:51:45 -0500
Message-ID: <m2fzhx6n66.fsf_at_orbtech.com>


Lauri Pietarinen <lauri.pietarinen_at_atbusiness.com> writes:

> Patrick K. O'Brien wrote:
>
> >lauri.pietarinen_at_atbusiness.com (Lauri Pietarinen) writes:
> >
> >Please allow me to take exception to a few of your points.
> >
> Sure!

Great!

> >>If you take a standard SQL-query with sorting and grouping and some
> >>joins and compare it with hand written "navigational" code you will
> >>notice that what can be expressed in SQL in 5-10 lines of code will
> >>require several pages of hand written code for equivalent result.
> >>
> >That really depends on two things: the specific SQL query, the
> >"navigational" programming language, and the capabilities of the
> >ODBMS. In general, I would agree that a declarative language is going
> >to be able to express an operation more concisely than a procedural or
> >object-oriented language. But a language like Python is very powerful
> >and expressive. I can do the equivalent of a simple SQL query in
> >about as many lines of code in Python using the PyPerSyst ODBMS. It
> >would take an incredibly complex SQL query before I'd get to "several
> >pages of hand written code". PyPerSyst itself is only several pages
> >of hand written code. ;-)
> >
> Well, just to make things a bit more concrete, how would you write
> the following query
>
> SELECT c.custname, p.prodname, sum(od.order_qty*p.prod_price) as
> part_cust_total
> from customer c,
> order o,
> order_detail od,
> part p
> where c.custid= o.custid and
> o.orderid = od.orderid and
> od.partid = p.partid
> group by c.custname, p.partname
> order by part_cust_total
>
> with obvious tables:
> customer(custid, custname)
> order(custid, orderid, orderdate, etc...)
> order_detail(orderid,partid,order_qty)
> product(prodid, prodname, prodprice)

First, I'll assume that the typos in your example are simply that. Just for the record, I've included SQL below without the typos:



SELECT c.custname, p.prodname, sum(od.orderqty*p.prodprice) as prod_cust_total
  from customer c,
     order o,
      order_detail od,
      product p

where c.custid = o.custid and
o.orderid = od.orderid and
od.prodid = p.prodid
group by c.custname, p.prodname
order by prod_cust_total

with obvious tables:
customer(custid, custname)
order(custid, orderid, orderdate, etc...) order_detail(orderid, prodid, orderqty)
product(prodid, prodname, prodprice)


To create the equivalent result using PyPerSyst, I first created a schema that closely matched your table definitions. I include that schema below. Then I create a small program to populate the database with sample data. Then I coded a function that generates similar results. I then ran the function in an interactive Python session. (In addition to executing programs, Python can also be used interactively.) Here is the code for the function:

def prod_cust_totals():

    """Return order summary information."""     db = _database
    group = {}
    for od in db.root['OrderDetail']:

        key = (od.order.customer.name, od.product.name)
        subtotal = od.quantity * od.product.price
        group[key] = group.get(key, 0) + subtotal
    totals = [(t, c, p) for ((c, p), t) in group.items()]     totals.sort()
    pprint.pprint(totals)

I think you'll agree that my code is roughly the same amount of code as your SQL. And I think the average Python programmer would find this to be a simple, straightforward example of Python code. There is really nothing unusual or tricky about this code.

Here is the result of running this function interactively against the sample database:

>>> import database
>>> db = database.database()
>>> database.populate()
>>> database.prod_cust_totals()
[(300, 'Customer 107', 'Product 100'),
 (300, 'Customer 190', 'Product 100'),
 (300, 'Customer 228', 'Product 100'),
 (300, 'Customer 280', 'Product 100'),
 (300, 'Customer 340', 'Product 100'),
 (300, 'Customer 348', 'Product 100'),
 (300, 'Customer 399', 'Product 100'),
 (300, 'Customer 414', 'Product 100'),
 (300, 'Customer 476', 'Product 100'),
 (300, 'Customer 479', 'Product 100'),
 (300, 'Customer 513', 'Product 100'),
 (300, 'Customer 533', 'Product 100'),
 (303, 'Customer 169', 'Product 101'),
 (303, 'Customer 171', 'Product 101'),
 (303, 'Customer 236', 'Product 101'),
 (303, 'Customer 308', 'Product 101'),
 (303, 'Customer 318', 'Product 101'),
 (303, 'Customer 431', 'Product 101'),
 (303, 'Customer 445', 'Product 101'),
 (303, 'Customer 492', 'Product 101'),
 (306, 'Customer 290', 'Product 102'),
 (306, 'Customer 316', 'Product 102'),
 (306, 'Customer 317', 'Product 102'),
 (306, 'Customer 321', 'Product 102'),
 (306, 'Customer 414', 'Product 102'),
 (306, 'Customer 495', 'Product 102'),
 (306, 'Customer 510', 'Product 102'),
 (306, 'Customer 519', 'Product 102'),
 (306, 'Customer 531', 'Product 102'),
 (306, 'Customer 579', 'Product 102'),
 (309, 'Customer 330', 'Product 103'),
 (309, 'Customer 358', 'Product 103'),
 (309, 'Customer 514', 'Product 103'),
 (312, 'Customer 117', 'Product 104'),
 (312, 'Customer 126', 'Product 104'),
 (312, 'Customer 236', 'Product 104'),
 (312, 'Customer 250', 'Product 104'),
 (312, 'Customer 267', 'Product 104'),
 (312, 'Customer 274', 'Product 104'),
 (312, 'Customer 286', 'Product 104'),
 (312, 'Customer 294', 'Product 104'),
 (312, 'Customer 448', 'Product 104'),
 (312, 'Customer 549', 'Product 104'),
 (315, 'Customer 191', 'Product 105'),
 (315, 'Customer 239', 'Product 105'),
 (315, 'Customer 335', 'Product 105'),
 (315, 'Customer 340', 'Product 105'),
 (315, 'Customer 346', 'Product 105'),
 (315, 'Customer 414', 'Product 105'),
 (315, 'Customer 473', 'Product 105'),
 (315, 'Customer 482', 'Product 105'),
 (315, 'Customer 499', 'Product 105'),
 (315, 'Customer 585', 'Product 105'),
 (315, 'Customer 587', 'Product 105'),
 (315, 'Customer 591', 'Product 105'),
 (318, 'Customer 249', 'Product 106'),
 (318, 'Customer 266', 'Product 106'),
 (318, 'Customer 421', 'Product 106'),
 (318, 'Customer 435', 'Product 106'),
 (318, 'Customer 442', 'Product 106'),
 (318, 'Customer 517', 'Product 106'),
 (318, 'Customer 561', 'Product 106'),
 (321, 'Customer 115', 'Product 107'),
 (321, 'Customer 131', 'Product 107'),
 (321, 'Customer 167', 'Product 107'),
 (321, 'Customer 250', 'Product 107'),
 (321, 'Customer 254', 'Product 107'),
 (321, 'Customer 313', 'Product 107'),
 (321, 'Customer 375', 'Product 107'),
 (321, 'Customer 383', 'Product 107'),
 (321, 'Customer 450', 'Product 107'),
 (321, 'Customer 496', 'Product 107'),
 (321, 'Customer 548', 'Product 107'),
 (324, 'Customer 103', 'Product 108'),
 (324, 'Customer 215', 'Product 108'),
 (324, 'Customer 374', 'Product 108'),
 (324, 'Customer 487', 'Product 108'),
 (324, 'Customer 576', 'Product 108'),
 (327, 'Customer 220', 'Product 109'),
 (327, 'Customer 267', 'Product 109'),

 (327, 'Customer 343', 'Product 109'),
 (327, 'Customer 441', 'Product 109'),
 ... [lots more results snipped]
 (40950, 'Customer 458', 'Product 175'),
 (41175, 'Customer 263', 'Product 183'),
 (42120, 'Customer 262', 'Product 195'),
 (42846, 'Customer 357', 'Product 193'),
 (43920, 'Customer 568', 'Product 183'),
 (44319, 'Customer 458', 'Product 187'),
 (45045, 'Customer 336', 'Product 195'),
 (45600, 'Customer 270', 'Product 190'),
 (46080, 'Customer 336', 'Product 192'),
 (47040, 'Customer 336', 'Product 160'),
 (48096, 'Customer 276', 'Product 167'),
 (49350, 'Customer 260', 'Product 175'),
 (50235, 'Customer 568', 'Product 197'),
 (56496, 'Customer 260', 'Product 176')]

I chose to print a list of tuples (Python tuples, not to be confused with Relational tuples), to illustrate the results of the function.

Here is the schema (system.py) for this database:

from pypersyst.entity import root
from pypersyst.twisted.entity import Entity

class Customer(Entity):

    _attrSpec = [

        'id',
        'name',
        ]

    _altkeySpec = [
        ('id',),
        ('name',),
        ]


class Order(Entity):

    _attrSpec = [

        'customer',
        'date',
        'id',
        ]

    _altkeySpec = [
        ('id',),
        ]


class OrderDetail(Entity):

    _attrSpec = [

        'order',
        'product',
        'quantity',
        ]

    _altkeySpec = [
        ('order', 'product'),
        ]


class Product(Entity):

    _attrSpec = [

        'id',
        'name',
        'price',
        ]

    _altkeySpec = [
        ('id',),
        ('name',),
        ]


class Root(root.Root):

    """Root class."""

    _EntityClasses = [

        Customer,
        Order,
        OrderDetail,
        Product,
        ]


And here is the database module (database.py), containing all the rest of the code:

import os
import pprint

# PyPerSyst provides pluggable components.

from pypersyst.database import Database
from pypersyst.engine.engine import Engine
from pypersyst.storage.storage import Storage

# Import the system and transaction schemas. from pobrien.order.schema import system
from pobrien.order.schema import transaction as tx

# The store directory is where our data files will reside. import pobrien.order.store
storeDir = os.path.dirname(pobrien.order.store.__file__)

# These next two variables are global semaphores that keep us from # opening the database more than once in the same process. _database = None

def database():

    """Return a PyPerSyst database."""
    global _database
    if _database is None:

        # We haven't opened this database yet, so let's do that.
        name = 'SampleOrderDatabase'
        db = _open(name)
        # Set module attr so we only create this db once per process.
        _database = db

    return _database

def _open(name):

    storage = Storage(storeDir, name, protocol=0, python=True)     engine = Engine(storage, system.Root)     db = Database(engine)
    return db

def populate():

    """Populate the database with sample data."""     db = _database
    # This is how we reset a database back to nothing. Evil!     db._engine._root = db._engine._root.__class__()     # Dump it to disk.
    db.dump()
    import calendar
    import datetime
    import random
    # Create 500 customers.
    customerids = range(100, 600)
    for n in customerids:

        db.execute(tx.Create('Customer', id=n, name='Customer %s'%n))     # Create 100 products.
    productids = range(100, 200)
    for n in productids:

        db.execute(tx.Create('Product', id=n,
                             name='Product %s'%n, price=n*3))
    # Create 2000 orders.
    for n in range(100, 2100):
        year = random.randint(1990, 2003)
        month = random.randint(1, 12)
        days = calendar.monthrange(year, month)[1]
        day = random.randint(1, days)
        date = datetime.date(year, month, day)
        custid = random.choice(customerids)
        cust = db.root['Customer'].find(id=custid)[0]
        order = db.execute(tx.Create('Order', date=date,
                                     id=n, customer=cust))
        # Create order details for up to 50 products.
        for prodid in random.sample(productids,
                                    random.randint(1, 50)):
            prod = db.root['Product'].find(id=prodid)[0]
            qty = random.randint(1, 20)
            db.execute(tx.Create('OrderDetail', order=order,
                                 product=prod, quantity=qty))
    # Compact the database.
    db.dump()

def prod_cust_totals():

    """Return order summary information."""     db = _database
    group = {}
    for od in db.root['OrderDetail']:

        key = (od.order.customer.name, od.product.name)
        subtotal = od.quantity * od.product.price
        group[key] = group.get(key, 0) + subtotal
    totals = [(t, c, p) for ((c, p), t) in group.items()]     totals.sort()
    pprint.pprint(totals)

Here is some more information from within the Python shell, just to show that the instances were in fact created for the sample database, and to show how many order details were generated:

>>> for extentname in db.root:

...     print db.root[extentname]
...     
<Extent object at 0x417f8d4c with 500 instances of Customer>
<Extent object at 0x417f8dec with 50938 instances of OrderDetail>
<Extent object at 0x417f8ccc with 100 instances of Product> <Extent object at 0x417f8dcc with 2000 instances of Order> >>>

I look forward to hearing your thoughts on this example.

Best regards,

Pat

-- 
Patrick K. O'Brien
Orbtech      http://www.orbtech.com/web/pobrien
-----------------------------------------------
"Your source for Python programming expertise."
-----------------------------------------------
Received on Mon Oct 13 2003 - 19:51:45 CEST

Original text of this message