Skip navigation.

Other

Attack of the Frankenschemas

DBMS2 - Mon, 2013-01-28 08:04

In typical debates, the extremists on both sides are wrong. “SQL vs. NoSQL” is an example of that rule. For many traditional categories of database or application, it is reasonable to say:

  • Relational databases are usually still a good default assumption …
  • … but increasingly often, the default should be overridden with a more useful alternative.

Reasons to abandon SQL in any given area usually start:

  • Creating a traditional relational schema is possible …
  • … but it’s tedious or difficult …
  • … especially since schema design is supposed to be done before you start coding.

Some would further say that NoSQL is cheaper, scales better, is cooler or whatever, but given the range of NewSQL alternatives, those claims are often overstated.

Sectors where these reasons kick in include but are not limited to:

  • Retailing, especially online. Different kinds of products have different kinds of attributes, making a Grand Cosmic Schema rather complex. Examples I’ve blogged about include:
  • Human resources. Employee-centric applications are naturally full of hierarchies, which can be annoying to flatten. Non-relational approaches I’ve blogged about include Workday’s object model and Neo4j’s graph-based contribution.
  • Web log analysis. Web logs can be particularly hard to flatten, as per my post on (that sense of) nested data structures.
  • More generally, marketing and other applications that maintain detailed profiles of customers or prospects. The information in these profiles is often based on a large variety of marketing campaigns, third-party databases, and analytic exercises. As the inputs pile up, the schemas get ever hairier.
  • Electronic medical records. Medical records are one area where non-relational approaches may actually have majority share. I blogged about one example in 2008.

Or to quote a 2008 post,

Conor O’Mahony, marketing manager for IBM’s DB2 pureXML, talks a lot about one of my favorite hobbyhorses — schema flexibility* — as a reason to use an XML data model. In a number of industries he sees use cases based around ongoing change in the information being managed:

  • Tax authorities change their rules and forms every year, but don’t want to do total rewrites of their electronic submission and processing software.
  • The financial services industry keeps inventing new products, which don’t just have different terms and conditions, but may also have different kinds of terms and conditions.
  • The same, to some extent, goes for the travel industry, which also keeps adding different kinds of offers and destinations.
  • The energy industry keeps adding new kinds of highly complex equipment it has to manage.

Conor also thinks market evidence shows that XML’s schema flexibility is important for data interchange. For example, hospitals (especially in the US) have disparate medical records and billing systems, which can make information interchange a chore.

*I now call that dynamic schemas.

So, for fear of Frankenschemas, should we flee from RDBMS altogether? Hardly. For social proof, please note:

  • Every application area I’ve cited can be and often is handled via relational techniques.
  • Some of the non-relational alternatives I’ve mentioned, such as XML or object-oriented DBMS, haven’t enjoyed a lot of traction.
  • Even the most successful NoSQL vendors are tiny when compared to the relational behemoths.

More conceptually, I’d say that the advantages of a relational DBMS start:

  • In theory and practice alike, the advantages of normalization and joins.
  • In theory and practice alike, the advantages of loose coupling between your database design and your application. (I think that’s a cleaner way of saying it than to focus on “reusing” the database, but it amounts to the same thing.)
  • In practice, performance and functionality in anything using indexes, even if joins aren’t involved.
  • In practice, maturity and functionality in general.

Those aren’t chopped liver.

Categories: Other