RDBMS Server Evaluation Matrix v 1.0.2 - FAQ

From: Daniel Druker <ddruker_at_agsm.ucla.edu>
Date: 15 Jun 93 15:11:37 PDT
Message-ID: <1993Jun15.151138.24754_at_news.mic.ucla.edu>


Attached is the first production draft of the RDBMS Server Comparison Matrix. As a change to this version, I have included a detailed description of each of the categories at the bottom of the matrix. I have also included numerous corrections, changes, and suggestions from the folks listed at the end of this document.

Please forward any corrections, comments, or additions to me at the address below.

When using the matrix, please remember that this is shareware and no warranties are given or implied. I have made every effort to include correct information, and have had this document reviewed by active usenet contributors and users of all the products. Interestingly, I send copies of this to active internet participants from each of the vendors, but didn't get a ton of feedback.

I considered including information for Sybase System 10, which make that product look a lot better. However, Since the next versions of Oracle7, Informix, and Ingres also have good new stuff in them I don't want to play the beta game. I've included some information on a few beta offerings such as for ODBC and precompilers,  but I hesitate to do this for the engines themselves.

I suggest that you use this document as a starting point for your database search. Figure out what you application needs are, then what features are most important to you, and use this to get a rough idea of what might be the best fit. Then put together an RFP for the vendors. I've found that an evaluation matrix weighting the functionality you deem most important, and ranking each vendor's implementation of that functionality based on their responses is very helpful in selecting the right product for your application.

I hope this helps people out.

Dan Druker
2nd Year MBA Student
The Anderson Graduate School of Management at UCLA (Formerly with Oracle Corp, now Independent) ddruker_at_agsm.ucla.edu

RDBMS Server Feature Comparison Matrix: Version 1.0.2

Copyright 1993 by Daniel Druker
All Rights Reserved

Product

O = Oracle7 		Version 7.0
S = Sybase  		Version 4.9
A = Ask/Ingres 		Version 6.4
I = Informix On-Line	Version 5.0


NOTE: The 4th column is Informix, not Ingres. This confuses some people...

Y = Yes, the feature exists in the current production product
N = No, the feature does not exist in the production product
* = See the note in the detailed description section
B = The feature exists in a useable, late beta product. 


Product						O S A I 

I. PERFORMANCE


  1. Row Level Locking Y N N Y
  2. Page Level Locking N Y Y Y
  3. Table Level Locking Y Y Y Y
  4. Database Level Locking Y Y Y Y
  5. Programmable Read Isolation Levels N N N Y
  6. System Doesn't Escalate Locks Y N N Y
  7. Programmer May Escalate Locks Y Y Y Y
  8. Versioning / No Read Locks Y * * *
  9. Dirty Read Mode N N Y Y
  10. Multi Server Architecture Y Y Y N
  11. Symmetric Multi-Processor Support Y Y Y Y
  12. Support for Loosely Coupled Systems Y N Y N
  13. Array Interface Y N N N
  14. Shared Commits Y Y Y Y
  15. Asynchronous I/O Y Y Y Y
  16. Raw I/O Y Y N Y
  17. Shared Log Files Y Y Y Y
  18. Non-Blocking Queries Y * * *
  19. Clustered Tables Y N Y N
  20. Clustered Indexes Y Y Y Y
  21. Hash Indexes Y N N N
  22. ANSI Standard Cursor Support Y N Y Y
  23. Backward Scrolling Cursors N N N N
  24. Non-Blocking Sequence Generator Y N N Y
  25. Unique-Key Generator Y N Y Y
  26. Cost Based Optimizer Y Y Y Y
  27. Optimizer uses Data Content Heuristics N N Y N
  28. Optimizer Scans Entire Table to get Stats Y Y Y Y
  29. Optimizer can estimate stats to save time Y N Y N
  30. Optimizer uses network costs N N N N
  31. Optimizer uses hardware costs N N N N
  32. Optimizer knows real-time load factors N N N N
  33. Optimizer moves join location Y N Y Y
  34. Stored Procedures in Database Y Y Y Y
  35. Cached Procedures are Shared Y Y Y N
  36. Stored Procedures can return Sets Y N N Y
  37. Stored Procedures Return All Datatypes Y N N Y
  38. Stored Functions in Database Y N N Y
  39. Stored Procedure Cache Y Y Y Y
  40. Shared SQL Cache Y N N N
  41. Parallel Data Query N N N N
  42. Parallel Index Build N N N N
  43. Parallel Sort N N N N
  44. Performance Monitoring Tools Y Y Y Y

II.INTEGRITY O S A I



45. Adherence to Industry Standards
	SQL Interface				Y Y Y Y
	FIPS 127-1 Validated 			Y * Y Y
46. Declarative Integrity Model			Y N N Y
47. Cascading Updates				N N N N
48. Cascading Deletes 				Y N N N 
49. Loosely Typed Datatypes, Ala ADA		Y N N N
50. Null Support				Y Y Y Y
51. Database Triggers				Y Y Y N
52. PRE- and POST- Triggers			Y N Y N
53. Triggers fire once per SQL Statement	Y Y Y N
54. Triggers fire for each row affected 	Y N N N
55. Programmer can control trigger fire order	Y Y N N 
56. Automatic Procedure Recompilation		Y Y Y Y
57. Programmer Defined Datatypes		N Y Y N
58. Event Alerters				Y N Y Y
59. Database Events Call 3GL Program 		Y * * N
60. Stored Procedures Call 3GL Program 		Y * * N 


III. DATABASE					O S A I
=============			
61. Portable					Y Y Y Y
62. Automatic Database Recovery 		Y Y Y Y
63. Multiplexed Log Files 			Y Y Y Y
64. Database Mirroring				N Y N Y
65. On-Line Database Backup			Y Y Y Y
66. Automatic Log Backup 			Y Y Y Y
67. On-Line Software Diagnostics		Y Y Y Y
68. Hot Standby Server				Y Y Y N
69. Full Cluster Utilization			Y N Y N
70. On Line Recovery				Y Y N N
71. Parallel Backup of multiple files		Y N N N 
72. Parallel Recovery of multiple files		Y N N N 
73. Multiple Log Files				Y N Y N
74. Resource Limiter/Governor			Y N Y N
75. DBA Utilities 				Y Y Y Y
76. Database Auditing 				Y Y Y N


IV.DISTRIBUTED RDBMS				O S A I 

======================
77. Distributed Join Y N Y Y 78. Synchronous Table Replication Y N Y N 79. Asynchronous Table Replication Y N N N 80. Connections to other Databases Y Y Y N 81. Programmatic Two Phase Commit Y Y Y N 82. Transparent Two Phase Commit Y N Y Y 83. Two Phase Commit to Heterogeneous RDBMS Y N N N 84. Movable Transaction Coordinator Y N N N 85. Remote Procedure Calls Y Y Y Y 86. RPC's Protected by Two Phase Commit Y N Y Y 87. Heterogeneous Data Support Y Y Y N 88. Procedural Database Gateway Y Y N N 89. SQL Based Database Gateway Y Y Y N 90. XA TP Monitor Compliant Y N N Y 91. Top End TP Monitor Support Y Y Y Y 92. CICS TP Monitor Support Y Y N N 93. Trans. Can Span Multiple Net Protocols Y N Y N 94. ODBC Support B B B B 95. SQL Access Group Support B B B B V. DATABASE SECURITY O S A I
======================
96. OS Security Integration Y Y Y Y 97. User Group Privileges/Roles Y Y Y N 98. Mandatory Access Control * * * * 99. Undergoing for Evaluation by NCSC Y N N Y 100. Standard Tools with Trusted Product Y N N Y 101. Non-Dedicated Trusted Server Y N Y Y 102. Product being evaluated at C2 Level Y N N Y 103. Product being evaluated at B1 Level Y N * Y 104. Product Certified (Europe) at B1 Equiv N N Y N VI.LANGUAGES & TOOLS O S A I
======================
105. SQL Procedural Language Y Y Y Y 106. Text/Image Datatypes (BLOBS) Y Y N Y 107. Union Operator Y Y Y Y 108. Select for Update Y N Y Y 109. Outer Join Operator Y Y Y Y 110. Dynamic SQL Y Y Y Y 111 Static SQL Y Y Y Y 112. EXPLAIN Command Y Y Y Y 113. Transaction Savepoints Y Y Y Y 114. Aliases/Synonyms Y Y Y Y 115. NIST Verified Conformance to Standards Y * Y Y 116. 3rd Party CASE Tools - Upper and Lower Y Y Y Y 117. Internal CASE Tools - Upper and Lower Y Y N N 118. Graphics Tools Y Y Y Y 119. Internal Email System Integration Y N N N 120. 3rd Party Email System Integration Y Y Y N 121. Trigger/Stored Proc Sends Email Y Y Y N 122. Office Automation Tools Y Y Y Y 123. Bit-Mapped Graphics Y Y Y Y 124. National Language Support European Y Y Y Y Semitic Y N Y N Asian Y N Y Y 125. Precompiler Support C Y Y Y Y COBOL Y Y Y Y Pascal Y N Y Y PL/I Y N N N SAME/DL N N N N ADA Y N Y Y Fortran Y N Y Y C++ B N N N 126. Function Call Interface C Y Y Y Y COBOL Y Y Y Y ADA Y N Y Y Pascal Y N Y Y PL/I Y N N N SAME/DL N N N N Fortran Y N Y Y C++ B N N N 127. Module Language Y N Y N 128. Bind Variable Support Y N Y Y PART 2: Other Important Stuff 1. Theoretical Limits 2. Practical Limits/References 3. Datatypes and Precisions

DETAILED DESCRIPTION OF CATEGORIES


  1. Row Level Locking
    First, a little about granularity. The smallest level of granularity in commercial databases is the Row level. Next comes Page level. Then Table Level. Then Database Level. Typically Database pages are 2048 bytes, so if your rows each contain 100 bytes of data you might squeeze 20 rows into a page.

Row level locking means that the database can use a lock granularity of a single row (tuple for the coneheads out there.) This means that multiple users (Up to 20 in the example above, can simultaneously update different rows on the same page. Each user, when performing an operation on the row, only locks that row in question and does not interfere with other users in the same page. Arguments for Row Level Locking are that it permits the highest degree of concurrency - Users tend not to lock each other out. Arguments against claim that row level locking causes a lot of overhead. My belief is that users are the slowest thing in my system, so I'd rather not wait for them. Thus I prefer row level locking.

2) Page Level Locking



See Also Row Level Locking. In Page Level, Locks occur at (surprise) the page level. Thus when one user updates a row, the entire page is locked and other users are blocked from updating (sometimes reading too) rows in that page. Sometimes users get around page level blocking by forcing what would be small rows to take up an entire page. This trick effectively simulates row level locking for critical tables that are getting hit by updates a lot, and are having problems with contention. The arguments for and against page level locking are mentioned above.

3) Table Level Locking



The database supports the LOCK TABLE command to lock an entire table at a time. Useful for locking a table for batch updates, locking out users for maintenance or reporting, etc.

4) Database Level Locking



The entire database can be locked with a single command. As far as I know, the products all support this through a single-user or DBA only mode, but there is no LOCK DATABASE command.

5) Programmable Read Isolation Levels



The programmar can indicate which type of read isolation he wishes to use for each SELECT statement. Examples are Dirty Read (Uncommitted Data), Cursor Stability, Repeatable Read, etc. These terms are given in order of growing strength.

6) System Doesn't Escalate Locks



Some databases have a maximum number of locks that can be supported, and automatically escalate row or page level locks to table level locks when that number has been exceeded. I believe that system generated lock escalation causes problems by creating artificial deadlocks.

7) Programmer May Escalate Locks



The programmer may escalate locks at his/her discression. For example, if the program writes to a certain percentage of the pages in a table, it may want to go ahead and upgrade to a table lock to avoid running into locked pages and having to wait or to rollback work.

8) Versioning / No Read Locks



The database keeps track of several versions of committed data so queries see only those rows that were committed and valid at the instant the query started. The database allow consistent reads (committed data) without taking out read locks. This is a good thing since readers do not interere with writers, and accurate reports can be written since only committed data is read.
  • Sybase has Browse Mode which does not use read locks. Informix and Ingres have dirty read mode which do not use read locks. 9) Dirty Read Mode
    The database has a read mode which scans the data as it currently exists on disk or in memory, regardless of whether it has been committed or not. In general dirty reads are very fast, but since the data is not necessarily valid some applications can't use them. Good for reports where accuracy doesn't matter.
    1. Multi Server Architecture
      Multiple Server processes exist on the server machine. These usually take two forms, a two-task architecture where each client user gets a corresponding OS process on the server, or a multi-threaded architecture where each of the server processes can manage multiple clients. Multi-Server architecture helps balance system load and is mandatory for taking advantage of symmetric multi-processors. The two-task architecture is considered resource intensive on operating systems that don't deal well with lots of processes well.
    2. Symmetric Multi-Processor Support
      Can the server take advantage of multiple CPU's in a symmetric multi-processor environment. Things to make sure a product can do: Make sure each client can read, write and commit transactions in parallel on any available CPU. Make sure, as you add CPU's, that transaction throughput increases in a near linear fashion. Some products and operating systems may scale well to 4 CPU's, others to 8, and others to 30, hundreds, or more. Some may not scale at all. (OS2, for example)
    3. Support for Loosely Coupled System
      Pioneered by DEC in their VAX cluster, loosely coupled processors are usually independent computers connected by a fast communications bus that can share and coordinate resources. Databases that work in this environment can run simultaneously on several nodes in a cluster. This increases power since processing is going on at each node, and reduces complexity since no two-phase commit or distributed database work is needed. Also helps for fault tolerance since if one of the machines dies the remaining nodes keep running. Today several unix machines (Pyramid, Sequent, IBM RS/6000) offer clustering in addition to DEC. Performance concerns may exist if OLTP goes against the same data on more than one node since database blocks get sent pinging around the cluster if this happens to much. Other than that I think this is pretty cool. Also this is the basis for how Oracle works on Massively Parallel machines like Ncube, KSR, Maspar, etc.
    4. Array Interface
      Can the client request that multiple rows be sent in a single batch, as opposed to a network request for every row to be returned. Reduces network traffic. Example: Send me 1000 rows, thanks. vs. send a row, thanks, send a row, thanks, 1000 times. Only an issue for cursor based processing: for streams based this is not an issue.
    5. Shared Commits
      Shared or Group commits write multiple transactions into the redo structure in a single physical I/O. This is an advantage over database systems which require a physical I/O for each individual transaction. Some products may not support shared commits on SMP hardware.
    6. Asynchronous I/O
      Most operating systems do provide Asynchronous I/O, including VMS, MVS, VM, and unix from Pyramid, and Sequent to name a few. On systems that do not provide asynchronous I/O, the need to wait for disk writes to complete may become a performance bottleneck. The server should be able to take advantage of Asynchronous I/O where available, and have a strategy to deal with the lack of asynchronous I/O on systems where it is not available.
    7. Raw I/O
      On Unix, it is much faster to be able to access raw filesystem devices, which are not controlled by the unix file and buffer managers, than to use normal or "cooked" files.
    8. Shared Log Files
      Every change to the database structure is automatically written in a "piggybacked" manner to the Redo file structure. Transactions are piggybacked in a way that one physical write can actually commit several transactions. This is a very efficient way to commit transactions because it requires only a sequential write to an o/s file. Multiple users can share the same redo file structure.
    9. Non-Blocking Queries
      For systems where multiple users are reading and writing to the same table at the same time, it may be critical that readers doing reports do not interfere with ongoing transaction processing. When readers need to obtain locks to get information, they are said to "block" writers since writers are blocked from getting the locks they need to perform their update. This is a great feature if you have a transaction processing system on which you want to be able to run reports against live data during operating hours.
  • Ingres and Informix can accomplish this by using Dirty Read mode. Sybase does it by using browse mode. See the above sections for limitations and applications of these features.
    1. Clustered Tables
      Clustered tables: Store multiple tables close to each other on disk based on a commonly used join key. This helps speed up access if two tables are always accessed by the same join key. Takes longer to insert and update than normal storage method.
    2. Clustered Indexes
      Clustered Indexes: Several Definitions. Either an index on the cluster key mentioned above, or on some systems a pre-sorted index cluster that greatly speeds data retrieval for ordered operations, at the expense of insert and update speed. Oracle uses the first type, Sybase the second.
    3. Hash Indexes
      Most products use binary tree indexes, which are flexible and good for objects that grow, shrink, change, etc. Hash indexes offer faster performance but are appropriate only for relatively static tables.
    4. ANSI Standard Cursor Support
      Cursors are memory structures which store the SQL statement, a parsed/compiled version of the SQL statement, a buffer for the result data, and pointers to the result set of the query, among other things. Cursors allow applications to process multiple SQL statements in a single database connection. This is crucial for window-based environments which require multiple SQL statements to be processed at one time with different information being displayed in different windows. This allows multiple windows to be created via a single session and reduces memory utilization requirements. In addition, multiple SQL statements can be encompassed in a single transaction with all internal locking controlled by the database engine.

Cursors are also required to meet ANSI standards, both for SQL compliance and for Pre-Compiler Support.

23) Backward Scrolling Cursors



All cursors permit forward scrolling of the result set, some products also allow backward scrolling. Good for those of you that like to think backwards.

24) Non-Blocking Sequence Generator



Many applications require that unique keys be generated, for example for purchase order numbers. A nice feature is a multithreaded  non-blocking sequence generator which can be used to generate sequence numbers for rows in tables. Some databases require applications needing unique primary keys to single thread all transactions through a single sequence table, which can become a major application bottleneck.

25) Unique Key Generator



The database contains a mechanism for generating unique keys.

26) Cost/Statistics Based Optimizer:



The database will gather and store comprehensive statistics about database structures. The statistics based optimizer will choose the most efficient access paths to data based on the information gathered. Some products do a much better job than others. Nice things to have: The ability for the system to "Guess" or estimate the statistics on a table based on a sample of the data. This is a lifesaver when it takes 4 hours to analyze statistics for a 30 gigabyte table. Also, some of the optimizers take into account low/high data values, and data distribution heuristics.

27) Optimizer uses Data Content Heuristics



The optimizer scans the data to determine the minimum value in the table, the maximum value, and the average value. It also keeps track of the distribution of data within the rows.

28) Optimizer Scans Entire Table to get Stats



To gather statistics, the optimizer must scan the entire table. This can take a long long long time for multi-gigabyte tables.

29) Optimizer can estimate stats to save time



The Optimizer can estimate statistics by scanning only a random sample of the rows in the table. Very useful for collecting statistics on large tables where it is impractical to scan the whole table for statistics.

30) Optimizer uses network costs



The optimizer is aware of the existance and make-up of the network, and some costs have been assigned to each link. These may be either costs in terms of link speed or in actual dollar expense for using the link. The optimizer will determine the lowest cost path in terms of time or expense, which may involve shipping data to different nodes for joins or through intermediate nodes depending upon the network.

31) Optimizer uses hardware costs



The optimizer is aware of the relative processing power, CPU Cycle cost, I/O speed and cost, etc. It decides how to implement queries based on this cost information, and the user's resource profile.

32) Optimizer knows real-time load factors



The optimizer can take into account real time system load information in determining the optimal path for queries. Thus this path may change from time to time depending on system load.

33) Optimizer moves join location



Classic example is a local table with 10 rows being joined to a remote table with 1 Million rows. The optimizer is smart enough to send the local table to the remote node for the join rather than the other way around.

34) Stored Procedures in the Database



Stored procedures are bits of procedural code, grouped together and stored in the DBMS engine. Stored procedures are usually stored in shared, compiled format. Stored procedures should be callable by applications, by other stored procedures, or by database triggers. There should also be full dependence tracking that will automatically recompile stored procedures when objects upon which they depend change. Stored procedures are good because they typically improve performance and reduce network traffic. They also are great for encapsulating allowable operations on data - most systems let you grant access to a stored procedure without granting access on the underlying tables. The major downside is that there is no standard for stored procedures, so any code you write that uses them is non-standard and non-portable.

35) Cached Procedures are Shared



Stored Procedures are cached in memory on the server. A single copy of the stored prodedure can be used by multiple users. Saves memory and execution time.

36) Stored Procedures can Return Sets



Stored procedures can return an array or table of data. Not limited to returning only a single row.

37) Stored Procedures return All Datatypes



Stored procedure are capable of return all available server datatypes, not just a limited subset.

38) Stored Functions in Database



Similar to Stored Procedures, but instead stored function calls can be defined by the user. User defined functions are useful for many reasons, similar to the idea of having both procedures and functions in a 3GL language

39) Stored Procedure Cache



Stored Procedures are cached in memory on the server. They do not need to be read in from disk each time they are called. There is a special, DBA configurable area of memory which holds the procedure cache, so it does not get flushed by users doing large queries, etc.

40) Shared SQL Cache



The ability to store ad-hoc SQL statements in shared cache memory on the server. The server recognizes when a client requests the execution of a SQL statement already in the shared cache, and can use the already parsed and compiled representation already in memory. This can provide large reductions in server memory usage where lots of users are using the same application (and thus the same SQL statements.)

41) Parallel Data Query



On a symmetric multi-processor, a single query can be decomposed into several parts that can each be dispatched to a separate processor for execution in parallel. This same functionality will eventually extend to clustered systems and networks, but no one can do it yet today.

42) Parallel Index Build



Same idea as parallel query, but for building indexes. Dramatically speeds index builds for large tables.

43) Parallel Sort



Whenever sorted data is needed, the sort operation can be decomposed and processed by multiple CPU's simultaneously.

44) Performance Monitoring Tools



The Vendor provides tools to monitor system performance and to diagnose problems. Tools are available to monitor both individual SQL statements and overall system performance.

II. INTEGRITY


45) Adherence to Industry Standards



There are lots of RDBMS standards. Most useful is FIPS 127-1 since the government actually tests the products to ensure their compliance.

There are also SQL2 and SQL3 standards out there or under development, but no official test suites being proctored. All are complex standards that define a common SQL dialect at an incredibly detailed level. The advantage of this is that if you write FIPS 127-1 code, you'll be able to run your application against all adhering databases with little re-writing of code.

It is not sufficient for a vendor to claim "we use ANSI-  SQL" This is like saying all unix's are the same. If you want to have a chance to be able to port your code to another DBMS without a rewrite, you must only use 100% ANSI standard statements and a database that has been CERTIFIED by NIST. From alanb_at_vnet.ibm.com
You should say FIPS 127-1, not FIPS 127 since there was the original FIPS 127, then the current FIPS 127-1, and soon FIPS 127-2. They are all based on different versions of the ANSI standard. NIST publishes the Validated Products list, and Oracle7, Informix OnLine 5.0, and Sybase 10 show up on the list as of Jan 93. SQL2 compliance is incorrect since there are no test suites yet (FIPS 127-2 will provide the tests). Also, SQL2 has three levels, entry, intermediate, and full; which level are you talking about? When it comes to conformance, do not take a vendor's word for it without them providing a certification of compliance. Also, note that there is a difference between conformance and compliance; I believe compliance means it has been formally tested and verified.

On the subject of conformance/compliance to the SQL standards, the following applies. ANSI X3.135-1989 SQL was adopted by NIST in FIPS 127-1. NIST developed test suites to test compliance with the SQL standard. Only if the DBMS was tested against version 3.0 of the test suites can a vendor claim compliance with the standard. And compliance can only be claimed for a specific hardware/operating system combination, ie. just because one is compliant on Sun does not mean one is compliant on HP. The results of the tests are published every quarter in the Validated Products List. NIST is working on FIPS 127-2 which adopts ANSI X3.135-1992

SQL or SQL2. To claim compliance is jumping the gun at this point because there are no test suites available. And besides, there are three levels to the SQL2 standard - entry, intermediate, and full. Which of these are vendors claiming compliance to?

  • Sybase System 10 has been FIPS 127-1 Validated. 46) Declarative Integrity Model
    This includes full support for declarative referential integrity, default values and domain support. The ANSI declarative approach greatly simplifies the process of providing database enforced integrity. It allows the programmer to define Primary and Foreign Keys, Default Values, Unique Keys, and so on when creating database tables. The database engine automatically enforces these rules to protect system data. I think this is vastly superior to databases that make you program referential and entity integrity using stored procedures and triggers. Certainly more portable, standard, and lower maintenance to use the Declarative Method. Probably better performance as well. 47) Cascading Update
    Defined for this matrix as the ability, within the Declarative model, to propegate updates to the primary key that is a foreign key to another table. For example, if I update a purchase order number, all line items that are associated with that purchase order must have their primary key updated or they will become disasociated records. All of the products with triggers should be able to do this, but I've never tried it and the declarative constraints may block updates to foreign keys. 48) Cascading Delete
    Defined specifically for this matrix as supporting the ON DELETE CASCADE definition of the declarative referential integrity model. Delete cascade deletes all the corresponding children if you delete the parent record. All of the products that support triggers can achieve this programatically, but I prefer the Declarative implementation for the reasons mentioned in the above Declarative Integrity Model Section. 49) Loosely Typed Datatypes, Ala ADA
    The %TYPE variable can be used to declare a variable that has the same structure as the columns or rows in a table or a view. If the datatype of any of the columns in the table changes, any variables declared as %TYPE will automatically change as well. If the variables are in a stored procedure or trigger, they will be automatically recompiled by the database engine, transparently to the user. 50) Null Support
    The programmer can create a table and specify whether or not NULL values can exist in each column. The SQL implementation should provide a function that will determine whether the value in the column is null or not. This feature is useful when performing arithmetic operations and outer joins. The database should also correctly evaluate to false NULL = Anything. 51) Database Triggers
    Database Triggers are pieces of procedural code associated with tables and are fired implicitly when data is modified in their table. They are used for access control, data validation, referential integrity, synchronous table replication, and many more uses. 52) PRE- and POST- Triggers
    Event triggers can be defined to execute before and/or after database events such as inserts, updates, etc. This is useful as a BEFORE trigger can prevent an illegal million row update from happening at all; some products only provide triggers to invalidate and roll back such a transaction after it has already happened. AFTER triggers are better suited to performing transaction auditing, distributed database, etc. 53) Triggers fire once per SQL Statement
    Database triggers automatically fire once, and only once, per SQL statement. These are useful for security, for example, when the trigger might check that the date is not a Sunday, and if it is fail the SQL statement. This type of trigger need fire only once per SQL statement. 54) Triggers fire once for every row affected by an SQL stmt
    Database triggers can automatically fire once for every row affected by a statement. For value based auditing, a developer would use post-insert, update, and delete triggers that fired for each row affected by the statement, and wrote the new and old values of each row into an audit table.

The ability for triggers to fire on a per row affected basis has many applications besides value based auditing. Synchronous table replication, recursive value checking, and multi-row transaction management are only a few possibilities.

55) Programmer can control Trigger Fire Order



Triggers fire in a pre-determined, known order, as opposed to a random order that may differ for each SQL statement.

56) Automatic Procedure Recompilation



The database maintains extensive dependency tracking information which allows it to automatically recompile all objects in a dependency chain whenever code is changed. The database kernel will recompile all procedures, functions, triggers, and packages necessary when dependant structures are modified. The database may also be capable of resolving dependencies of modified procedures on remote nodes.

57) Programmer Defined Datatypes



The programmer can define custom datatypes to the Engine. Useful if you have wierd datatypes, like longitude/lattitude etc.

58) Event Alerters



Events can be defined in the database, which the database will watch for. When the event occurs the database engine will take some pre-determined action. Example: When the inventory drops below a certain level in the inventory table an event alerter notices, send a message to the purchasing clerk and automatically enters a purchase order for the product needed.

59) Database Events call 3GL program



Event Alerters can call a 3GL program, for example to turn the operator's screen red if the reactor is about to melt down.
  • Through add on products (Sybase Openserver, Ingres Knowledge Manager) this is possible. 60) Stored Procedures call 3GL program
    A stored procedure can call a 3GL program, for example to update rows in the employee table and then send email to the employees manager confirming the changes were made.
  • Through add on products (Sybase Openserver, Ingres Knowledge Manager) this is possible.

III. DATABASE


61) Portable



The server should run on many different types of hardware and operating systems. 3-tier portability includes installations on microcomputers, minicomputers and mainframes. In addition, some databases run on super computers and massively parallel computers. Hardware portability is important so you can always select the best performing hardware platform and be sure your database applications will still work.

Other aspects of portability include network protocol portability, and Graphical User Interface portability.

62) Automatic Database Recovery



Database failures are usually grouped into several loose categories. Instance failure, occurs when the machine running the database server crashes, or software fails, or a dumb operator kills the server, with no losses on the database disks. Media failure occurs when a disk containing database information fails.

Database recovery in case of instance failure should be performed automatically by simply restarting the database instance. The engine should roll back any transactions that were pending but not committed at the time of failure, and ensures the integrity of all committed transactions. The time to recover from instance failure should be configurable by the DBA. Recovery from media failure should be able to be performed in automatic, semi-automatic, or manual modes. In all cases, recovery from media failure requires that a backup of the lost files at some point in time is available, along with all database redo files since that point in time.

63) Multiplexed Log Files



The database can maintain multiple transaction redo files on different disks and write to all of them at the same time. This provides added protection in case of a loss of the disk upon which the log files reside, which otherwise would render the database non-recoverable for up to the minute transactions.

64) Database Mirroring



The RDBMS can perform software disk mirroring of the database files regardless of whether the mirroring is supported at the Hardware or Operating system level. Mirroring means keeping multiple copies of all database information to protect from disk failure.

All of the products can take advantage of hardware and operating system disk mirroring, which are preferable to RDBMS software mirroring if they are available.

65) On-Line Database Backup



DBA's can make an online backup of the entire database while the database is up, all tables are online and users are active. This should not require locking, and should have a minimal effect on system performance, other than the I/O's required to perform the backup.

66) Automatic Log Backup



Filled database logs are automatically backed up to secondary storage on disk or tape. This archiving feature as well as online backup and recovery are crucial elements for implementing mission critical applications requiring 24 hour availability.

67) On-Line Software Diagnostics



This is a wide and vague category, but basically what I'm getting at is that tools are available on-line to diagnose problems, understand performance, monitor users, etc. You should also be able to configure the database on line. All the products can do this to some extent, but since I'm not providing ratings of how well features are implemented I suggest you ask users for how they feel about this area. These tools are typically very weak in comparison to mainframe operating system diagnostics.

68) Hot Standby Server



The capability to restart the database on a backup machine in the case of hardware failure. This capability is provided on hardware platforms that support shared disks between multiple computers, such as the DEC VaxCluster. Upon sensing that the primary computer has gone down, a utility on the second machine can then automatically start the backup engine. The engine will recover any outstanding transactions from the failed machine, and then will go into standard operation. 3GL applications can be programmed to notice the failure of the primary server, and automatically reconnect to the backup machine. I think that full cluster utilization mentioned below is a much better solution, but it is more expensive.

69) Full Cluster Utilization



The capability to run a single database on multiple machines in a shared disk environment. This is an advantage over other database systems that only have 'hot standby' or 'companion server' capabilities. Applications can take advantage of the processing power of all machines, not just a single computer at a time. The engine provides automatic fail over reconciliation when nodes do fail, and applications can be written to automatically reconnect to the database if the node to which they are attached fails.

70) On-Line Recovery



This is critical for OLTP, and 24 hour a day mission critical systems. On line recovery is the ability to recover subsets of the database while the rest of the database is up and on-line. The database does not need to be taken offline during recovery of a data file. Users that do not require access to failed areas of the database will remain unaware that failure ever happened. Users that do attempt to access damaged files will receive an appropriate error message, and will be able to access the file as soon as recovery is complete. Some databases may require that the entire database be taken off line for recovery from media failure.

71) Parallel Backup of Multiple Database Files



The Database files can be backed up in parallel. That is, multiple database file backup commands can be issued simultaneously, and on an SMP system can run across multiple processors.

72) Parallel Recovery of Multiple Database Files



Multiple recover database files can be issued simultaneously. On an SMP machine, the recover processes will be spread across available processors. Critical for recovering large databases on SMP machines. Why ? Examine running a transaction processing application for 1 hour on a 14 CPU machine. Thus you have accumulated 14 CPU hours of transaction processing. A disk drive blows up. If you don't have parallel recovery, it now takes you 14 hours to recover your database if you are limited to a single process for recovery.

73) Multiple Log Files



The ability to provide multiple Redo file structures.Some databases have only a single log file, which is a problem since if it ever fills, the entire database hangs. Other database use multiple redo files, which provides a level of insurance if your operator goes to sleep without mounting the proper backup tape. Also, some databases with a single log file get into trouble when long running transactions are going on and the log fills.

74) Resource Limiter/Governor



A resource limiter (Governor) that evaluates the impact of database requests and restricts them according to limits for each of the users that access the database. The resource limiter follows user resource profiles associated with the privilege role of the user. Users can be limited to CPU usage, direct I/O, logical I/O, and wall time per statement or cumulatively per session. The Governor can also log users off the database automatically after specified idle periods. Some of the governors can stop "Queries from Hell" at parse time, by using optimizer statistics. Others only stop the query once a resource limit is reached.

75) DBA Utilities:



Again a weaselly category, since I don't want to give relative rankings. In general, DBA utilities should be able to start and stop an engine, perform real-time monitoring of database use and performance, assist in backup and recovery of database logs and data, and provide for execution of SQL statements. Utilities may also help in managing users, user groups, application privileges and more. I would ask users of systems what they think of each product's DBA tools to get an idea of what is available. The DBA utilities should be able to run in client/Server mode to facilitate centralized management.

76) Database Auditing



Database Auditing covers areas like: Successful and/or unsuccessful attempts to connect/disconnect from the database, Granting or Revoking of privileges, Enabling or disabling writing to the audit trail table, setting default auditing options for any database tables, and more. Auditing should be definable on a per-user or per-system basis. Database Triggers can typically be used to provide journaling of changed data and other specific auditing functions. The databases targeted for C2 level security compliance all provide a great deal of auditing support, more than the average site will ever use.

IV. DISTRIBUTED RDBMS


77) Distributed Join



The capability of joining two tables in a select statement that are on different machines. Some systems can perform this transparently to the user and to the application program. The query is then exactly the same as if the tables resided on the same machine. Example:
	SELECT ENAME, DEPTNAME FROM EMP, DEPT
	WHERE EMP.DEPTNO=DEPT.DEPTNO;

Where the EMP table resides on the local server, and the DEPT table resides on a remote server.

78) Synchronous Table Replication



Table replication allow the distribution of updates from a single master to one or more slave nodes. Synchronous replication implies that updates are propogated to slave nodes in real time, protected by two-phase commit. Classic use is for banking transactions, when balances must be synchronized on all machines at all times. The danger of Synchronous replication is this if one node fails, the transaction doesn't go through on any of the nodes.

79) Asynchronous Table Replication



See Synchronous Replication. Asynchronous replication differs in that updates are not propegated in real time. This is useful for information that doesn't always need to be perfectly in synch. Also, does not have the problem with all nodes needing to be alive for the initial transaction to complete. Updates get propegated around the network on at programmer defined intervals.

80) Connections to other Databases



Gateways are available that facilite the incorporation of data from foreign databases. Some gateways are read/write, some are read only. Some are procedural, others are SQL based.

81) Programmatic Two Phase Commit



Two phase commit is a mechanism for managing distributed update capabilities in a distributed database.Two Phase Commit (TPC) ensures that an update either completes or rolls back on all nodes in a transaction. Programming this is very complex, and gets exponentially more difficult as the number of nodes in a transaction grows. I have never heard of an application that has been built on one of these products using programmatic two phase commit that was successful.

82) Transparent Two Phase Commit



Transparent TPC eliminates programming and lets the servers handle the complexities behind the TPC algorithms. An example of this is below:
	UPDATE EMP...; 	(EMP is on Node A)
	UPDATE DEPT...;	(DEPT is on Node B)
	COMMIT;

THAT'S IT !! 83) Commit to Foreign Database



The gateway products allow single site updates and commits to heterogeneous databases. In addition, a single heterogeneous database can participate in a Two-Phase Commit transaction with multiple native databases.

84) Movable Transaction Coordinator



In a Two-Phase Commit distributed update, one of the databases acts as the coordinator of each transaction. The coordinator does not need to go through the Prepare stage of the TPC algorithm, and thus locks resources for a minimal amount of time. All of the products that support programmatic TPC allow the programmer to specify the coordination site. This category specifically refers to the Transparent TPC facilities, where available.

The coordination task can be assigned according to network resources and machine capabilities as required for each transaction. This protects resources on your most critical database servers. The server automatically chooses the transaction coordinator, also known as the commit point, based on pre-defined server criticalities. The commit point need not prepare for a two-phase commit transaction, so if a failure occurs no resources on the transaction coordinator are held for in-doubt recovery.

85) Remote Procedure Calls



Stored procedures may be called from remote nodes without restriction. Translation of character sets and datatypes automatically occures.

86. RPC's Protected by Two Phase Commit



Stored Procedures can perform remote updates or call other stored procedures on remote nodes. Any distributed transactions are automatically protected by TPC

87) Heterogeneous Data Support



Gateways exist which allow applications to access data in foreign databases. Typically the application cannot tell that this data is not coming from its normal server. There are lots of issues with database gateways, including read/write capability, performance, static vs dynamic SQL to DB2, and on and on.

88) Procedural Database Gateway



Gateways exist which allow remote stored procedure calls to be executed against foreign databases. The application believes that it is calling a native stored procedure, though the call is actually being executed against a foreign database.

89) SQL Based Database Gateway



Ad Hoc SQL statements can be sent to foreign databases for processing. The application thinks that it is sending a SQL statement to the local server. Issues include SQL dialect translation, functionality augmentation, static vs dynamic SQL, performance, and more.

90) XA TP Monitor Compliant



The Server is compatible with the XA Transaction Processing Monitor Specifications.

91) Top End TP Monitor Support



The server works with the Top End TP Monitor Product.

92) CICS TP Monitor Support



On IBM Mainframes running MVS, the server or gateway is able to work with the CIC TP monitor.

93) Transaction Can Span Multiple Net Protocols



A single transaction can be routed across multiple network protocols. For example, a PC on a Novell LAN send a query across IPX that talks to the LAN RDBMS Server. From there, the Query goes out across TCP/IP to a unix mainframe. Important becuase it sucks to have to buy TCP/IP stacks and NIC's for every PC just to talk to the enterprise server if you want to do client/server to your Novell Lan. Appletalk has similar issues.

94) ODBC Support



The product supports (or has announced support for) Microsoft's ODBC standard.

95) SQL Access Group Support



The product supports or has announced support for the SQL Access Group's connectivity standard.

V. DATABASE SECURITY


96) O/S Security Integration



The ability to specify that database logins will be validated by the operating system. That is, each account maps on a one to one basis with operating system login identifiers. On IBM systems, database security is integrated with the host security management system (RACF, ACF2 and others)

97) User Group Privileges/Roles



The system supports simple user groups, like unix, where access rights can be grants to group names rather than individuals.

Roles are collections of privileges like user groups. Roles differ from simple user groups in terms of additional functionality. Users may be granted several rolls, and can switch back and forth between roles in a session. Users can also enable and disable specific privileges within the role they are in. Rolls are additive in nature, thus a manager role can have the privileges of clerk plus additional rights.

98) Trusted Database/ Mandatory Access Control/ B1 Security



A high security version of the database, where every row is labelled with a security classification (Top Secret, Public, etc) Lots of other security and auditing features as well. Most applicable for Department of Defense applications, or corporate applications where lots of security is needed. Available as an option for all products. May not be based on the lastest version of the server engine, and may not have all functionality of the normal engine.
  • All are seperate, extra cost products from the normal engine.
  • Sybase System 10 has been targeted for this level, but it is not a current production product. 99) Undergoing for Evaluation by the NCSC
    The Database server has been selected for the Trusted Product Evaluation Program (TPEP) of the U.S. Government's National Computer Security Center (NCSC). The products were submitted for official government evaluation, targeted at Class C2 and Class B1, respectively, of the Trusted Database Interpretation (TDI) of the Department of Defense's Trusted Computer System Evaluation Criteria (TCSEC or "Orange Book").

Participation in this program is an involved process in which the functionality and security of the product is evaluated against computer security criteria published by the government. A product must meet the functional requirements of the target evaluation class, and the implementation of security features is tested to assure correct design and functions. There are seven classes of evaluation ratings, ranging from A1 (most secure) through B3, B2, B1, C2, C1 to D (minimal security). Once a product passes evaluation at the target rating class, it is placed on the NCSC's Evaluated Products List.

As of this date, no product has been certified in the United States. Ingres has been certified at the B1 equivalent level in Europe.

From Alanb_at_vnet.ibm.com
The only RDBMSs undergoing evaluation by NCSC for C2/B1 security are Oracle and Informix. They are still in the evaluation process and have not been certified. Ingres was evaluated by a European security group at a level equivalent to B1.

  1. Standard Tools with Trusted (B1) Product
    The Trusted (B1) Engine, uses the standard development tool set. No additional tools training is required for developers to switch from normal to Trusted applications. Secure databases do, however, require a different database and application design to accommodate the demands of multi-level secure applications. Some vendors have require specialized tools to work with the high security database.
  2. Non-Dedicated Server
    The Trusted (B1) DBMS engine does not compromise the security of other trusted applications on the host machine on which it is running. It works with the trusted operating systems to control access to secure data. Some implementations of Secure Databases require that the database server is the only program running on the server computer.
  3. Product being evaluated at C2 Level
    The standard version of the engine is undergoing evaluation by the NCSC at the C2 target security level. As of this writing, no products have been certified as C2 compliant.
    • From Alanb_at_vnet.ibm.com The only RDBMSs undergoing evaluation by NCSC for C2/B1 security are Oracle and Informix. They are still in the evaluation process and have not been certified. Ingres was evaluated by a European security group at a level equivalent to B1.
  4. Product being evaluated at B1 Level
    An enhanced security version of the product is undergoing evaluation by the NCSC at the B1 security level. As of this writing, no products have been certified B1 compliant.
    • From Alanb_at_vnet.ibm.com The only RDBMSs undergoing evaluation by NCSC for C2/B1 security are Oracle and Informix. They are still in the evaluation process and have not been certified. Ingres was evaluated by a European security group at a level equivalent to B1.
  5. Product Certified (Europe) at B1 Equivalent
    The High Security Version of Ingres has been certified by a European Agency in Europe at a security level similar to U.S. B1 Level.

VI. LANGUAGES & TOOLS


  1. SQL Procedural Language
    Because native SQL sucks for algorithmic processing, most vendors have devised procedural languages. These are the languages used to write stored procedures and triggers, and often can be used as stand-alone programs as well.
  2. Text/Image Datatypes (BLOBS)
    Database fields can contain Binary and Text data up to 2 gigabytes per field per row. Useful for storing images, sound, video, telemetry data, etc.
  3. Union Operator
    Standard union combines multiple tables and eliminates duplicate rows.

The ANSI standard also includes the UNION ALL operator, which does not eliminate table duplicates and is useful for horizontal partitioning of data.

  1. Select for Update
    SELECT... FOR UPDATE is useful for obtaining locks on desired structures and can be used if REPEATABLE READ functionality is required. It can be used to test if a structure is already locked by another user, like the TSET instruction or executing a spinlock. It is required for ANSI compliance.
  2. Outer Join Operator
    Outer Join lets you perform a join where you want all the rows from one of your tables, even if no matching rows exist in the second table. Very useful for modelling real world problems. Example, You have no employees in the HR Department, but you still want it to show up on the list of departments and their employees.
  3. Dynamic SQL
    SQL statements can be generated on an AD-Hoc basis and sent to the database engine for processing. They do not need to be seperately compiled and have plans generated before they can be executed.
  4. Static SQL
    SQL statements can be pre-parsed, compiled, and have plans generated. Thus they execute faster since this doesn't have to be done at run time. Improves speed at the expense of flexibility and maintenance. This section explicitly does NOT include stored procedures and triggers, which are typically stored compiled.
  5. EXPLAIN Command
    The EXPLAIN Command lets you print out a report that show the Query Execution path chosen by the optimizer, and statistics about how the actual query was done. It show what indexes were used, what type of join was done, how many rows were processed, etc.
  6. Transaction Savepoints
    Savepoints are markers which are used in transactions to increase their atomicity. Coupled with the ROLLBACK TO SAVEPOINT command. This prevents an entire multi-part transaction from being rolled back by the failure of one of the individual statements within it.
  7. Aliases/Synonyms
    Aliases and Synonyms are used as substitute names for tables, views, etc. Used to hide the fact that a table is actually owned by a different schema: CREATE SYNONYM FOO for DAN.FOO. Also used to implement location transparancy for distributed databases: CREATE SYNONYM FOO for DAN.FOO_at_LOS_ANGELES
  8. NIST Verified conformance to Standards
    The product has been tested and passed with 100% compatibility the NIST test suite for ANSI FIPS 127-1 SQL. Important for government bids as non FIPS 127-1 products are usually noncompliant, and important if you want to be able to write portable code.
    • Sybase System 10, and not the current product, has been verified.
  9. 3rd Party CASE Tools
    3rd Party CASE tools are available for Computer Aided Software/ Systems Engineering. Some people prefer 3rd party case tools because they can generate code for several different types of DBMS's, and are considered to be less of a vendor "lock in."
  10. Internal CASE Tools
    The vendor provides CASE tools for Computer Aided Software/ Systems Engineering. Some people prefer vendor provided CASE since it tends to be tighly linked to the database and can take advantage of all of the most up to date features.
  11. Graphics Tools
    The vendor provides tools that allow graphical applications to be built that use database data. All of the vendors have 3rd party graphics tools available.
  12. Internal E-Mail System Integration
    Mail systems are provided by the vendor which are written specifically to run on top of and integrate with the database system. In general this facilitates the combining of database data and reports into the mail system. Database events can trigger mail messages, and vice-versa.
  13. 3rd Party E-Mail System Integration
    3rd Party Mail systems exist which are written specifically to run on top of and integrate with the database system. In general this facilitates the combining of database data and reports into the mail system. Database events can trigger mail messages, and vice-versa.
  14. Trigger/Stored Procedure Sends E-Mail
    A trigger or stored procedure can trigger the creation of an E- message. Ask the vendor to show how easy this is to do in the various systems. Some come with built in stored procedures to accomplish this, others you need to write your own C code. I think this is a pretty cool feature to have.
  15. Office Automation Tools
    Office Automation tools are available, which integrate RDBMS data into E-Mail, Word Processor, Calendar-Schedular, Spreadsheet, Spell Checking, Document Management, etc...
  16. Bit-Mapped Graphics
    The 4GL supports Bit-Mapped Graphics. How well ? Ask some users.
  17. National Language Support
    Which of the following languages are supported. Language support includes translated manuals, error messages, commands, etc. Also number formats and dates are different in different languages. 16 bit characters can be stored where appropriate.
  18. Precompiler Support
    Precompilers allow you to embed SQL statements (Embedded SQL) in 3GL programs. These programs are run through a pre-compiler to produce pure 3GL code with low level function calls replacing the SQL statements.

Precompilers should also provide both syntactic and semantic checking for increased programmer productivity. Precompiler support is a key component of the ANSI SQL standard as well as the Federal Government's NIST SQL compliance test.

Precompilers are important becuase they let you write standard code that can be re-precompiled and re-compiled and then will run against other database engines. If you write your code to your product's function call interface, you are screwed if you ever want to change engines.

  1. Function Call Interface
    A low level function call interface is available. Arguments for using this are that it provides better performance than using embedded SQL and precompilers, if you know what you are doing. Arguments against include that precompilers are getting better and get you close to the speed of writing directly in the call interface, and that if you write in the vendor's call interface you are locked into non-portable code.
  2. Module Language
    Modelled after ADA, the idea is that you modularize all of your embedded SQL in an external file, and only put native 3GL statements in your actual program. The Module Language helps you integrate the two.
  3. Bind Variable Support
    Bind variables allow applications to execute a SQL statement multiple times with different values for the bind variables without reparsing the SQL statement. This leads to performance improvements in applications requiring repetition of SQL statements. Bind variables are usually associated with database cursor support.

PART 2



Other Stuff You Should Consider

This section has some things suggested by people on the net that either don't lend themselves well to the checklist format, or I didn't have time to investigate as thoroughly as I'd like. These are all great questions to make the vendors sweat, and do their due dilligence.

  1. Theoretical Limits
    Databases per Computer Databases per Organization Total Single Database Size Tables Per Database

Rows Per Table
Columns Per Table
Indexes Per Table
BLOBS Per Table

Columns Per Index
Rows Per Index
Levels Per Index

Locks Per Transaction
Locks Per Database Page
Locks Per Table
Locks Per Database

Columns Per SELECT Statement
Tables Per SELECT Statement
Databases per SELECT Statement

Time to Back Up 1 Gigabyte
Time to Load 1 Gigabyte Unsorted Data
Time to Load 1 Gigabyte Sorted Data
Time to Recover 1,000,000 Transactions

2) Practical Limits/References



For the categories above, what are the limits for actual existing production databases out there today on any hardware ? Same Question, but for the target machine or hardware class for this application.

3) Datatypes and Precisions



What Datatypes are available and at What Precision Levels ? What is the Maximum Length of each ?

Suggested Types Include

Fixed Length Character
Variable Length Character
Date
Time
Smallint
Integer
Packed Decimal
Float/Double
Currency
Binary Large Object (BLOB)
Text Large Object

Thanks to the following people for their input:

Alan Goodman		alan_at_dssmktg.com
Bill Coffin 		billc_at_ingres.com
RW Salnick		salnick_at_dejavu.spk.wa.us
? ? 				?_at_pivot-sts.sbi.com
Graeme Sargent		graeme_at_pyra.co.uk
Matt Kosatur		matt_at_meaddata.com
Paul Turner		turner_at_telstar.kodak.com
Snorri Bergmann		snorri_at_strengur.is
Bill H 			billh_at_greed.sbil.co.uk
mark saindon		msaindon_at_oracle.com
Ben				ben_at_sybase.com
Alan Beal			alanb_at_vnet.ibm.com
Curt Leaist 		curtl_at_zed.com
Lou Kates			louk_at_research.teleride.on.ca 
Newsgroups: comp.databases,comp.databases.informix,comp.databases.ingres,comp.databases.oracle,comp.databases.sybase
Subject: RDBMS Server Evaluation Matrix v 1.0.2 - FAQ
Summary: RDBMS Feature Comparison 
Expires: 

References:
Sender:
Followup-To: sender
Distribution: world
Organization: Anderson Graduate School of Management, UCLA Keywords: Oracle, Ingres, Sybase, Informix, RDBMS, FAQ

Attached is the first production draft of the RDBMS Server Comparison Matrix. As a change to this version, I have included a detailed description of each of the categories at the bottom of the matrix. I have also included numerous corrections, changes, and suggestions from the folks listed at the end of this document.

Please forward any corrections, comments, or additions to me at the address below.

When using the matrix, please remember that this is shareware and no warranties are given or implied. I have made every effort to include correct information, and have had this document reviewed by active usenet contributors and users of all the products. Interestingly, I send copies of this to active internet participants from each of the vendors, but didn't get a ton of feedback.

I considered including information for Sybase System 10, which make that product look a lot better. However, Since the next versions of Oracle7, Informix, and Ingres also have good new stuff in them I don't want to play the beta game. I've included some information on a few beta offerings such as for ODBC and precompilers,  but I hesitate to do this for the engines themselves.

I suggest that you use this document as a starting point for your database search. Figure out what you application needs are, then what features are most important to you, and use this to get a rough idea of what might be the best fit. Then put together an RFP for the vendors. I've found that an evaluation matrix weighting the functionality you deem most important, and ranking each vendor's implementation of that functionality based on their responses is very helpful in selecting the right product for your application.

I hope this helps people out.

Dan Druker
2nd Year MBA Student
The Anderson Graduate School of Management at UCLA (Formerly with Oracle Corp, now Independent) ddruker_at_agsm.ucla.edu

RDBMS Server Feature Comparison Matrix: Version 1.0.2

Copyright 1993 by Daniel Druker
All Rights Reserved

Product

O = Oracle7 		Version 7.0
S = Sybase  		Version 4.9
A = Ask/Ingres 		Version 6.4
I = Informix On-Line	Version 5.0


NOTE: The 4th column is Informix, not Ingres. This confuses some people...

Y = Yes, the feature exists in the current production product
N = No, the feature does not exist in the production product
* = See the note in the detailed description section
B = The feature exists in a useable, late beta product. 


Product						O S A I 

I. PERFORMANCE


  1. Row Level Locking Y N N Y
  2. Page Level Locking N Y Y Y
  3. Table Level Locking Y Y Y Y
  4. Database Level Locking Y Y Y Y
  5. Programmable Read Isolation Levels N N N Y
  6. System Doesn't Escalate Locks Y N N Y
  7. Programmer May Escalate Locks Y Y Y Y
  8. Versioning / No Read Locks Y * * *
  9. Dirty Read Mode N N Y Y
  10. Multi Server Architecture Y Y Y N
  11. Symmetric Multi-Processor Support Y Y Y Y
  12. Support for Loosely Coupled Systems Y N Y N
  13. Array Interface Y N N N
  14. Shared Commits Y Y Y Y
  15. Asynchronous I/O Y Y Y Y
  16. Raw I/O Y Y N Y
  17. Shared Log Files Y Y Y Y
  18. Non-Blocking Queries Y * * *
  19. Clustered Tables Y N Y N
  20. Clustered Indexes Y Y Y Y
  21. Hash Indexes Y N N N
  22. ANSI Standard Cursor Support Y N Y Y
  23. Backward Scrolling Cursors N N N N
  24. Non-Blocking Sequence Generator Y N N Y
  25. Unique-Key Generator Y N Y Y
  26. Cost Based Optimizer Y Y Y Y
  27. Optimizer uses Data Content Heuristics N N Y N
  28. Optimizer Scans Entire Table to get Stats Y Y Y Y
  29. Optimizer can estimate stats to save time Y N Y N
  30. Optimizer uses network costs N N N N
  31. Optimizer uses hardware costs N N N N
  32. Optimizer knows real-time load factors N N N N
  33. Optimizer moves join location Y N Y Y
  34. Stored Procedures in Database Y Y Y Y
  35. Cached Procedures are Shared Y Y Y N
  36. Stored Procedures can return Sets Y N N Y
  37. Stored Procedures Return All Datatypes Y N N Y
  38. Stored Functions in Database Y N N Y
  39. Stored Procedure Cache Y Y Y Y
  40. Shared SQL Cache Y N N N
  41. Parallel Data Query N N N N
  42. Parallel Index Build N N N N
  43. Parallel Sort N N N N
  44. Performance Monitoring Tools Y Y Y Y

II.INTEGRITY O S A I



45. Adherence to Industry Standards
	SQL Interface				Y Y Y Y
	FIPS 127-1 Validated 			Y * Y Y
46. Declarative Integrity Model			Y N N Y
47. Cascading Updates				N N N N
48. Cascading Deletes 				Y N N N 
49. Loosely Typed Datatypes, Ala ADA		Y N N N
50. Null Support				Y Y Y Y
51. Database Triggers				Y Y Y N
52. PRE- and POST- Triggers			Y N Y N
53. Triggers fire once per SQL Statement	Y Y Y N
54. Triggers fire for each row affected 	Y N N N
55. Programmer can control trigger fire order	Y Y N N 
56. Automatic Procedure Recompilation		Y Y Y Y
57. Programmer Defined Datatypes		N Y Y N
58. Event Alerters				Y N Y Y
59. Database Events Call 3GL Program 		Y * * N
60. Stored Procedures Call 3GL Program 		Y * * N 


III. DATABASE					O S A I
=============			
61. Portable					Y Y Y Y
62. Automatic Database Recovery 		Y Y Y Y
63. Multiplexed Log Files 			Y Y Y Y
64. Database Mirroring				N Y N Y
65. On-Line Database Backup			Y Y Y Y
66. Automatic Log Backup 			Y Y Y Y
67. On-Line Software Diagnostics		Y Y Y Y
68. Hot Standby Server				Y Y Y N
69. Full Cluster Utilization			Y N Y N
70. On Line Recovery				Y Y N N
71. Parallel Backup of multiple files		Y N N N 
72. Parallel Recovery of multiple files		Y N N N 
73. Multiple Log Files				Y N Y N
74. Resource Limiter/Governor			Y N Y N
75. DBA Utilities 				Y Y Y Y
76. Database Auditing 				Y Y Y N


IV.DISTRIBUTED RDBMS				O S A I 

======================
77. Distributed Join Y N Y Y 78. Synchronous Table Replication Y N Y N 79. Asynchronous Table Replication Y N N N 80. Connections to other Databases Y Y Y N 81. Programmatic Two Phase Commit Y Y Y N 82. Transparent Two Phase Commit Y N Y Y 83. Two Phase Commit to Heterogeneous RDBMS Y N N N 84. Movable Transaction Coordinator Y N N N 85. Remote Procedure Calls Y Y Y Y 86. RPC's Protected by Two Phase Commit Y N Y Y 87. Heterogeneous Data Support Y Y Y N 88. Procedural Database Gateway Y Y N N 89. SQL Based Database Gateway Y Y Y N 90. XA TP Monitor Compliant Y N N Y 91. Top End TP Monitor Support Y Y Y Y 92. CICS TP Monitor Support Y Y N N 93. Trans. Can Span Multiple Net Protocols Y N Y N 94. ODBC Support B B B B 95. SQL Access Group Support B B B B V. DATABASE SECURITY O S A I
======================
96. OS Security Integration Y Y Y Y 97. User Group Privileges/Roles Y Y Y N 98. Mandatory Access Control * * * * 99. Undergoing for Evaluation by NCSC Y N N Y 100. Standard Tools with Trusted Product Y N N Y 101. Non-Dedicated Trusted Server Y N Y Y 102. Product being evaluated at C2 Level Y N N Y 103. Product being evaluated at B1 Level Y N * Y 104. Product Certified (Europe) at B1 Equiv N N Y N VI.LANGUAGES & TOOLS O S A I
======================
105. SQL Procedural Language Y Y Y Y 106. Text/Image Datatypes (BLOBS) Y Y N Y 107. Union Operator Y Y Y Y 108. Select for Update Y N Y Y 109. Outer Join Operator Y Y Y Y 110. Dynamic SQL Y Y Y Y 111 Static SQL Y Y Y Y 112. EXPLAIN Command Y Y Y Y 113. Transaction Savepoints Y Y Y Y 114. Aliases/Synonyms Y Y Y Y 115. NIST Verified Conformance to Standards Y * Y Y 116. 3rd Party CASE Tools - Upper and Lower Y Y Y Y 117. Internal CASE Tools - Upper and Lower Y Y N N 118. Graphics Tools Y Y Y Y 119. Internal Email System Integration Y N N N 120. 3rd Party Email System Integration Y Y Y N 121. Trigger/Stored Proc Sends Email Y Y Y N 122. Office Automation Tools Y Y Y Y 123. Bit-Mapped Graphics Y Y Y Y 124. National Language Support European Y Y Y Y Semitic Y N Y N Asian Y N Y Y 125. Precompiler Support C Y Y Y Y COBOL Y Y Y Y Pascal Y N Y Y PL/I Y N N N SAME/DL N N N N ADA Y N Y Y Fortran Y N Y Y C++ B N N N 126. Function Call Interface C Y Y Y Y COBOL Y Y Y Y ADA Y N Y Y Pascal Y N Y Y PL/I Y N N N SAME/DL N N N N Fortran Y N Y Y C++ B N N N 127. Module Language Y N Y N 128. Bind Variable Support Y N Y Y PART 2: Other Important Stuff 1. Theoretical Limits 2. Practical Limits/References 3. Datatypes and Precisions

DETAILED DESCRIPTION OF CATEGORIES


  1. Row Level Locking
    First, a little about granularity. The smallest level of granularity in commercial databases is the Row level. Next comes Page level. Then Table Level. Then Database Level. Typically Database pages are 2048 bytes, so if your rows each contain 100 bytes of data you might squeeze 20 rows into a page.

Row level locking means that the database can use a lock granularity of a single row (tuple for the coneheads out there.) This means that multiple users (Up to 20 in the example above, can simultaneously update different rows on the same page. Each user, when performing an operation on the row, only locks that row in question and does not interfere with other users in the same page. Arguments for Row Level Locking are that it permits the highest degree of concurrency - Users tend not to lock each other out. Arguments against claim that row level locking causes a lot of overhead. My belief is that users are the slowest thing in my system, so I'd rather not wait for them. Thus I prefer row level locking.

2) Page Level Locking



See Also Row Level Locking. In Page Level, Locks occur at (surprise) the page level. Thus when one user updates a row, the entire page is locked and other users are blocked from updating (sometimes reading too) rows in that page. Sometimes users get around page level blocking by forcing what would be small rows to take up an entire page. This trick effectively simulates row level locking for critical tables that are getting hit by updates a lot, and are having problems with contention. The arguments for and against page level locking are mentioned above.

3) Table Level Locking



The database supports the LOCK TABLE command to lock an entire table at a time. Useful for locking a table for batch updates, locking out users for maintenance or reporting, etc.

4) Database Level Locking



The entire database can be locked with a single command. As far as I know, the products all support this through a single-user or DBA only mode, but there is no LOCK DATABASE command.

5) Programmable Read Isolation Levels



The programmar can indicate which type of read isolation he wishes to use for each SELECT statement. Examples are Dirty Read (Uncommitted Data), Cursor Stability, Repeatable Read, etc. These terms are given in order of growing strength.

6) System Doesn't Escalate Locks



Some databases have a maximum number of locks that can be supported, and automatically escalate row or page level locks to table level locks when that number has been exceeded. I believe that system generated lock escalation causes problems by creating artificial deadlocks.

7) Programmer May Escalate Locks



The programmer may escalate locks at his/her discression. For example, if the program writes to a certain percentage of the pages in a table, it may want to go ahead and upgrade to a table lock to avoid running into locked pages and having to wait or to rollback work.

8) Versioning / No Read Locks



The database keeps track of several versions of committed data so queries see only those rows that were committed and valid at the instant the query started. The database allow consistent reads (committed data) without taking out read locks. This is a good thing since readers do not interere with writers, and accurate reports can be written since only committed data is read.
  • Sybase has Browse Mode which does not use read locks. Informix and Ingres have dirty read mode which do not use read locks. 9) Dirty Read Mode
    The database has a read mode which scans the data as it currently exists on disk or in memory, regardless of whether it has been committed or not. In general dirty reads are very fast, but since the data is not necessarily valid some applications can't use them. Good for reports where accuracy doesn't matter.
    1. Multi Server Architecture
      Multiple Server processes exist on the server machine. These usually take two forms, a two-task architecture where each client user gets a corresponding OS process on the server, or a multi-threaded architecture where each of the server processes can manage multiple clients. Multi-Server architecture helps balance system load and is mandatory for taking advantage of symmetric multi-processors. The two-task architecture is considered resource intensive on operating systems that don't deal well with lots of processes well.
    2. Symmetric Multi-Processor Support
      Can the server take advantage of multiple CPU's in a symmetric multi-processor environment. Things to make sure a product can do: Make sure each client can read, write and commit transactions in parallel on any available CPU. Make sure, as you add CPU's, that transaction throughput increases in a near linear fashion. Some products and operating systems may scale well to 4 CPU's, others to 8, and others to 30, hundreds, or more. Some may not scale at all. (OS2, for example)
    3. Support for Loosely Coupled System
      Pioneered by DEC in their VAX cluster, loosely coupled processors are usually independent computers connected by a fast communications bus that can share and coordinate resources. Databases that work in this environment can run simultaneously on several nodes in a cluster. This increases power since processing is going on at each node, and reduces complexity since no two-phase commit or distributed database work is needed. Also helps for fault tolerance since if one of the machines dies the remaining nodes keep running. Today several unix machines (Pyramid, Sequent, IBM RS/6000) offer clustering in addition to DEC. Performance concerns may exist if OLTP goes against the same data on more than one node since database blocks get sent pinging around the cluster if this happens to much. Other than that I think this is pretty cool. Also this is the basis for how Oracle works on Massively Parallel machines like Ncube, KSR, Maspar, etc.
    4. Array Interface
      Can the client request that multiple rows be sent in a single batch, as opposed to a network request for every row to be returned. Reduces network traffic. Example: Send me 1000 rows, thanks. vs. send a row, thanks, send a row, thanks, 1000 times. Only an issue for cursor based processing: for streams based this is not an issue.
    5. Shared Commits
      Shared or Group commits write multiple transactions into the redo structure in a single physical I/O. This is an advantage over database systems which require a physical I/O for each individual transaction. Some products may not support shared commits on SMP hardware.
    6. Asynchronous I/O
      Most operating systems do provide Asynchronous I/O, including VMS, MVS, VM, and unix from Pyramid, and Sequent to name a few. On systems that do not provide asynchronous I/O, the need to wait for disk writes to complete may become a performance bottleneck. The server should be able to take advantage of Asynchronous I/O where available, and have a strategy to deal with the lack of asynchronous I/O on systems where it is not available.
    7. Raw I/O
      On Unix, it is much faster to be able to access raw filesystem devices, which are not controlled by the unix file and buffer managers, than to use normal or "cooked" files.
    8. Shared Log Files
      Every change to the database structure is automatically written in a "piggybacked" manner to the Redo file structure. Transactions are piggybacked in a way that one physical write can actually commit several transactions. This is a very efficient way to commit transactions because it requires only a sequential write to an o/s file. Multiple users can share the same redo file structure.
    9. Non-Blocking Queries
      For systems where multiple users are reading and writing to the same table at the same time, it may be critical that readers doing reports do not interfere with ongoing transaction processing. When readers need to obtain locks to get information, they are said to "block" writers since writers are blocked from getting the locks they need to perform their update. This is a great feature if you have a transaction processing system on which you want to be able to run reports against live data during operating hours.
  • Ingres and Informix can accomplish this by using Dirty Read mode. Sybase does it by using browse mode. See the above sections for limitations and applications of these features.
    1. Clustered Tables
      Clustered tables: Store multiple tables close to each other on disk based on a commonly used join key. This helps speed up access if two tables are always accessed by the same join key. Takes longer to insert and update than normal storage method.
    2. Clustered Indexes
      Clustered Indexes: Several Definitions. Either an index on the cluster key mentioned above, or on some systems a pre-sorted index cluster that greatly speeds data retrieval for ordered operations, at the expense of insert and update speed. Oracle uses the first type, Sybase the second.
    3. Hash Indexes
      Most products use binary tree indexes, which are flexible and good for objects that grow, shrink, change, etc. Hash indexes offer faster performance but are appropriate only for relatively static tables.
    4. ANSI Standard Cursor Support
      Cursors are memory structures which store the SQL statement, a parsed/compiled version of the SQL statement, a buffer for the result data, and pointers to the result set of the query, among other things. Cursors allow applications to process multiple SQL statements in a single database connection. This is crucial for window-based environments which require multiple SQL statements to be processed at one time with different information being displayed in different windows. This allows multiple windows to be created via a single session and reduces memory utilization requirements. In addition, multiple SQL statements can be encompassed in a single transaction with all internal locking controlled by the database engine.

Cursors are also required to meet ANSI standards, both for SQL compliance and for Pre-Compiler Support.

23) Backward Scrolling Cursors



All cursors permit forward scrolling of the result set, some products also allow backward scrolling. Good for those of you that like to think backwards.

24) Non-Blocking Sequence Generator



Many applications require that unique keys be generated, for example for purchase order numbers. A nice feature is a multithreaded  non-blocking sequence generator which can be used to generate sequence numbers for rows in tables. Some databases require applications needing unique primary keys to single thread all transactions through a single sequence table, which can become a major application bottleneck.

25) Unique Key Generator



The database contains a mechanism for generating unique keys.

26) Cost/Statistics Based Optimizer:



The database will gather and store comprehensive statistics about database structures. The statistics based optimizer will choose the most efficient access paths to data based on the information gathered. Some products do a much better job than others. Nice things to have: The ability for the system to "Guess" or estimate the statistics on a table based on a sample of the data. This is a lifesaver when it takes 4 hours to analyze statistics for a 30 gigabyte table. Also, some of the optimizers take into account low/high data values, and data distribution heuristics.

27) Optimizer uses Data Content Heuristics



The optimizer scans the data to determine the minimum value in the table, the maximum value, and the average value. It also keeps track of the distribution of data within the rows.

28) Optimizer Scans Entire Table to get Stats



To gather statistics, the optimizer must scan the entire table. This can take a long long long time for multi-gigabyte tables.

29) Optimizer can estimate stats to save time



The Optimizer can estimate statistics by scanning only a random sample of the rows in the table. Very useful for collecting statistics on large tables where it is impractical to scan the whole table for statistics.

30) Optimizer uses network costs



The optimizer is aware of the existance and make-up of the network, and some costs have been assigned to each link. These may be either costs in terms of link speed or in actual dollar expense for using the link. The optimizer will determine the lowest cost path in terms of time or expense, which may involve shipping data to different nodes for joins or through intermediate nodes depending upon the network.

31) Optimizer uses hardware costs



The optimizer is aware of the relative processing power, CPU Cycle cost, I/O speed and cost, etc. It decides how to implement queries based on this cost information, and the user's resource profile.

32) Optimizer knows real-time load factors



The optimizer can take into account real time system load information in determining the optimal path for queries. Thus this path may change from time to time depending on system load.

33) Optimizer moves join location



Classic example is a local table with 10 rows being joined to a remote table with 1 Million rows. The optimizer is smart enough to send the local table to the remote node for the join rather than the other way around.

34) Stored Procedures in the Database



Stored procedures are bits of procedural code, grouped together and stored in the DBMS engine. Stored procedures are usually stored in shared, compiled format. Stored procedures should be callable by applications, by other stored procedures, or by database triggers. There should also be full dependence tracking that will automatically recompile stored procedures when objects upon which they depend change. Stored procedures are good because they typically improve performance and reduce network traffic. They also are great for encapsulating allowable operations on data - most systems let you grant access to a stored procedure without granting access on the underlying tables. The major downside is that there is no standard for stored procedures, so any code you write that uses them is non-standard and non-portable.

35) Cached Procedures are Shared



Stored Procedures are cached in memory on the server. A single copy of the stored prodedure can be used by multiple users. Saves memory and execution time.

36) Stored Procedures can Return Sets



Stored procedures can return an array or table of data. Not limited to returning only a single row.

37) Stored Procedures return All Datatypes



Stored procedure are capable of return all available server datatypes, not just a limited subset.

38) Stored Functions in Database



Similar to Stored Procedures, but instead stored function calls can be defined by the user. User defined functions are useful for many reasons, similar to the idea of having both procedures and functions in a 3GL language

39) Stored Procedure Cache



Stored Procedures are cached in memory on the server. They do not need to be read in from disk each time they are called. There is a special, DBA configurable area of memory which holds the procedure cache, so it does not get flushed by users doing large queries, etc.

40) Shared SQL Cache



The ability to store ad-hoc SQL statements in shared cache memory on the server. The server recognizes when a client requests the execution of a SQL statement already in the shared cache, and can use the already parsed and compiled representation already in memory. This can provide large reductions in server memory usage where lots of users are using the same application (and thus the same SQL statements.)

41) Parallel Data Query



On a symmetric multi-processor, a single query can be decomposed into several parts that can each be dispatched to a separate processor for execution in parallel. This same functionality will eventually extend to clustered systems and networks, but no one can do it yet today.

42) Parallel Index Build



Same idea as parallel query, but for building indexes. Dramatically speeds index builds for large tables.

43) Parallel Sort



Whenever sorted data is needed, the sort operation can be decomposed and processed by multiple CPU's simultaneously.

44) Performance Monitoring Tools



The Vendor provides tools to monitor system performance and to diagnose problems. Tools are available to monitor both individual SQL statements and overall system performance.

II. INTEGRITY


45) Adherence to Industry Standards



There are lots of RDBMS standards. Most useful is FIPS 127-1 since the government actually tests the products to ensure their compliance.

There are also SQL2 and SQL3 standards out there or under development, but no official test suites being proctored. All are complex standards that define a common SQL dialect at an incredibly detailed level. The advantage of this is that if you write FIPS 127-1 code, you'll be able to run your application against all adhering databases with little re-writing of code.

It is not sufficient for a vendor to claim "we use ANSI-  SQL" This is like saying all unix's are the same. If you want to have a chance to be able to port your code to another DBMS without a rewrite, you must only use 100% ANSI standard statements and a database that has been CERTIFIED by NIST. From alanb_at_vnet.ibm.com
You should say FIPS 127-1, not FIPS 127 since there was the original FIPS 127, then the current FIPS 127-1, and soon FIPS 127-2. They are all based on different versions of the ANSI standard. NIST publishes the Validated Products list, and Oracle7, Informix OnLine 5.0, and Sybase 10 show up on the list as of Jan 93. SQL2 compliance is incorrect since there are no test suites yet (FIPS 127-2 will provide the tests). Also, SQL2 has three levels, entry, intermediate, and full; which level are you talking about? When it comes to conformance, do not take a vendor's word for it without them providing a certification of compliance. Also, note that there is a difference between conformance and compliance; I believe compliance means it has been formally tested and verified.

On the subject of conformance/compliance to the SQL standards, the following applies. ANSI X3.135-1989 SQL was adopted by NIST in FIPS 127-1. NIST developed test suites to test compliance with the SQL standard. Only if the DBMS was tested against version 3.0 of the test suites can a vendor claim compliance with the standard. And compliance can only be claimed for a specific hardware/operating system combination, ie. just because one is compliant on Sun does not mean one is compliant on HP. The results of the tests are published every quarter in the Validated Products List. NIST is working on FIPS 127-2 which adopts ANSI X3.135-1992

SQL or SQL2. To claim compliance is jumping the gun at this point because there are no test suites available. And besides, there are three levels to the SQL2 standard - entry, intermediate, and full. Which of these are vendors claiming compliance to?

  • Sybase System 10 has been FIPS 127-1 Validated. 46) Declarative Integrity Model
    This includes full support for declarative referential integrity, default values and domain support. The ANSI declarative approach greatly simplifies the process of providing database enforced integrity. It allows the programmer to define Primary and Foreign Keys, Default Values, Unique Keys, and so on when creating database tables. The database engine automatically enforces these rules to protect system data. I think this is vastly superior to databases that make you program referential and entity integrity using stored procedures and triggers. Certainly more portable, standard, and lower maintenance to use the Declarative Method. Probably better performance as well. 47) Cascading Update
    Defined for this matrix as the ability, within the Declarative model, to propegate updates to the primary key that is a foreign key to another table. For example, if I update a purchase order number, all line items that are associated with that purchase order must have their primary key updated or they will become disasociated records. All of the products with triggers should be able to do this, but I've never tried it and the declarative constraints may block updates to foreign keys. 48) Cascading Delete
    Defined specifically for this matrix as supporting the ON DELETE CASCADE definition of the declarative referential integrity model. Delete cascade deletes all the corresponding children if you delete the parent record. All of the products that support triggers can achieve this programatically, but I prefer the Declarative implementation for the reasons mentioned in the above Declarative Integrity Model Section. 49) Loosely Typed Datatypes, Ala ADA
    The %TYPE variable can be used to declare a variable that has the same structure as the columns or rows in a table or a view. If the datatype of any of the columns in the table changes, any variables declared as %TYPE will automatically change as well. If the variables are in a stored procedure or trigger, they will be automatically recompiled by the database engine, transparently to the user. 50) Null Support
    The programmer can create a table and specify whether or not NULL values can exist in each column. The SQL implementation should provide a function that will determine whether the value in the column is null or not. This feature is useful when performing arithmetic operations and outer joins. The database should also correctly evaluate to false NULL = Anything. 51) Database Triggers
    Database Triggers are pieces of procedural code associated with tables and are fired implicitly when data is modified in their table. They are used for access control, data validation, referential integrity, synchronous table replication, and many more uses. 52) PRE- and POST- Triggers
    Event triggers can be defined to execute before and/or after database events such as inserts, updates, etc. This is useful as a BEFORE trigger can prevent an illegal million row update from happening at all; some products only provide triggers to invalidate and roll back such a transaction after it has already happened. AFTER triggers are better suited to performing transaction auditing, distributed database, etc. 53) Triggers fire once per SQL Statement
    Database triggers automatically fire once, and only once, per SQL statement. These are useful for security, for example, when the trigger might check that the date is not a Sunday, and if it is fail the SQL statement. This type of trigger need fire only once per SQL statement. 54) Triggers fire once for every row affected by an SQL stmt
    Database triggers can automatically fire once for every row affected by a statement. For value based auditing, a developer would use post-insert, update, and delete triggers that fired for each row affected by the statement, and wrote the new and old values of each row into an audit table.

The ability for triggers to fire on a per row affected basis has many applications besides value based auditing. Synchronous table replication, recursive value checking, and multi-row transaction management are only a few possibilities.

55) Programmer can control Trigger Fire Order



Triggers fire in a pre-determined, known order, as opposed to a random order that may differ for each SQL statement.

56) Automatic Procedure Recompilation



The database maintains extensive dependency tracking information which allows it to automatically recompile all objects in a dependency chain whenever code is changed. The database kernel will recompile all procedures, functions, triggers, and packages necessary when dependant structures are modified. The database may also be capable of resolving dependencies of modified procedures on remote nodes.

57) Programmer Defined Datatypes



The programmer can define custom datatypes to the Engine. Useful if you have wierd datatypes, like longitude/lattitude etc.

58) Event Alerters



Events can be defined in the database, which the database will watch for. When the event occurs the database engine will take some pre-determined action. Example: When the inventory drops below a certain level in the inventory table an event alerter notices, send a message to the purchasing clerk and automatically enters a purchase order for the product needed.

59) Database Events call 3GL program



Event Alerters can call a 3GL program, for example to turn the operator's screen red if the reactor is about to melt down.
  • Through add on products (Sybase Openserver, Ingres Knowledge Manager) this is possible. 60) Stored Procedures call 3GL program
    A stored procedure can call a 3GL program, for example to update rows in the employee table and then send email to the employees manager confirming the changes were made.
  • Through add on products (Sybase Openserver, Ingres Knowledge Manager) this is possible.

III. DATABASE


61) Portable



The server should run on many different types of hardware and operating systems. 3-tier portability includes installations on microcomputers, minicomputers and mainframes. In addition, some databases run on super computers and massively parallel computers. Hardware portability is important so you can always select the best performing hardware platform and be sure your database applications will still work.

Other aspects of portability include network protocol portability, and Graphical User Interface portability.

62) Automatic Database Recovery



Database failures are usually grouped into several loose categories. Instance failure, occurs when the machine running the database server crashes, or software fails, or a dumb operator kills the server, with no losses on the database disks. Media failure occurs when a disk containing database information fails.

Database recovery in case of instance failure should be performed automatically by simply restarting the database instance. The engine should roll back any transactions that were pending but not committed at the time of failure, and ensures the integrity of all committed transactions. The time to recover from instance failure should be configurable by the DBA. Recovery from media failure should be able to be performed in automatic, semi-automatic, or manual modes. In all cases, recovery from media failure requires that a backup of the lost files at some point in time is available, along with all database redo files since that point in time.

63) Multiplexed Log Files



The database can maintain multiple transaction redo files on different disks and write to all of them at the same time. This provides added protection in case of a loss of the disk upon which the log files reside, which otherwise would render the database non-recoverable for up to the minute transactions.

64) Database Mirroring



The RDBMS can perform software disk mirroring of the database files regardless of whether the mirroring is supported at the Hardware or Operating system level. Mirroring means keeping multiple copies of all database information to protect from disk failure.

All of the products can take advantage of hardware and operating system disk mirroring, which are preferable to RDBMS software mirroring if they are available.

65) On-Line Database Backup



DBA's can make an online backup of the entire database while the database is up, all tables are online and users are active. This should not require locking, and should have a minimal effect on system performance, other than the I/O's required to perform the backup.

66) Automatic Log Backup



Filled database logs are automatically backed up to secondary storage on disk or tape. This archiving feature as well as online backup and recovery are crucial elements for implementing mission critical applications requiring 24 hour availability.

67) On-Line Software Diagnostics



This is a wide and vague category, but basically what I'm getting at is that tools are available on-line to diagnose problems, understand performance, monitor users, etc. You should also be able to configure the database on line. All the products can do this to some extent, but since I'm not providing ratings of how well features are implemented I suggest you ask users for how they feel about this area. These tools are typically very weak in comparison to mainframe operating system diagnostics.

68) Hot Standby Server



The capability to restart the database on a backup machine in the case of hardware failure. This capability is provided on hardware platforms that support shared disks between multiple computers, such as the DEC VaxCluster. Upon sensing that the primary computer has gone down, a utility on the second machine can then automatically start the backup engine. The engine will recover any outstanding transactions from the failed machine, and then will go into standard operation. 3GL applications can be programmed to notice the failure of the primary server, and automatically reconnect to the backup machine. I think that full cluster utilization mentioned below is a much better solution, but it is more expensive.

69) Full Cluster Utilization



The capability to run a single database on multiple machines in a shared disk environment. This is an advantage over other database systems that only have 'hot standby' or 'companion server' capabilities. Applications can take advantage of the processing power of all machines, not just a single computer at a time. The engine provides automatic fail over reconciliation when nodes do fail, and applications can be written to automatically reconnect to the database if the node to which they are attached fails.

70) On-Line Recovery



This is critical for OLTP, and 24 hour a day mission critical systems. On line recovery is the ability to recover subsets of the database while the rest of the database is up and on-line. The database does not need to be taken offline during recovery of a data file. Users that do not require access to failed areas of the database will remain unaware that failure ever happened. Users that do attempt to access damaged files will receive an appropriate error message, and will be able to access the file as soon as recovery is complete. Some databases may require that the entire database be taken off line for recovery from media failure.

71) Parallel Backup of Multiple Database Files



The Database files can be backed up in parallel. That is, multiple database file backup commands can be issued simultaneously, and on an SMP system can run across multiple processors.

72) Parallel Recovery of Multiple Database Files



Multiple recover database files can be issued simultaneously. On an SMP machine, the recover processes will be spread across available processors. Critical for recovering large databases on SMP machines. Why ? Examine running a transaction processing application for 1 hour on a 14 CPU machine. Thus you have accumulated 14 CPU hours of transaction processing. A disk drive blows up. If you don't have parallel recovery, it now takes you 14 hours to recover your database if you are limited to a single process for recovery.

73) Multiple Log Files



The ability to provide multiple Redo file structures.Some databases have only a single log file, which is a problem since if it ever fills, the entire database hangs. Other database use multiple redo files, which provides a level of insurance if your operator goes to sleep without mounting the proper backup tape. Also, some databases with a single log file get into trouble when long running transactions are going on and the log fills.

74) Resource Limiter/Governor



A resource limiter (Governor) that evaluates the impact of database requests and restricts them according to limits for each of the users that access the database. The resource limiter follows user resource profiles associated with the privilege role of the user. Users can be limited to CPU usage, direct I/O, logical I/O, and wall time per statement or cumulatively per session. The Governor can also log users off the database automatically after specified idle periods. Some of the governors can stop "Queries from Hell" at parse time, by using optimizer statistics. Others only stop the query once a resource limit is reached.

75) DBA Utilities:



Again a weaselly category, since I don't want to give relative rankings. In general, DBA utilities should be able to start and stop an engine, perform real-time monitoring of database use and performance, assist in backup and recovery of database logs and data, and provide for execution of SQL statements. Utilities may also help in managing users, user groups, application privileges and more. I would ask users of systems what they think of each product's DBA tools to get an idea of what is available. The DBA utilities should be able to run in client/Server mode to facilitate centralized management.

76) Database Auditing



Database Auditing covers areas like: Successful and/or unsuccessful attempts to connect/disconnect from the database, Granting or Revoking of privileges, Enabling or disabling writing to the audit trail table, setting default auditing options for any database tables, and more. Auditing should be definable on a per-user or per-system basis. Database Triggers can typically be used to provide journaling of changed data and other specific auditing functions. The databases targeted for C2 level security compliance all provide a great deal of auditing support, more than the average site will ever use.

IV. DISTRIBUTED RDBMS


77) Distributed Join



The capability of joining two tables in a select statement that are on different machines. Some systems can perform this transparently to the user and to the application program. The query is then exactly the same as if the tables resided on the same machine. Example:
	SELECT ENAME, DEPTNAME FROM EMP, DEPT
	WHERE EMP.DEPTNO=DEPT.DEPTNO;

Where the EMP table resides on the local server, and the DEPT table resides on a remote server.

78) Synchronous Table Replication



Table replication allow the distribution of updates from a single master to one or more slave nodes. Synchronous replication implies that updates are propogated to slave nodes in real time, protected by two-phase commit. Classic use is for banking transactions, when balances must be synchronized on all machines at all times. The danger of Synchronous replication is this if one node fails, the transaction doesn't go through on any of the nodes.

79) Asynchronous Table Replication



See Synchronous Replication. Asynchronous replication differs in that updates are not propegated in real time. This is useful for information that doesn't always need to be perfectly in synch. Also, does not have the problem with all nodes needing to be alive for the initial transaction to complete. Updates get propegated around the network on at programmer defined intervals.

80) Connections to other Databases



Gateways are available that facilite the incorporation of data from foreign databases. Some gateways are read/write, some are read only. Some are procedural, others are SQL based.

81) Programmatic Two Phase Commit



Two phase commit is a mechanism for managing distributed update capabilities in a distributed database.Two Phase Commit (TPC) ensures that an update either completes or rolls back on all nodes in a transaction. Programming this is very complex, and gets exponentially more difficult as the number of nodes in a transaction grows. I have never heard of an application that has been built on one of these products using programmatic two phase commit that was successful.

82) Transparent Two Phase Commit



Transparent TPC eliminates programming and lets the servers handle the complexities behind the TPC algorithms. An example of this is below:
	UPDATE EMP...; 	(EMP is on Node A)
	UPDATE DEPT...;	(DEPT is on Node B)
	COMMIT;

THAT'S IT !! 83) Commit to Foreign Database



The gateway products allow single site updates and commits to heterogeneous databases. In addition, a single heterogeneous database can participate in a Two-Phase Commit transaction with multiple native databases.

84) Movable Transaction Coordinator



In a Two-Phase Commit distributed update, one of the databases acts as the coordinator of each transaction. The coordinator does not need to go through the Prepare stage of the TPC algorithm, and thus locks resources for a minimal amount of time. All of the products that support programmatic TPC allow the programmer to specify the coordination site. This category specifically refers to the Transparent TPC facilities, where available.

The coordination task can be assigned according to network resources and machine capabilities as required for each transaction. This protects resources on your most critical database servers. The server automatically chooses the transaction coordinator, also known as the commit point, based on pre-defined server criticalities. The commit point need not prepare for a two-phase commit transaction, so if a failure occurs no resources on the transaction coordinator are held for in-doubt recovery.

85) Remote Procedure Calls



Stored procedures may be called from remote nodes without restriction. Translation of character sets and datatypes automatically occures.

86. RPC's Protected by Two Phase Commit



Stored Procedures can perform remote updates or call other stored procedures on remote nodes. Any distributed transactions are automatically protected by TPC

87) Heterogeneous Data Support



Gateways exist which allow applications to access data in foreign databases. Typically the application cannot tell that this data is not coming from its normal server. There are lots of issues with database gateways, including read/write capability, performance, static vs dynamic SQL to DB2, and on and on.

88) Procedural Database Gateway



Gateways exist which allow remote stored procedure calls to be executed against foreign databases. The application believes that it is calling a native stored procedure, though the call is actually being executed against a foreign database.

89) SQL Based Database Gateway



Ad Hoc SQL statements can be sent to foreign databases for processing. The application thinks that it is sending a SQL statement to the local server. Issues include SQL dialect translation, functionality augmentation, static vs dynamic SQL, performance, and more.

90) XA TP Monitor Compliant



The Server is compatible with the XA Transaction Processing Monitor Specifications.

91) Top End TP Monitor Support



The server works with the Top End TP Monitor Product.

92) CICS TP Monitor Support



On IBM Mainframes running MVS, the server or gateway is able to work with the CIC TP monitor.

93) Transaction Can Span Multiple Net Protocols



A single transaction can be routed across multiple network protocols. For example, a PC on a Novell LAN send a query across IPX that talks to the LAN RDBMS Server. From there, the Query goes out across TCP/IP to a unix mainframe. Important becuase it sucks to have to buy TCP/IP stacks and NIC's for every PC just to talk to the enterprise server if you want to do client/server to your Novell Lan. Appletalk has similar issues.

94) ODBC Support



The product supports (or has announced support for) Microsoft's ODBC standard.

95) SQL Access Group Support



The product supports or has announced support for the SQL Access Group's connectivity standard.

V. DATABASE SECURITY


96) O/S Security Integration



The ability to specify that database logins will be validated by the operating system. That is, each account maps on a one to one basis with operating system login identifiers. On IBM systems, database security is integrated with the host security management system (RACF, ACF2 and others)

97) User Group Privileges/Roles



The system supports simple user groups, like unix, where access rights can be grants to group names rather than individuals.

Roles are collections of privileges like user groups. Roles differ from simple user groups in terms of additional functionality. Users may be granted several rolls, and can switch back and forth between roles in a session. Users can also enable and disable specific privileges within the role they are in. Rolls are additive in nature, thus a manager role can have the privileges of clerk plus additional rights.

98) Trusted Database/ Mandatory Access Control/ B1 Security



A high security version of the database, where every row is labelled with a security classification (Top Secret, Public, etc) Lots of other security and auditing features as well. Most applicable for Department of Defense applications, or corporate applications where lots of security is needed. Available as an option for all products. May not be based on the lastest version of the server engine, and may not have all functionality of the normal engine.
  • All are seperate, extra cost products from the normal engine.
  • Sybase System 10 has been targeted for this level, but it is not a current production product. 99) Undergoing for Evaluation by the NCSC
    The Database server has been selected for the Trusted Product Evaluation Program (TPEP) of the U.S. Government's National Computer Security Center (NCSC). The products were submitted for official government evaluation, targeted at Class C2 and Class B1, respectively, of the Trusted Database Interpretation (TDI) of the Department of Defense's Trusted Computer System Evaluation Criteria (TCSEC or "Orange Book").

Participation in this program is an involved process in which the functionality and security of the product is evaluated against computer security criteria published by the government. A product must meet the functional requirements of the target evaluation class, and the implementation of security features is tested to assure correct design and functions. There are seven classes of evaluation ratings, ranging from A1 (most secure) through B3, B2, B1, C2, C1 to D (minimal security). Once a product passes evaluation at the target rating class, it is placed on the NCSC's Evaluated Products List.

As of this date, no product has been certified in the United States. Ingres has been certified at the B1 equivalent level in Europe.

From Alanb_at_vnet.ibm.com
The only RDBMSs undergoing evaluation by NCSC for C2/B1 security are Oracle and Informix. They are still in the evaluation process and have not been certified. Ingres was evaluated by a European security group at a level equivalent to B1.

  1. Standard Tools with Trusted (B1) Product
    The Trusted (B1) Engine, uses the standard development tool set. No additional tools training is required for developers to switch from normal to Trusted applications. Secure databases do, however, require a different database and application design to accommodate the demands of multi-level secure applications. Some vendors have require specialized tools to work with the high security database.
  2. Non-Dedicated Server
    The Trusted (B1) DBMS engine does not compromise the security of other trusted applications on the host machine on which it is running. It works with the trusted operating systems to control access to secure data. Some implementations of Secure Databases require that the database server is the only program running on the server computer.
  3. Product being evaluated at C2 Level
    The standard version of the engine is undergoing evaluation by the NCSC at the C2 target security level. As of this writing, no products have been certified as C2 compliant.
    • From Alanb_at_vnet.ibm.com The only RDBMSs undergoing evaluation by NCSC for C2/B1 security are Oracle and Informix. They are still in the evaluation process and have not been certified. Ingres was evaluated by a European security group at a level equivalent to B1.
  4. Product being evaluated at B1 Level
    An enhanced security version of the product is undergoing evaluation by the NCSC at the B1 security level. As of this writing, no products have been certified B1 compliant.
    • From Alanb_at_vnet.ibm.com The only RDBMSs undergoing evaluation by NCSC for C2/B1 security are Oracle and Informix. They are still in the evaluation process and have not been certified. Ingres was evaluated by a European security group at a level equivalent to B1.
  5. Product Certified (Europe) at B1 Equivalent
    The High Security Version of Ingres has been certified by a European Agency in Europe at a security level similar to U.S. B1 Level.

VI. LANGUAGES & TOOLS


  1. SQL Procedural Language
    Because native SQL sucks for algorithmic processing, most vendors have devised procedural languages. These are the languages used to write stored procedures and triggers, and often can be used as stand-alone programs as well.
  2. Text/Image Datatypes (BLOBS)
    Database fields can contain Binary and Text data up to 2 gigabytes per field per row. Useful for storing images, sound, video, telemetry data, etc.
  3. Union Operator
    Standard union combines multiple tables and eliminates duplicate rows.

The ANSI standard also includes the UNION ALL operator, which does not eliminate table duplicates and is useful for horizontal partitioning of data.

  1. Select for Update
    SELECT... FOR UPDATE is useful for obtaining locks on desired structures and can be used if REPEATABLE READ functionality is required. It can be used to test if a structure is already locked by another user, like the TSET instruction or executing a spinlock. It is required for ANSI compliance.
  2. Outer Join Operator
    Outer Join lets you perform a join where you want all the rows from one of your tables, even if no matching rows exist in the second table. Very useful for modelling real world problems. Example, You have no employees in the HR Department, but you still want it to show up on the list of departments and their employees.
  3. Dynamic SQL
    SQL statements can be generated on an AD-Hoc basis and sent to the database engine for processing. They do not need to be seperately compiled and have plans generated before they can be executed.
  4. Static SQL
    SQL statements can be pre-parsed, compiled, and have plans generated. Thus they execute faster since this doesn't have to be done at run time. Improves speed at the expense of flexibility and maintenance. This section explicitly does NOT include stored procedures and triggers, which are typically stored compiled.
  5. EXPLAIN Command
    The EXPLAIN Command lets you print out a report that show the Query Execution path chosen by the optimizer, and statistics about how the actual query was done. It show what indexes were used, what type of join was done, how many rows were processed, etc.
  6. Transaction Savepoints
    Savepoints are markers which are used in transactions to increase their atomicity. Coupled with the ROLLBACK TO SAVEPOINT command. This prevents an entire multi-part transaction from being rolled back by the failure of one of the individual statements within it.
  7. Aliases/Synonyms
    Aliases and Synonyms are used as substitute names for tables, views, etc. Used to hide the fact that a table is actually owned by a different schema: CREATE SYNONYM FOO for DAN.FOO. Also used to implement location transparancy for distributed databases: CREATE SYNONYM FOO for DAN.FOO_at_LOS_ANGELES
  8. NIST Verified conformance to Standards
    The product has been tested and passed with 100% compatibility the NIST test suite for ANSI FIPS 127-1 SQL. Important for government bids as non FIPS 127-1 products are usually noncompliant, and important if you want to be able to write portable code.
    • Sybase System 10, and not the current product, has been verified.
  9. 3rd Party CASE Tools
    3rd Party CASE tools are available for Computer Aided Software/ Systems Engineering. Some people prefer 3rd party case tools because they can generate code for several different types of DBMS's, and are considered to be less of a vendor "lock in."
  10. Internal CASE Tools
    The vendor provides CASE tools for Computer Aided Software/ Systems Engineering. Some people prefer vendor provided CASE since it tends to be tighly linked to the database and can take advantage of all of the most up to date features.
  11. Graphics Tools
    The vendor provides tools that allow graphical applications to be built that use database data. All of the vendors have 3rd party graphics tools available.
  12. Internal E-Mail System Integration
    Mail systems are provided by the vendor which are written specifically to run on top of and integrate with the database system. In general this facilitates the combining of database data and reports into the mail system. Database events can trigger mail messages, and vice-versa.
  13. 3rd Party E-Mail System Integration
    3rd Party Mail systems exist which are written specifically to run on top of and integrate with the database system. In general this facilitates the combining of database data and reports into the mail system. Database events can trigger mail messages, and vice-versa.
  14. Trigger/Stored Procedure Sends E-Mail
    A trigger or stored procedure can trigger the creation of an E- message. Ask the vendor to show how easy this is to do in the various systems. Some come with built in stored procedures to accomplish this, others you need to write your own C code. I think this is a pretty cool feature to have.
  15. Office Automation Tools
    Office Automation tools are available, which integrate RDBMS data into E-Mail, Word Processor, Calendar-Schedular, Spreadsheet, Spell Checking, Document Management, etc...
  16. Bit-Mapped Graphics
    The 4GL supports Bit-Mapped Graphics. How well ? Ask some users.
  17. National Language Support
    Which of the following languages are supported. Language support includes translated manuals, error messages, commands, etc. Also number formats and dates are different in different languages. 16 bit characters can be stored where appropriate.
  18. Precompiler Support
    Precompilers allow you to embed SQL statements (Embedded SQL) in 3GL programs. These programs are run through a pre-compiler to produce pure 3GL code with low level function calls replacing the SQL statements.

Precompilers should also provide both syntactic and semantic checking for increased programmer productivity. Precompiler support is a key component of the ANSI SQL standard as well as the Federal Government's NIST SQL compliance test.

Precompilers are important becuase they let you write standard code that can be re-precompiled and re-compiled and then will run against other database engines. If you write your code to your product's function call interface, you are screwed if you ever want to change engines.

  1. Function Call Interface
    A low level function call interface is available. Arguments for using this are that it provides better performance than using embedded SQL and precompilers, if you know what you are doing. Arguments against include that precompilers are getting better and get you close to the speed of writing directly in the call interface, and that if you write in the vendor's call interface you are locked into non-portable code.
  2. Module Language
    Modelled after ADA, the idea is that you modularize all of your embedded SQL in an external file, and only put native 3GL statements in your actual program. The Module Language helps you integrate the two.
  3. Bind Variable Support
    Bind variables allow applications to execute a SQL statement multiple times with different values for the bind variables without reparsing the SQL statement. This leads to performance improvements in applications requiring repetition of SQL statements. Bind variables are usually associated with database cursor support.

PART 2



Other Stuff You Should Consider

This section has some things suggested by people on the net that either don't lend themselves well to the checklist format, or I didn't have time to investigate as thoroughly as I'd like. These are all great questions to make the vendors sweat, and do their due dilligence.

  1. Theoretical Limits
    Databases per Computer Databases per Organization Total Single Database Size Tables Per Database

Rows Per Table
Columns Per Table
Indexes Per Table
BLOBS Per Table

Columns Per Index
Rows Per Index
Levels Per Index

Locks Per Transaction
Locks Per Database Page
Locks Per Table
Locks Per Database

Columns Per SELECT Statement
Tables Per SELECT Statement
Databases per SELECT Statement

Time to Back Up 1 Gigabyte
Time to Load 1 Gigabyte Unsorted Data
Time to Load 1 Gigabyte Sorted Data
Time to Recover 1,000,000 Transactions

2) Practical Limits/References



For the categories above, what are the limits for actual existing production databases out there today on any hardware ? Same Question, but for the target machine or hardware class for this application.

3) Datatypes and Precisions



What Datatypes are available and at What Precision Levels ? What is the Maximum Length of each ?

Suggested Types Include

Fixed Length Character
Variable Length Character
Date
Time
Smallint
Integer
Packed Decimal
Float/Double
Currency
Binary Large Object (BLOB)
Text Large Object

Thanks to the following people for their input:

Alan Goodman		alan_at_dssmktg.com
Bill Coffin 		billc_at_ingres.com
RW Salnick		salnick_at_dejavu.spk.wa.us
? ? 				?_at_pivot-sts.sbi.com
Graeme Sargent		graeme_at_pyra.co.uk
Matt Kosatur		matt_at_meaddata.com
Paul Turner		turner_at_telstar.kodak.com
Snorri Bergmann		snorri_at_strengur.is
Bill H 			billh_at_greed.sbil.co.uk
mark saindon		msaindon_at_oracle.com
Ben				ben_at_sybase.com
Alan Beal			alanb_at_vnet.ibm.com
Curt Leaist 		curtl_at_zed.com
Lou Kates			louk_at_research.teleride.on.ca 
Newsgroups: comp.databases
Subject: RDBMS Feature Compatrison Matrix, v 1.0.2 - FAQ
Summary: 
Expires: 

Sender:
Followup-To:
Distribution: world
Organization: Anderson Graduate School of Management, UCLA Keywords:

Attached is the first production draft of the RDBMS Server Comparison Matrix. As a change to this version, I have included a detailed description of each of the categories at the bottom of the matrix. I have also included numerous corrections, changes, and suggestions from the folks listed at the end of this document.

Please forward any corrections, comments, or additions to me at the address below.

When using the matrix, please remember that this is shareware and no warranties are given or implied. I have made every effort to include correct information, and have had this document reviewed by active usenet contributors and users of all the products. Interestingly, I send copies of this to active internet participants from each of the vendors, but didn't get a ton of feedback.

I considered including information for Sybase System 10, which make that product look a lot better. However, Since the next versions of Oracle7, Informix, and Ingres also have good new stuff in them I don't want to play the beta game. I've included some information on a few beta offerings such as for ODBC and precompilers,  but I hesitate to do this for the engines themselves.

I suggest that you use this document as a starting point for your database search. Figure out what you application needs are, then what features are most important to you, and use this to get a rough idea of what might be the best fit. Then put together an RFP for the vendors. I've found that an evaluation matrix weighting the functionality you deem most important, and ranking each vendor's implementation of that functionality based on their responses is very helpful in selecting the right product for your application.

I hope this helps people out.

Dan Druker
2nd Year MBA Student
The Anderson Graduate School of Management at UCLA (Formerly with Oracle Corp, now Independent) ddruker_at_agsm.ucla.edu

RDBMS Server Feature Comparison Matrix: Version 1.0.2

Copyright 1993 by Daniel Druker
All Rights Reserved

Product

O = Oracle7 		Version 7.0
S = Sybase  		Version 4.9
A = Ask/Ingres 		Version 6.4
I = Informix On-Line	Version 5.0


NOTE: The 4th column is Informix, not Ingres. This confuses some people...

Y = Yes, the feature exists in the current production product
N = No, the feature does not exist in the production product
* = See the note in the detailed description section
B = The feature exists in a useable, late beta product. 


Product						O S A I 

I. PERFORMANCE


  1. Row Level Locking Y N N Y
  2. Page Level Locking N Y Y Y
  3. Table Level Locking Y Y Y Y
  4. Database Level Locking Y Y Y Y
  5. Programmable Read Isolation Levels N N N Y
  6. System Doesn't Escalate Locks Y N N Y
  7. Programmer May Escalate Locks Y Y Y Y
  8. Versioning / No Read Locks Y * * *
  9. Dirty Read Mode N N Y Y
  10. Multi Server Architecture Y Y Y N
  11. Symmetric Multi-Processor Support Y Y Y Y
  12. Support for Loosely Coupled Systems Y N Y N
  13. Array Interface Y N N N
  14. Shared Commits Y Y Y Y
  15. Asynchronous I/O Y Y Y Y
  16. Raw I/O Y Y N Y
  17. Shared Log Files Y Y Y Y
  18. Non-Blocking Queries Y * * *
  19. Clustered Tables Y N Y N
  20. Clustered Indexes Y Y Y Y
  21. Hash Indexes Y N N N
  22. ANSI Standard Cursor Support Y N Y Y
  23. Backward Scrolling Cursors N N N N
  24. Non-Blocking Sequence Generator Y N N Y
  25. Unique-Key Generator Y N Y Y
  26. Cost Based Optimizer Y Y Y Y
  27. Optimizer uses Data Content Heuristics N N Y N
  28. Optimizer Scans Entire Table to get Stats Y Y Y Y
  29. Optimizer can estimate stats to save time Y N Y N
  30. Optimizer uses network costs N N N N
  31. Optimizer uses hardware costs N N N N
  32. Optimizer knows real-time load factors N N N N
  33. Optimizer moves join location Y N Y Y
  34. Stored Procedures in Database Y Y Y Y
  35. Cached Procedures are Shared Y Y Y N
  36. Stored Procedures can return Sets Y N N Y
  37. Stored Procedures Return All Datatypes Y N N Y
  38. Stored Functions in Database Y N N Y
  39. Stored Procedure Cache Y Y Y Y
  40. Shared SQL Cache Y N N N
  41. Parallel Data Query N N N N
  42. Parallel Index Build N N N N
  43. Parallel Sort N N N N
  44. Performance Monitoring Tools Y Y Y Y

II.INTEGRITY O S A I



45. Adherence to Industry Standards
	SQL Interface				Y Y Y Y
	FIPS 127-1 Validated 			Y * Y Y
46. Declarative Integrity Model			Y N N Y
47. Cascading Updates				N N N N
48. Cascading Deletes 				Y N N N 
49. Loosely Typed Datatypes, Ala ADA		Y N N N
50. Null Support				Y Y Y Y
51. Database Triggers				Y Y Y N
52. PRE- and POST- Triggers			Y N Y N
53. Triggers fire once per SQL Statement	Y Y Y N
54. Triggers fire for each row affected 	Y N N N
55. Programmer can control trigger fire order	Y Y N N 
56. Automatic Procedure Recompilation		Y Y Y Y
57. Programmer Defined Datatypes		N Y Y N
58. Event Alerters				Y N Y Y
59. Database Events Call 3GL Program 		Y * * N
60. Stored Procedures Call 3GL Program 		Y * * N 


III. DATABASE					O S A I
=============			
61. Portable					Y Y Y Y
62. Automatic Database Recovery 		Y Y Y Y
63. Multiplexed Log Files 			Y Y Y Y
64. Database Mirroring				N Y N Y
65. On-Line Database Backup			Y Y Y Y
66. Automatic Log Backup 			Y Y Y Y
67. On-Line Software Diagnostics		Y Y Y Y
68. Hot Standby Server				Y Y Y N
69. Full Cluster Utilization			Y N Y N
70. On Line Recovery				Y Y N N
71. Parallel Backup of multiple files		Y N N N 
72. Parallel Recovery of multiple files		Y N N N 
73. Multiple Log Files				Y N Y N
74. Resource Limiter/Governor			Y N Y N
75. DBA Utilities 				Y Y Y Y
76. Database Auditing 				Y Y Y N


IV.DISTRIBUTED RDBMS				O S A I 

======================
77. Distributed Join Y N Y Y 78. Synchronous Table Replication Y N Y N 79. Asynchronous Table Replication Y N N N 80. Connections to other Databases Y Y Y N 81. Programmatic Two Phase Commit Y Y Y N 82. Transparent Two Phase Commit Y N Y Y 83. Two Phase Commit to Heterogeneous RDBMS Y N N N 84. Movable Transaction Coordinator Y N N N 85. Remote Procedure Calls Y Y Y Y 86. RPC's Protected by Two Phase Commit Y N Y Y 87. Heterogeneous Data Support Y Y Y N 88. Procedural Database Gateway Y Y N N 89. SQL Based Database Gateway Y Y Y N 90. XA TP Monitor Compliant Y N N Y 91. Top End TP Monitor Support Y Y Y Y 92. CICS TP Monitor Support Y Y N N 93. Trans. Can Span Multiple Net Protocols Y N Y N 94. ODBC Support B B B B 95. SQL Access Group Support B B B B V. DATABASE SECURITY O S A I
======================
96. OS Security Integration Y Y Y Y 97. User Group Privileges/Roles Y Y Y N 98. Mandatory Access Control * * * * 99. Undergoing for Evaluation by NCSC Y N N Y 100. Standard Tools with Trusted Product Y N N Y 101. Non-Dedicated Trusted Server Y N Y Y 102. Product being evaluated at C2 Level Y N N Y 103. Product being evaluated at B1 Level Y N * Y 104. Product Certified (Europe) at B1 Equiv N N Y N VI.LANGUAGES & TOOLS O S A I
======================
105. SQL Procedural Language Y Y Y Y 106. Text/Image Datatypes (BLOBS) Y Y N Y 107. Union Operator Y Y Y Y 108. Select for Update Y N Y Y 109. Outer Join Operator Y Y Y Y 110. Dynamic SQL Y Y Y Y 111 Static SQL Y Y Y Y 112. EXPLAIN Command Y Y Y Y 113. Transaction Savepoints Y Y Y Y 114. Aliases/Synonyms Y Y Y Y 115. NIST Verified Conformance to Standards Y * Y Y 116. 3rd Party CASE Tools - Upper and Lower Y Y Y Y 117. Internal CASE Tools - Upper and Lower Y Y N N 118. Graphics Tools Y Y Y Y 119. Internal Email System Integration Y N N N 120. 3rd Party Email System Integration Y Y Y N 121. Trigger/Stored Proc Sends Email Y Y Y N 122. Office Automation Tools Y Y Y Y 123. Bit-Mapped Graphics Y Y Y Y 124. National Language Support European Y Y Y Y Semitic Y N Y N Asian Y N Y Y 125. Precompiler Support C Y Y Y Y COBOL Y Y Y Y Pascal Y N Y Y PL/I Y N N N SAME/DL N N N N ADA Y N Y Y Fortran Y N Y Y C++ B N N N 126. Function Call Interface C Y Y Y Y COBOL Y Y Y Y ADA Y N Y Y Pascal Y N Y Y PL/I Y N N N SAME/DL N N N N Fortran Y N Y Y C++ B N N N 127. Module Language Y N Y N 128. Bind Variable Support Y N Y Y PART 2: Other Important Stuff 1. Theoretical Limits 2. Practical Limits/References 3. Datatypes and Precisions

DETAILED DESCRIPTION OF CATEGORIES


  1. Row Level Locking
    First, a little about granularity. The smallest level of granularity in commercial databases is the Row level. Next comes Page level. Then Table Level. Then Database Level. Typically Database pages are 2048 bytes, so if your rows each contain 100 bytes of data you might squeeze 20 rows into a page.

Row level locking means that the database can use a lock granularity of a single row (tuple for the coneheads out there.) This means that multiple users (Up to 20 in the example above, can simultaneously update different rows on the same page. Each user, when performing an operation on the row, only locks that row in question and does not interfere with other users in the same page. Arguments for Row Level Locking are that it permits the highest degree of concurrency - Users tend not to lock each other out. Arguments against claim that row level locking causes a lot of overhead. My belief is that users are the slowest thing in my system, so I'd rather not wait for them. Thus I prefer row level locking.

2) Page Level Locking



See Also Row Level Locking. In Page Level, Locks occur at (surprise) the page level. Thus when one user updates a row, the entire page is locked and other users are blocked from updating (sometimes reading too) rows in that page. Sometimes users get around page level blocking by forcing what would be small rows to take up an entire page. This trick effectively simulates row level locking for critical tables that are getting hit by updates a lot, and are having problems with contention. The arguments for and against page level locking are mentioned above.

3) Table Level Locking



The database supports the LOCK TABLE command to lock an entire table at a time. Useful for locking a table for batch updates, locking out users for maintenance or reporting, etc.

4) Database Level Locking



The entire database can be locked with a single command. As far as I know, the products all support this through a single-user or DBA only mode, but there is no LOCK DATABASE command.

5) Programmable Read Isolation Levels



The programmar can indicate which type of read isolation he wishes to use for each SELECT statement. Examples are Dirty Read (Uncommitted Data), Cursor Stability, Repeatable Read, etc. These terms are given in order of growing strength.

6) System Doesn't Escalate Locks



Some databases have a maximum number of locks that can be supported, and automatically escalate row or page level locks to table level locks when that number has been exceeded. I believe that system generated lock escalation causes problems by creating artificial deadlocks.

7) Programmer May Escalate Locks



The programmer may escalate locks at his/her discression. For example, if the program writes to a certain percentage of the pages in a table, it may want to go ahead and upgrade to a table lock to avoid running into locked pages and having to wait or to rollback work.

8) Versioning / No Read Locks



The database keeps track of several versions of committed data so queries see only those rows that were committed and valid at the instant the query started. The database allow consistent reads (committed data) without taking out read locks. This is a good thing since readers do not interere with writers, and accurate reports can be written since only committed data is read.
  • Sybase has Browse Mode which does not use read locks. Informix and Ingres have dirty read mode which do not use read locks. 9) Dirty Read Mode
    The database has a read mode which scans the data as it currently exists on disk or in memory, regardless of whether it has been committed or not. In general dirty reads are very fast, but since the data is not necessarily valid some applications can't use them. Good for reports where accuracy doesn't matter.
    1. Multi Server Architecture
      Multiple Server processes exist on the server machine. These usually take two forms, a two-task architecture where each client user gets a corresponding OS process on the server, or a multi-threaded architecture where each of the server processes can manage multiple clients. Multi-Server architecture helps balance system load and is mandatory for taking advantage of symmetric multi-processors. The two-task architecture is considered resource intensive on operating systems that don't deal well with lots of processes well.
    2. Symmetric Multi-Processor Support
      Can the server take advantage of multiple CPU's in a symmetric multi-processor environment. Things to make sure a product can do: Make sure each client can read, write and commit transactions in parallel on any available CPU. Make sure, as you add CPU's, that transaction throughput increases in a near linear fashion. Some products and operating systems may scale well to 4 CPU's, others to 8, and others to 30, hundreds, or more. Some may not scale at all. (OS2, for example)
    3. Support for Loosely Coupled System
      Pioneered by DEC in their VAX cluster, loosely coupled processors are usually independent computers connected by a fast communications bus that can share and coordinate resources. Databases that work in this environment can run simultaneously on several nodes in a cluster. This increases power since processing is going on at each node, and reduces complexity since no two-phase commit or distributed database work is needed. Also helps for fault tolerance since if one of the machines dies the remaining nodes keep running. Today several unix machines (Pyramid, Sequent, IBM RS/6000) offer clustering in addition to DEC. Performance concerns may exist if OLTP goes against the same data on more than one node since database blocks get sent pinging around the cluster if this happens to much. Other than that I think this is pretty cool. Also this is the basis for how Oracle works on Massively Parallel machines like Ncube, KSR, Maspar, etc.
    4. Array Interface
      Can the client request that multiple rows be sent in a single batch, as opposed to a network request for every row to be returned. Reduces network traffic. Example: Send me 1000 rows, thanks. vs. send a row, thanks, send a row, thanks, 1000 times. Only an issue for cursor based processing: for streams based this is not an issue.
    5. Shared Commits
      Shared or Group commits write multiple transactions into the redo structure in a single physical I/O. This is an advantage over database systems which require a physical I/O for each individual transaction. Some products may not support shared commits on SMP hardware.
    6. Asynchronous I/O
      Most operating systems do provide Asynchronous I/O, including VMS, MVS, VM, and unix from Pyramid, and Sequent to name a few. On systems that do not provide asynchronous I/O, the need to wait for disk writes to complete may become a performance bottleneck. The server should be able to take advantage of Asynchronous I/O where available, and have a strategy to deal with the lack of asynchronous I/O on systems where it is not available.
    7. Raw I/O
      On Unix, it is much faster to be able to access raw filesystem devices, which are not controlled by the unix file and buffer managers, than to use normal or "cooked" files.
    8. Shared Log Files
      Every change to the database structure is automatically written in a "piggybacked" manner to the Redo file structure. Transactions are piggybacked in a way that one physical write can actually commit several transactions. This is a very efficient way to commit transactions because it requires only a sequential write to an o/s file. Multiple users can share the same redo file structure.
    9. Non-Blocking Queries
      For systems where multiple users are reading and writing to the same table at the same time, it may be critical that readers doing reports do not interfere with ongoing transaction processing. When readers need to obtain locks to get information, they are said to "block" writers since writers are blocked from getting the locks they need to perform their update. This is a great feature if you have a transaction processing system on which you want to be able to run reports against live data during operating hours.
  • Ingres and Informix can accomplish this by using Dirty Read mode. Sybase does it by using browse mode. See the above sections for limitations and applications of these features.
    1. Clustered Tables
      Clustered tables: Store multiple tables close to each other on disk based on a commonly used join key. This helps speed up access if two tables are always accessed by the same join key. Takes longer to insert and update than normal storage method.
    2. Clustered Indexes
      Clustered Indexes: Several Definitions. Either an index on the cluster key mentioned above, or on some systems a pre-sorted index cluster that greatly speeds data retrieval for ordered operations, at the expense of insert and update speed. Oracle uses the first type, Sybase the second.
    3. Hash Indexes
      Most products use binary tree indexes, which are flexible and good for objects that grow, shrink, change, etc. Hash indexes offer faster performance but are appropriate only for relatively static tables.
    4. ANSI Standard Cursor Support
      Cursors are memory structures which store the SQL statement, a parsed/compiled version of the SQL statement, a buffer for the result data, and pointers to the result set of the query, among other things. Cursors allow applications to process multiple SQL statements in a single database connection. This is crucial for window-based environments which require multiple SQL statements to be processed at one time with different information being displayed in different windows. This allows multiple windows to be created via a single session and reduces memory utilization requirements. In addition, multiple SQL statements can be encompassed in a single transaction with all internal locking controlled by the database engine.

Cursors are also required to meet ANSI standards, both for SQL compliance and for Pre-Compiler Support.

23) Backward Scrolling Cursors



All cursors permit forward scrolling of the result set, some products also allow backward scrolling. Good for those of you that like to think backwards.

24) Non-Blocking Sequence Generator



Many applications require that unique keys be generated, for example for purchase order numbers. A nice feature is a multithreaded  non-blocking sequence generator which can be used to generate sequence numbers for rows in tables. Some databases require applications needing unique primary keys to single thread all transactions through a single sequence table, which can become a major application bottleneck.

25) Unique Key Generator



The database contains a mechanism for generating unique keys.

26) Cost/Statistics Based Optimizer:



The database will gather and store comprehensive statistics about database structures. The statistics based optimizer will choose the most efficient access paths to data based on the information gathered. Some products do a much better job than others. Nice things to have: The ability for the system to "Guess" or estimate the statistics on a table based on a sample of the data. This is a lifesaver when it takes 4 hours to analyze statistics for a 30 gigabyte table. Also, some of the optimizers take into account low/high data values, and data distribution heuristics.

27) Optimizer uses Data Content Heuristics



The optimizer scans the data to determine the minimum value in the table, the maximum value, and the average value. It also keeps track of the distribution of data within the rows.

28) Optimizer Scans Entire Table to get Stats



To gather statistics, the optimizer must scan the entire table. This can take a long long long time for multi-gigabyte tables.

29) Optimizer can estimate stats to save time



The Optimizer can estimate statistics by scanning only a random sample of the rows in the table. Very useful for collecting statistics on large tables where it is impractical to scan the whole table for statistics.

30) Optimizer uses network costs



The optimizer is aware of the existance and make-up of the network, and some costs have been assigned to each link. These may be either costs in terms of link speed or in actual dollar expense for using the link. The optimizer will determine the lowest cost path in terms of time or expense, which may involve shipping data to different nodes for joins or through intermediate nodes depending upon the network.

31) Optimizer uses hardware costs



The optimizer is aware of the relative processing power, CPU Cycle cost, I/O speed and cost, etc. It decides how to implement queries based on this cost information, and the user's resource profile.

32) Optimizer knows real-time load factors



The optimizer can take into account real time system load information in determining the optimal path for queries. Thus this path may change from time to time depending on system load.

33) Optimizer moves join location



Classic example is a local table with 10 rows being joined to a remote table with 1 Million rows. The optimizer is smart enough to send the local table to the remote node for the join rather than the other way around.

34) Stored Procedures in the Database



Stored procedures are bits of procedural code, grouped together and stored in the DBMS engine. Stored procedures are usually stored in shared, compiled format. Stored procedures should be callable by applications, by other stored procedures, or by database triggers. There should also be full dependence tracking that will automatically recompile stored procedures when objects upon which they depend change. Stored procedures are good because they typically improve performance and reduce network traffic. They also are great for encapsulating allowable operations on data - most systems let you grant access to a stored procedure without granting access on the underlying tables. The major downside is that there is no standard for stored procedures, so any code you write that uses them is non-standard and non-portable.

35) Cached Procedures are Shared



Stored Procedures are cached in memory on the server. A single copy of the stored prodedure can be used by multiple users. Saves memory and execution time.

36) Stored Procedures can Return Sets



Stored procedures can return an array or table of data. Not limited to returning only a single row.

37) Stored Procedures return All Datatypes



Stored procedure are capable of return all available server datatypes, not just a limited subset.

38) Stored Functions in Database



Similar to Stored Procedures, but instead stored function calls can be defined by the user. User defined functions are useful for many reasons, similar to the idea of having both procedures and functions in a 3GL language

39) Stored Procedure Cache



Stored Procedures are cached in memory on the server. They do not need to be read in from disk each time they are called. There is a special, DBA configurable area of memory which holds the procedure cache, so it does not get flushed by users doing large queries, etc.

40) Shared SQL Cache



The ability to store ad-hoc SQL statements in shared cache memory on the server. The server recognizes when a client requests the execution of a SQL statement already in the shared cache, and can use the already parsed and compiled representation already in memory. This can provide large reductions in server memory usage where lots of users are using the same application (and thus the same SQL statements.)

41) Parallel Data Query



On a symmetric multi-processor, a single query can be decomposed into several parts that can each be dispatched to a separate processor for execution in parallel. This same functionality will eventually extend to clustered systems and networks, but no one can do it yet today.

42) Parallel Index Build



Same idea as parallel query, but for building indexes. Dramatically speeds index builds for large tables.

43) Parallel Sort



Whenever sorted data is needed, the sort operation can be decomposed and processed by multiple CPU's simultaneously.

44) Performance Monitoring Tools



The Vendor provides tools to monitor system performance and to diagnose problems. Tools are available to monitor both individual SQL statements and overall system performance.

II. INTEGRITY


45) Adherence to Industry Standards



There are lots of RDBMS standards. Most useful is FIPS 127-1 since the government actually tests the products to ensure their compliance.

There are also SQL2 and SQL3 standards out there or under development, but no official test suites being proctored. All are complex standards that define a common SQL dialect at an incredibly detailed level. The advantage of this is that if you write FIPS 127-1 code, you'll be able to run your application against all adhering databases with little re-writing of code.

It is not sufficient for a vendor to claim "we use ANSI-  SQL" This is like saying all unix's are the same. If you want to have a chance to be able to port your code to another DBMS without a rewrite, you must only use 100% ANSI standard statements and a database that has been CERTIFIED by NIST. From alanb_at_vnet.ibm.com
You should say FIPS 127-1, not FIPS 127 since there was the original FIPS 127, then the current FIPS 127-1, and soon FIPS 127-2. They are all based on different versions of the ANSI standard. NIST publishes the Validated Products list, and Oracle7, Informix OnLine 5.0, and Sybase 10 show up on the list as of Jan 93. SQL2 compliance is incorrect since there are no test suites yet (FIPS 127-2 will provide the tests). Also, SQL2 has three levels, entry, intermediate, and full; which level are you talking about? When it comes to conformance, do not take a vendor's word for it without them providing a certification of compliance. Also, note that there is a difference between conformance and compliance; I believe compliance means it has been formally tested and verified.

On the subject of conformance/compliance to the SQL standards, the following applies. ANSI X3.135-1989 SQL was adopted by NIST in FIPS 127-1. NIST developed test suites to test compliance with the SQL standard. Only if the DBMS was tested against version 3.0 of the test suites can a vendor claim compliance with the standard. And compliance can only be claimed for a specific hardware/operating system combination, ie. just because one is compliant on Sun does not mean one is compliant on HP. The results of the tests are published every quarter in the Validated Products List. NIST is working on FIPS 127-2 which adopts ANSI X3.135-1992

SQL or SQL2. To claim compliance is jumping the gun at this point because there are no test suites available. And besides, there are three levels to the SQL2 standard - entry, intermediate, and full. Which of these are vendors claiming compliance to?

  • Sybase System 10 has been FIPS 127-1 Validated. 46) Declarative Integrity Model
    This includes full support for declarative referential integrity, default values and domain support. The ANSI declarative approach greatly simplifies the process of providing database enforced integrity. It allows the programmer to define Primary and Foreign Keys, Default Values, Unique Keys, and so on when creating database tables. The database engine automatically enforces these rules to protect system data. I think this is vastly superior to databases that make you program referential and entity integrity using stored procedures and triggers. Certainly more portable, standard, and lower maintenance to use the Declarative Method. Probably better performance as well. 47) Cascading Update
    Defined for this matrix as the ability, within the Declarative model, to propegate updates to the primary key that is a foreign key to another table. For example, if I update a purchase order number, all line items that are associated with that purchase order must have their primary key updated or they will become disasociated records. All of the products with triggers should be able to do this, but I've never tried it and the declarative constraints may block updates to foreign keys. 48) Cascading Delete
    Defined specifically for this matrix as supporting the ON DELETE CASCADE definition of the declarative referential integrity model. Delete cascade deletes all the corresponding children if you delete the parent record. All of the products that support triggers can achieve this programatically, but I prefer the Declarative implementation for the reasons mentioned in the above Declarative Integrity Model Section. 49) Loosely Typed Datatypes, Ala ADA
    The %TYPE variable can be used to declare a variable that has the same structure as the columns or rows in a table or a view. If the datatype of any of the columns in the table changes, any variables declared as %TYPE will automatically change as well. If the variables are in a stored procedure or trigger, they will be automatically recompiled by the database engine, transparently to the user. 50) Null Support
    The programmer can create a table and specify whether or not NULL values can exist in each column. The SQL implementation should provide a function that will determine whether the value in the column is null or not. This feature is useful when performing arithmetic operations and outer joins. The database should also correctly evaluate to false NULL = Anything. 51) Database Triggers
    Database Triggers are pieces of procedural code associated with tables and are fired implicitly when data is modified in their table. They are used for access control, data validation, referential integrity, synchronous table replication, and many more uses. 52) PRE- and POST- Triggers
    Event triggers can be defined to execute before and/or after database events such as inserts, updates, etc. This is useful as a BEFORE trigger can prevent an illegal million row update from happening at all; some products only provide triggers to invalidate and roll back such a transaction after it has already happened. AFTER triggers are better suited to performing transaction auditing, distributed database, etc. 53) Triggers fire once per SQL Statement
    Database triggers automatically fire once, and only once, per SQL statement. These are useful for security, for example, when the trigger might check that the date is not a Sunday, and if it is fail the SQL statement. This type of trigger need fire only once per SQL statement. 54) Triggers fire once for every row affected by an SQL stmt
    Database triggers can automatically fire once for every row affected by a statement. For value based auditing, a developer would use post-insert, update, and delete triggers that fired for each row affected by the statement, and wrote the new and old values of each row into an audit table.

The ability for triggers to fire on a per row affected basis has many applications besides value based auditing. Synchronous table replication, recursive value checking, and multi-row transaction management are only a few possibilities.

55) Programmer can control Trigger Fire Order



Triggers fire in a pre-determined, known order, as opposed to a random order that may differ for each SQL statement.

56) Automatic Procedure Recompilation



The database maintains extensive dependency tracking information which allows it to automatically recompile all objects in a dependency chain whenever code is changed. The database kernel will recompile all procedures, functions, triggers, and packages necessary when dependant structures are modified. The database may also be capable of resolving dependencies of modified procedures on remote nodes.

57) Programmer Defined Datatypes



The programmer can define custom datatypes to the Engine. Useful if you have wierd datatypes, like longitude/lattitude etc.

58) Event Alerters



Events can be defined in the database, which the database will watch for. When the event occurs the database engine will take some pre-determined action. Example: When the inventory drops below a certain level in the inventory table an event alerter notices, send a message to the purchasing clerk and automatically enters a purchase order for the product needed.

59) Database Events call 3GL program



Event Alerters can call a 3GL program, for example to turn the operator's screen red if the reactor is about to melt down.
  • Through add on products (Sybase Openserver, Ingres Knowledge Manager) this is possible. 60) Stored Procedures call 3GL program
    A stored procedure can call a 3GL program, for example to update rows in the employee table and then send email to the employees manager confirming the changes were made.
  • Through add on products (Sybase Openserver, Ingres Knowledge Manager) this is possible.

III. DATABASE


61) Portable



The server should run on many different types of hardware and operating systems. 3-tier portability includes installations on microcomputers, minicomputers and mainframes. In addition, some databases run on super computers and massively parallel computers. Hardware portability is important so you can always select the best performing hardware platform and be sure your database applications will still work.

Other aspects of portability include network protocol portability, and Graphical User Interface portability.

62) Automatic Database Recovery



Database failures are usually grouped into several loose categories. Instance failure, occurs when the machine running the database server crashes, or software fails, or a dumb operator kills the server, with no losses on the database disks. Media failure occurs when a disk containing database information fails.

Database recovery in case of instance failure should be performed automatically by simply restarting the database instance. The engine should roll back any transactions that were pending but not committed at the time of failure, and ensures the integrity of all committed transactions. The time to recover from instance failure should be configurable by the DBA. Recovery from media failure should be able to be performed in automatic, semi-automatic, or manual modes. In all cases, recovery from media failure requires that a backup of the lost files at some point in time is available, along with all database redo files since that point in time.

63) Multiplexed Log Files



The database can maintain multiple transaction redo files on different disks and write to all of them at the same time. This provides added protection in case of a loss of the disk upon which the log files reside, which otherwise would render the database non-recoverable for up to the minute transactions.

64) Database Mirroring



The RDBMS can perform software disk mirroring of the database files regardless of whether the mirroring is supported at the Hardware or Operating system level. Mirroring means keeping multiple copies of all database information to protect from disk failure.

All of the products can take advantage of hardware and operating system disk mirroring, which are preferable to RDBMS software mirroring if they are available.

65) On-Line Database Backup



DBA's can make an online backup of the entire database while the database is up, all tables are online and users are active. This should not require locking, and should have a minimal effect on system performance, other than the I/O's required to perform the backup.

66) Automatic Log Backup



Filled database logs are automatically backed up to secondary storage on disk or tape. This archiving feature as well as online backup and recovery are crucial elements for implementing mission critical applications requiring 24 hour availability.

67) On-Line Software Diagnostics



This is a wide and vague category, but basically what I'm getting at is that tools are available on-line to diagnose problems, understand performance, monitor users, etc. You should also be able to configure the database on line. All the products can do this to some extent, but since I'm not providing ratings of how well features are implemented I suggest you ask users for how they feel about this area. These tools are typically very weak in comparison to mainframe operating system diagnostics.

68) Hot Standby Server



The capability to restart the database on a backup machine in the case of hardware failure. This capability is provided on hardware platforms that support shared disks between multiple computers, such as the DEC VaxCluster. Upon sensing that the primary computer has gone down, a utility on the second machine can then automatically start the backup engine. The engine will recover any outstanding transactions from the failed machine, and then will go into standard operation. 3GL applications can be programmed to notice the failure of the primary server, and automatically reconnect to the backup machine. I think that full cluster utilization mentioned below is a much better solution, but it is more expensive.

69) Full Cluster Utilization



The capability to run a single database on multiple machines in a shared disk environment. This is an advantage over other database systems that only have 'hot standby' or 'companion server' capabilities. Applications can take advantage of the processing power of all machines, not just a single computer at a time. The engine provides automatic fail over reconciliation when nodes do fail, and applications can be written to automatically reconnect to the database if the node to which they are attached fails.

70) On-Line Recovery



This is critical for OLTP, and 24 hour a day mission critical systems. On line recovery is the ability to recover subsets of the database while the rest of the database is up and on-line. The database does not need to be taken offline during recovery of a data file. Users that do not require access to failed areas of the database will remain unaware that failure ever happened. Users that do attempt to access damaged files will receive an appropriate error message, and will be able to access the file as soon as recovery is complete. Some databases may require that the entire database be taken off line for recovery from media failure.

71) Parallel Backup of Multiple Database Files



The Database files can be backed up in parallel. That is, multiple database file backup commands can be issued simultaneously, and on an SMP system can run across multiple processors.

72) Parallel Recovery of Multiple Database Files



Multiple recover database files can be issued simultaneously. On an SMP machine, the recover processes will be spread across available processors. Critical for recovering large databases on SMP machines. Why ? Examine running a transaction processing application for 1 hour on a 14 CPU machine. Thus you have accumulated 14 CPU hours of transaction processing. A disk drive blows up. If you don't have parallel recovery, it now takes you 14 hours to recover your database if you are limited to a single process for recovery.

73) Multiple Log Files



The ability to provide multiple Redo file structures.Some databases have only a single log file, which is a problem since if it ever fills, the entire database hangs. Other database use multiple redo files, which provides a level of insurance if your operator goes to sleep without mounting the proper backup tape. Also, some databases with a single log file get into trouble when long running transactions are going on and the log fills.

74) Resource Limiter/Governor



A resource limiter (Governor) that evaluates the impact of database requests and restricts them according to limits for each of the users that access the database. The resource limiter follows user resource profiles associated with the privilege role of the user. Users can be limited to CPU usage, direct I/O, logical I/O, and wall time per statement or cumulatively per session. The Governor can also log users off the database automatically after specified idle periods. Some of the governors can stop "Queries from Hell" at parse time, by using optimizer statistics. Others only stop the query once a resource limit is reached.

75) DBA Utilities:



Again a weaselly category, since I don't want to give relative rankings. In general, DBA utilities should be able to start and stop an engine, perform real-time monitoring of database use and performance, assist in backup and recovery of database logs and data, and provide for execution of SQL statements. Utilities may also help in managing users, user groups, application privileges and more. I would ask users of systems what they think of each product's DBA tools to get an idea of what is available. The DBA utilities should be able to run in client/Server mode to facilitate centralized management.

76) Database Auditing



Database Auditing covers areas like: Successful and/or unsuccessful attempts to connect/disconnect from the database, Granting or Revoking of privileges, Enabling or disabling writing to the audit trail table, setting default auditing options for any database tables, and more. Auditing should be definable on a per-user or per-system basis. Database Triggers can typically be used to provide journaling of changed data and other specific auditing functions. The databases targeted for C2 level security compliance all provide a great deal of auditing support, more than the average site will ever use.

IV. DISTRIBUTED RDBMS


77) Distributed Join



The capability of joining two tables in a select statement that are on different machines. Some systems can perform this transparently to the user and to the application program. The query is then exactly the same as if the tables resided on the same machine. Example:
	SELECT ENAME, DEPTNAME FROM EMP, DEPT
	WHERE EMP.DEPTNO=DEPT.DEPTNO;

Where the EMP table resides on the local server, and the DEPT table resides on a remote server.

78) Synchronous Table Replication



Table replication allow the distribution of updates from a single master to one or more slave nodes. Synchronous replication implies that updates are propogated to slave nodes in real time, protected by two-phase commit. Classic use is for banking transactions, when balances must be synchronized on all machines at all times. The danger of Synchronous replication is this if one node fails, the transaction doesn't go through on any of the nodes.

79) Asynchronous Table Replication



See Synchronous Replication. Asynchronous replication differs in that updates are not propegated in real time. This is useful for information that doesn't always need to be perfectly in synch. Also, does not have the problem with all nodes needing to be alive for the initial transaction to complete. Updates get propegated around the network on at programmer defined intervals.

80) Connections to other Databases



Gateways are available that facilite the incorporation of data from foreign databases. Some gateways are read/write, some are read only. Some are procedural, others are SQL based.

81) Programmatic Two Phase Commit



Two phase commit is a mechanism for managing distributed update capabilities in a distributed database.Two Phase Commit (TPC) ensures that an update either completes or rolls back on all nodes in a transaction. Programming this is very complex, and gets exponentially more difficult as the number of nodes in a transaction grows. I have never heard of an application that has been built on one of these products using programmatic two phase commit that was successful.

82) Transparent Two Phase Commit



Transparent TPC eliminates programming and lets the servers handle the complexities behind the TPC algorithms. An example of this is below:
	UPDATE EMP...; 	(EMP is on Node A)
	UPDATE DEPT...;	(DEPT is on Node B)
	COMMIT;

THAT'S IT !! 83) Commit to Foreign Database



The gateway products allow single site updates and commits to heterogeneous databases. In addition, a single heterogeneous database can participate in a Two-Phase Commit transaction with multiple native databases.

84) Movable Transaction Coordinator



In a Two-Phase Commit distributed update, one of the databases acts as the coordinator of each transaction. The coordinator does not need to go through the Prepare stage of the TPC algorithm, and thus locks resources for a minimal amount of time. All of the products that support programmatic TPC allow the programmer to specify the coordination site. This category specifically refers to the Transparent TPC facilities, where available.

The coordination task can be assigned according to network resources and machine capabilities as required for each transaction. This protects resources on your most critical database servers. The server automatically chooses the transaction coordinator, also known as the commit point, based on pre-defined server criticalities. The commit point need not prepare for a two-phase commit transaction, so if a failure occurs no resources on the transaction coordinator are held for in-doubt recovery.

85) Remote Procedure Calls



Stored procedures may be called from remote nodes without restriction. Translation of character sets and datatypes automatically occures.

86. RPC's Protected by Two Phase Commit



Stored Procedures can perform remote updates or call other stored procedures on remote nodes. Any distributed transactions are automatically protected by TPC

87) Heterogeneous Data Support



Gateways exist which allow applications to access data in foreign databases. Typically the application cannot tell that this data is not coming from its normal server. There are lots of issues with database gateways, including read/write capability, performance, static vs dynamic SQL to DB2, and on and on.

88) Procedural Database Gateway



Gateways exist which allow remote stored procedure calls to be executed against foreign databases. The application believes that it is calling a native stored procedure, though the call is actually being executed against a foreign database.

89) SQL Based Database Gateway



Ad Hoc SQL statements can be sent to foreign databases for processing. The application thinks that it is sending a SQL statement to the local server. Issues include SQL dialect translation, functionality augmentation, static vs dynamic SQL, performance, and more.

90) XA TP Monitor Compliant



The Server is compatible with the XA Transaction Processing Monitor Specifications.

91) Top End TP Monitor Support



The server works with the Top End TP Monitor Product.

92) CICS TP Monitor Support



On IBM Mainframes running MVS, the server or gateway is able to work with the CIC TP monitor.

93) Transaction Can Span Multiple Net Protocols



A single transaction can be routed across multiple network protocols. For example, a PC on a Novell LAN send a query across IPX that talks to the LAN RDBMS Server. From there, the Query goes out across TCP/IP to a unix mainframe. Important becuase it sucks to have to buy TCP/IP stacks and NIC's for every PC just to talk to the enterprise server if you want to do client/server to your Novell Lan. Appletalk has similar issues.

94) ODBC Support



The product supports (or has announced support for) Microsoft's ODBC standard.

95) SQL Access Group Support



The product supports or has announced support for the SQL Access Group's connectivity standard.

V. DATABASE SECURITY


96) O/S Security Integration



The ability to specify that database logins will be validated by the operating system. That is, each account maps on a one to one basis with operating system login identifiers. On IBM systems, database security is integrated with the host security management system (RACF, ACF2 and others)

97) User Group Privileges/Roles



The system supports simple user groups, like unix, where access rights can be grants to group names rather than individuals.

Roles are collections of privileges like user groups. Roles differ from simple user groups in terms of additional functionality. Users may be granted several rolls, and can switch back and forth between roles in a session. Users can also enable and disable specific privileges within the role they are in. Rolls are additive in nature, thus a manager role can have the privileges of clerk plus additional rights.

98) Trusted Database/ Mandatory Access Control/ B1 Security



A high security version of the database, where every row is labelled with a security classification (Top Secret, Public, etc) Lots of other security and auditing features as well. Most applicable for Department of Defense applications, or corporate applications where lots of security is needed. Available as an option for all products. May not be based on the lastest version of the server engine, and may not have all functionality of the normal engine.
  • All are seperate, extra cost products from the normal engine.
  • Sybase System 10 has been targeted for this level, but it is not a current production product. 99) Undergoing for Evaluation by the NCSC
    The Database server has been selected for the Trusted Product Evaluation Program (TPEP) of the U.S. Government's National Computer Security Center (NCSC). The products were submitted for official government evaluation, targeted at Class C2 and Class B1, respectively, of the Trusted Database Interpretation (TDI) of the Department of Defense's Trusted Computer System Evaluation Criteria (TCSEC or "Orange Book").

Participation in this program is an involved process in which the functionality and security of the product is evaluated against computer security criteria published by the government. A product must meet the functional requirements of the target evaluation class, and the implementation of security features is tested to assure correct design and functions. There are seven classes of evaluation ratings, ranging from A1 (most secure) through B3, B2, B1, C2, C1 to D (minimal security). Once a product passes evaluation at the target rating class, it is placed on the NCSC's Evaluated Products List.

As of this date, no product has been certified in the United States. Ingres has been certified at the B1 equivalent level in Europe.

From Alanb_at_vnet.ibm.com
The only RDBMSs undergoing evaluation by NCSC for C2/B1 security are Oracle and Informix. They are still in the evaluation process and have not been certified. Ingres was evaluated by a European security group at a level equivalent to B1.

  1. Standard Tools with Trusted (B1) Product
    The Trusted (B1) Engine, uses the standard development tool set. No additional tools training is required for developers to switch from normal to Trusted applications. Secure databases do, however, require a different database and application design to accommodate the demands of multi-level secure applications. Some vendors have require specialized tools to work with the high security database.
  2. Non-Dedicated Server
    The Trusted (B1) DBMS engine does not compromise the security of other trusted applications on the host machine on which it is running. It works with the trusted operating systems to control access to secure data. Some implementations of Secure Databases require that the database server is the only program running on the server computer.
  3. Product being evaluated at C2 Level
    The standard version of the engine is undergoing evaluation by the NCSC at the C2 target security level. As of this writing, no products have been certified as C2 compliant.
    • From Alanb_at_vnet.ibm.com The only RDBMSs undergoing evaluation by NCSC for C2/B1 security are Oracle and Informix. They are still in the evaluation process and have not been certified. Ingres was evaluated by a European security group at a level equivalent to B1.
  4. Product being evaluated at B1 Level
    An enhanced security version of the product is undergoing evaluation by the NCSC at the B1 security level. As of this writing, no products have been certified B1 compliant.
    • From Alanb_at_vnet.ibm.com The only RDBMSs undergoing evaluation by NCSC for C2/B1 security are Oracle and Informix. They are still in the evaluation process and have not been certified. Ingres was evaluated by a European security group at a level equivalent to B1.
  5. Product Certified (Europe) at B1 Equivalent
    The High Security Version of Ingres has been certified by a European Agency in Europe at a security level similar to U.S. B1 Level.

VI. LANGUAGES & TOOLS


  1. SQL Procedural Language
    Because native SQL sucks for algorithmic processing, most vendors have devised procedural languages. These are the languages used to write stored procedures and triggers, and often can be used as stand-alone programs as well.
  2. Text/Image Datatypes (BLOBS)
    Database fields can contain Binary and Text data up to 2 gigabytes per field per row. Useful for storing images, sound, video, telemetry data, etc.
  3. Union Operator
    Standard union combines multiple tables and eliminates duplicate rows.

The ANSI standard also includes the UNION ALL operator, which does not eliminate table duplicates and is useful for horizontal partitioning of data.

  1. Select for Update
    SELECT... FOR UPDATE is useful for obtaining locks on desired structures and can be used if REPEATABLE READ functionality is required. It can be used to test if a structure is already locked by another user, like the TSET instruction or executing a spinlock. It is required for ANSI compliance.
  2. Outer Join Operator
    Outer Join lets you perform a join where you want all the rows from one of your tables, even if no matching rows exist in the second table. Very useful for modelling real world problems. Example, You have no employees in the HR Department, but you still want it to show up on the list of departments and their employees.
  3. Dynamic SQL
    SQL statements can be generated on an AD-Hoc basis and sent to the database engine for processing. They do not need to be seperately compiled and have plans generated before they can be executed.
  4. Static SQL
    SQL statements can be pre-parsed, compiled, and have plans generated. Thus they execute faster since this doesn't have to be done at run time. Improves speed at the expense of flexibility and maintenance. This section explicitly does NOT include stored procedures and triggers, which are typically stored compiled.
  5. EXPLAIN Command
    The EXPLAIN Command lets you print out a report that show the Query Execution path chosen by the optimizer, and statistics about how the actual query was done. It show what indexes were used, what type of join was done, how many rows were processed, etc.
  6. Transaction Savepoints
    Savepoints are markers which are used in transactions to increase their atomicity. Coupled with the ROLLBACK TO SAVEPOINT command. This prevents an entire multi-part transaction from being rolled back by the failure of one of the individual statements within it.
  7. Aliases/Synonyms
    Aliases and Synonyms are used as substitute names for tables, views, etc. Used to hide the fact that a table is actually owned by a different schema: CREATE SYNONYM FOO for DAN.FOO. Also used to implement location transparancy for distributed databases: CREATE SYNONYM FOO for DAN.FOO_at_LOS_ANGELES
  8. NIST Verified conformance to Standards
    The product has been tested and passed with 100% compatibility the NIST test suite for ANSI FIPS 127-1 SQL. Important for government bids as non FIPS 127-1 products are usually noncompliant, and important if you want to be able to write portable code.
    • Sybase System 10, and not the current product, has been verified.
  9. 3rd Party CASE Tools
    3rd Party CASE tools are available for Computer Aided Software/ Systems Engineering. Some people prefer 3rd party case tools because they can generate code for several different types of DBMS's, and are considered to be less of a vendor "lock in."
  10. Internal CASE Tools
    The vendor provides CASE tools for Computer Aided Software/ Systems Engineering. Some people prefer vendor provided CASE since it tends to be tighly linked to the database and can take advantage of all of the most up to date features.
  11. Graphics Tools
    The vendor provides tools that allow graphical applications to be built that use database data. All of the vendors have 3rd party graphics tools available.
  12. Internal E-Mail System Integration
    Mail systems are provided by the vendor which are written specifically to run on top of and integrate with the database system. In general this facilitates the combining of database data and reports into the mail system. Database events can trigger mail messages, and vice-versa.
  13. 3rd Party E-Mail System Integration
    3rd Party Mail systems exist which are written specifically to run on top of and integrate with the database system. In general this facilitates the combining of database data and reports into the mail system. Database events can trigger mail messages, and vice-versa.
  14. Trigger/Stored Procedure Sends E-Mail
    A trigger or stored procedure can trigger the creation of an E- message. Ask the vendor to show how easy this is to do in the various systems. Some come with built in stored procedures to accomplish this, others you need to write your own C code. I think this is a pretty cool feature to have.
  15. Office Automation Tools
    Office Automation tools are available, which integrate RDBMS data into E-Mail, Word Processor, Calendar-Schedular, Spreadsheet, Spell Checking, Document Management, etc...
  16. Bit-Mapped Graphics
    The 4GL supports Bit-Mapped Graphics. How well ? Ask some users.
  17. National Language Support
    Which of the following languages are supported. Language support includes translated manuals, error messages, commands, etc. Also number formats and dates are different in different languages. 16 bit characters can be stored where appropriate.
  18. Precompiler Support
    Precompilers allow you to embed SQL statements (Embedded SQL) in 3GL programs. These programs are run through a pre-compiler to produce pure 3GL code with low level function calls replacing the SQL statements.

Precompilers should also provide both syntactic and semantic checking for increased programmer productivity. Precompiler support is a key component of the ANSI SQL standard as well as the Federal Government's NIST SQL compliance test.

Precompilers are important becuase they let you write standard code that can be re-precompiled and re-compiled and then will run against other database engines. If you write your code to your product's function call interface, you are screwed if you ever want to change engines.

  1. Function Call Interface
    A low level function call interface is available. Arguments for using this are that it provides better performance than using embedded SQL and precompilers, if you know what you are doing. Arguments against include that precompilers are getting better and get you close to the speed of writing directly in the call interface, and that if you write in the vendor's call interface you are locked into non-portable code.
  2. Module Language
    Modelled after ADA, the idea is that you modularize all of your embedded SQL in an external file, and only put native 3GL statements in your actual program. The Module Language helps you integrate the two.
  3. Bind Variable Support
    Bind variables allow applications to execute a SQL statement multiple times with different values for the bind variables without reparsing the SQL statement. This leads to performance improvements in applications requiring repetition of SQL statements. Bind variables are usually associated with database cursor support.

PART 2



Other Stuff You Should Consider

This section has some things suggested by people on the net that either don't lend themselves well to the checklist format, or I didn't have time to investigate as thoroughly as I'd like. These are all great questions to make the vendors sweat, and do their due dilligence.

  1. Theoretical Limits
    Databases per Computer Databases per Organization Total Single Database Size Tables Per Database

Rows Per Table
Columns Per Table
Indexes Per Table
BLOBS Per Table

Columns Per Index
Rows Per Index
Levels Per Index

Locks Per Transaction
Locks Per Database Page
Locks Per Table
Locks Per Database

Columns Per SELECT Statement
Tables Per SELECT Statement
Databases per SELECT Statement

Time to Back Up 1 Gigabyte
Time to Load 1 Gigabyte Unsorted Data
Time to Load 1 Gigabyte Sorted Data
Time to Recover 1,000,000 Transactions

2) Practical Limits/References



For the categories above, what are the limits for actual existing production databases out there today on any hardware ? Same Question, but for the target machine or hardware class for this application.

3) Datatypes and Precisions



What Datatypes are available and at What Precision Levels ? What is the Maximum Length of each ?

Suggested Types Include

Fixed Length Character
Variable Length Character
Date
Time
Smallint
Integer
Packed Decimal
Float/Double
Currency
Binary Large Object (BLOB)
Text Large Object

Thanks to the following people for their input:

Alan Goodman		alan_at_dssmktg.com
Bill Coffin 		billc_at_ingres.com
RW Salnick		salnick_at_dejavu.spk.wa.us
? ? 				?_at_pivot-sts.sbi.com
Graeme Sargent		graeme_at_pyra.co.uk
Matt Kosatur		matt_at_meaddata.com
Paul Turner		turner_at_telstar.kodak.com
Snorri Bergmann		snorri_at_strengur.is
Bill H 			billh_at_greed.sbil.co.uk
mark saindon		msaindon_at_oracle.com
Ben				ben_at_sybase.com
Alan Beal			alanb_at_vnet.ibm.com
Curt Leaist 		curtl_at_zed.com
Lou Kates			louk_at_research.teleride.on.ca 
Newsgroups: comp.databases
Subject: 
Summary: 
Expires: 

Sender:
Followup-To:
Distribution: world
Organization: Anderson Graduate School of Management, UCLA Keywords:

Newsgroups: comp.databases,comp.databases.informix,comp.databases.ingres,comp.databases.oracle,comp.databases.sybase

Subject: RDBMS Server Evaluation Matrix v 1.0.2 - FAQ
Summary: RDBMS Feature Comparison 
Expires: 

References:
Sender:
Followup-To: sender
Distribution: world
Organization: Anderson Graduate School of Management, UCLA Keywords: Oracle, Ingres, Sybase, Informix, RDBMS, FAQ

Attached is the first production draft of the RDBMS Server Comparison Matrix. As a change to this version, I have included a detailed description of each of the categories at the bottom of the matrix. I have also included numerous corrections, changes, and suggestions from the folks listed at the end of this document.

Please forward any corrections, comments, or additions to me at the address below.

When using the matrix, please remember that this is shareware and no warranties are given or implied. I have made every effort to include correct information, and have had this document reviewed by active usenet contributors and users of all the products. Interestingly, I send copies of this to active internet participants from each of the vendors, but didn't get a ton of feedback.

I considered including information for Sybase System 10, which make that product look a lot better. However, Since the next versions of Oracle7, Informix, and Ingres also have good new stuff in them I don't want to play the beta game. I've included some information on a few beta offerings such as for ODBC and precompilers,  but I hesitate to do this for the engines themselves.

I suggest that you use this document as a starting point for your database search. Figure out what you application needs are, then what features are most important to you, and use this to get a rough idea of what might be the best fit. Then put together an RFP for the vendors. I've found that an evaluation matrix weighting the functionality you deem most important, and ranking each vendor's implementation of that functionality based on their responses is very helpful in selecting the right product for your application.

I hope this helps people out.

Dan Druker
2nd Year MBA Student
The Anderson Graduate School of Management at UCLA (Formerly with Oracle Corp, now Independent) ddruker_at_agsm.ucla.edu

RDBMS Server Feature Comparison Matrix: Version 1.0.2

Copyright 1993 by Daniel Druker
All Rights Reserved

Product

O = Oracle7 		Version 7.0
S = Sybase  		Version 4.9
A = Ask/Ingres 		Version 6.4
I = Informix On-Line	Version 5.0


NOTE: The 4th column is Informix, not Ingres. This confuses some people...

Y = Yes, the feature exists in the current production product
N = No, the feature does not exist in the production product
* = See the note in the detailed description section
B = The feature exists in a useable, late beta product. 


Product						O S A I 

I. PERFORMANCE


  1. Row Level Locking Y N N Y
  2. Page Level Locking N Y Y Y
  3. Table Level Locking Y Y Y Y
  4. Database Level Locking Y Y Y Y
  5. Programmable Read Isolation Levels N N N Y
  6. System Doesn't Escalate Locks Y N N Y
  7. Programmer May Escalate Locks Y Y Y Y
  8. Versioning / No Read Locks Y * * *
  9. Dirty Read Mode N N Y Y
  10. Multi Server Architecture Y Y Y N
  11. Symmetric Multi-Processor Support Y Y Y Y
  12. Support for Loosely Coupled Systems Y N Y N
  13. Array Interface Y N N N
  14. Shared Commits Y Y Y Y
  15. Asynchronous I/O Y Y Y Y
  16. Raw I/O Y Y N Y
  17. Shared Log Files Y Y Y Y
  18. Non-Blocking Queries Y * * *
  19. Clustered Tables Y N Y N
  20. Clustered Indexes Y Y Y Y
  21. Hash Indexes Y N N N
  22. ANSI Standard Cursor Support Y N Y Y
  23. Backward Scrolling Cursors N N N N
  24. Non-Blocking Sequence Generator Y N N Y
  25. Unique-Key Generator Y N Y Y
  26. Cost Based Optimizer Y Y Y Y
  27. Optimizer uses Data Content Heuristics N N Y N
  28. Optimizer Scans Entire Table to get Stats Y Y Y Y
  29. Optimizer can estimate stats to save time Y N Y N
  30. Optimizer uses network costs N N N N
  31. Optimizer uses hardware costs N N N N
  32. Optimizer knows real-time load factors N N N N
  33. Optimizer moves join location Y N Y Y
  34. Stored Procedures in Database Y Y Y Y
  35. Cached Procedures are Shared Y Y Y N
  36. Stored Procedures can return Sets Y N N Y
  37. Stored Procedures Return All Datatypes Y N N Y
  38. Stored Functions in Database Y N N Y
  39. Stored Procedure Cache Y Y Y Y
  40. Shared SQL Cache Y N N N
  41. Parallel Data Query N N N N
  42. Parallel Index Build N N N N
  43. Parallel Sort N N N N
  44. Performance Monitoring Tools Y Y Y Y

II.INTEGRITY O S A I



45. Adherence to Industry Standards
	SQL Interface				Y Y Y Y
	FIPS 127-1 Validated 			Y * Y Y
46. Declarative Integrity Model			Y N N Y
47. Cascading Updates				N N N N
48. Cascading Deletes 				Y N N N 
49. Loosely Typed Datatypes, Ala ADA		Y N N N
50. Null Support				Y Y Y Y
51. Database Triggers				Y Y Y N
52. PRE- and POST- Triggers			Y N Y N
53. Triggers fire once per SQL Statement	Y Y Y N
54. Triggers fire for each row affected 	Y N N N
55. Programmer can control trigger fire order	Y Y N N 
56. Automatic Procedure Recompilation		Y Y Y Y
57. Programmer Defined Datatypes		N Y Y N
58. Event Alerters				Y N Y Y
59. Database Events Call 3GL Program 		Y * * N
60. Stored Procedures Call 3GL Program 		Y * * N 


III. DATABASE					O S A I
=============			
61. Portable					Y Y Y Y
62. Automatic Database Recovery 		Y Y Y Y
63. Multiplexed Log Files 			Y Y Y Y
64. Database Mirroring				N Y N Y
65. On-Line Database Backup			Y Y Y Y
66. Automatic Log Backup 			Y Y Y Y
67. On-Line Software Diagnostics		Y Y Y Y
68. Hot Standby Server				Y Y Y N
69. Full Cluster Utilization			Y N Y N
70. On Line Recovery				Y Y N N
71. Parallel Backup of multiple files		Y N N N 
72. Parallel Recovery of multiple files		Y N N N 
73. Multiple Log Files				Y N Y N
74. Resource Limiter/Governor			Y N Y N
75. DBA Utilities 				Y Y Y Y
76. Database Auditing 				Y Y Y N


IV.DISTRIBUTED RDBMS				O S A I 

======================
77. Distributed Join Y N Y Y 78. Synchronous Table Replication Y N Y N 79. Asynchronous Table Replication Y N N N 80. Connections to other Databases Y Y Y N 81. Programmatic Two Phase Commit Y Y Y N 82. Transparent Two Phase Commit Y N Y Y 83. Two Phase Commit to Heterogeneous RDBMS Y N N N 84. Movable Transaction Coordinator Y N N N 85. Remote Procedure Calls Y Y Y Y 86. RPC's Protected by Two Phase Commit Y N Y Y 87. Heterogeneous Data Support Y Y Y N 88. Procedural Database Gateway Y Y N N 89. SQL Based Database Gateway Y Y Y N 90. XA TP Monitor Compliant Y N N Y 91. Top End TP Monitor Support Y Y Y Y 92. CICS TP Monitor Support Y Y N N 93. Trans. Can Span Multiple Net Protocols Y N Y N 94. ODBC Support B B B B 95. SQL Access Group Support B B B B V. DATABASE SECURITY O S A I
======================
96. OS Security Integration Y Y Y Y 97. User Group Privileges/Roles Y Y Y N 98. Mandatory Access Control * * * * 99. Undergoing for Evaluation by NCSC Y N N Y 100. Standard Tools with Trusted Product Y N N Y 101. Non-Dedicated Trusted Server Y N Y Y 102. Product being evaluated at C2 Level Y N N Y 103. Product being evaluated at B1 Level Y N * Y 104. Product Certified (Europe) at B1 Equiv N N Y N VI.LANGUAGES & TOOLS O S A I
======================
105. SQL Procedural Language Y Y Y Y 106. Text/Image Datatypes (BLOBS) Y Y N Y 107. Union Operator Y Y Y Y 108. Select for Update Y N Y Y 109. Outer Join Operator Y Y Y Y 110. Dynamic SQL Y Y Y Y 111 Static SQL Y Y Y Y 112. EXPLAIN Command Y Y Y Y 113. Transaction Savepoints Y Y Y Y 114. Aliases/Synonyms Y Y Y Y 115. NIST Verified Conformance to Standards Y * Y Y 116. 3rd Party CASE Tools - Upper and Lower Y Y Y Y 117. Internal CASE Tools - Upper and Lower Y Y N N 118. Graphics Tools Y Y Y Y 119. Internal Email System Integration Y N N N 120. 3rd Party Email System Integration Y Y Y N 121. Trigger/Stored Proc Sends Email Y Y Y N 122. Office Automation Tools Y Y Y Y 123. Bit-Mapped Graphics Y Y Y Y 124. National Language Support European Y Y Y Y Semitic Y N Y N Asian Y N Y Y 125. Precompiler Support C Y Y Y Y COBOL Y Y Y Y Pascal Y N Y Y PL/I Y N N N SAME/DL N N N N ADA Y N Y Y Fortran Y N Y Y C++ B N N N 126. Function Call Interface C Y Y Y Y COBOL Y Y Y Y ADA Y N Y Y Pascal Y N Y Y PL/I Y N N N SAME/DL N N N N Fortran Y N Y Y C++ B N N N 127. Module Language Y N Y N 128. Bind Variable Support Y N Y Y PART 2: Other Important Stuff 1. Theoretical Limits 2. Practical Limits/References 3. Datatypes and Precisions

DETAILED DESCRIPTION OF CATEGORIES


  1. Row Level Locking
    First, a little about granularity. The smallest level of granularity in commercial databases is the Row level. Next comes Page level. Then Table Level. Then Database Level. Typically Database pages are 2048 bytes, so if your rows each contain 100 bytes of data you might squeeze 20 rows into a page.

Row level locking means that the database can use a lock granularity of a single row (tuple for the coneheads out there.) This means that multiple users (Up to 20 in the example above, can simultaneously update different rows on the same page. Each user, when performing an operation on the row, only locks that row in question and does not interfere with other users in the same page. Arguments for Row Level Locking are that it permits the highest degree of concurrency - Users tend not to lock each other out. Arguments against claim that row level locking causes a lot of overhead. My belief is that users are the slowest thing in my system, so I'd rather not wait for them. Thus I prefer row level locking.

2) Page Level Locking



See Also Row Level Locking. In Page Level, Locks occur at (surprise) the page level. Thus when one user updates a row, the entire page is locked and other users are blocked from updating (sometimes reading too) rows in that page. Sometimes users get around page level blocking by forcing what would be small rows to take up an entire page. This trick effectively simulates row level locking for critical tables that are getting hit by updates a lot, and are having problems with contention. The arguments for and against page level locking are mentioned above.

3) Table Level Locking



The database supports the LOCK TABLE command to lock an entire table at a time. Useful for locking a table for batch updates, locking out users for maintenance or reporting, etc.

4) Database Level Locking



The entire database can be locked with a single command. As far as I know, the products all support this through a single-user or DBA only mode, but there is no LOCK DATABASE command.

5) Programmable Read Isolation Levels



The programmar can indicate which type of read isolation he wishes to use for each SELECT statement. Examples are Dirty Read (Uncommitted Data), Cursor Stability, Repeatable Read, etc. These terms are given in order of growing strength.

6) System Doesn't Escalate Locks



Some databases have a maximum number of locks that can be supported, and automatically escalate row or page level locks to table level locks when that number has been exceeded. I believe that system generated lock escalation causes problems by creating artificial deadlocks.

7) Programmer May Escalate Locks



The programmer may escalate locks at his/her discression. For example, if the program writes to a certain percentage of the pages in a table, it may want to go ahead and upgrade to a table lock to avoid running into locked pages and having to wait or to rollback work.

8) Versioning / No Read Locks



The database keeps track of several versions of committed data so queries see only those rows that were committed and valid at the instant the query started. The database allow consistent reads (committed data) without taking out read locks. This is a good thing since readers do not interere with writers, and accurate reports can be written since only committed data is read.
  • Sybase has Browse Mode which does not use read locks. Informix and Ingres have dirty read mode which do not use read locks. 9) Dirty Read Mode
    The database has a read mode which scans the data as it currently exists on disk or in memory, regardless of whether it has been committed or not. In general dirty reads are very fast, but since the data is not necessarily valid some applications can't use them. Good for reports where accuracy doesn't matter.
    1. Multi Server Architecture
      Multiple Server processes exist on the server machine. These usually take two forms, a two-task architecture where each client user gets a corresponding OS process on the server, or a multi-threaded architecture where each of the server processes can manage multiple clients. Multi-Server architecture helps balance system load and is mandatory for taking advantage of symmetric multi-processors. The two-task architecture is considered resource intensive on operating systems that don't deal well with lots of processes well.
    2. Symmetric Multi-Processor Support
      Can the server take advantage of multiple CPU's in a symmetric multi-processor environment. Things to make sure a product can do: Make sure each client can read, write and commit transactions in parallel on any available CPU. Make sure, as you add CPU's, that transaction throughput increases in a near linear fashion. Some products and operating systems may scale well to 4 CPU's, others to 8, and others to 30, hundreds, or more. Some may not scale at all. (OS2, for example)
    3. Support for Loosely Coupled System
      Pioneered by DEC in their VAX cluster, loosely coupled processors are usually independent computers connected by a fast communications bus that can share and coordinate resources. Databases that work in this environment can run simultaneously on several nodes in a cluster. This increases power since processing is going on at each node, and reduces complexity since no two-phase commit or distributed database work is needed. Also helps for fault tolerance since if one of the machines dies the remaining nodes keep running. Today several unix machines (Pyramid, Sequent, IBM RS/6000) offer clustering in addition to DEC. Performance concerns may exist if OLTP goes against the same data on more than one node since database blocks get sent pinging around the cluster if this happens to much. Other than that I think this is pretty cool. Also this is the basis for how Oracle works on Massively Parallel machines like Ncube, KSR, Maspar, etc.
    4. Array Interface
      Can the client request that multiple rows be sent in a single batch, as opposed to a network request for every row to be returned. Reduces network traffic. Example: Send me 1000 rows, thanks. vs. send a row, thanks, send a row, thanks, 1000 times. Only an issue for cursor based processing: for streams based this is not an issue.
    5. Shared Commits
      Shared or Group commits write multiple transactions into the redo structure in a single physical I/O. This is an advantage over database systems which require a physical I/O for each individual transaction. Some products may not support shared commits on SMP hardware.
    6. Asynchronous I/O
      Most operating systems do provide Asynchronous I/O, including VMS, MVS, VM, and unix from Pyramid, and Sequent to name a few. On systems that do not provide asynchronous I/O, the need to wait for disk writes to complete may become a performance bottleneck. The server should be able to take advantage of Asynchronous I/O where available, and have a strategy to deal with the lack of asynchronous I/O on systems where it is not available.
    7. Raw I/O
      On Unix, it is much faster to be able to access raw filesystem devices, which are not controlled by the unix file and buffer managers, than to use normal or "cooked" files.
    8. Shared Log Files
      Every change to the database structure is automatically written in a "piggybacked" manner to the Redo file structure. Transactions are piggybacked in a way that one physical write can actually commit several transactions. This is a very efficient way to commit transactions because it requires only a sequential write to an o/s file. Multiple users can share the same redo file structure.
    9. Non-Blocking Queries
      For systems where multiple users are reading and writing to the same table at the same time, it may be critical that readers doing reports do not interfere with ongoing transaction processing. When readers need to obtain locks to get information, they are said to "block" writers since writers are blocked from getting the locks they need to perform their update. This is a great feature if you have a transaction processing system on which you want to be able to run reports against live data during operating hours.
  • Ingres and Informix can accomplish this by using Dirty Read mode. Sybase does it by using browse mode. See the above sections for limitations and applications of these features.
    1. Clustered Tables
      Clustered tables: Store multiple tables close to each other on disk based on a commonly used join key. This helps speed up access if two tables are always accessed by the same join key. Takes longer to insert and update than normal storage method.
    2. Clustered Indexes
      Clustered Indexes: Several Definitions. Either an index on the cluster key mentioned above, or on some systems a pre-sorted index cluster that greatly speeds data retrieval for ordered operations, at the expense of insert and update speed. Oracle uses the first type, Sybase the second.
    3. Hash Indexes
      Most products use binary tree indexes, which are flexible and good for objects that grow, shrink, change, etc. Hash indexes offer faster performance but are appropriate only for relatively static tables.
    4. ANSI Standard Cursor Support
      Cursors are memory structures which store the SQL statement, a parsed/compiled version of the SQL statement, a buffer for the result data, and pointers to the result set of the query, among other things. Cursors allow applications to process multiple SQL statements in a single database connection. This is crucial for window-based environments which require multiple SQL statements to be processed at one time with different information being displayed in different windows. This allows multiple windows to be created via a single session and reduces memory utilization requirements. In addition, multiple SQL statements can be encompassed in a single transaction with all internal locking controlled by the database engine.

Cursors are also required to meet ANSI standards, both for SQL compliance and for Pre-Compiler Support.

23) Backward Scrolling Cursors



All cursors permit forward scrolling of the result set, some products also allow backward scrolling. Good for those of you that like to think backwards.

24) Non-Blocking Sequence Generator



Many applications require that unique keys be generated, for example for purchase order numbers. A nice feature is a multithreaded  non-blocking sequence generator which can be used to generate sequence numbers for rows in tables. Some databases require applications needing unique primary keys to single thread all transactions through a single sequence table, which can become a major application bottleneck.

25) Unique Key Generator



The database contains a mechanism for generating unique keys.

26) Cost/Statistics Based Optimizer:



The database will gather and store comprehensive statistics about database structures. The statistics based optimizer will choose the most efficient access paths to data based on the information gathered. Some products do a much better job than others. Nice things to have: The ability for the system to "Guess" or estimate the statistics on a table based on a sample of the data. This is a lifesaver when it takes 4 hours to analyze statistics for a 30 gigabyte table. Also, some of the optimizers take into account low/high data values, and data distribution heuristics.

27) Optimizer uses Data Content Heuristics



The optimizer scans the data to determine the minimum value in the table, the maximum value, and the average value. It also keeps track of the distribution of data within the rows.

28) Optimizer Scans Entire Table to get Stats



To gather statistics, the optimizer must scan the entire table. This can take a long long long time for multi-gigabyte tables.

29) Optimizer can estimate stats to save time



The Optimizer can estimate statistics by scanning only a random sample of the rows in the table. Very useful for collecting statistics on large tables where it is impractical to scan the whole table for statistics.

30) Optimizer uses network costs



The optimizer is aware of the existance and make-up of the network, and some costs have been assigned to each link. These may be either costs in terms of link speed or in actual dollar expense for using the link. The optimizer will determine the lowest cost path in terms of time or expense, which may involve shipping data to different nodes for joins or through intermediate nodes depending upon the network.

31) Optimizer uses hardware costs



The optimizer is aware of the relative processing power, CPU Cycle cost, I/O speed and cost, etc. It decides how to implement queries based on this cost information, and the user's resource profile.

32) Optimizer knows real-time load factors



The optimizer can take into account real time system load information in determining the optimal path for queries. Thus this path may change from time to time depending on system load.

33) Optimizer moves join location



Classic example is a local table with 10 rows being joined to a remote table with 1 Million rows. The optimizer is smart enough to send the local table to the remote node for the join rather than the other way around.

34) Stored Procedures in the Database



Stored procedures are bits of procedural code, grouped together and stored in the DBMS engine. Stored procedures are usually stored in shared, compiled format. Stored procedures should be callable by applications, by other stored procedures, or by database triggers. There should also be full dependence tracking that will automatically recompile stored procedures when objects upon which they depend change. Stored procedures are good because they typically improve performance and reduce network traffic. They also are great for encapsulating allowable operations on data - most systems let you grant access to a stored procedure without granting access on the underlying tables. The major downside is that there is no standard for stored procedures, so any code you write that uses them is non-standard and non-portable.

35) Cached Procedures are Shared



Stored Procedures are cached in memory on the server. A single copy of the stored prodedure can be used by multiple users. Saves memory and execution time.

36) Stored Procedures can Return Sets



Stored procedures can return an array or table of data. Not limited to returning only a single row.

37) Stored Procedures return All Datatypes



Stored procedure are capable of return all available server datatypes, not just a limited subset.

38) Stored Functions in Database



Similar to Stored Procedures, but instead stored function calls can be defined by the user. User defined functions are useful for many reasons, similar to the idea of having both procedures and functions in a 3GL language

39) Stored Procedure Cache



Stored Procedures are cached in memory on the server. They do not need to be read in from disk each time they are called. There is a special, DBA configurable area of memory which holds the procedure cache, so it does not get flushed by users doing large queries, etc.

40) Shared SQL Cache



The ability to store ad-hoc SQL statements in shared cache memory on the server. The server recognizes when a client requests the execution of a SQL statement already in the shared cache, and can use the already parsed and compiled representation already in memory. This can provide large reductions in server memory usage where lots of users are using the same application (and thus the same SQL statements.)

41) Parallel Data Query



On a symmetric multi-processor, a single query can be decomposed into several parts that can each be dispatched to a separate processor for execution in parallel. This same functionality will eventually extend to clustered systems and networks, but no one can do it yet today.

42) Parallel Index Build



Same idea as parallel query, but for building indexes. Dramatically speeds index builds for large tables.

43) Parallel Sort



Whenever sorted data is needed, the sort operation can be decomposed and processed by multiple CPU's simultaneously.

44) Performance Monitoring Tools



The Vendor provides tools to monitor system performance and to diagnose problems. Tools are available to monitor both individual SQL statements and overall system performance.

II. INTEGRITY


45) Adherence to Industry Standards



There are lots of RDBMS standards. Most useful is FIPS 127-1 since the government actually tests the products to ensure their compliance.

There are also SQL2 and SQL3 standards out there or under development, but no official test suites being proctored. All are complex standards that define a common SQL dialect at an incredibly detailed level. The advantage of this is that if you write FIPS 127-1 code, you'll be able to run your application against all adhering databases with little re-writing of code.

It is not sufficient for a vendor to claim "we use ANSI-  SQL" This is like saying all unix's are the same. If you want to have a chance to be able to port your code to another DBMS without a rewrite, you must only use 100% ANSI standard statements and a database that has been CERTIFIED by NIST. From alanb_at_vnet.ibm.com
You should say FIPS 127-1, not FIPS 127 since there was the original FIPS 127, then the current FIPS 127-1, and soon FIPS 127-2. They are all based on different versions of the ANSI standard. NIST publishes the Validated Products list, and Oracle7, Informix OnLine 5.0, and Sybase 10 show up on the list as of Jan 93. SQL2 compliance is incorrect since there are no test suites yet (FIPS 127-2 will provide the tests). Also, SQL2 has three levels, entry, intermediate, and full; which level are you talking about? When it comes to conformance, do not take a vendor's word for it without them providing a certification of compliance. Also, note that there is a difference between conformance and compliance; I believe compliance means it has been formally tested and verified.

On the subject of conformance/compliance to the SQL standards, the following applies. ANSI X3.135-1989 SQL was adopted by NIST in FIPS 127-1. NIST developed test suites to test compliance with the SQL standard. Only if the DBMS was tested against version 3.0 of the test suites can a vendor claim compliance with the standard. And compliance can only be claimed for a specific hardware/operating system combination, ie. just because one is compliant on Sun does not mean one is compliant on HP. The results of the tests are published every quarter in the Validated Products List. NIST is working on FIPS 127-2 which adopts ANSI X3.135-1992

SQL or SQL2. To claim compliance is jumping the gun at this point because there are no test suites available. And besides, there are three levels to the SQL2 standard - entry, intermediate, and full. Which of these are vendors claiming compliance to?

  • Sybase System 10 has been FIPS 127-1 Validated. 46) Declarative Integrity Model
    This includes full support for declarative referential integrity, default values and domain support. The ANSI declarative approach greatly simplifies the process of providing database enforced integrity. It allows the programmer to define Primary and Foreign Keys, Default Values, Unique Keys, and so on when creating database tables. The database engine automatically enforces these rules to protect system data. I think this is vastly superior to databases that make you program referential and entity integrity using stored procedures and triggers. Certainly more portable, standard, and lower maintenance to use the Declarative Method. Probably better performance as well. 47) Cascading Update
    Defined for this matrix as the ability, within the Declarative model, to propegate updates to the primary key that is a foreign key to another table. For example, if I update a purchase order number, all line items that are associated with that purchase order must have their primary key updated or they will become disasociated records. All of the products with triggers should be able to do this, but I've never tried it and the declarative constraints may block updates to foreign keys. 48) Cascading Delete
    Defined specifically for this matrix as supporting the ON DELETE CASCADE definition of the declarative referential integrity model. Delete cascade deletes all the corresponding children if you delete the parent record. All of the products that support triggers can achieve this programatically, but I prefer the Declarative implementation for the reasons mentioned in the above Declarative Integrity Model Section. 49) Loosely Typed Datatypes, Ala ADA
    The %TYPE variable can be used to declare a variable that has the same structure as the columns or rows in a table or a view. If the datatype of any of the columns in the table changes, any variables declared as %TYPE will automatically change as well. If the variables are in a stored procedure or trigger, they will be automatically recompiled by the database engine, transparently to the user. 50) Null Support
    The programmer can create a table and specify whether or not NULL values can exist in each column. The SQL implementation should provide a function that will determine whether the value in the column is null or not. This feature is useful when performing arithmetic operations and outer joins. The database should also correctly evaluate to false NULL = Anything. 51) Database Triggers
    Database Triggers are pieces of procedural code associated with tables and are fired implicitly when data is modified in their table. They are used for access control, data validation, referential integrity, synchronous table replication, and many more uses. 52) PRE- and POST- Triggers
    Event triggers can be defined to execute before and/or after database events such as inserts, updates, etc. This is useful as a BEFORE trigger can prevent an illegal million row update from happening at all; some products only provide triggers to invalidate and roll back such a transaction after it has already happened. AFTER triggers are better suited to performing transaction auditing, distributed database, etc. 53) Triggers fire once per SQL Statement
    Database triggers automatically fire once, and only once, per SQL statement. These are useful for security, for example, when the trigger might check that the date is not a Sunday, and if it is fail the SQL statement. This type of trigger need fire only once per SQL statement. 54) Triggers fire once for every row affected by an SQL stmt
    Database triggers can automatically fire once for every row affected by a statement. For value based auditing, a developer would use post-insert, update, and delete triggers that fired for each row affected by the statement, and wrote the new and old values of each row into an audit table.

The ability for triggers to fire on a per row affected basis has many applications besides value based auditing. Synchronous table replication, recursive value checking, and multi-row transaction management are only a few possibilities.

55) Programmer can control Trigger Fire Order



Triggers fire in a pre-determined, known order, as opposed to a random order that may differ for each SQL statement.

56) Automatic Procedure Recompilation



The database maintains extensive dependency tracking information which allows it to automatically recompile all objects in a dependency chain whenever code is changed. The database kernel will recompile all procedures, functions, triggers, and packages necessary when dependant structures are modified. The database may also be capable of resolving dependencies of modified procedures on remote nodes.

57) Programmer Defined Datatypes



The programmer can define custom datatypes to the Engine. Useful if you have wierd datatypes, like longitude/lattitude etc.

58) Event Alerters



Events can be defined in the database, which the database will watch for. When the event occurs the database engine will take some pre-determined action. Example: When the inventory drops below a certain level in the inventory table an event alerter notices, send a message to the purchasing clerk and automatically enters a purchase order for the product needed.

59) Database Events call 3GL program



Event Alerters can call a 3GL program, for example to turn the operator's screen red if the reactor is about to melt down.
  • Through add on products (Sybase Openserver, Ingres Knowledge Manager) this is possible. 60) Stored Procedures call 3GL program
    A stored procedure can call a 3GL program, for example to update rows in the employee table and then send email to the employees manager confirming the changes were made.
  • Through add on products (Sybase Openserver, Ingres Knowledge Manager) this is possible.

III. DATABASE


61) Portable



The server should run on many different types of hardware and operating systems. 3-tier portability includes installations on microcomputers, minicomputers and mainframes. In addition, some databases run on super computers and massively parallel computers. Hardware portability is important so you can always select the best performing hardware platform and be sure your database applications will still work.

Other aspects of portability include network protocol portability, and Graphical User Interface portability.

62) Automatic Database Recovery



Database failures are usually grouped into several loose categories. Instance failure, occurs when the machine running the database server crashes, or software fails, or a dumb operator kills the server, with no losses on the database disks. Media failure occurs when a disk containing database information fails.

Database recovery in case of instance failure should be performed automatically by simply restarting the database instance. The engine should roll back any transactions that were pending but not committed at the time of failure, and ensures the integrity of all committed transactions. The time to recover from instance failure should be configurable by the DBA. Recovery from media failure should be able to be performed in automatic, semi-automatic, or manual modes. In all cases, recovery from media failure requires that a backup of the lost files at some point in time is available, along with all database redo files since that point in time.

63) Multiplexed Log Files



The database can maintain multiple transaction redo files on different disks and write to all of them at the same time. This provides added protection in case of a loss of the disk upon which the log files reside, which otherwise would render the database non-recoverable for up to the minute transactions.

64) Database Mirroring



The RDBMS can perform software disk mirroring of the database files regardless of whether the mirroring is supported at the Hardware or Operating system level. Mirroring means keeping multiple copies of all database information to protect from disk failure.

All of the products can take advantage of hardware and operating system disk mirroring, which are preferable to RDBMS software mirroring if they are available.

65) On-Line Database Backup



DBA's can make an online backup of the entire database while the database is up, all tables are online and users are active. This should not require locking, and should have a minimal effect on system performance, other than the I/O's required to perform the backup.

66) Automatic Log Backup



Filled database logs are automatically backed up to secondary storage on disk or tape. This archiving feature as well as online backup and recovery are crucial elements for implementing mission critical applications requiring 24 hour availability.

67) On-Line Software Diagnostics



This is a wide and vague category, but basically what I'm getting at is that tools are available on-line to diagnose problems, understand performance, monitor users, etc. You should also be able to configure the database on line. All the products can do this to some extent, but since I'm not providing ratings of how well features are implemented I suggest you ask users for how they feel about this area. These tools are typically very weak in comparison to mainframe operating system diagnostics.

68) Hot Standby Server



The capability to restart the database on a backup machine in the case of hardware failure. This capability is provided on hardware platforms that support shared disks between multiple computers, such as the DEC VaxCluster. Upon sensing that the primary computer has gone down, a utility on the second machine can then automatically start the backup engine. The engine will recover any outstanding transactions from the failed machine, and then will go into standard operation. 3GL applications can be programmed to notice the failure of the primary server, and automatically reconnect to the backup machine. I think that full cluster utilization mentioned below is a much better solution, but it is more expensive.

69) Full Cluster Utilization



The capability to run a single database on multiple machines in a shared disk environment. This is an advantage over other database systems that only have 'hot standby' or 'companion server' capabilities. Applications can take advantage of the processing power of all machines, not just a single computer at a time. The engine provides automatic fail over reconciliation when nodes do fail, and applications can be written to automatically reconnect to the database if the node to which they are attached fails.

70) On-Line Recovery



This is critical for OLTP, and 24 hour a day mission critical systems. On line recovery is the ability to recover subsets of the database while the rest of the database is up and on-line. The database does not need to be taken offline during recovery of a data file. Users that do not require access to failed areas of the database will remain unaware that failure ever happened. Users that do attempt to access damaged files will receive an appropriate error message, and will be able to access the file as soon as recovery is complete. Some databases may require that the entire database be taken off line for recovery from media failure.

71) Parallel Backup of Multiple Database Files



The Database files can be backed up in parallel. That is, multiple database file backup commands can be issued simultaneously, and on an SMP system can run across multiple processors.

72) Parallel Recovery of Multiple Database Files



Multiple recover database files can be issued simultaneously. On an SMP machine, the recover processes will be spread across available processors. Critical for recovering large databases on SMP machines. Why ? Examine running a transaction processing application for 1 hour on a 14 CPU machine. Thus you have accumulated 14 CPU hours of transaction processing. A disk drive blows up. If you don't have parallel recovery, it now takes you 14 hours to recover your database if you are limited to a single process for recovery.

73) Multiple Log Files



The ability to provide multiple Redo file structures.Some databases have only a single log file, which is a problem since if it ever fills, the entire database hangs. Other database use multiple redo files, which provides a level of insurance if your operator goes to sleep without mounting the proper backup tape. Also, some databases with a single log file get into trouble when long running transactions are going on and the log fills.

74) Resource Limiter/Governor



A resource limiter (Governor) that evaluates the impact of database requests and restricts them according to limits for each of the users that access the database. The resource limiter follows user resource profiles associated with the privilege role of the user. Users can be limited to CPU usage, direct I/O, logical I/O, and wall time per statement or cumulatively per session. The Governor can also log users off the database automatically after specified idle periods. Some of the governors can stop "Queries from Hell" at parse time, by using optimizer statistics. Others only stop the query once a resource limit is reached.

75) DBA Utilities:



Again a weaselly category, since I don't want to give relative rankings. In general, DBA utilities should be able to start and stop an engine, perform real-time monitoring of database use and performance, assist in backup and recovery of database logs and data, and provide for execution of SQL statements. Utilities may also help in managing users, user groups, application privileges and more. I would ask users of systems what they think of each product's DBA tools to get an idea of what is available. The DBA utilities should be able to run in client/Server mode to facilitate centralized management.

76) Database Auditing



Database Auditing covers areas like: Successful and/or unsuccessful attempts to connect/disconnect from the database, Granting or Revoking of privileges, Enabling or disabling writing to the audit trail table, setting default auditing options for any database tables, and more. Auditing should be definable on a per-user or per-system basis. Database Triggers can typically be used to provide journaling of changed data and other specific auditing functions. The databases targeted for C2 level security compliance all provide a great deal of auditing support, more than the average site will ever use.

IV. DISTRIBUTED RDBMS


77) Distributed Join



The capability of joining two tables in a select statement that are on different machines. Some systems can perform this transparently to the user and to the application program. The query is then exactly the same as if the tables resided on the same machine. Example:
	SELECT ENAME, DEPTNAME FROM EMP, DEPT
	WHERE EMP.DEPTNO=DEPT.DEPTNO;

Where the EMP table resides on the local server, and the DEPT table resides on a remote server.

78) Synchronous Table Replication



Table replication allow the distribution of updates from a single master to one or more slave nodes. Synchronous replication implies that updates are propogated to slave nodes in real time, protected by two-phase commit. Classic use is for banking transactions, when balances must be synchronized on all machines at all times. The danger of Synchronous replication is this if one node fails, the transaction doesn't go through on any of the nodes.

79) Asynchronous Table Replication



See Synchronous Replication. Asynchronous replication differs in that updates are not propegated in real time. This is useful for information that doesn't always need to be perfectly in synch. Also, does not have the problem with all nodes needing to be alive for the initial transaction to complete. Updates get propegated around the network on at programmer defined intervals.

80) Connections to other Databases



Gateways are available that facilite the incorporation of data from foreign databases. Some gateways are read/write, some are read only. Some are procedural, others are SQL based.

81) Programmatic Two Phase Commit



Two phase commit is a mechanism for managing distributed update capabilities in a distributed database.Two Phase Commit (TPC) ensures that an update either completes or rolls back on all nodes in a transaction. Programming this is very complex, and gets exponentially more difficult as the number of nodes in a transaction grows. I have never heard of an application that has been built on one of these products using programmatic two phase commit that was successful.

82) Transparent Two Phase Commit



Transparent TPC eliminates programming and lets the servers handle the complexities behind the TPC algorithms. An example of this is below:
	UPDATE EMP...; 	(EMP is on Node A)
	UPDATE DEPT...;	(DEPT is on Node B)
	COMMIT;

THAT'S IT !! 83) Commit to Foreign Database



The gateway products allow single site updates and commits to heterogeneous databases. In addition, a single heterogeneous database can participate in a Two-Phase Commit transaction with multiple native databases.

84) Movable Transaction Coordinator



In a Two-Phase Commit distributed update, one of the databases acts as the coordinator of each transaction. The coordinator does not need to go through the Prepare stage of the TPC algorithm, and thus locks resources for a minimal amount of time. All of the products that support programmatic TPC allow the programmer to specify the coordination site. This category specifically refers to the Transparent TPC facilities, where available.

The coordination task can be assigned according to network resources and machine capabilities as required for each transaction. This protects resources on your most critical database servers. The server automatically chooses the transaction coordinator, also known as the commit point, based on pre-defined server criticalities. The commit point need not prepare for a two-phase commit transaction, so if a failure occurs no resources on the transaction coordinator are held for in-doubt recovery.

85) Remote Procedure Calls



Stored procedures may be called from remote nodes without restriction. Translation of character sets and datatypes automatically occures.

86. RPC's Protected by Two Phase Commit



Stored Procedures can perform remote updates or call other stored procedures on remote nodes. Any distributed transactions are automatically protected by TPC

87) Heterogeneous Data Support



Gateways exist which allow applications to access data in foreign databases. Typically the application cannot tell that this data is not coming from its normal server. There are lots of issues with database gateways, including read/write capability, performance, static vs dynamic SQL to DB2, and on and on.

88) Procedural Database Gateway



Gateways exist which allow remote stored procedure calls to be executed against foreign databases. The application believes that it is calling a native stored procedure, though the call is actually being executed against a foreign database.

89) SQL Based Database Gateway



Ad Hoc SQL statements can be sent to foreign databases for processing. The application thinks that it is sending a SQL statement to the local server. Issues include SQL dialect translation, functionality augmentation, static vs dynamic SQL, performance, and more.

90) XA TP Monitor Compliant



The Server is compatible with the XA Transaction Processing Monitor Specifications.

91) Top End TP Monitor Support



The server works with the Top End TP Monitor Product.

92) CICS TP Monitor Support



On IBM Mainframes running MVS, the server or gateway is able to work with the CIC TP monitor.

93) Transaction Can Span Multiple Net Protocols



A single transaction can be routed across multiple network protocols. For example, a PC on a Novell LAN send a query across IPX that talks to the LAN RDBMS Server. From there, the Query goes out across TCP/IP to a unix mainframe. Important becuase it sucks to have to buy TCP/IP stacks and NIC's for every PC just to talk to the enterprise server if you want to do client/server to your Novell Lan. Appletalk has similar issues.

94) ODBC Support



The product supports (or has announced support for) Microsoft's ODBC standard.

95) SQL Access Group Support



The product supports or has announced support for the SQL Access Group's connectivity standard.

V. DATABASE SECURITY


96) O/S Security Integration



The ability to specify that database logins will be validated by the operating system. That is, each account maps on a one to one basis with operating system login identifiers. On IBM systems, database security is integrated with the host security management system (RACF, ACF2 and others)

97) User Group Privileges/Roles



The system supports simple user groups, like unix, where access rights can be grants to group names rather than individuals.

Roles are collections of privileges like user groups. Roles differ from simple user groups in terms of additional functionality. Users may be granted several rolls, and can switch back and forth between roles in a session. Users can also enable and disable specific privileges within the role they are in. Rolls are additive in nature, thus a manager role can have the privileges of clerk plus additional rights.

98) Trusted Database/ Mandatory Access Control/ B1 Security



A high security version of the database, where every row is labelled with a security classification (Top Secret, Public, etc) Lots of other security and auditing features as well. Most applicable for Department of Defense applications, or corporate applications where lots of security is needed. Available as an option for all products. May not be based on the lastest version of the server engine, and may not have all functionality of the normal engine.
  • All are seperate, extra cost products from the normal engine.
  • Sybase System 10 has been targeted for this level, but it is not a current production product. 99) Undergoing for Evaluation by the NCSC
    The Database server has been selected for the Trusted Product Evaluation Program (TPEP) of the U.S. Government's National Computer Security Center (NCSC). The products were submitted for official government evaluation, targeted at Class C2 and Class B1, respectively, of the Trusted Database Interpretation (TDI) of the Department of Defense's Trusted Computer System Evaluation Criteria (TCSEC or "Orange Book").

Participation in this program is an involved process in which the functionality and security of the product is evaluated against computer security criteria published by the government. A product must meet the functional requirements of the target evaluation class, and the implementation of security features is tested to assure correct design and functions. There are seven classes of evaluation ratings, ranging from A1 (most secure) through B3, B2, B1, C2, C1 to D (minimal security). Once a product passes evaluation at the target rating class, it is placed on the NCSC's Evaluated Products List.

As of this date, no product has been certified in the United States. Ingres has been certified at the B1 equivalent level in Europe.

From Alanb_at_vnet.ibm.com
The only RDBMSs undergoing evaluation by NCSC for C2/B1 security are Oracle and Informix. They are still in the evaluation process and have not been certified. Ingres was evaluated by a European security group at a level equivalent to B1.

  1. Standard Tools with Trusted (B1) Product
    The Trusted (B1) Engine, uses the standard development tool set. No additional tools training is required for developers to switch from normal to Trusted applications. Secure databases do, however, require a different database and application design to accommodate the demands of multi-level secure applications. Some vendors have require specialized tools to work with the high security database.
  2. Non-Dedicated Server
    The Trusted (B1) DBMS engine does not compromise the security of other trusted applications on the host machine on which it is running. It works with the trusted operating systems to control access to secure data. Some implementations of Secure Databases require that the database server is the only program running on the server computer.
  3. Product being evaluated at C2 Level
    The standard version of the engine is undergoing evaluation by the NCSC at the C2 target security level. As of this writing, no products have been certified as C2 compliant.
    • From Alanb_at_vnet.ibm.com The only RDBMSs undergoing evaluation by NCSC for C2/B1 security are Oracle and Informix. They are still in the evaluation process and have not been certified. Ingres was evaluated by a European security group at a level equivalent to B1.
  4. Product being evaluated at B1 Level
    An enhanced security version of the product is undergoing evaluation by the NCSC at the B1 security level. As of this writing, no products have been certified B1 compliant.
    • From Alanb_at_vnet.ibm.com The only RDBMSs undergoing evaluation by NCSC for C2/B1 security are Oracle and Informix. They are still in the evaluation process and have not been certified. Ingres was evaluated by a European security group at a level equivalent to B1.
  5. Product Certified (Europe) at B1 Equivalent
    The High Security Version of Ingres has been certified by a European Agency in Europe at a security level similar to U.S. B1 Level.

VI. LANGUAGES & TOOLS


  1. SQL Procedural Language
    Because native SQL sucks for algorithmic processing, most vendors have devised procedural languages. These are the languages used to write stored procedures and triggers, and often can be used as stand-alone programs as well.
  2. Text/Image Datatypes (BLOBS)
    Database fields can contain Binary and Text data up to 2 gigabytes per field per row. Useful for storing images, sound, video, telemetry data, etc.
  3. Union Operator
    Standard union combines multiple tables and eliminates duplicate rows.

The ANSI standard also includes the UNION ALL operator, which does not eliminate table duplicates and is useful for horizontal partitioning of data.

  1. Select for Update
    SELECT... FOR UPDATE is useful for obtaining locks on desired structures and can be used if REPEATABLE READ functionality is required. It can be used to test if a structure is already locked by another user, like the TSET instruction or executing a spinlock. It is required for ANSI compliance.
  2. Outer Join Operator
    Outer Join lets you perform a join where you want all the rows from one of your tables, even if no matching rows exist in the second table. Very useful for modelling real world problems. Example, You have no employees in the HR Department, but you still want it to show up on the list of departments and their employees.
  3. Dynamic SQL
    SQL statements can be generated on an AD-Hoc basis and sent to the database engine for processing. They do not need to be seperately compiled and have plans generated before they can be executed.
  4. Static SQL
    SQL statements can be pre-parsed, compiled, and have plans generated. Thus they execute faster since this doesn't have to be done at run time. Improves speed at the expense of flexibility and maintenance. This section explicitly does NOT include stored procedures and triggers, which are typically stored compiled.
  5. EXPLAIN Command
    The EXPLAIN Command lets you print out a report that show the Query Execution path chosen by the optimizer, and statistics about how the actual query was done. It show what indexes were used, what type of join was done, how many rows were processed, etc.
  6. Transaction Savepoints
    Savepoints are markers which are used in transactions to increase their atomicity. Coupled with the ROLLBACK TO SAVEPOINT command. This prevents an entire multi-part transaction from being rolled back by the failure of one of the individual statements within it.
  7. Aliases/Synonyms
    Aliases and Synonyms are used as substitute names for tables, views, etc. Used to hide the fact that a table is actually owned by a different schema: CREATE SYNONYM FOO for DAN.FOO. Also used to implement location transparancy for distributed databases: CREATE SYNONYM FOO for DAN.FOO_at_LOS_ANGELES
  8. NIST Verified conformance to Standards
    The product has been tested and passed with 100% compatibility the NIST test suite for ANSI FIPS 127-1 SQL. Important for government bids as non FIPS 127-1 products are usually noncompliant, and important if you want to be able to write portable code.
    • Sybase System 10, and not the current product, has been verified.
  9. 3rd Party CASE Tools
    3rd Party CASE tools are available for Computer Aided Software/ Systems Engineering. Some people prefer 3rd party case tools because they can generate code for several different types of DBMS's, and are considered to be less of a vendor "lock in."
  10. Internal CASE Tools
    The vendor provides CASE tools for Computer Aided Software/ Systems Engineering. Some people prefer vendor provided CASE since it tends to be tighly linked to the database and can take advantage of all of the most up to date features.
  11. Graphics Tools
    The vendor provides tools that allow graphical applications to be built that use database data. All of the vendors have 3rd party graphics tools available.
  12. Internal E-Mail System Integration
    Mail systems are provided by the vendor which are written specifically to run on top of and integrate with the database system. In general this facilitates the combining of database data and reports into the mail system. Database events can trigger mail messages, and vice-versa.
  13. 3rd Party E-Mail System Integration
    3rd Party Mail systems exist which are written specifically to run on top of and integrate with the database system. In general this facilitates the combining of database data and reports into the mail system. Database events can trigger mail messages, and vice-versa.
  14. Trigger/Stored Procedure Sends E-Mail
    A trigger or stored procedure can trigger the creation of an E- message. Ask the vendor to show how easy this is to do in the various systems. Some come with built in stored procedures to accomplish this, others you need to write your own C code. I think this is a pretty cool feature to have.
  15. Office Automation Tools
    Office Automation tools are available, which integrate RDBMS data into E-Mail, Word Processor, Calendar-Schedular, Spreadsheet, Spell Checking, Document Management, etc...
  16. Bit-Mapped Graphics
    The 4GL supports Bit-Mapped Graphics. How well ? Ask some users.
  17. National Language Support
    Which of the following languages are supported. Language support includes translated manuals, error messages, commands, etc. Also number formats and dates are different in different languages. 16 bit characters can be stored where appropriate.
  18. Precompiler Support
    Precompilers allow you to embed SQL statements (Embedded SQL) in 3GL programs. These programs are run through a pre-compiler to produce pure 3GL code with low level function calls replacing the SQL statements.

Precompilers should also provide both syntactic and semantic checking for increased programmer productivity. Precompiler support is a key component of the ANSI SQL standard as well as the Federal Government's NIST SQL compliance test.

Precompilers are important becuase they let you write standard code that can be re-precompiled and re-compiled and then will run against other database engines. If you write your code to your product's function call interface, you are screwed if you ever want to change engines.

  1. Function Call Interface
    A low level function call interface is available. Arguments for using this are that it provides better performance than using embedded SQL and precompilers, if you know what you are doing. Arguments against include that precompilers are getting better and get you close to the speed of writing directly in the call interface, and that if you write in the vendor's call interface you are locked into non-portable code.
  2. Module Language
    Modelled after ADA, the idea is that you modularize all of your embedded SQL in an external file, and only put native 3GL statements in your actual program. The Module Language helps you integrate the two.
  3. Bind Variable Support
    Bind variables allow applications to execute a SQL statement multiple times with different values for the bind variables without reparsing the SQL statement. This leads to performance improvements in applications requiring repetition of SQL statements. Bind variables are usually associated with database cursor support.

PART 2



Other Stuff You Should Consider

This section has some things suggested by people on the net that either don't lend themselves well to the checklist format, or I didn't have time to investigate as thoroughly as I'd like. These are all great questions to make the vendors sweat, and do their due dilligence.

  1. Theoretical Limits
    Databases per Computer Databases per Organization Total Single Database Size Tables Per Database

Rows Per Table
Columns Per Table
Indexes Per Table
BLOBS Per Table

Columns Per Index
Rows Per Index
Levels Per Index

Locks Per Transaction
Locks Per Database Page
Locks Per Table
Locks Per Database

Columns Per SELECT Statement
Tables Per SELECT Statement
Databases per SELECT Statement

Time to Back Up 1 Gigabyte
Time to Load 1 Gigabyte Unsorted Data
Time to Load 1 Gigabyte Sorted Data
Time to Recover 1,000,000 Transactions

2) Practical Limits/References



For the categories above, what are the limits for actual existing production databases out there today on any hardware ? Same Question, but for the target machine or hardware class for this application.

3) Datatypes and Precisions



What Datatypes are available and at What Precision Levels ? What is the Maximum Length of each ?

Suggested Types Include

Fixed Length Character
Variable Length Character
Date
Time
Smallint
Integer
Packed Decimal
Float/Double
Currency
Binary Large Object (BLOB)
Text Large Object

Thanks to the following people for their input:

Alan Goodman		alan_at_dssmktg.com
Bill Coffin 		billc_at_ingres.com
RW Salnick		salnick_at_dejavu.spk.wa.us
? ? 			?_at_pivot-sts.sbi.com
Graeme Sargent		graeme_at_pyra.co.uk
Matt Kosatur		matt_at_meaddata.com
Paul Turner		turner_at_telstar.kodak.com
Snorri Bergmann		snorri_at_strengur.is
Bill H 			billh_at_greed.sbil.co.uk
mark saindon		msaindon_at_oracle.com
Ben			ben_at_sybase.com
Alan Beal		alanb_at_vnet.ibm.com
Curt Leaist 		curtl_at_zed.com
Lou Kates		louk_at_research.teleride.on.ca 
Received on Wed Jun 16 1993 - 00:11:37 CEST

Original text of this message