Re: dynamic database concept

From: --CELKO-- <71062.1056_at_compuserve.com>
Date: 12 Jul 2003 14:24:07 -0700
Message-ID: <c0d87ec0.0307121324.625f7622_at_posting.google.com>


>> The problem is that the cadaster should be dynamic, I mean, users
have the possibility to create new fields [sic] through the interface, besides those that are already default of the program. <<

No, no, no! There are important differences between a file system and a database, a table and file, a row and record, and column and field. If you do not have a good conceptual model, you hit a ceiling and cannot get past a certain level of competency.

In 25 words or less it is "Logical versus Physical", but it goes beyond that. A file system is a loose collection of files, which have a lot of redundant data in them. A database system is a single unit which models the entire enterprise as tables, constraints, etc.

Files are independent of each other, while tables in a database are interrelated. You open an entire database, not tables; you open individual files.

The original idea of a database was to collect data in a way that avoided redundant data in too many files and not have it dependent on a particular programming language.

A file is made up of records, and records are made up of fields. A file is ordered and can be accessed by a PHYSICAL location, while a table is not. Saying "first record", "last record", and "next n records" makes sense in a file, but there is no concept of a "first row", "last row", and "next row" in a table.

A file is usually associated with a particular language -- ever try to read a Fortran file with a Cobol program? A database is language independent; the internal SQL datatypes are converted into host language datatypes.

A field exists only because of the program reading it; a column exists because it is in a table in a database. A column is independent of any host language application program that might use it.

In a procedural language, "READ a,b,c FROM FileX;" does not give the same results as "READ b,c,a FROM FileX;" and some languages will let you write "READ a,a,a FROM FileX;" In SQL, "SELECT a, b, c FROM TableX" returns the same data as "SELECT b, c, a FROM TableX" because things are located by name, not position.

A field is fixed or variable length, can repeated with an OCCURS in Cobol, etc. A field can change datatypes (union in 'C', VARIANT in Pascal, REDEFINES in Cobol).

A column is a scalar value, drawn from a single domain (datatype + constraints + relationships) and represented in one and only one datatype. You have no idea whatsoever how a column is represented internally; you never see it. Look at temporal datatypes; in SQL server, DATETIME is a binary number internally (UNIX style system clock representation), but TIMESTAMP is a string of digits in DB2 (Cobol style time representation). When you have a field, you have to worry about that physical representation. Do not worry about the bits; think of data in the abstract.

Rows and columns have constraints. Records and fields can have anything in them and often do!! Talk to anyone who has tried to build a data warehouse about that <g>. My favorite is an inventory system with part number "I hate my job!" appearing in several places, undetected for over ten years.

We were so used to the concept of sequential processing, that we thought of it as the nature of the universe. Even Dr. Codd started with the concept of a PRIMARY KEY in his first writing; it was the old sort key from sequential file processing (magnetic tape storage) in disguise.

It was awhile before Dr. Codd changed his mind and said that all keys are equally keys, and we don't need a special one in a relational database, like you did in a tape system.

But look at how many new SQL programmers use IDENTITY, GUID, ROWID, or other auto-numbering vendor extensions to get a key that can be used for locating a given row -- they are imitating a magnetic tape's sequential access. It lets them know the order in which a row was added to the table -- just like individual records went onto the end of the tape!

Dr. Codd defined a row as a representation of a single simple fact. A record is usually a combination of a lot of facts. That is, we don't normalize a file; you stuff data into it and hope that you have everything you need for an application. When the system needs new data, you add fields to the end of the record. That is how we got records that were measured in Kbytes.

The fallacy you are trying to commit is storing general META-data in a database. Itr flat out does not work. The whole thing falls apart for the reasons you are now discovering and because any typo becomes a new data type. The queries are nightmares that run like glue, etc.

Find a data modeler to help and do this right. Received on Sat Jul 12 2003 - 23:24:07 CEST

Original text of this message