Re: What is this model technique called

From: --CELKO-- <71062.1056_at_compuserve.com>
Date: 10 Jun 2003 09:45:55 -0700
Message-ID: <c0d87ec0.0306100845.63f45eb1_at_posting.google.com>


>> [And columns are not fields] why not? <<

There are important differences between a file system and a database, a table and file, a row and record, and column and field. If you do not have a good conceptual model, you hit a ceiling and cannot get past a certain level of competency.

In 25 words or less it is "Logical versus Physical", but it goes beyond that. A file system is a loose collection of files, which have a lot of redundant data in them. A database schema is a single unit which models the entire enterprise as tables, constraints, etc.

Files are independent of each other, while tables in a database are interrelated. You open an entire database, not tables; you open individual files. Files cannot change each other; tables can.

The original idea of a database was to collect data in a way that avoided redundant data in too many files and not have it dependent on a particular programming language.

A file is made up of records, and records are made up of fields. A file is ordered and can be accessed by a PHYSICAL location, while a table is not. Saying "first record", "last record", and "next n records" makes sense in a file, but there is no concept of a "first row", "last row", and "next row" in a table.

A file is usually associated with a particular language -- ever try to read a Fortran file with a Cobol program? A database is language independent; the internal SQL datatypes are converted into host language datatypes.

A field exists *only* because of the program reading it; a column exists because it is in a table in a database. A column is independent of any host language application program that might use it.

In a procedural language, "READ a, b, c FROM FileX;" does not give the same results as "READ b, c, a FROM FileX;" and some languages will let you write "READ a, a, a FROM FileX;" In SQL, "SELECT a, b, c FROM TableX" returns the same data as "SELECT b, c, a FROM TableX" because things are located by name, not position.

A field is fixed or variable length, can repeated with an OCCURS in Cobol, etc. A field can change datatypes (union in 'C', VARIANT in Pascal, REDEFINES in Cobol).

A column is a scalar value, drawn from a single domain (datatype + constraints + relationships) and represented in one and only one datatype. You have no idea whatsoever how a column is represented internally; you never see it. Look at temporal datatypes; in SQL server, DATETIME is a binary number internally (UNIX style system clock representation), but TIMESTAMP is a string of digits in DB2 (Cobol style time representation). When you have a field, you have to worry about that physical representation. Do not worry about the bits; think of data in the abstract.

Rows and columns have constraints; files do not check themselves. Records and fields can have anything in them and often do!! Talk to anyone who has tried to build a data warehouse about that <g>. My favorite is an inventory system with part number "I hate my job!" appearing in several places, undetected for over ten years.

We were so used to the concept of sequential processing, that we thought of it as the nature of the universe. Even Dr. Codd started with the concept of a PRIMARY KEY in his first writing; it was the old sort key from sequential file processing (magnetic tape storage) in disguise.

It was awhile before Dr. Codd changed his mind and said that all keys are equally keys, and we don't need a special one in a relational database, like you did in a tape system.

But look at how many new SQL programmers use IDENTITY, GUID, ROWID, or other auto-numbering vendor extensions to get a key that can be used for locating a given row -- they are imitating a magnetic tape's sequential access. It lets them know the order in which a row was added to the table -- just like individual records went onto the end of the tape!

Dr. Codd defined a row as a representation of a single simple fact. A record is usually a combination of a lot of facts. That is, we don't normalize a file; you stuff data into it and hope that you have everything you need for an application. When the system needs new data, you add fields to the end of the record. That is how we got records that were measured in Kbytes.

Indexes are part of the PHYSICAL access methods and not all SQL products use them. SQL has Uniqueness constraints (PRIMARY KEY and UNIQUE) and Declarative Referential integrity (DRI -- FOREIGN KEY .. REFERENCES..). Received on Tue Jun 10 2003 - 18:45:55 CEST

Original text of this message