Re: Database Naming Conventions

From: --CELKO-- <71062.1056_at_compuserve.com>
Date: 26 Dec 2001 13:55:41 -0800
Message-ID: <c0d87ec0.0112261355.14e71ea8_at_posting.google.com>


>> I would like having some input on the topic of Database Naming
Conventions
(=naming database objects). Are there any "de-facto" standards ... <<

Here is a short summary of the NCITS L8 Metadata Standards Committee ryules for data elements:

The 11179 Standard is broken down into six sections:

  11179-1: Framework for the Specification and Standardization of Data Elements Definitions

  11179-2: Classification for Data Elements

  11179-3: Basic Attributes of Data Elements

  11179-4: Rules and Guidelines for the Formulation of Data

  11179-5: Naming and Identification Principles for Data

  11179-6: Registration of Data Elements

Since I cannot reprint the standard, let me remark on the highlights of some of these sections.

Naming Data Elements

Section 11179-4 has a good simple set of rules for defining a data element. A data definition shall:

  1. be unique (within any data dictionary in which it appears)
  2. be stated in the singular
  3. state what the concept is, not only what it is not
  4. be stated as a descriptive phrase or sentence(s)
  5. contain only commonly understood abbreviations
  6. be expressed without embedding definitions of other data elements or underlying concepts

The document then goes on to explain how to apply these rules with illustrations. There are three kinds of rules that form a complete naming convention:

  • Semantic rules based on the components of data elements.
  • Syntax rules for arranging the components within a name.
  • Lexical rules for the language-related aspects of names.

While the following naming convention is oriented to the development of application-level names, the rule set may be adapted to the development of names at any level.

Annex A of ISO 11179-5 gives an example of all three of these rules.

Levels of rules progress from the most general (conceptual) and become more and more specific (physical). The objects at each level are called data element components and they are assembled, in part or whole, into names. The idea is that the final names will be both as discrete and complete as possible.

While this formalism is nice in theory, names are subject to constraints imposed by software limitations in the real world. Another problem is that one data element may have many names depending on the context in which it is used. It might be called something in a report and something else in an EDI file. Provision for identification of synonymous names is made through sets of name-context pairs in the element description. Since many names may be associated with a single data element, it is important to also use a unique identifier, usually in the form of a number, to distinguish each data element from any other. ISO 11179-5 discusses assigning this identifier at the International registry level. Both the identifier and at least one name are considered necessary to comply with ISO 11179-5. Each organization should decide the form of identifier best suited to its individual requirements.

Levels of Abstraction

Name development begins at the conceptual level. An object class represents an idea, abstraction or thing in the real world, such as tree or country. A property is something that describes all objects in the class, such as height or identifier. This lets us form terms such as "tree height" or "country identifier" from the combination of the class and the property.

The level in the process is the logical level. A complete logical data element must include a form of representation for the values in its data value domain (the set of possible valid values of a data element). The representation term describes the data element's representation class. The representation class is equivalent to the class word of the prime/class naming convention many data administrators are familiar with. This gets us to "tree height measure", "country identifier name" and "country identifier code" as possible data elements.

There is a subtle difference between "identifier name" and "identifier code" and it might be so subtle that we do not want to model it. But we would need a rule to drop the property term in this case. The property would still exist as part of the inheritance structure of the data element, but it would not be part of the data element name.

Some logical data elements can be considered generic elements if they are well-defined and are shared across organizations. Country names and country codes are well-defined in ISO Standard 3166, Codes for the Representation of Names of Countries, and you might simply reference this document.

Note that this is the highest level at which true data elements, by the definition of ISO 11179, appear: they have an object class, a property, and a representation.

The next is the application level. This is usually done with a quantifier which applies to the particular application. The quantifier will either subset the data value domain or add more restrictions to the definition so that we work with only those values needed in the application.

For example, assume that we are using ISO 3166 country codes, but we are only interested in Europe. This would be a simple subset of the standard, but it will not change over time. However, the subset of countries with more than 20 cm of rain this year will vary greatly over time.

Changes in the name to reflect this will be accomplished by addition of qualifier terms to the logical name. For example, if an application of Country name were to list all the countries a certain organization had trading agreements with, the application data element would be called Trading partner country name. The data value domain would consist of a subset of countries listed in ISO 3166. Note that the qualifier term trading partner is itself an object class. This relationship could be expressed in a hierarchical relationship in the data model.

The physical name is the lowest level. These are the names which actually appear in the database table column headers, file descriptions, EDI transaction file layouts, and so forth. They may be abbreviations or use a limited character set because of software restrictions. However, they might also add information about their origin or format.

In a registry, each of the data element names and name components, will always be paired with its context so that we know the source or usage of the name or name component. The goal is to be able to trace each data element from its source to wherever it is used, regardless of the name it appears under.

Registering Standards

Section 11179-6 is an attempt to build a list of universal data elements and specify their meaning and format. This includes codes for sex, currency, country names and many other things. Received on Wed Dec 26 2001 - 22:55:41 CET

Original text of this message