Re: "thou shalt not conflate meta-data with data"

From: Dawn M. Wolthuis <dwolt_at_tincat-group.comREMOVE>
Date: Wed, 2 Mar 2005 20:05:42 -0600
Message-ID: <d05re4$nti$1_at_news.netins.net>


"Paul" <paul_at_test.com> wrote in message news:42264735$0$23601$ed2e19e4_at_ptn-nntp-reader04.plus.net...
> Neo wrote:
>> When modeling "John is a person & John is male & male is a gender" as
>> shown below, which data is meta data? Which data "step[s] outside the
>> language to talk about the language itself" ?
>>
>> T_Gender
>> ID Name
>> 1 Male
>> 2 Female
>>
>> T_Person
>> ID Name Gender
>> 3 John Male(1)
>> 4 Mary Female(2)
>
> the data are the propositions:
> "John is male"
> "Mary is female"
> "Male is a gender"
> "Female is a gender".
>
> these proposition are grouped into predicates:
> "person P has gender G"
> "G is a gender"
>
> the meta-data would be things like "gender G is of varchar(10) type" or
> "the table representing the predicate 'person P has gender G' is called
> T_Person". Because if you're inside the model, these things are
> irrelevant. It's like the people inside the Matrix wondering what variable
> names are used to refer to them or something.
>
> This is from a relational database point of view, I guess you could have a
> system where both the things above are considered just as "data". As I
> said before, the terms "data" and "meta-data" are defined only within the
> context of a system, so you can't give an absolute answer.

I agree that one cannot determine what are metadata compared to data by looking at words in isolation. This means that some decisions in data modeling are related to the software applications we are aware of to date, rather than being application-independent.

To answer the question of a process to use to identify metadata, one step in identifying metadata could be to change a set of propositions into a corresponding predicate. The new words that are selected (that become column names, for example) are metadata. These same words could surely be data in another "system" of propositions (or even in this system a particular word could be both data and metadata).

It doesn't sit well with me when someone says not to confuse data with metadata, however, because one can model a business situation with different sets of propositions and these different sets show something as "metadata" in one model and "data" in another and it is not a simple matter to make the choice between any two such alternatives.

Any time there is a boolean column, the column name could instead be data in the column. For example, ActiveFlag (or maybe just Active -- Celko can tell me if that meets industry naming standards since I always forget where to find those) might have values of Y or N. We could otherwise have a Status column with "Active" or "Inactive" as values. A lot of the choices we can make in data modeling have to do with what to make data and what to make metadata. One person's data is another person's metadata.

--dawn Received on Thu Mar 03 2005 - 03:05:42 CET

Original text of this message