Re: DB schema -> graph, good strategies
Date: Mon, 28 Oct 2002 20:09:01 +0100
Message-ID: <87smyqwfci.fsf_at_gvdnet.dk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
>>>>> "Jan" == Jan Hidders <hidders_at_REMOVE.THIS.uia.ua.ac.be> writes: Jan> I see. And the basic idea is rather simple, isn't it; if you are Jan> looking for a set of key words then they don't all have to be in Jan> the same column or the same tuple, but they can also be in Jan> related tuples where relatedness is measured in the number of Jan> foreign-key jumps. Looks very much like taking proximity into Jan> account, only generalized to more than two dimensions. I wonder Jan> if you could use the same techniques.
I would be shooting myself in the food if didn't, now wouldn't I? :-) Considering only single-tuple search results is next to useless in most production environments.
[Directionality of schema graphs]
Jan> Now that I know that I would say that in general the direction Jan> doesn't matter. If two tuples are related then it wouldn't make Jan> sense to say that the first is related to the second but the Jan> second not to the first. Having said that I could imagine that Jan> for certain relationships where you know what they mean Jan> (e.g. "contains", or "links to") there could be some asymmetry, Jan> but that would be a semantic issue and not something you could Jan> tell from looking at the schema and certainly not the direction Jan> of the foreign key reference.
This is exactly the core issue I'd like to explore further, although I haven't been able to find an obviously good place to start. Given the directionality fk -> pk, a graph
A <- B -> C
with a schema like A: (foo, bar, id1), B: (id1, id2), C: (baz, quux, id2) would obviously very likely be a plain many-to-many relationship between A and C. In most cases it would make more sense for the relationship between A and C in this case to be closer than if the graph had been A -> B -> C. How can this knowledge be exploited? And in which areas? Interesting stuff.
Jan> The same would probably be true for the importance of certain Jan> relationships (some might be "small jumps" and others could be Jan> "big jumps") but also that is a semantic issue.
Exactly. This way the schema graph could be made weighted. They take a primitive approach to this in the BANKS paper I mentioned the other day; that is to say their consideration of directionality is somewhat on the simple side, but am not evaluating the other mechanisms they use for weighting their graphs.
>> Imagine we have some ER model that represents our data as we prefer
>> to think of it. Anyone with basic knowledge of databases know that
>> when transfering this model to a relational database we need to do
>> data normalisation.
Jan> I'm not so sure of that. It is my experience that if you make a
Jan> good ORM / ER diagram you hardly ever need to normalize.
That depends on who makes them. If you get a DBA to make them, then they'll likely need less normalisation than if you get non-techie business folks to do it. My main point, however, remains the same: that most of the people who need to query databases will think of data in rather more abstract terms than what is actually implemented. And they don't find schema, left inner joins and arcane text-based interfaces as fascinating as we do. :-)
Jan> To the extent that it can be done that's actually quite Jan> simple. What you want is to find out whether a table corresponds Jan> to an entity type, a relationship type or (if you allow those) a Jan> mixed type. You can scratch possibilities using the rules that Jan> hold for the ER model you use, until no more possibilities can be Jan> scratched. For example it is true in almost all models that Jan> foreign keys only arrive in entity types and mixed types but Jan> never in relationship types.
This assumes that we have the original model, which is an assumption I'm not overly fond of.
Jan> This gets more difficult/ambiguous if you have rules that say Jan> something like you cannot have cycles with at least on entity Jan> type in them.
I'm afraid I don't understand what you mean.
Martin
- -- Homepage: http://www.cs.auc.dk/~factotum/ GPG public key: http://www.cs.auc.dk/~factotum/gpgkey.txt -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: Using Mailcrypt+GnuPG <http://www.gnupg.org>
iEYEARECAAYFAj29i00ACgkQYu1fMmOQldU0FQCeNzgkNDW+nuMYQS94axBiU0kq
Cj4AoOVgUC6BoOG02XUZ5OKd54anskoh
=IoSe
-----END PGP SIGNATURE-----
Received on Mon Oct 28 2002 - 20:09:01 CET