Re: Relational/hierarchical data problem

From: Bob Badour <>
Date: Tue, 18 Feb 2003 17:39:44 -0500
Message-ID: <YZy4a.33$>


I think I would probably change my assumptions slightly and include a 'self' record in the ancestry table where Generation=0, Consang=1 and Decendant=Ancestor.

Constraint C1 would change accordingly.

If you try it out, I would like to hear the results.


"Bob Badour" <> wrote in message news:T4y4a.23$
> Will,
> The elegant and theory-supported solution to your problem uses a
> closure. SQL doesn't have transitive closure; although, some products have
> proprietary extensions like CONNECT BY to handle this.
> I am not sure how many births you have to deal with. It might be possible
> pre-compute the transitive closure in another table then use that to
> evaluate the shared ancestors.
> Assuming your animal id's are INTEGER:
> CREATE TABLE Ancestry (
> Descendant INTEGER NOT NULL,
> Generation INTEGER NOT NULL,
> )
> PRIMARY KEY (Descendant, Ancestor, Generation)
> FOREIGN KEY (Descendant) REFERENCES Animal
> CONSTRAINT C1 CHECK (Generation >= 1)
> CONSTRAINT C2 CHECK (Consang >= 1)
> Whenever you insert a new Animal or change its parentage, you would need
> maintain the entries in Ancestry as far back as known.
> My assumptions:
> A parent would be Generation=1, a grandparent would be Generation=2, etc.
> If parents are half-siblings, one of the grandparents would have
> If a sire breeds a daughter, the sire would appear in both Generation=1
> Generation=2 with Consang=1 in both places. Of course, I expect you will
> amend my suggestion and assumptions as needed for your specific
> requirements.
> It seems to me that most of the ancestry table would be stable with
> occasional additions. I assume the insert performance should be
> Of course, I don't know whether you are breeding cattle, rabbits or
> so my assumption might not be valid.
> Unfortunately, I suspect that queries to maximize distance for all animals
> will still take quite a lot of processing time.
> Regards,
> Bob
> "Will" <> wrote in message
> > I'm working with data similar to that found in a family tree. In this
> > case it applies to animals and particularly breeding. The data is held
> > within a single table in a relational database and takes the form of a
> > single record for each animal, which includes two fields to reference
> > the unique ID of both the mother and the father.
> >
> > I need to determine how many shared ancestors exist between the
> > subject and any other animal and which ancestors are shared.
> >
> > I have already created a graphical "family tree" for any given animal.
> > This recursively retrieves a single record from the table, gets the id
> > for the mother and father and repeats the cycle for the required
> > number of generations, building a collection of ancestor records as it
> > goes. Performance is acceptable for this task.
> >
> > At present I am using the "family tree" approach to trace shared
> > ancestors between animals. This is not a problem for two animals, but
> > if the process is repeated for an entire herd performance begins to
> > take large hits. The problem is that shared ancestry must be checked
> > for the entire herd in order to determine which are the best breeding
> > candidates for any given animal, so I don't have any way to avoid this
> > task.
> >
> > I have played around with a few alternative approaches including:
> >
> > 1) temporary database table to hold ancestor records for every animal
> > in the herd. There are constant maintenance issues here to ensure that
> > the data stays current.
> > 2) XML/hierarchical storage for the data. Cross breeding across
> > generations makes this a nightmare.
> >
> > So far I have found nothing that will improve on performance. Is this
> > a common problem with a "best practice" solution, or can anyone
> > suggest how I could improve efficiency here?
> >
> > Regards
> >
> > Will
Received on Tue Feb 18 2003 - 23:39:44 CET

Original text of this message