Re: Relational/hierarchical data problem

From: Bob Badour <bbadour_at_golden.net>
Date: Tue, 18 Feb 2003 16:38:52 -0500
Message-ID: <T4y4a.23$Ct3.2747515_at_mantis.golden.net>


Will,

The elegant and theory-supported solution to your problem uses a transitive closure. SQL doesn't have transitive closure; although, some products have proprietary extensions like CONNECT BY to handle this.

I am not sure how many births you have to deal with. It might be possible to pre-compute the transitive closure in another table then use that to evaluate the shared ancestors.

Assuming your animal id's are INTEGER:

CREATE TABLE Ancestry (
  Descendant INTEGER NOT NULL,
  Ancestor INTEGER NOT NULL,
  Generation INTEGER NOT NULL,
  Consang INTEGER NOT NULL
)

PRIMARY KEY (Descendant, Ancestor, Generation)
FOREIGN KEY (Descendant) REFERENCES Animal
FOREIGH KEY (Ancestory) REFERENCES Animal
CONSTRAINT C1 CHECK (Generation >= 1)
CONSTRAINT C2 CHECK (Consang >= 1)

Whenever you insert a new Animal or change its parentage, you would need to maintain the entries in Ancestry as far back as known.

My assumptions:

A parent would be Generation=1, a grandparent would be Generation=2, etc. If parents are half-siblings, one of the grandparents would have Consang=2. If a sire breeds a daughter, the sire would appear in both Generation=1 and Generation=2 with Consang=1 in both places. Of course, I expect you will amend my suggestion and assumptions as needed for your specific requirements.

It seems to me that most of the ancestry table would be stable with occasional additions. I assume the insert performance should be acceptable. Of course, I don't know whether you are breeding cattle, rabbits or salmon, so my assumption might not be valid.

Unfortunately, I suspect that queries to maximize distance for all animals will still take quite a lot of processing time.

Regards,
Bob

"Will" <mrbrown_1998_at_yahoo.com> wrote in message news:8792b468.0302180552.7816cbe3_at_posting.google.com...
> I'm working with data similar to that found in a family tree. In this
> case it applies to animals and particularly breeding. The data is held
> within a single table in a relational database and takes the form of a
> single record for each animal, which includes two fields to reference
> the unique ID of both the mother and the father.
>
> I need to determine how many shared ancestors exist between the
> subject and any other animal and which ancestors are shared.
>
> I have already created a graphical "family tree" for any given animal.
> This recursively retrieves a single record from the table, gets the id
> for the mother and father and repeats the cycle for the required
> number of generations, building a collection of ancestor records as it
> goes. Performance is acceptable for this task.
>
> At present I am using the "family tree" approach to trace shared
> ancestors between animals. This is not a problem for two animals, but
> if the process is repeated for an entire herd performance begins to
> take large hits. The problem is that shared ancestry must be checked
> for the entire herd in order to determine which are the best breeding
> candidates for any given animal, so I don't have any way to avoid this
> task.
>
> I have played around with a few alternative approaches including:
>
> 1) temporary database table to hold ancestor records for every animal
> in the herd. There are constant maintenance issues here to ensure that
> the data stays current.
> 2) XML/hierarchical storage for the data. Cross breeding across
> generations makes this a nightmare.
>
> So far I have found nothing that will improve on performance. Is this
> a common problem with a "best practice" solution, or can anyone
> suggest how I could improve efficiency here?
>
> Regards
>
> Will
Received on Tue Feb 18 2003 - 22:38:52 CET

Original text of this message