Re: Relational/hierarchical data problem

From: Will <>
Date: 25 Feb 2003 07:03:05 -0800
Message-ID: <>

Hi Neo,

In answer to your questions:  

> 1. Should the Consag Report take the animal's sex(M/F) into
> consideration? Should g1 and gX be opposite sex? The above report did
> not take the animal's sex into consideration.

Establishing consanguinity for two individuals does not require that gender be known. I will require gneder to be recorded/calculable for other purposes in this case (e.g. bloodline diagram in "family tree" format with colour coding).

> 2. Should the Consag Report take an animal's ancestor level into
> consideration. When the distance between g1 and gX is equal to g1 and
> gY, would gY be preferred if its generation-level was closest to that
> of g1? The above report does not show generation-levels directly.

I think generations should be calculated from the data rather than stored explicitly. This is particularly true for goats, because line breeding factors mean that the generations are usually nowhere near as distinctly defined as would be the case with humans.

> 3. Is the provided report usable or missing needed info?

It should be usable, provided a link can be provided for each record to retrieve things like gender, name and so on.

> Ideally, if you can email ( your data as a text
> file that would provide the best comparison. Otherwise, how many pairs
> of goats did you start with, approx how many ancestor levels do you
> have, approx how many goats total, etc.

I think this was covered earlier in the thread. We have 41 animals, 208 animal records including current and ancestors, and 5 generations is the furthest back that our information goes.

Recently I found a web site outlining the "Knot System" (, a proposal for a scientific notation for genealogy. It dates back to 1998 and does not seem to have been widely adopted, but it does make interesting reading and I think it could be useful in this case.

The Knot System is based on the "Sequential System" (, which advocates storing genealogy data as sequential integers. For example, proband = 1, father = 2, mother = 3, father's father = 4, father's mother = 5, mother's father = 6, mother's mother = 7 and so on. From what I gather the Sequential System is used by a number of currently available genealogy programs.

Going back to my original problem, this system would certainly speed things up as Bob suggested. I would have two database tables, one containing details about each animal such as birth date and weight, gender and so on. The second table would basically build an index for ancestry of all the animals in the first table, storing the id of the proband, the id of the ancestor and the relevant integer value according to the Sequential System. In this manner it would be possible to derive the generation and gender of each ancestor without even referring to the second table.

This approach does have a high maintenance cost, but the improvement in performance for things like bloodlines and breeding comparison should be huge. Taking a 5 generation bloodline as an example, the recursive approach would require 30 database queries (proband's father, proband's mother, proband's father's father, etc). Using the Sequential System this could be cut to a single query such as "SELECT * FROM IndexTable WHERE probandID = 1 ORDER BY ancestorID". Adding a join to the query could provide access to additional data about each animal for reporting purposes.

To determine shared ancestors for 2 individuals over 5 generations, the recursive system would require 60 queries and additional code for ordering, etc. With the Sequential System, I could achieve the same result by doing something like

"SELECT g1.ancestorID FROM
((SELECT ancestorID FROM IndexTable WHERE probandID = [Proband1]) as g1
(SELECT ancestorID FROM IndexTable WHERE probandID = [Proband2]) as g2 ON g1.ancestorID = g2.ancestorID)
ORDER BY g1.ancestorID"


Will Received on Tue Feb 25 2003 - 16:03:05 CET

Original text of this message