Re: HELP with data model

From: Lawrence V. Rohrer <lrohrer_at_earthlink.net>
Date: 1995/12/28
Message-ID: <4bttap$52s_at_bolivia.it.earthlink.net>


one1000eye_at_aol.com (One1000eye) wrote:
>We are in the process of developing 'Skills system'. We have a
>many-to-many recursive Skill entity. That is, Skill can consist of set of
>skills, which ,in turn, can have many components. For example:
>PROGRAMMING is a skill and further classifications are C, C++, FORTRAN,
>SQL etc. SQL can belong to ORACLE, SYBASE etc.
>
>I am sure some of you might have figured this out and found a suitable
>relational representation. This is very simillar to Bill of Materials
>problem. Can any one share his/her experience? All suggestions are
>welcome. Thanks in advance.
>
>Alagu Kannayiram

Merry Christmas! (or other suitable holiday greeting!)

My first response to your problem is 'Why bother maintaining a recursive skills set -- It is likely the real business requirements c= an be achieved with just three levels: Occupation(s), business experience and a set of specific skills/buzzwords' So My personal pro= file might read:

Occupations:

	    Software architect
     database programmer
					technical manager
     Florist

Business skills: 
	     Entertainment rights licensing (distribution) contracts, warehouse management software, accounting software, project tracking,=
 banking MIS, technical management, floral industry (wholesale and retail), paperboy

Techinical Skills:

database design, data modeling, database performance tuning, Performance benchmarks, Sybase to Oracle conversions, Pl*Sql, Transac-S=
ql, SQL, Pro*C, Pro*Cobol, sqllib (Sybase C),  Oracle, Sybase, Sqlforms, Oracle forms, Accell programming, Dos .bat files, Model 204=
, CICS, Wang VS, Vax VMS, Unix, Macintosh, Sun, HP, Dec, IBM, Apple IIe, Radio Shack Model I (with 48K memory expansion box and a 8"=
 floppy holding 88K!) ( I occansionally think in hypertext << or sidebars >>)

The relationship of the above attributes to each other does not matter when real questions are asked of the database:

Find me the best programmer (in the world;-) who can convert my Model 204 database on a IBM mainframe into Oracle on a Solaris 2.4 S=
parc Center 1000.  This database tracks a retail/wholesale florists inventory, accounting and distribution contracts. Applicant must=
 have retail/wholesale florist experience, warehouse experience, communicate well with both programmers, warehouse management and fl=
orists, is available with two weeks notice and is willing to move to Hong Kong with a starting salary of $10,000(US) a month. (plus =
free room and board.:-)

Since this query has one unique individual (myself); put the check in the mail and don't bother writing any (more) code.

Now that the practical problem is solved what are the more esotaric (sp) problems:

First the above textual job requirement theoretically could be translated by an advance AI application into an attribute list of my = skills. For now I am going to dismiss this since (today) a human operator can best translate the textual desciption into a more usab= le form:

Position:

     Software Architect

Business environment:

            warehouse management, floral industry, contract management,

Required Skills:

     databse design, database programming, Oracle, SQL, Pl*SqL, oracle forms, Model 204, Unix, database conversion (cross bandwagon)= , Good business skills, Good communicator

Desired Skills

     Sun Solaris 2.4, IBM mainframe, ability to travel, CICS, willing to accept low $10,000/ month salary, sense of humor, managemen= t skills, ability to leap tall building in a single bound...

And with the help of some clever interface work create data that the computer will understand... but in what kind of data model you = still ask?

Well from my experience this problem is really a comparison of one entity with a SET of skills to another entity with a corralary(sp=
) SET of requirements. Data modeling wise one individual has many skills, one skill can be possesed by many people,  A job request c=
an have many  required skills (and optional skills). The programming problem is that the number of skills possesed by the applicant =
or required for the job is unlimited. Yes, practially speaking if the system handles 3, 6 or event a dozen skills 98% of the world w=
ould be satisfied.

I have come with several solutions over the years and none that is perfect (yet!).

First, some of this problem can be solved via good interface design. For instance if the applicant has "strong Oracle skill" the sys= tem could generate these records: "Oracle, SQL, Pl*SqL, oracle forms". The idea is to reduce the complexity of your "recursive skill= s" set down the lowest attributes required of the system.

Several years ago I was the principle programmer for (GABA's) Entertainment Rights Licensing System. This project had the high hopes=
 of standardizing how the entertainment industry managed distribution contracts. One of the problems was the the salesman (while on =
the floor at the Caan Film Festival :-) wanted to enter into his portable computer "Sell all TV and video distribution rights to the=
 new Terry Gillam movie for exclusive world wide distribution...". The lawyers had something a little different in mind: "...(Today)=
 "World Wide distribution" will mean these specific countries (x, y, z, ...). On January 15 "World Wide Distribution will mean these=
 countries (t, u, v, x, y, z) << They knew well in advance the outcome of various civil revolutions >> LOL

The solution was to have the user (salesman) enter codes that meant "World Wide distribution" and then the screen exploded/expanded = the codes out to the actual list of countries that were to be covered. The contract would then state the specific legal meaning(s). =

Based upon my experience filling out "skills sets" and reading a LOT of resumes trying to find good database programmers your skill = tracking application could do with this type of simplification.

simplify, simplify, simplify

(Henry David Thoreu)

Now that I have vented my initial emotional response...I will (try to) explain some of the other solutions I have tried. Keep in min= d that the issue is not the data model. That is reasonably straight forward. The problem is how to design the database to assist in = the search algorythm.

This problem is really not just a many to many problem. Retional algebra deals fairly well with those (and performance is usaullyy v= eryy good). Here we one entity with many attributes compared to another entity with many attributes. Compared AT THE ATTRIBUTE level=  Perhaps we could call this many to many to many to many. I will invent this symbology: M(m) to M(m). Implementing these in SQL, within accetable performance criteria, has been a major challege.

My experience with M(m) to M(m) relationships has been in the warehouse management business. The requirements were (are) (for instan=
ce) to find an empty location in the warehouse in the forward pick zone of the right branch of the warehouse, with the same or lower=
 velocity class as the product, that can handle the unit of measure of the product, that can hold all of the product and potentially=
 deal with security requirements, handling requirements and other issues. The smallest warehouses have 10,000 locations with the lar=
gest having close to 400,000 locations (skattered over 17 acres!).

In the first couple of attempts we were able to limit the customers expectations to 3 to 6 attributes that they really cared about a= nd that they felt were required. So the above business rule looked somenthing like this in code:

select location
  from locations
 where "location is empty"
   and "velocity class is equal or less than required"    and exists (select 1

         from location_zone lz
        where location = l.location
          and exists (select 1
                from zone
               where zone = lz.zone
                 and zone_function = 'forward pick'
              )
      )
   and exists (select 1
         from location_zone lz
        where location = l.location
          and exists (select 1
                from zone
               where zone = lz.zone
                 and zone_function = "required Branch"
              )
      )

   and volume >= required volume.

The data model is a mixture of fully normalized attributes (such as zone functionality: forward pick and Branch) and attributes whic= h are tracked directly on the locations table (volume, velocity class, empty)

The above statement can be reasonably fast. It works. This style of coding could be expanded beyond what most programmers reasonbly = want to maintain. (which is about 8 attributes for me) Depending on the size of the tables involved the performance would suffer in = geometric proportions though.

The next step is to tune the statement to take advantage of known distribution of table data. -- Find the attributes that are most s= elective. Remember in the opening job requirements list, Model 204 is much more obscure than Oracle or Florists. Thus as long as the=  above code can take advantage of such peculiarities, performance does not suffer.

In our case the customer's requirements did not have enough limiting factors. Some queries retreived thousands of rows. Thus, for pe= rformance reasons, we had to come with another approach.

(For the database modeling purest, It is time to put your peril sensitive glasses back on...)

So based upon very specific customer requirements we denormalized the data structure and placed in a seperate table the attributes we cared about:

create table location_functions (

        location,
 empty_flag,
 velocityy_class,
 forward_or_reserve_zone_type,
 branch,
 volume

        )

This allowed us to reduce the geometric searching times of some queries. Further by using SQL programming tricks such as "rownum =1"=  (and a few others) we were able to reduce search time closer to linear performance times. This is the data model and code that went=  into production.

Since I too try to be a modeling purest and it is getting harder to type with out being able to see, I have other ideas to toss abou= t.

Is SQL the best language to use for M(m) to M(m) problems?

I don't think so. The best SQL can do is to condense a large number of iterative loops into single statement (like the one above). F= urther, It still does not buy us unlimited number of attribute comparisons (Yes one could argue dynamic SQL -- done right -- could c= ut short some of the problems but not all of them)

Will object oreinted programming better solve this problem?

No. In fact having new and varied types of sets of objects only makes it worse. Though I don't consider myself and object programmer= , the extreem example is where different classes of employables each have differnt methods of determining "Yes I meet the critiria"

BTW-- Oracle version 8 would seemingly not solve the problem.

One thing I have experimented with is using an expert system. Cosmos or Cosmic? is an expert system developed by NASA and maintained=
 as a public domain "product". I tried loading up all of my locations and putaway calculation rules into this "database". I got resu=
lts equal to the results I got in Sybase/Oracleon the first call. The difference is that once I got the initial result subsequent re=
sults became very fast. Think of this system as pre-compiling the results (potential data records retrieved) of all of the business =
rules (SQL statements) in the system and subsequently additonal requests/queries are just a mater of activating the results set. -- =
sending the output to the calling program.

As usual I got called off onto another project before installing this in production or even simulation environment...

It is my best suggestion for your problem to first try and simplify the design to a set of attributes compared to another set of att= ributes. Navigating a hiarchial chain for this application does not make sense.

In the long run OLTP performance requirements may not be able to be met with standard SQL database engines IHO. There are just too m= any records to have to deal with. Perhaps we have to program call to third party expert system or find some way to duplicate the fas= ter than linear performance algorythms (precompiling/preconditioning the data for expected queries << denormalize >>)

It's 3:00AM. Time to send this out. Forgive the spelling mistakes...

Spread Peace on Earth,
Lawrence V. Rohrer
1500 Adams #105
Costa Mesa, California 92626
lrohrer_at_earthlink.net
http://www.earthlink.net/~lrohrer/index.html Received on Thu Dec 28 1995 - 00:00:00 CET

Original text of this message