Re: Theoretical Design Question
Date: 2000/01/31
Message-ID: <87468r$k4b$1_at_news.netvision.net.il>
Hey,
I was actually planning to use MS-SQL Server 7, and probably upgrade to MS-SQL Server 2000 once it's released. I only have experience with MS-SQL Server, so I asked around about Oracle. I didn't get the idea that Oracle performs much better than MSSQL, and since MSSQL would integrate smoothly into our Microsoft-based development environment, and would be much easier to code and maintain than Oracle, I decided to use it. However, if I'll be convinced that Oracle 8 is much more suitable for the task at hand (or maybe even the only choice), I'll consider it instead.
The database holds more information than I have described - my intentions are to separate all information, including items' attributes such as name and description from the thousands of "0..9" properties I previously described.
I think that the best way to think of this database is as a matchmaker's database. Given a list of 1 billion people, each with 3,000 different attributes, each consists of a number between 0 and 9, and given 1 billion profiles to look for in the database every day - calculate the best match.
A single person's attributes' values will rarely change, however - for some people the eye color of their mate is of importance 8, and for others it's of importance 2 - and I want to allow the importance rate to count in the matchmaking process.
The number of properties itself is also subject to change - but this is rather rare, and may also be done manually.
Basically the people's properties will never be browsed. They will only be compared in one method or another to a "wanted profile" to calculate a compatibility rate.
Note that despite the matchmaker analogue, the comparisons are not performed between one person to another, but rather between one person and a profile, according to the importance defined for each attribute in the profile.
Are you sure that the best way to maintain a huge amount of the presented data is a SQL Server? Wouldn't I gain faster data retrieval rates by creating my own not SQL-based, not relational data server relying on a flat file of some kind?
Thanks for your help,
Oz
- Original Message ----- From: "Jerry Gitomer" <jgitomer_at_erols.com> Newsgroups: comp.databases.theory Sent: Monday, January 31, 2000 8:01 AM Subject: Re: Theoretical Design Question
> Oz Ben Eliezer wrote:
> >
> > Hey!
> >
> > My question is as follows...
> >
> > I need to store items in my database. I want the database to be able to
> > handle up to 1,000,000,000 items. I'll explain what I mean by "handle"
in a
> > moment.
> >
> > Each item has its own properties. Each property is represented by a 0-9
> > number. I want each item to have several thousands of different
properties.
> >
> > Now I'll return to the "handle" thing. I want the system to perform
> > comparisons and calculations on the items' properties, according to
> > hardcoded rules defined by me, in the mission of finding the best
suitable
> > items for a specific requirement.
> >
> > For instance, suppose I have a "book" item which has 3 properties:
weight
> > (3), difficulty of reading (6) and prize (2). I'm trying to locate best
> > suitable itemsb for weight(2), difficulty(6) and prize(3). So the system
> > basically needs to go over all the items and calculate a value of
> > "compatibility" for each one, choosing the most appropriate one (or
several)
> > at the end.
> >
> > I hope I was clear enough...
> >
> > How would you implement that? I forgot to mention that such queries are
> > common enough to occur every millisecond. Would you use an SQL Server to
> > store the data structure just described? Should I create a database
system
> > of my own to handle these data manipulation requirements? If you would
> > implement that using an SQL Server - how would you do that? If
otherwise -
> > do you have some suggestions about how to design this data structure
outside
> > of the database? Maybe references to commercial software handling with
those
> > things?
> >
> > Thanks for your help!
> > Oz
>
> Given the size of your database and the frequency of queries you should
> use an RDBMS such as (in alphabetical order) DB2, Informix, or Oracle.
> I
> mention these because each can point to installations with billion row
> tables.
>
> You haven't provided sufficient information to allow your question
> to be answered without a lot of assumptions being made.
>
> For example when the database is queried is the user looking at 1
> billion items or some identifiable subset or a specific item?
>
> Are the properties for each item in a subset fixed or are they
> subject to frequent change?
>
> If, for example, the user is querying one specific item and wants
> to know if it is suitable I would design a two table structure, one for
> items and one for properties and calculate the suitability on the fly.
>
> If the user is querying all of the items in the database and
> the properties are relatively static I would look into a three table
> structure with one table for items, one table for properties, and one
> table for precalculated suitability matches (even though the suitability
> table could have several rows for each item).
>
> hth
>
> --
> Jerry Gitomer
> Once I learned how to spell DBA, I became one.
Received on Mon Jan 31 2000 - 00:00:00 CET