Re: Advice on how to query data

From: Monty <mmontreaux_at_hotmail.com>
Date: 26 Mar 2005 07:38:03 -0800
Message-ID: <1111851483.560395.144730@o13g2000cwo.googlegroups.com>

> "Sybrand Bakker" <postbus_at_sybrandb.removeSPAM.demon.nl> wrote in
> message news:j5qa41po7ruq6ivkn3i8fa44urt58dr95g_at_4ax.com...

> 2 I'm not sure why you are using a series of 'ANDs' with the
> activities table joined multiple time, where you could do with only
> one occurence of this table in the from list, and an OR statement in
> the where clause. Despite the verbosity of your post, the table

Thank you for your post Sybrand. The reason I am adding the self joins is that I want the organisation name returned (from table "contacts") to only be returned when all rows in the child table "contactActs" are satisfied to contain all the "ContactActID" values (ie 17 and 51), not just one of them.

Take the following data for example.
These use the same table definitions as my original post.

CONTACTS TABLE:

ID ORGNAME
100 Company A Ltd
101 Organisation B plc

CONTACTACTS TABLE:

ID CONTACTACTID
100 17
100 51
100 23
101 17

If I just added an "OR" statement such and mentioned the table once, eg

WHERE contactActID=17 OR contactActID=51

I would get both "Company A Ltd" and "Organisation B plc" returned.
>From this very small bit of data, I would only want 100 - "Company A
Ltd" as this is the only company with BOTH rows present. Thus the self joins, 2 in the above example but up to 6 in my original post.

Hope this makes it more clear.

Thank you

Monty

Sybrand Bakker wrote:
> On 26 Mar 2005 04:07:14 -0800, "Monty" <mmontreaux_at_hotmail.com>
wrote:
>
> >Hi, I'm posting to the newsgroup in an attempt to formulate ideas on
> >how to tackle the poor performance of an SQL query I have written. I
> >already have an SQL solution but what I am interested in is one that
is
> >far better. I will elaborate more below but here is the project
> >background first.
> >
> >The organisation I am contracting for is using a 3rd party customer
> >relationship management (CRM) system. This is how it is used.
Contact
> >can be made with a customer or prospect, for example customer 100
> >"Customer A Ltd" (see code below). There are a number of different
> >kinds on contact - first contact may be made (ContactActID=1), the
> >customer may be phoned (ContactActID=5), the customer may phone the
> >organisation (ContactActID=17), a saleman may call them
> >(ContactActID=51), a contract may be issued (ContactActID=400), a
non
> >disclosure agreement may be signed (ContactActID=105), they may
phone
> >to ask to be taken off a mailing list (ContactActID=43), mail may be
> >returned "Recipient Unknown" (ContactActID=26) etc. There are
currently
> >46 different ContactActID's but the system has provision for a total
> >possible number 600. The CRM system is built around Lotus notes but
> >replication has been configured into an Oracle database. The
> >applications that are being built on top of this information use the
> >data in Oracle database (and not Lotus Notes).
> >
> >Every time contact is made, this is recorded through Lotus Notes. In
> >the Oracle database, the basics of what is replicated is a new row
in
> >table "contactActs". Table "contactActs" has a mandatory m:1
> >participation condition with table "contacts" and implemented
through a
> >NOT NULL FK constraint. Although things are far more complex than
this,
> >for the purposes of a testbed and explanation here, this is the
story.
> >Here is some typical code to to create the two necessary tables and
> >some dummy data for testing (this has been produced on Solaris SPARC
> >running 10g [Enterprise Edition] (EE) but the production system
> >currently uses 9i EE/same hardware).
> >
> >
> >CREATE TABLE contacts(ID NUMBER PRIMARY KEY, orgName VARCHAR2(40));
> >CREATE TABLE contactActs(ID NUMBER
> > CONSTRAINT FK_Contacts REFERENCES contacts(ID),
> > ContactActID NUMBER NOT NULL);
> >
> >INSERT INTO contacts
> > VALUES(100,'Company A Ltd');
> >INSERT INTO contacts
> > VALUES(101,'Organisation B plc');
> >INSERT INTO contacts
> > VALUES(102,'Organisation C plc');
> >INSERT INTO contacts
> > VALUES(103,'Organisation D partnership');
> >
> >--create a whole lot of dummy data
> >--to test query efficiency
> >DECLARE
> > randActivity NUMBER;
> > randContact NUMBER;
> >BEGIN
> > DBMS_RANDOM.SEED(TO_CHAR(SYSDATE,'sshhmm'));
> > FOR noActivities IN 1..100000 LOOP
> > randContact:=TRUNC(DBMS_RANDOM.VALUE(100, 103.5));
> > randActivity:=ROUND(DBMS_RANDOM.VALUE(1,600));
> >
> > IF NOT (randContact=103 AND randActivity=51) THEN
> > INSERT INTO contactActs
> > VALUES(randContact,randActivity);
> > END IF;
> >
> > END LOOP;
> >END;
> >/
> >
> >CREATE INDEX ixcontactActs
> > ON contactActs(id,ContactActID);
> >
> >
> >Although I do not have the sourcecode of the CRM system that
populates
> >the Oracle database, I am (because I am one of the developers) in
> >control of the system that queries the information replicated within
> >the Oracle database and can therefore have all the sourcecode to
> >optimize these SQL queries in this part of the system. Part of the
> >enhanced system is the ability to search this contact activity
> >information on an ad-hoc basis over a corporate intranet (ie someone
is
> >performing a search on the CRM system data in Oracle using the
> >components we are writing to find out who has made customer contact
&
> >when, get the full story of contact activity prior to picking up the
> >phone and so on). There are options in this component of the system
to
> >enable the person to filter down the resultset. Without going into
the
> >detail, there may be a requirement to (which demonstrates this
> >filtering) :-
> >
> >Give me the organisation name for all customers where the customer
has
> >phoned us (ContactActID=17) and also a salesman has called
> >(ContactActID=51). In the real system we would retrieve other
details
> >such as dates, salesman notes etc & display this in a browser but
this
> >is outside the scope of this NNTP posting.
> >
> >In terms of SQL, the way this could be coded is (where 17, and 51
were
> >appropriate bind variables for some otherwise static SQL).
> >
> > SELECT orgName
> > FROM contactActs acts1,contactActs acts2,contacts con1
> > WHERE con1.ID=acts1.ID
> > AND con1.ID=acts2.ID
> > AND acts1.contactActID=17
> > AND acts2.contactActID=51
> > GROUP BY orgName;
> >
> >When run against the dummy data created above, this query gives the
> >required result (there will almost certainly be other ContactActID's
> >other than 17 and 51 and this is okay as long as the query returns
hits
> >with both). I could use the same statement for ContactActID=43 and
26
> >or any other two valid ContactActID's without Oracle having to
reparse
> >a different bit of SQL. The use of bind variables is critically
> >important because of the often slow query time we are observing and
> >heavy overhead on our database server in addition to just good
design.
> >This is also documented many times in the Oracle docs and very
clearly
> >in Kyte/Expert One to One Oracle where he writes (p436) "We have
seen
> >for example that by not using bind variables, you might spend 90
> >percent of your execution time parsing queries instead of actually
> >running them....".
> >
> >The problem I have is that the user of the system may not select 2
(in
> >the above example 17 and 51) different ContactActID to query, but 3,
or
> >4, etc. Here I would have to construct a new SQL query to accomodate
3
> >ContactActID, or 4, or 5, or 6 etc thereby having to reparse
different
> >SQL statements all the time.
> >
> >For example, if the query were performed for customer 100 where
there
> >were ContactActID's 17, 51, 24, 54, 43, and 44 (where all these
numbers
> >were bind variables). I could use the following similar SQL. Of
course
> >it would need to be reparsed on the database server.
> >
> >
> > SELECT orgName
> > FROM contacts con,contactActs acts1,contactActs acts2,
> > contactActs acts3,contactActs acts4,
> > contactActs acts5,contactActs acts6
> > WHERE con.ID=100
> > AND con.ID=acts1.ID
> > AND con.ID=acts2.ID
> > AND con.ID=acts3.ID
> > AND con.ID=acts4.ID
> > AND con.ID=acts5.ID
> > AND con.ID=acts6.ID
> > AND acts1.contactActID=17
> > AND acts2.contactActID=51
> > AND acts3.contactActID=24
> > AND acts4.contactActID=54
> > AND acts5.contactActID=43
> > AND acts6.contactActID=44
> > GROUP BY orgName;
> >
> >If a query was performed with a different 6 (not 17, 51, 24 etc)
> >contactActID's, the server wouldn't have to parse the SQL statement
> >again and I would get the performance benefit from this. Not
> >unsurprisingly however, queries like the one above are VERY SLOW and
> >not suitable for a production environment (often taking tens of
seconds
> >using test data typical of what I have created above). Further, if
the
> >user decides to query 30 fields and not 6 as shown above, things
have
> >clearly gotton out of hand and would start to effect the performance
on
> >other systems. Having explained this in this much detail, I write to
> >ask is a better approach to querying information structured like
this.
> >Although I do not have control of the structures of tables
> >"contactActs" and "contacts", I have control over most everything
else
> >so could structure that for maximum efficiency. I just need advice
on
> >alternate solutions.
> >
> >Other solutions that has been suggested in the organisation, are:
> >
> >1. Put a trigger on table "contactActs" to populate a third table
> >containing the contact activity information with 600 columns, with
600
> >indexes, one for each contactActID, and using a huge SQL query to
query
> >each column in turn populating the values via bind variables etc.
Not
> >elegant. There is no objection to restructuring the data to make the
> >query efficient. The information would have to be sourced from the
> >tables "contactActs" and "contacts" (via triggers etc) but the
> >structure of these two tables cannot change.
> >2. A suggestion was made to use the Analytic Functions. We do not
> >currently have significant expertise with Analysic Functions but
after
> >a brief investigation concluded these would not aid us here anyway.
> >
> >Further, over the coming 3 years, we wish to move away from the
Lotus
> >Notes system and do not want to currently further enhance
integration
> >with it (that is I do not want to query the Lotus Notes database
> >directly to retrieve the customer contact history).
> >
> >Ideas for other or more performant options anyone? And thank you
for
> >reading this far.
> >Thank you
> >Monty
>
> 1 Post should always contain the Oracle version number, usually
> consisting of only 5 digits and three dots, so 8 keystrokes. Most
> answer are version dependent.
> 2 I'm not sure why you are using a series of 'ANDs' with the
> activities table joined multiple time, where you could do with only
> one occurence of this table in the from list, and an OR statement in
> the where clause. Despite the verbosity of your post, the table
> definitions lack, so I can't see why this shouldn't be possible.
>
>
> --
> Sybrand Bakker, Senior Oracle DBA
Received on Sat Mar 26 2005 - 09:38:03 CST