Re: Advice on how to query data

From: Sybrand Bakker <postbus_at_sybrandb.demon.nl>
Date: Sat, 26 Mar 2005 14:56:03 +0100
Message-ID: <j5qa41po7ruq6ivkn3i8fa44urt58dr95g@4ax.com>

On 26 Mar 2005 04:07:14 -0800, "Monty" <mmontreaux_at_hotmail.com> wrote:

>Hi, I'm posting to the newsgroup in an attempt to formulate ideas on
>how to tackle the poor performance of an SQL query I have written. I
>already have an SQL solution but what I am interested in is one that is
>far better. I will elaborate more below but here is the project
>background first.
>
>The organisation I am contracting for is using a 3rd party customer
>relationship management (CRM) system. This is how it is used. Contact
>can be made with a customer or prospect, for example customer 100
>"Customer A Ltd" (see code below). There are a number of different
>kinds on contact - first contact may be made (ContactActID=1), the
>customer may be phoned (ContactActID=5), the customer may phone the
>organisation (ContactActID=17), a saleman may call them
>(ContactActID=51), a contract may be issued (ContactActID=400), a non
>disclosure agreement may be signed (ContactActID=105), they may phone
>to ask to be taken off a mailing list (ContactActID=43), mail may be
>returned "Recipient Unknown" (ContactActID=26) etc. There are currently
>46 different ContactActID's but the system has provision for a total
>possible number 600. The CRM system is built around Lotus notes but
>replication has been configured into an Oracle database. The
>applications that are being built on top of this information use the
>data in Oracle database (and not Lotus Notes).
>
>Every time contact is made, this is recorded through Lotus Notes. In
>the Oracle database, the basics of what is replicated is a new row in
>table "contactActs". Table "contactActs" has a mandatory m:1
>participation condition with table "contacts" and implemented through a
>NOT NULL FK constraint. Although things are far more complex than this,
>for the purposes of a testbed and explanation here, this is the story.
>Here is some typical code to to create the two necessary tables and
>some dummy data for testing (this has been produced on Solaris SPARC
>running 10g [Enterprise Edition] (EE) but the production system
>currently uses 9i EE/same hardware).
>
>
>CREATE TABLE contacts(ID NUMBER PRIMARY KEY, orgName VARCHAR2(40));
>CREATE TABLE contactActs(ID NUMBER
> CONSTRAINT FK_Contacts REFERENCES contacts(ID),
> ContactActID NUMBER NOT NULL);
>
>INSERT INTO contacts
> VALUES(100,'Company A Ltd');
>INSERT INTO contacts
> VALUES(101,'Organisation B plc');
>INSERT INTO contacts
> VALUES(102,'Organisation C plc');
>INSERT INTO contacts
> VALUES(103,'Organisation D partnership');
>
>--create a whole lot of dummy data
>--to test query efficiency
>DECLARE
> randActivity NUMBER;
> randContact NUMBER;
>BEGIN
> DBMS_RANDOM.SEED(TO_CHAR(SYSDATE,'sshhmm'));
> FOR noActivities IN 1..100000 LOOP
> randContact:=TRUNC(DBMS_RANDOM.VALUE(100, 103.5));
> randActivity:=ROUND(DBMS_RANDOM.VALUE(1,600));
>
> IF NOT (randContact=103 AND randActivity=51) THEN
> INSERT INTO contactActs
> VALUES(randContact,randActivity);
> END IF;
>
> END LOOP;
>END;
>/
>
>CREATE INDEX ixcontactActs
> ON contactActs(id,ContactActID);
>
>
>Although I do not have the sourcecode of the CRM system that populates
>the Oracle database, I am (because I am one of the developers) in
>control of the system that queries the information replicated within
>the Oracle database and can therefore have all the sourcecode to
>optimize these SQL queries in this part of the system. Part of the
>enhanced system is the ability to search this contact activity
>information on an ad-hoc basis over a corporate intranet (ie someone is
>performing a search on the CRM system data in Oracle using the
>components we are writing to find out who has made customer contact &
>when, get the full story of contact activity prior to picking up the
>phone and so on). There are options in this component of the system to
>enable the person to filter down the resultset. Without going into the
>detail, there may be a requirement to (which demonstrates this
>filtering) :-
>
>Give me the organisation name for all customers where the customer has
>phoned us (ContactActID=17) and also a salesman has called
>(ContactActID=51). In the real system we would retrieve other details
>such as dates, salesman notes etc & display this in a browser but this
>is outside the scope of this NNTP posting.
>
>In terms of SQL, the way this could be coded is (where 17, and 51 were
>appropriate bind variables for some otherwise static SQL).
>
> SELECT orgName
> FROM contactActs acts1,contactActs acts2,contacts con1
> WHERE con1.ID=acts1.ID
> AND con1.ID=acts2.ID
> AND acts1.contactActID=17
> AND acts2.contactActID=51
> GROUP BY orgName;
>
>When run against the dummy data created above, this query gives the
>required result (there will almost certainly be other ContactActID's
>other than 17 and 51 and this is okay as long as the query returns hits
>with both). I could use the same statement for ContactActID=43 and 26
>or any other two valid ContactActID's without Oracle having to reparse
>a different bit of SQL. The use of bind variables is critically
>important because of the often slow query time we are observing and
>heavy overhead on our database server in addition to just good design.
>This is also documented many times in the Oracle docs and very clearly
>in Kyte/Expert One to One Oracle where he writes (p436) "We have seen
>for example that by not using bind variables, you might spend 90
>percent of your execution time parsing queries instead of actually
>running them....".
>
>The problem I have is that the user of the system may not select 2 (in
>the above example 17 and 51) different ContactActID to query, but 3, or
>4, etc. Here I would have to construct a new SQL query to accomodate 3
>ContactActID, or 4, or 5, or 6 etc thereby having to reparse different
>SQL statements all the time.
>
>For example, if the query were performed for customer 100 where there
>were ContactActID's 17, 51, 24, 54, 43, and 44 (where all these numbers
>were bind variables). I could use the following similar SQL. Of course
>it would need to be reparsed on the database server.
>
>
> SELECT orgName
> FROM contacts con,contactActs acts1,contactActs acts2,
> contactActs acts3,contactActs acts4,
> contactActs acts5,contactActs acts6
> WHERE con.ID=100
> AND con.ID=acts1.ID
> AND con.ID=acts2.ID
> AND con.ID=acts3.ID
> AND con.ID=acts4.ID
> AND con.ID=acts5.ID
> AND con.ID=acts6.ID
> AND acts1.contactActID=17
> AND acts2.contactActID=51
> AND acts3.contactActID=24
> AND acts4.contactActID=54
> AND acts5.contactActID=43
> AND acts6.contactActID=44
> GROUP BY orgName;
>
>If a query was performed with a different 6 (not 17, 51, 24 etc)
>contactActID's, the server wouldn't have to parse the SQL statement
>again and I would get the performance benefit from this. Not
>unsurprisingly however, queries like the one above are VERY SLOW and
>not suitable for a production environment (often taking tens of seconds
>using test data typical of what I have created above). Further, if the
>user decides to query 30 fields and not 6 as shown above, things have
>clearly gotton out of hand and would start to effect the performance on
>other systems. Having explained this in this much detail, I write to
>ask is a better approach to querying information structured like this.
>Although I do not have control of the structures of tables
>"contactActs" and "contacts", I have control over most everything else
>so could structure that for maximum efficiency. I just need advice on
>alternate solutions.
>
>Other solutions that has been suggested in the organisation, are:
>
>1. Put a trigger on table "contactActs" to populate a third table
>containing the contact activity information with 600 columns, with 600
>indexes, one for each contactActID, and using a huge SQL query to query
>each column in turn populating the values via bind variables etc. Not
>elegant. There is no objection to restructuring the data to make the
>query efficient. The information would have to be sourced from the
>tables "contactActs" and "contacts" (via triggers etc) but the
>structure of these two tables cannot change.
>2. A suggestion was made to use the Analytic Functions. We do not
>currently have significant expertise with Analysic Functions but after
>a brief investigation concluded these would not aid us here anyway.
>
>Further, over the coming 3 years, we wish to move away from the Lotus
>Notes system and do not want to currently further enhance integration
>with it (that is I do not want to query the Lotus Notes database
>directly to retrieve the customer contact history).
>
>Ideas for other or more performant options anyone? And thank you for
>reading this far.
>Thank you
>Monty

1 Post should always contain the Oracle version number, usually consisting of only 5 digits and three dots, so 8 keystrokes. Most answer are version dependent.
2 I'm not sure why you are using a series of 'ANDs' with the activities table joined multiple time, where you could do with only one occurence of this table in the from list, and an OR statement in the where clause. Despite the verbosity of your post, the table definitions lack, so I can't see why this shouldn't be possible.

--
Sybrand Bakker, Senior Oracle DBA

Received on Sat Mar 26 2005 - 07:56:03 CST