Re: fastest method for reading tables

From: Keith Boulton <boulke_at_globalnet.co.uk>
Date: 1997/11/18
Message-ID: <347203ae.6006426@read.news.global.net.uk>

On 18 Nov 1997 00:17:51 GMT, hasan_at_DB.Stanford.EDU (Waqar Hasan) wrote:

>
>I have an application program that needs to read data
>by the fastest method possible.
>
>Is OCI the fastest available interface?
>What parameters need to be tuned to get the application
>to go faster?
>Is there any other interface that might go faster than OCI?
>
>The application makes queries that retrieve *all* rows of a table
>(but only a subset of the columns).
>
>The application does not care if the data is slightly out of date.
>
>thanks,
>
>-Waqar Hasan
> hasan_at_db.stanford.edu

I would be surprised if the OCI was significantly faster than a Pro* embedded SQL program. I don't have a suitable compiler at the moment to experiment.

The work going on in a query can be split into two seperately optimised stages:

Read data from the database to a server process.
Communiciate the results from the server process to a user process.

To optimise step 1 involves ensuring that the data blocks are available in the sga i.e. the

db_block_buffers initialisation parameter should be as high as possible without inducing paging

on the server and the server process should rarely be shutdown so that the cache hit ratio is

as high as possible.

If your application primarily accesses tables by full table scan, then issuing ALTER TABLE

<your table> CACHE will favour your table at the expense of others though this may not have the

desired effect.
If you are accessing the table via a full-table scan and only accessing a small proportion of

each row ( e.g. 3 of 20 columns, or only 30 bytes out of an average record length of 2000 ) it

may be worth splitting the table into two with a one-to-one relationship between them although

the same effect can be achieved by using an index fast full scan (index_ffs hint ) - see the

Oracle Server Tuning manual.
Alternatively, it may be worth defining one or more composite indexes on the columns required.

Oracle can use this index to order the rows, and if all the columns required to be retrieved

are in the index no data block accesses will be required. Even without ordering, Oracle

(version 8 at least) can do a fast full scan of an index in place of a full table scan where

all the columns in the select list are in the index so reducing the number of data blocks to be

read.

Examples:
create table bill ( a varchar2(20) not null, b varchar2(2000) not null, c varchar2(2000), d

number not null )
storage( initial 2m next 2m pctincrease 0 );

After inserting 32768 records. Note in all the following tests, the ratio real to elapsed was

similar.

select avg(a) from bill; almost entirely reflects the cost of the full table scan on the server

giving real: 7811ms
repeating the same query gives real: 3500 ms reflecting the impact of caching in the sga and in

this case, if file buffers. Analyze table has no impact on these results except that the

analyze command itself causes some data to be cached. Issuing alter table bill cache; followed by select avg(a) from bill; gives real: approx 3500ms

but subsequent executions give real: approx 550 ms, nearly seven times faster. I would not

expected the cache option to have much effect here, because the sga was already plenty big

enough to hold all of the table.

Creating another table john which holds only column a of Bill gives an initial time after

database restart for select avg(a) from john of real: 900ms, subsequent executions give real:

600ms, after alter table john cache; subsequence executions give real: 460ms.

Creating an index on bill also improves results: create index bill_i1 on bill(a) storage( initial 1m next 1m pctincrease 0 );
analyze table bill estimate statistics;
perform shutdown and restart to clear the sga select avg(a) from bill; real: 731
select avg(a) from bill; real: 541

A slight improvement on subsequent executions results from caching of index blocks. The access

path shows that the new index is being used in place of the table: SELECT STATEMENT Cost = 365
SORT AGGREGATE
INDEX FULL SCAN BILL_I1 The same access path minus the sort aggregate is used both for select a from bill and select

avg(a) from bill, confirming the earlier hypothesis that select avg(a) is the same as select a

in terms of database processing.

In general, data is required to be sorted and we do not wish to distort sga caching too much,

so use indexes rather than caching, unless you are referencing most of the columns in the

table. Bear in mind, however, the impact of large numbers of indexes on record

insertion/update/delete.

Optimising communication between the server process and the user process is done primarily by

passing more than one row of data using array fetches. This can be seen in the performance of

the export command with different buffer sizes:

Buffer	Time(s)
0	20
10k	15
100k	7
1000k	7

Notice that at some point there is not further gain.

The fastest method possible would be to use a direct path read of the database blocks. I believe there is enough information available in the documentation to allow you to decode the database blocks directly (should you wish to be so foolish). Received on Tue Nov 18 1997 - 00:00:00 CST