Large table query with cpu throttling
Date: Mon, 18 Feb 2008 14:01:34 +0800
- I have a large table, in the range of 1 billion records.
- But the machine is a relatively slow one. Therefore I would not like to clog the machine when querying this large table.
- Supposed I need to extract two fields of ALL the record - but the full table scan can afford to take its time. Therefore, how do I do a full table scan, with cpu throttling (eg, sleeping or slowing down once in a while) so as to let other jobs continue? (any SQL hints available?)
Ideally I thought it should not involved any indexes, as that will incur additional lookups, plus possibly some sorting etc, because the target is a FTS anyway - is my analysis correct?
B. Supposed there are no solution to the above, so alternatively is to do slicing. My method is find the min and max of the rowid, and then do a direct rowid access:
select * from table where rowid = 'xxxx'.
where the xxxx correspond to the rowid automatically generated. This statement does not generate any temporary sorting, or depending on any prior indexes.
To do this, I did a simple experiment:
select rowid from a_large_table;
and the character range of the rowid seemed to be (in this order): [A-Za-z0-9+/] and that's all, and a total of 17 characters. This is verified in Ora10gR2, and Ora9iR2 (both Linux). Are my analysis correct? Any other documentation have u seen with these info?
Therefore my plan is to construct the SQL with the rowid following the above format, slowly increasing, so as directly retrieve the data with minimal performance hits on the server. Is this the best way of slicing the table so as to do a full table scan while still meeting the requirements above?
C. Currently, I found that select rowid from a_large_table where rownum < 100 and rowid > 'xxxx' return almost immediately, whereas a "select count(*)" can take about 20mins or more to return just a number.
This therefore is the 3rd way to get data - 100 records at a time. Similar in design to (2) above. But because the last row's rowid is always the largest of all the 100 rows, therefore, i will use that value as the input to the next query, to get the next 100 rows, and so on. So this way seem not to incur any performance hits at all, but it has the assumption that "select" statement ALWAYS RETURN THE ROWID in an incremental manner, if just rownum and rowid are used in the where clause. Is this assumption reasonable?
Thanks.Received on Mon Feb 18 2008 - 00:01:34 CST