Performance of my algorithms & system

From: <gungor_o_at_kocfinans.com.tr>
Date: Fri, 26 May 2000 12:14:24 +0300
Message-Id: <10509.106837@fatcity.com>

Hi all,
I want to share your opinions/results with mine about some performance tests on my system. I want to compare some subset of our customers (about 3000 daily) with whole set (currently 2000000) of them. I use 6 fields to find similar records. I can't utilize indexes due to column substring comparisons of the algorithm. You can suggest any advice about 1.comparing subset of two char columns and 2.full-table loops optimizations. After the following tests, I think of using the C version of algorithm by extracting the table data into flat files. Currently, this version gives me about 50 min.s of extract time (159 MB) and 30 min.s of computation time. I also want to hear about any suggestions (if possible, with actual or example codes) about catch algorithms. For example, just using some columns for comparison, computing the similarity of two personal records, such as using name, surname, address, father name, etc.

Here are my tests. I used Oracle tkprof, Precise/SQL, and SQLPlus (timing) tools. I can't see any I/O wait since I rerun the following code many times, and created "mytbl" with 100,000 rows from the original file to enable it fit into the memory fully. I observed that computation time is linearly proportional to the rowcount. i.e. if you increase the rowcount as much as twice or ten times, then its duration also increases twice or ten-times respectively. I want to learn about similar/exact statistics about your systems and comments about my system and tests.

Our system is: Oracle 7.3.4.5 on 8-processor, 2560 MByte mem, SUN E10000 Solaris 2.6.

Some of my tests:

This one tries to compute time required for a cached (i.e. no disk io and any other wait events except CPU wait)

begin

     for i in (select name from mytbl) loop
        null;
      end loop;

end;

This returns to me after >.5sn.
select Avg(vsize(name)) from mytbl = 6

2. Then I tried a null loop without any block read etc. Here is some exerpt from the trace file:

begin

for i in 1..300000 loop

      for j in 1..2000 loop
        null;
      end loop;

end loop;
end;

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0
Execute 1 881.21 925.48 0 0 0 1
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ----------

total 2 881.21 925.48 0 0 0 1

3. Finally, I tried a C version, which lasts 4 sec.s. I list it below:

>>>>>>>>>>>>>>>>>>>>>>>

#include <stdio.h>
#include <string.h>
#include <time.h>

void ReturnTime(char *s)
{

	time_t t;
	struct tm *tm;

	t = time(NULL);
	tm = localtime(&t);
	sprintf(s, "%02d:%02d:%02d", 
		tm->tm_hour, tm->tm_min, tm->tm_sec);

}

void PrintTime(void)
{

      char s[10];
     
      ReturnTime(s);
	fprintf(stderr, "%s\n", s);

}

main()
{
int i, j, k;
int a;
double m=0;
fprintf(stderr, "Press Enter to start..."); a=getchar();
fprintf(stderr, "Start: ");
PrintTime();

for(i=0; i<300000; i++) {
for(k=0; k<2000; k++) {

m ++;
}
}

fprintf(stderr, "Finish: ");
PrintTime();

fprintf(stderr, "m =%f i=%d j=%d k=%d", m,i, j, k);

return 0;
}

<<<<<<<<<<<<<<<<<<<<<<<

/>cc -o myc -xO5 myc.c
/>myc
Press Enter to start... Received on Fri May 26 2000 - 04:14:24 CDT