Home » SQL & PL/SQL » SQL & PL/SQL » fast operation (10g)
fast operation [message #410726] Mon, 29 June 2009 23:50 Go to next message
maheshmhs
Messages: 93
Registered: October 2008
Location: Nepal
Member
Hello all
INSERT INTO zzz_emp_similarity
(
sn,
empgroupname,
companyname,
similarity,
field1
)
SELECT
ROWNUM AS sn,
a.EmpGroupName,
b.companyname,
fn_string_similarity(a.EmpGroupName,b.companyname) AS imilarity,
b.field1
FROM zzz_emp_nomatch a,
hr_company_fields b
WHERE  fn_string_similarity(a.EmpGroupName,b.companyname) > 90;

I have this code. fn_string_similarity is a function which i created that measures similarity between two string based on leivensteins distance.
zzz_emp_nomatch has around 500 records. and hr_company_fields
has 13 million records. This query has already taken 4 days and still running. Can there be better approach of writing this query (other than using parallel in hint)so that it takes less time.

Thank you
Re: fast operation [message #410731 is a reply to message #410726] Tue, 30 June 2009 00:29 Go to previous messageGo to next message
Michel Cadot
Messages: 68734
Registered: March 2007
Location: Saint-Maur, France, https...
Senior Member
Account Moderator
The problem is not the query, it is the function that is executed for each row of cartesian product that is 500*13 million times.
Even if the function takes 0.1ms for each row the total is 0.0001*(500*13000000)/86400 > 7.5 days.

Regards
Michel

[Updated on: Tue, 30 June 2009 00:32]

Report message to a moderator

Re: fast operation [message #410741 is a reply to message #410731] Tue, 30 June 2009 01:25 Go to previous messageGo to next message
maheshmhs
Messages: 93
Registered: October 2008
Location: Nepal
Member
Thank You Michel.

Can there be any other way of doing it?Maybe without taking cartesian product or something like that?



Re: fast operation [message #410742 is a reply to message #410741] Tue, 30 June 2009 01:27 Go to previous messageGo to next message
Michel Cadot
Messages: 68734
Registered: March 2007
Location: Saint-Maur, France, https...
Senior Member
Account Moderator
Given what you posted, no.

Regards
Michel
Re: fast operation [message #410762 is a reply to message #410726] Tue, 30 June 2009 02:57 Go to previous messageGo to next message
ashoka_bl
Messages: 398
Registered: November 2006
Location: Bangalore
Senior Member

What is the relationship between table a, and table b...Do you need to join them ? or what...

Share your function as well.
Re: fast operation [message #416203 is a reply to message #410726] Fri, 31 July 2009 02:47 Go to previous messageGo to next message
easymatica
Messages: 5
Registered: July 2009
Junior Member
maheshmhs wrote on Tue, 30 June 2009 07:50
This query has already taken 4 days and still running. Can there be better approach of writing this query (other than using parallel in hint)so that it takes less time


Can there be better approach of organizing the data structure so that it takes less time? Even a fast code on a huge amount of data gives a slow performance
Re: fast operation [message #416887 is a reply to message #416203] Wed, 05 August 2009 00:30 Go to previous messageGo to next message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
most obvious solution is to use some simple hueristics in your where clause to reduce the joins done.

if you can come up with some simple rules that can help you to avoid certain combinations of code then you can save lots of time.

For example maybe based on length of text strings you can skip comparisons that obviously won't match your > 90, without having to do you function call.

Also, some solutions of distance are matrix based. Maybe you can find some combination fo Oracle built in matrix functions that does the math for you and thus avoid calling your home grown plsql function to do the job. I believe some versions of oracle now offer matrix support.

Lastly you may want to think about changing your logic. For example, there is no need to actually compute the distance, only to know that your distance exceeds some value. If your distance calculation occurrs in a loop or steps as most algorithms do, then you can stop the process once it exceeds you limit. Modify your function to accept your limit and short circuit once your limit is reached.

Good luck, Kevin

[Updated on: Wed, 05 August 2009 00:39]

Report message to a moderator

Re: fast operation [message #416926 is a reply to message #410726] Wed, 05 August 2009 03:44 Go to previous messageGo to next message
elaiyavel
Messages: 114
Registered: April 2008
Senior Member
create a function based index for your string comparison utility.

function based index would provide a improved performance.

Thx.
Re: fast operation [message #416929 is a reply to message #416926] Wed, 05 August 2009 03:57 Go to previous messageGo to next message
cookiemonster
Messages: 13963
Registered: September 2008
Location: Rainy Manchester
Senior Member
elaiyavel wrote on Wed, 05 August 2009 09:44
create a function based index for your string comparison utility.

function based index would provide a improved performance.

Thx.


Since the inputs for the function come from different tables that's not possible.
Re: fast operation [message #416992 is a reply to message #416929] Wed, 05 August 2009 09:40 Go to previous message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
You really need to find out where the cost is comming from, then fix that area.

1) if the cost is due to context switching from sql to plsql then your solution may be to re-write your code without the function call using only sql.

2) if the cost is accessing the data then your solution will likely be tuning the underlying sql.

3) if the cost is the algorithm of the function then you need a better algorithm.

Construct various tests to figure out where the cost is, then proceed based on what you learn. You will be more or less successful given you talents for detective work. Consider an approach like this:

1) run the sql to get a full cost of the query

2) run the sql without the function call to get the cost of the function call

3) remove data from the table used by the function by truncating them then make you call to understand the cost of the sql.

4) put data back into the table and run the query but call the function using a constant so that is always pulls the same data. This will help you understand the cost of the algorithm.

5) examine your data and see if you can rank it by difficulty level based on your function. low/medium/high (use more groups if you want to fine tune). Then get average cost for each ranking. This will help you understand if the data is exhibiting one of two difficult cases or if all data is expensive to process.

Good luck, Kevin
Previous Topic: Risks or disadvantages of using database links
Next Topic: Replacing First Five Digits in SSN
Goto Forum:
  


Current Time: Wed Feb 12 04:15:36 CST 2025