Home » SQL & PL/SQL » SQL & PL/SQL » Oracle PL/SQL String Function Overhead (Oracle 9.2 Unix)
Oracle PL/SQL String Function Overhead [message #330008] Fri, 27 June 2008 05:15 Go to next message
harrysmall3
Messages: 109
Registered: April 2008
Location: Massachusetts
Senior Member
Good Mornings,

I was recently surprised about the processing time I encountered performing string data comparissons on data stored in table arrays retrieved from a bulk collection and am looking to get pointed in the right research direction to decide if a process I own can be better optimized.

Below is the background of the process situation; the topic questions I am looking for direction in study are

(1) Do Oracle built in functions differ in noticable efficiency for string comparisson and editing functions such as
INSTR, SUBSTR, REPLACE, TRANSLATE for the number of machine instructions they execute? How much overhead is in a FOR loop?
For eg, if two 10 character strings were to be compared for the existence of any four characters of one in the other what is the tradeoff between:
perform an outer loop for each 4 substr() chars of var1 and check if = to any 4 substr() chars of var2 in an inner loop VS
one loop where every 4 chars are checked in var2 via INSTR()?

(2) Is there any difference in process efficieny between the CASE statement and an IF/ELSE construct?

**(3) Is it worth considering using Java or C+ as compiled stored procedures for string function efficiency?

-----background:

We take health insurance claims and try to match the data on the claim with the correct practitioner record on our provider database so that the claim can be repriced using contract information on the provider record.

The inherent problem with claim matching is that data can be missing, or incorrect on the claim, and the practitioner can have multiple entries/contracts on the lookup database.

When a claim cannot be automatically matched it falls thru to a secondary process which is the focus of my attention. This process tries to make "fuzzy" matches to retrieve a group of "candidate" possible matches on the provider db and either determine if the provider is out-of-network, if (1) no fuzzy match can produce candidate hits, or (2) for each candidate record the contract is not active.

If we find at least one potential provider record with an active contract we send the claim to a que for manual intervention. I have been working on improving the intelligence to reduce the amount of records requiring manual intervention but we had to back out our last enhancement due to performance degredation.

The stored procedure had a series of cursors based on data combinations to join the claim record to the provider record - different variations of name and location data. The criteria grew to incoporating string functions in the lookup to do a variety of character combination scans for the claim provider names against the provider db names.

My approach was to remove all name parsing functions and other data lookup functions out of the cursors and consolidate them into one query on only direct field matches taking full advantage of all indexing.

That was a godsend. The database IO time was lightening quick.
Without the name portion of join criteria, at worst case, if zip code was not present I would be gathering every provider record in a city and state. The database IO was still phenomenal. Even for NY,NY I could get 30000+ recs in a bulk collection in a few seconds.

Now comes the bottleneck - sweeping my table array to employ the fuzzy logic of the string compares formely employed in the sql was enormous. After much reworking to insure minimal work was employed in the iterations I'm still losing the performance improvement on the outlier scenarios as above. In the case of the 30,000 record result set from NY, it takes about 15 seconds to do an instr() compare of a substr() of every four characters in a string to the value stored in a variable.

My next idea from the DB aspect would be to make a parallel pipelined table function to pipe each row of the cursor collection to the routine doing the string work to gain a little time back.

Any ideas, or tips on direction of research- and sharing of personal experiences in this situation - much appreciated!

Thank you
Harry

[Updated on: Fri, 27 June 2008 05:23] by Moderator

Report message to a moderator

Re: Oracle PL/SQL String Function Overhead [message #330014 is a reply to message #330008] Fri, 27 June 2008 05:53 Go to previous messageGo to next message
S.Rajaram
Messages: 1027
Registered: October 2006
Location: United Kingdom
Senior Member
I am not 100% sure whether this will be of any help to you but for what it is worth it's worth giving it a try.

http://download.oracle.com/docs/cd/B19306_01/text.102/b14218/toc.htm

Also I hope you are aware that oracle is written in C. So I would really doubt reimplementing the string functions in C is going to be of any help to you.

Also it's worth checking this link for your fuzzy logic algorithm.

http://www.psoug.org/reference/utl_match.html

Hope that helps.

Regards

Raj
Re: Oracle PL/SQL String Function Overhead [message #333446 is a reply to message #330014] Fri, 11 July 2008 12:24 Go to previous messageGo to next message
harrysmall3
Messages: 109
Registered: April 2008
Location: Massachusetts
Senior Member
Thank you Raj
Re: Oracle PL/SQL String Function Overhead [message #333455 is a reply to message #330008] Fri, 11 July 2008 12:52 Go to previous messageGo to next message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
Depending upon your database version, Oracle does regular expressions. You might find these will do your searching for you much much faster that writing the comparaable plsql on your own.

It means you will have to learn regular expressions, but if you don't have the time for it, try to find a UNIX guy in your area who already knows it.

I bet this does your string matching way better that plsql routines.

Good luck, Kevin
Re: Oracle PL/SQL String Function Overhead [message #333487 is a reply to message #333455] Fri, 11 July 2008 21:50 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Another consideration is Domain Indexes

I've never done it, but I understand that you can create an index and matching function that would improve this.

The idea is that by using a Domain Index, it stores all of the possible four-character substrings in a secondary table and indexes them. Then when you pass in your string to match, it splits it into 4 character substrings and performs lookups on the secondary table.

The good bit is that the Domain Index will automatically maintain the secondary table for you.

There's a fair bit of effort in this, you can start learning with the Data Cartridge Guide - http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14289/toc.htm

Ross Leishman
Re: Oracle PL/SQL String Function Overhead [message #333631 is a reply to message #330008] Sun, 13 July 2008 16:51 Go to previous messageGo to next message
coleing
Messages: 213
Registered: February 2008
Senior Member
Pattern Matching is not Oracle's fort'e im afraid.

We used some software called Trillium Parsing engine for pattern matching - way faster than Oracle.

It was even faster exporting the data to flat files, processing in Trillium to get matching scores for all possible candidates and then loading back up. (millions of rows - talking parsing half a million applications to match against a 60,000,000 person database).

Regular Expressions would be great if Oracle was good at them but Oracle is slower at parsing regular expressions than it is with user defined function doing the same job. Infact, we converted a regular expression in raocle to a user defined function, and it ran 12 times faster. THe regular expression was to find a postcode in a string - so nothing too over the top.
Re: Oracle PL/SQL String Function Overhead [message #333849 is a reply to message #330008] Mon, 14 July 2008 10:37 Go to previous messageGo to next message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
Wow, good to know this, thanks. Kevin
Re: Oracle PL/SQL String Function Overhead [message #333860 is a reply to message #333631] Mon, 14 July 2008 11:24 Go to previous messageGo to next message
Michel Cadot
Messages: 68707
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
This is true in 10g, regexp functions have been rewriten in 11g, I think you should retest them.

Regards
Michel
Re: Oracle PL/SQL String Function Overhead [message #334023 is a reply to message #333631] Tue, 15 July 2008 03:58 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
If you're prepared to export the data and re-import, you should give serious consideration to using Perl to do the file processing. Perl really does do string processing well, and fast.
Re: Oracle PL/SQL String Function Overhead [message #340012 is a reply to message #330008] Mon, 11 August 2008 00:22 Go to previous message
harrysmall3
Messages: 109
Registered: April 2008
Location: Massachusetts
Senior Member
I think I found something promising for a solution.

Well its been almost a month since I ran into this string function overhead issue, which we finally got to acceptable performance by compensating with additional boolean variable fields to track the matching status of each record so we could skip some extraneous checks, which of course has twisted code into a tangled maze of horror Smile

I decided that when I get to the full system rewrite their had to be a better approach all around. Regardless of performance, the extent of procedural logic required to implement fuzzy pattern matching would make maintenance, modification and "learnability" of the process too much an expense to be an acceptable solution.

I looked for another "way" towards parsing besides a procedural algorithm. Here is what I am putting together for proof of concept to develop a protoype for the new process, which once tested I will submit here.

#1 - Circumvent using pl/sql to parse data and make the strings query-able with sql. I hadnt explored the object oriented aspects of Oracle, and had limited experience in this area because i never had a necessity for it. I was quite enthused to find Oracle's OO implementation was through functions and procedures and object reference was virtually transperent to the structure of usage of regular datatypes.

I had coded with oracle objects as required in table function usage but never thought about them in any other light than as containers to pipe back bulk collections. And I never thought about table functions serving any other purpose than processing sql - accepting cursors and returning records.

To make single record and field/variable data available for SQL processing, I reversed that model. Data can be piped back from any object source.

The data fields will be passed to a TYPE object who's constructor functions, with minimal substring execution, split it into parts and stored into an array, and then passed to a table function.

Our "Instring SubString" moving pattern scan which iterated through the substring of string1 for existence in any instr position of string2 could now become a a straight SELECT join.

#2 - Parsing by creating new datatypes to replace procedural character algorithms. The splitting of datainto components could happen in levels where they could be cast into new types that inherit from it. I had ideas like "ALPHABETIC" - only letters, "ALPHANUM" only characters and numbers, "VARIANT" as in VB - chars,numbers,symbols... then the combinations of like "SNUM", "SCHAR", for per se "Number Symbol" with further delineation of number-symbol-mixed, number-symbol-leading, trailing,etc.

This could be priceless for what i have to do but the mystery will once again be about performance.

I think their would be an advantage to the fact that memory would be released as each object can be destroyed when the next record is to be processed?

I'll post my performance analysis when completed.

Harry
Previous Topic: schemas
Next Topic: Flag this message Is 'select' a part of Oracle DML language
Goto Forum:
  


Current Time: Thu Nov 07 16:12:26 CST 2024