Home » SQL & PL/SQL » SQL & PL/SQL » Finding code changes from source table
Finding code changes from source table [message #309767] 
Fri, 28 March 2008 13:52 
annagel
Messages: 220 Registered: April 2006

Senior Member 


We have an online documentation system we built to help keep track of customizations we make to the Oracle Application, like eTRM but just for our stuff and with some extras here and there like for example, we snapshot changes to custom packages each night and you can go back at time and look at old versions just like any version tracking system just built in the database so we can access db data along the way.
Anyway I had been bouncing around the idea of, since we are already tracking package versions anyway, expanding on that and figure out a way to display what's changed and what has remained the same, user can always just pull the two versions down and rum them through a compare tool, but it would be cool to be able to put it right up there with the rest of the autodocs. I have figured out most of what I want but I am running into one problem.
Here is my setup code which is basically a simple table just version, line, and code columns and changes from on version to the next are the types of changes I want to pick up on.
1>2 Change contents of 1 line
2>3 Insert a line so recognize that not every line has actually changed but rather that the new document is just shifted down
3>4 Delete a line, same idea as 2>3 but in reverse
4>5 My bug, because of the way I handle 2>3 and 3>4 when I have two lines which contain the same text I get incorrect results. This is what I need to fix. I know why I am getting the results incorrect, but I am trying to figure out a way to either prevent the 4>5 problem
CREATE TABLE samp_src (ver NUMBER, line NUMBER, src VARCHAR2(1000));
INSERT INTO samp_src
VALUES (1, 1, 'Line 1');
INSERT INTO samp_src
VALUES (1, 2, 'Line 2');
INSERT INTO samp_src
VALUES (1, 3, 'Line 3');
INSERT INTO samp_src
VALUES (1, 4, 'Line 4');
INSERT INTO samp_src
VALUES (1, 5, 'Line 5');
INSERT INTO samp_src
SELECT 2, line, src
FROM samp_src
WHERE ver = 1;
UPDATE samp_src
SET src = 'EDIT LINE 1'
WHERE line = 1
AND ver = 2;
INSERT INTO samp_src
SELECT 3, line + 1, src
FROM samp_src
WHERE ver = 2;
INSERT INTO samp_src
VALUES (3, 1, 'Line 0');
INSERT INTO samp_src
SELECT 4, line  1, src
FROM samp_src
WHERE ver = 3;
DELETE FROM samp_src
WHERE ver = 4
AND line = 0;
INSERT INTO samp_src
SELECT 5, line, src
FROM samp_src
WHERE ver = 4;
INSERT INTO samp_src
VALUES (5, 6, 'Line 4');
COMMIT ;
and my current query
SELECT NULLIF (SUM (NVL (st_line, 0)), 0) st_line,
NULLIF (SUM (NVL (en_line, 0)), 0) en_line, SUM (chg), src
FROM (SELECT st.line st_line, en.line en_line, 0 chg, en.src
FROM samp_src st, samp_src en
WHERE st.line = en.line
AND st.ver = :st
AND en.ver = :en
AND st.src = en.src
UNION ALL
SELECT st.line, NULL, 1 chg, st.src
FROM samp_src st, samp_src en
WHERE st.line = en.line
AND st.ver = :st
AND en.ver = :en
AND st.src <> en.src
UNION ALL
SELECT NULL, en.line, 1 chg, en.src
FROM samp_src st, samp_src en
WHERE st.line = en.line
AND st.ver = :st
AND en.ver = :en
AND st.src <> en.src
UNION ALL
SELECT st.line, NULL, 1, st.src
FROM samp_src st
WHERE st.ver = :st
AND NOT EXISTS (SELECT NULL
FROM samp_src en
WHERE en.ver = :en
AND en.line = st.line)
UNION ALL
SELECT NULL, en.line, 1, en.src
FROM samp_src en
WHERE en.ver = :en
AND NOT EXISTS (SELECT NULL
FROM samp_src st
WHERE st.ver = :st
AND st.line = en.line))
GROUP BY src
ORDER BY NVL (st_line, en_line), NVL (en_line, st_line)
So basic idea of the query, first find all the lines, this is the 5 unioned queries, first find anything that is good out of the bat (line and source match) and tie them together, then find lines with different sources is query 2 and 3, and then 4 and 5 are line numbers in one and not the other, def could make 25 better but for the time being I went for easy here.
Once all that is together I just simply sum using the source line as my grouper, I am using 1, 0, and 1 to indicate equality so that works out to 0 for matching lines and then just an nvl to take care of the joins.
The problem really is that there is too little logic in the join of rows, only 1 start and 1 end line should be joined, and you don't want to join lines if there are matched between them, both of these rules get broken by my query.
Sorry this got long but I wanted to give as much detail as I could, does it seem like I am on the right track here? Or am I going about this all wrong?



Re: Finding code changes from source table [message #309780 is a reply to message #309767] 
Fri, 28 March 2008 14:48 
annagel
Messages: 220 Registered: April 2006

Senior Member 


After a bit more research I have discovered that a common diff method is to find the longest comment sequence looking at the wikipedia entry () I am starting to think PL/SQL is going to be the only way to go to completely solve the problem effectively...




Re: Finding code changes from source table [message #313371 is a reply to message #310267] 
Fri, 11 April 2008 09:33 
annagel
Messages: 220 Registered: April 2006

Senior Member 


That would not have been a bad idea, but this was more an ontheside project anyway and I was interested in figuring out how to get it done in the end more than anything.
I did end up with a solution that works well for small changes...and not so well for larger ones. Thought I would post it here in case anyone was interested in the results.
First the setup we define a couple indexby tables to hold onto the stuff we need:
TYPE src_array IS TABLE OF VARCHAR2 (4000)
INDEX BY BINARY_INTEGER;
TYPE num_array IS TABLE OF NUMBER
INDEX BY BINARY_INTEGER;
Second a check sum (found googling) so we can eliminate the need for long string to long string comparisons. I did not actually do a time comparison using the two methods, but the setup time for the checksum is quite low so I figure it is a can't hurt type situation
FUNCTION checksum (p_buff IN VARCHAR2)
RETURN NUMBER
IS
l_sum NUMBER DEFAULT 0;
l_n NUMBER;
BEGIN
FOR i IN 1 .. TRUNC (LENGTH (p_buff  'x') / 2)
LOOP
l_n :=
ASCII (SUBSTR (p_buff  'x', 1 + (i  1) * 2, 1)) * 256
+ ASCII (SUBSTR (p_buff  'x', 2 + (i  1) * 2, 1));
l_sum := MOD (l_sum + l_n, 4294967296);
END LOOP;
WHILE (l_sum > 65536)
LOOP
l_sum := BITAND (l_sum, 65535) + TRUNC (l_sum / 65536);
END LOOP;
RETURN l_sum;
END checksum;
So now the setup for the actual processing. We start off by finding the first source line that is different between the two starting at the front and then the first difference starting at the ends of the two files. Everything between these two is what gets compared. We load the source lines from and to into a src_array indexby table, we load the checksums into a num_array indexby table.
PROCEDURE fwd_diff (
p1 IN NUMBER,
p2 IN NUMBER,
q1 IN NUMBER,
q2 IN NUMBER,
p_matrix IN OUT NOCOPY num_array,
p_chk_from IN num_array,
p_chk_to IN num_array
)
IS
i NUMBER;
j NUMBER;
diag NUMBER;
BEGIN
p_matrix (MOD (p1, 2) * q2 + q1) := 0;
FOR j IN q1 + 1 .. q2
LOOP
p_matrix (MOD (p1, 2) * q2 + j) := p_matrix (MOD (p1, 2) * q2 + j  1)
+ 1;
END LOOP;
FOR i IN p1 + 1 .. p2
LOOP
p_matrix (MOD (i, 2) * q2 + q1) := p_matrix (MOD (i  1, 2) * q2 + q1)
+ 1;
FOR j IN q1 + 1 .. q2
LOOP
diag := p_matrix (MOD (i  1, 2) * q2 + j  1);
IF p_chk_from (i  1) <> p_chk_to (j  1)
THEN
diag := diag + 1;
END IF;
p_matrix (MOD (i, 2) * q2 + j) :=
LEAST (diag,
LEAST (p_matrix (MOD (i  1, 2) * q2 + j) + 1,
p_matrix (MOD (i, 2) * q2 + j  1) + 1
)
);
END LOOP;
END LOOP;
END;
PROCEDURE rev_diff (
p1 IN NUMBER,
p2 IN NUMBER,
q1 IN NUMBER,
q2 IN NUMBER,
p_matrix IN OUT NOCOPY num_array,
p_chk_from IN num_array,
p_chk_to IN num_array
)
IS
i NUMBER;
j NUMBER;
diag NUMBER;
BEGIN
p_matrix (MOD (p2, 2) * q2 + q2) := 0;
FOR j IN REVERSE q1 .. q2  1
LOOP
p_matrix (MOD (p2, 2) * q2 + j) := p_matrix (MOD (p2, 2) * q2 + j + 1)
+ 1;
END LOOP;
FOR i IN REVERSE p1 .. p2  1
LOOP
p_matrix (MOD (i, 2) * q2 + q2) := p_matrix (MOD (i + 1, 2) * q2 + q2)
+ 1;
FOR j IN REVERSE q1 .. q2  1
LOOP
diag := p_matrix (MOD (i + 1, 2) * q2 + j + 1);
IF p_chk_from (i) <> p_chk_to (j)
THEN
diag := diag + 1;
END IF;
p_matrix (MOD (i, 2) * q2 + j) :=
LEAST (diag,
LEAST (p_matrix (MOD (i + 1, 2) * q2 + j) + 1,
p_matrix (MOD (i, 2) * q2 + j + 1) + 1
)
);
END LOOP;
END LOOP;
END;
PROCEDURE diff (
p1 IN NUMBER,
p2 IN NUMBER,
q1 IN NUMBER,
q2 IN NUMBER,
p_fwd_matrix IN OUT NOCOPY num_array,
p_rev_matrix IN OUT NOCOPY num_array,
p_src_from IN src_array,
p_src_to IN src_array,
p_chk_from IN num_array,
p_chk_to IN num_array,
fle IN UTL_FILE.file_type
)
IS
i NUMBER;
j NUMBER;
mid NUMBER;
s2mid NUMBER;
best NUMBER;
l_sum NUMBER;
l_used BOOLEAN;
BEGIN
IF p2 <= p1
THEN
FOR j IN q1 .. q2  1
LOOP
UTL_FILE.put_line (fle,
'+ '  REPLACE (html (p_src_to (j)), CHR (10)));
END LOOP;
ELSIF q2 <= q1
THEN
FOR i IN p1 .. p2  1
LOOP
UTL_FILE.put_line (fle,
'<font color="#FF0000"> '
 REPLACE (html (p_src_from (i)), CHR (10))
 '</font>'
);
END LOOP;
ELSIF p1 = p2  1
THEN
l_used := FALSE;
FOR j IN q1 .. q2  1
LOOP
IF NOT (l_used)
AND p_chk_from (p1) = p_chk_to (j)
THEN
l_used := TRUE;
UTL_FILE.put_line (fle,
' '  REPLACE (html (p_src_to (j)), CHR (10))
);
ELSE
UTL_FILE.put_line (fle,
'<font color="#0000FF">+ '
 REPLACE (html (p_src_to (j)), CHR (10))
 '</font>'
);
END IF;
END LOOP;
ELSE
there is more than 1 row left in from and to so we need to split and process
mid := FLOOR ((p1 + p2) / 2);
fwd_diff (p1, mid, q1, q2, p_fwd_matrix, p_chk_from, p_chk_to);
rev_diff (mid, p2, q1, q2, p_rev_matrix, p_chk_from, p_chk_to);
s2mid := q1;
best := NULL;
FOR i IN q1 .. q2
LOOP
l_sum :=
p_fwd_matrix (MOD (mid, 2) * q2 + i)
+ p_rev_matrix (MOD (mid, 2) * q2 + i);
IF l_sum < NVL (best, l_sum + 1)
THEN
best := l_sum;
s2mid := i;
END IF;
END LOOP;
diff (p1,
mid,
q1,
s2mid,
p_fwd_matrix,
p_rev_matrix,
p_src_from,
p_src_to,
p_chk_from,
p_chk_to,
fle
);
diff (mid,
p2,
s2mid,
q2,
p_fwd_matrix,
p_rev_matrix,
p_src_from,
p_src_to,
p_chk_from,
p_chk_to,
fle
);
END IF;
END;
The diff function gets called passing in 1 for p1 and q1, and then the length of src set 1 and 2 plus 1 for our from and to sets. The fwd and rev matrix params are just empty num_arary's, src and chk values are the source and checksum arrays, and fle is just an output file pointer.
The code is based very heavily off of this example here:
http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Hirsch/
Not exact obviously as the lack of a real array type in PL/SQL does add some complications that needed to be accounted for, but a very close approximation of that method.
This algorithm is wasteful with the processor, but it does so to save itself from the large memory requirements that crop up when you try to preserve the result matrix rather than tossing it out as you go along as this algorithm does.
Andrew




Goto Forum:
Current Time: Mon Mar 27 23:24:43 CDT 2017
Total time taken to generate the page: 0.18371 seconds
