Home » SQL & PL/SQL » SQL & PL/SQL » Finding code changes from source table
Finding code changes from source table [message #309767] Fri, 28 March 2008 13:52 Go to next message
annagel
Messages: 220
Registered: April 2006
Senior Member
We have an online documentation system we built to help keep track of customizations we make to the Oracle Application, like eTRM but just for our stuff and with some extras here and there like for example, we snapshot changes to custom packages each night and you can go back at time and look at old versions just like any version tracking system just built in the database so we can access db data along the way.

Anyway I had been bouncing around the idea of, since we are already tracking package versions anyway, expanding on that and figure out a way to display what's changed and what has remained the same, user can always just pull the two versions down and rum them through a compare tool, but it would be cool to be able to put it right up there with the rest of the auto-docs. I have figured out most of what I want but I am running into one problem.

Here is my set-up code which is basically a simple table just version, line, and code columns and changes from on version to the next are the types of changes I want to pick up on.

1-->2 Change contents of 1 line

2-->3 Insert a line so recognize that not every line has actually changed but rather that the new document is just shifted down

3-->4 Delete a line, same idea as 2-->3 but in reverse

4-->5 My bug, because of the way I handle 2-->3 and 3-->4 when I have two lines which contain the same text I get incorrect results. This is what I need to fix. I know why I am getting the results incorrect, but I am trying to figure out a way to either prevent the 4-->5 problem

CREATE TABLE samp_src (ver NUMBER, line NUMBER, src VARCHAR2(1000));

INSERT INTO samp_src
     VALUES (1, 1, 'Line 1');

INSERT INTO samp_src
     VALUES (1, 2, 'Line 2');

INSERT INTO samp_src
     VALUES (1, 3, 'Line 3');

INSERT INTO samp_src
     VALUES (1, 4, 'Line 4');

INSERT INTO samp_src
     VALUES (1, 5, 'Line 5');


INSERT INTO samp_src
  SELECT 2, line, src
    FROM samp_src
   WHERE ver = 1;


UPDATE samp_src
   SET src = 'EDIT LINE 1'
 WHERE line = 1
   AND ver = 2;


INSERT INTO samp_src
  SELECT 3, line + 1, src
    FROM samp_src
   WHERE ver = 2;


INSERT INTO samp_src
     VALUES (3, 1, 'Line 0');


INSERT INTO samp_src
  SELECT 4, line - 1, src
    FROM samp_src
   WHERE ver = 3;


DELETE FROM samp_src
      WHERE ver = 4
        AND line = 0;


INSERT INTO samp_src
  SELECT 5, line, src
    FROM samp_src
   WHERE ver = 4;

INSERT INTO samp_src
     VALUES (5, 6, 'Line 4');

COMMIT ;


and my current query

SELECT   NULLIF (SUM (NVL (st_line, 0)), 0) st_line,
         NULLIF (SUM (NVL (en_line, 0)), 0) en_line, SUM (chg), src
    FROM (SELECT st.line st_line, en.line en_line, 0 chg, en.src
            FROM samp_src st, samp_src en
           WHERE st.line = en.line
             AND st.ver = :st
             AND en.ver = :en
             AND st.src = en.src
          UNION ALL
          SELECT st.line, NULL, -1 chg, st.src
            FROM samp_src st, samp_src en
           WHERE st.line = en.line
             AND st.ver = :st
             AND en.ver = :en
             AND st.src <> en.src
          UNION ALL
          SELECT NULL, en.line, 1 chg, en.src
            FROM samp_src st, samp_src en
           WHERE st.line = en.line
             AND st.ver = :st
             AND en.ver = :en
             AND st.src <> en.src
          UNION ALL
          SELECT st.line, NULL, -1, st.src
            FROM samp_src st
           WHERE st.ver = :st
             AND NOT EXISTS (SELECT NULL
                               FROM samp_src en
                              WHERE en.ver = :en
                                AND en.line = st.line)
          UNION ALL
          SELECT NULL, en.line, 1, en.src
            FROM samp_src en
           WHERE en.ver = :en
             AND NOT EXISTS (SELECT NULL
                               FROM samp_src st
                              WHERE st.ver = :st
                                AND st.line = en.line))
GROUP BY src
ORDER BY NVL (st_line, en_line), NVL (en_line, st_line)


So basic idea of the query, first find all the lines, this is the 5 unioned queries, first find anything that is good out of the bat (line and source match) and tie them together, then find lines with different sources is query 2 and 3, and then 4 and 5 are line numbers in one and not the other, def could make 2-5 better but for the time being I went for easy here.

Once all that is together I just simply sum using the source line as my grouper, I am using -1, 0, and 1 to indicate equality so that works out to 0 for matching lines and then just an nvl to take care of the joins.

The problem really is that there is too little logic in the join of rows, only 1 start and 1 end line should be joined, and you don't want to join lines if there are matched between them, both of these rules get broken by my query.

Sorry this got long but I wanted to give as much detail as I could, does it seem like I am on the right track here? Or am I going about this all wrong?
Re: Finding code changes from source table [message #309780 is a reply to message #309767] Fri, 28 March 2008 14:48 Go to previous messageGo to next message
annagel
Messages: 220
Registered: April 2006
Senior Member
After a bit more research I have discovered that a common diff method is to find the longest comment sequence looking at the wikipedia entry () I am starting to think PL/SQL is going to be the only way to go to completely solve the problem effectively...
Re: Finding code changes from source table [message #310267 is a reply to message #309767] Mon, 31 March 2008 15:26 Go to previous messageGo to next message
Kevin Meade
Messages: 2101
Registered: December 1999
Location: Connecticut USA
Senior Member
Are you using one of the common source control utilities? something like: PVCS, or MICROSOFT SOURCE SAFE? If so, you should look into if the specific tool you are using has a programming API. If it does then you can in theory anyway, write some java/plsq/view code that will construct a view in your database to show you what every you want.

It would be an effort in integration more than coding. This is what I would do anyway. It is jus a thought.

Good luck, Kevin
Re: Finding code changes from source table [message #313371 is a reply to message #310267] Fri, 11 April 2008 09:33 Go to previous messageGo to next message
annagel
Messages: 220
Registered: April 2006
Senior Member
That would not have been a bad idea, but this was more an on-the-side project anyway and I was interested in figuring out how to get it done in the end more than anything.

I did end up with a solution that works well for small changes...and not so well for larger ones. Thought I would post it here in case anyone was interested in the results.

First the set-up we define a couple indexby tables to hold onto the stuff we need:

  TYPE src_array IS TABLE OF VARCHAR2 (4000)
    INDEX BY BINARY_INTEGER;

  TYPE num_array IS TABLE OF NUMBER
    INDEX BY BINARY_INTEGER;


Second a check sum (found googling) so we can eliminate the need for long string to long string comparisons. I did not actually do a time comparison using the two methods, but the set-up time for the check-sum is quite low so I figure it is a can't hurt type situation

  FUNCTION checksum (p_buff IN VARCHAR2)
    RETURN NUMBER
  IS
    l_sum   NUMBER DEFAULT 0;
    l_n     NUMBER;
  BEGIN
    FOR i IN 1 .. TRUNC (LENGTH (p_buff || 'x') / 2)
    LOOP
      l_n :=
          ASCII (SUBSTR (p_buff || 'x', 1 + (i - 1) * 2, 1)) * 256
        + ASCII (SUBSTR (p_buff || 'x', 2 + (i - 1) * 2, 1));
      l_sum := MOD (l_sum + l_n, 4294967296);
    END LOOP;

    WHILE (l_sum > 65536)
    LOOP
      l_sum := BITAND (l_sum, 65535) + TRUNC (l_sum / 65536);
    END LOOP;

    RETURN l_sum;
  END checksum;


So now the set-up for the actual processing. We start off by finding the first source line that is different between the two starting at the front and then the first difference starting at the ends of the two files. Everything between these two is what gets compared. We load the source lines from and to into a src_array indexby table, we load the check-sums into a num_array indexby table.

PROCEDURE fwd_diff (
    p1           IN              NUMBER,
    p2           IN              NUMBER,
    q1           IN              NUMBER,
    q2           IN              NUMBER,
    p_matrix     IN OUT NOCOPY   num_array,
    p_chk_from   IN              num_array,
    p_chk_to     IN              num_array
  )
  IS
    i      NUMBER;
    j      NUMBER;
    diag   NUMBER;
  BEGIN
    p_matrix (MOD (p1, 2) * q2 + q1) := 0;

    FOR j IN q1 + 1 .. q2
    LOOP
      p_matrix (MOD (p1, 2) * q2 + j) := p_matrix (MOD (p1, 2) * q2 + j - 1)
                                         + 1;
    END LOOP;

    FOR i IN p1 + 1 .. p2
    LOOP
      p_matrix (MOD (i, 2) * q2 + q1) := p_matrix (MOD (i - 1, 2) * q2 + q1)
                                         + 1;

      FOR j IN q1 + 1 .. q2
      LOOP
        diag := p_matrix (MOD (i - 1, 2) * q2 + j - 1);

        IF p_chk_from (i - 1) <> p_chk_to (j - 1)
        THEN
          diag := diag + 1;
        END IF;

        p_matrix (MOD (i, 2) * q2 + j) :=
          LEAST (diag,
                 LEAST (p_matrix (MOD (i - 1, 2) * q2 + j) + 1,
                        p_matrix (MOD (i, 2) * q2 + j - 1) + 1
                       )
                );
      END LOOP;
    END LOOP;
  END;

  PROCEDURE rev_diff (
    p1           IN              NUMBER,
    p2           IN              NUMBER,
    q1           IN              NUMBER,
    q2           IN              NUMBER,
    p_matrix     IN OUT NOCOPY   num_array,
    p_chk_from   IN              num_array,
    p_chk_to     IN              num_array
  )
  IS
    i      NUMBER;
    j      NUMBER;
    diag   NUMBER;
  BEGIN
    p_matrix (MOD (p2, 2) * q2 + q2) := 0;

    FOR j IN REVERSE q1 .. q2 - 1
    LOOP
      p_matrix (MOD (p2, 2) * q2 + j) := p_matrix (MOD (p2, 2) * q2 + j + 1)
                                         + 1;
    END LOOP;

    FOR i IN REVERSE p1 .. p2 - 1
    LOOP
      p_matrix (MOD (i, 2) * q2 + q2) := p_matrix (MOD (i + 1, 2) * q2 + q2)
                                         + 1;

      FOR j IN REVERSE q1 .. q2 - 1
      LOOP
        diag := p_matrix (MOD (i + 1, 2) * q2 + j + 1);

        IF p_chk_from (i) <> p_chk_to (j)
        THEN
          diag := diag + 1;
        END IF;

        p_matrix (MOD (i, 2) * q2 + j) :=
          LEAST (diag,
                 LEAST (p_matrix (MOD (i + 1, 2) * q2 + j) + 1,
                        p_matrix (MOD (i, 2) * q2 + j + 1) + 1
                       )
                );
      END LOOP;
    END LOOP;
  END;

  PROCEDURE diff (
    p1             IN              NUMBER,
    p2             IN              NUMBER,
    q1             IN              NUMBER,
    q2             IN              NUMBER,
    p_fwd_matrix   IN OUT NOCOPY   num_array,
    p_rev_matrix   IN OUT NOCOPY   num_array,
    p_src_from     IN              src_array,
    p_src_to       IN              src_array,
    p_chk_from     IN              num_array,
    p_chk_to       IN              num_array,
    fle            IN              UTL_FILE.file_type
  )
  IS
    i        NUMBER;
    j        NUMBER;
    mid      NUMBER;
    s2mid    NUMBER;
    best     NUMBER;
    l_sum    NUMBER;
    l_used   BOOLEAN;
  BEGIN
    IF p2 <= p1
    THEN
      FOR j IN q1 .. q2 - 1
      LOOP
        UTL_FILE.put_line (fle,
                           '+ ' || REPLACE (html (p_src_to (j)), CHR (10)));
      END LOOP;
    ELSIF q2 <= q1
    THEN
      FOR i IN p1 .. p2 - 1
      LOOP
        UTL_FILE.put_line (fle,
                              '<font color="#FF0000">- '
                           || REPLACE (html (p_src_from (i)), CHR (10))
                           || '</font>'
                          );
      END LOOP;
    ELSIF p1 = p2 - 1
    THEN
      l_used := FALSE;

      FOR j IN q1 .. q2 - 1
      LOOP
        IF     NOT (l_used)
           AND p_chk_from (p1) = p_chk_to (j)
        THEN
          l_used := TRUE;
          UTL_FILE.put_line (fle,
                             '  ' || REPLACE (html (p_src_to (j)), CHR (10))
                            );
        ELSE
          UTL_FILE.put_line (fle,
                                '<font color="#0000FF">+  '
                             || REPLACE (html (p_src_to (j)), CHR (10))
                             || '</font>'
                            );
        END IF;
      END LOOP;
    ELSE
      --there is more than 1 row left in from and to so we need to split and process
      mid := FLOOR ((p1 + p2) / 2);
      fwd_diff (p1, mid, q1, q2, p_fwd_matrix, p_chk_from, p_chk_to);
      rev_diff (mid, p2, q1, q2, p_rev_matrix, p_chk_from, p_chk_to);
      s2mid := q1;
      best := NULL;

      FOR i IN q1 .. q2
      LOOP
        l_sum :=
            p_fwd_matrix (MOD (mid, 2) * q2 + i)
          + p_rev_matrix (MOD (mid, 2) * q2 + i);

        IF l_sum < NVL (best, l_sum + 1)
        THEN
          best := l_sum;
          s2mid := i;
        END IF;
      END LOOP;

      diff (p1,
            mid,
            q1,
            s2mid,
            p_fwd_matrix,
            p_rev_matrix,
            p_src_from,
            p_src_to,
            p_chk_from,
            p_chk_to,
            fle
           );
      diff (mid,
            p2,
            s2mid,
            q2,
            p_fwd_matrix,
            p_rev_matrix,
            p_src_from,
            p_src_to,
            p_chk_from,
            p_chk_to,
            fle
           );
    END IF;
  END;


The diff function gets called passing in 1 for p1 and q1, and then the length of src set 1 and 2 plus 1 for our from and to sets. The fwd and rev matrix params are just empty num_arary's, src and chk values are the source and check-sum arrays, and fle is just an out-put file pointer.

The code is based very heavily off of this example here:

http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Hirsch/

Not exact obviously as the lack of a real array type in PL/SQL does add some complications that needed to be accounted for, but a very close approximation of that method.

This algorithm is wasteful with the processor, but it does so to save itself from the large memory requirements that crop up when you try to preserve the result matrix rather than tossing it out as you go along as this algorithm does.

Andrew
Re: Finding code changes from source table [message #314789 is a reply to message #313371] Thu, 17 April 2008 15:23 Go to previous message
andrew again
Messages: 2577
Registered: March 2000
Senior Member
Personally, I'd much rather use a tried & tested diff available in *nix or windows. Doing a code beautification on some code is likely to raise tons of "false alarms" with a regular diff even though formatting is all that was done.

Having said that, I regularly use diff, sdiff on HP-UX.

word-diff may be useful?
http://www.softpanorama.org/Utilities/diff_tools.shtml

Also see this useful link...
http://technology.amis.nl/blog/?p=846


Previous Topic: Tracking which procedure updated the record
Next Topic: Help Needed with SQL
Goto Forum:
  


Current Time: Thu Dec 08 23:49:44 CST 2016

Total time taken to generate the page: 0.10913 seconds