Home » SQL & PL/SQL » SQL & PL/SQL » Check for missing records in 75 million records per day (Oracle 11g R2 (11.2.0.3.0), OS: Red Hat Enterprice Linux )
Check for missing records in 75 million records per day [message #630792] Mon, 05 January 2015 07:39 Go to next message
bgraupe
Messages: 25
Registered: July 2009
Location: Windhoek Namibia
Junior Member
Good day

I need to create a procedure that checks for missing records throughout the day. An average of 75 million records are inserted into the table daily.

My aim is to execute the procedure every hour for the records inserted into a table until that hour, but the procedure is not done before the next execution. The records are generated with a sequential number. As soon as the highest number is reached, the number will be initialized to the value of 0 again. This is a process that gets repeated a few times throughout a day. My unique key on the table is the timestamp combined with the sequential number.

I also tried to create a temporary table once a day with the records of the previous day and repeat the process on this temporary table not to affect the production table, but still the job is running between 6 and 7 hours to check through all records of the day.

Any recommendations to speed up this process will help.

Attached find a shortened copy of the procedure.

If you want to forward a mail, please mail to bgraupe@mtc.com.na

Regards
Bonita

[Updated on: Mon, 05 January 2015 07:41]

Report message to a moderator

Re: Check for missing records in 75 million records per day [message #630793 is a reply to message #630792] Mon, 05 January 2015 07:44 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
Please read and follow the forum guidelines, to enable us to help you:

http://www.orafaq.com/forum/t/88153/0/ and read http://www.orafaq.com/forum/t/174502/
Re: Check for missing records in 75 million records per day [message #630794 is a reply to message #630793] Mon, 05 January 2015 07:57 Go to previous messageGo to next message
Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web
Senior Member
@OP, I don't see any attachment. I insist copy and paste the code using code tags rather than attaching it. And sharing your email id publicly is not a good idea.

Coming back to your question, how is the sequence maintained? And what's the reason for the missing numbers?
Re: Check for missing records in 75 million records per day [message #630795 is a reply to message #630794] Mon, 05 January 2015 08:09 Go to previous messageGo to next message
bgraupe
Messages: 25
Registered: July 2009
Location: Windhoek Namibia
Junior Member
Thank you for the advise. Will keep that in mind for future posts. (new on this)

The sequence numbers are generated at the source system. I don't have any control over this. I just need to load the records and ensure no missing records. There can be many reasons for missing numbers. Mostly it was an error on our switch that generated the records and my application is the only source that loads these records and as a audit measure I need to ensure no missing records. Reason being that these records are used for law enforcement support.



CREATE OR replace PROCEDURE Recordcheck_sp (g_start_date IN DATE, 
                                            g_end_date   IN DATE) 
IS 
  /************************************* Variable declaration **************************************/ 
  v_startdate_c       VARCHAR2(18) := To_char(Trunc(g_start_date), 'dd-mon-rrrr' 
                                      ) 
                                || ' 000000'; 
  v_enddate_c         VARCHAR2(18) := To_char(Trunc(g_end_date), 'dd-mon-rrrr') 
                              || ' 235959'; 
  v_start_date        TIMESTAMP := To_date(v_startdate_c, 'dd-mon-rrrr hh24miss' 
                                   ); 
  v_end_date          TIMESTAMP := To_date(v_enddate_c, 'dd-mon-rrrr hh24miss'); 
  v_min_rec           NUMBER(9) := 0; 
  v_max_rec           NUMBER(9) := 0; 
  v_err_count         NUMBER (2); 
  v_trans_date        TIMESTAMP; 
  v_record_nr         cdruser.cdrgsm_prosp_tb.record_nr%TYPE; 
  v_find_record_nr    cdruser.cdrgsm_prosp_tb.record_nr%TYPE; 
  v_file_name         cdruser.cdrgsm_prosp_tb.file_name%TYPE; 
  v_recording_entiry  cdruser.cdrgsm_prosp_tb.recording_entity%TYPE; 
  --v_start_date           TIMESTAMP := g_start_date; 
  --v_end_date             TIMESTAMP := g_end_date; 
  v_search_date       TIMESTAMP; 
  v_search_date_start TIMESTAMP; 
  v_search_date_end   TIMESTAMP; 
  a                   VARCHAR2 (20) := ''; 
  b                   VARCHAR2 (120) := ''; 
  c                   VARCHAR2 (4000) := ''; 
  d                   VARCHAR2 (250) := ''; 
  v_missing           VARCHAR2 (1) := 'F'; 
  v_find_count        NUMBER := 0; 
  v_first             VARCHAR2 (1) := 'F'; 
  v_count             NUMBER := 0; 
  x                   INT; 
BEGIN 
    /************************************************************************************************ 
          get min record nr of the day from table gsm_missing_rec_copy_tb 
    ************************************************************************************************/ 
    v_search_date_end := v_start_date + 15 / 1440; -- + 30/1440 
    SELECT /*+ PARALLEL(gsm_missing_rec_copy_tb,12) */ 
    /*+ index(gsm_missing_rec_copy_tb D_GSM_PROSP_NRECORDNR_IDX)*/ trans_date, 
                                                                   record_nr 
    INTO   v_start_date, v_min_rec 
    FROM   gsm_missing_rec_copy_tb 
    WHERE  record_nr = (SELECT Min(record_nr) 
                        FROM   ((SELECT 
                       /*+ PARALLEL(gsm_missing_rec_copy_tb,12) */ 
                  /*+ index(gsm_missing_rec_copy_tb D_GSM_PROSP_NRECORDNR_IDX)*/ 
                               trans_date, 
                               record_nr 
                                 FROM   gsm_missing_rec_copy_tb 
                                 WHERE  trans_date >= v_start_date 
                                        AND trans_date <= v_search_date_end 
                                 ORDER  BY record_nr))) 
           AND trans_date >= v_start_date 
           AND trans_date <= v_search_date_end; 

    /************************************************************************************************ 
          get max record nr from table huaweigsm_daily_recnr_tmp_tb 
    ************************************************************************************************/ 
    v_search_date_end := v_end_date - 30 / 1440; 

    SELECT /*+ PARALLEL(gsm_missing_rec_copy_tb,12) */ 
    /*+ index(gsm_missing_rec_copy_tb D_GSM_PROSP_NRECORDNR_IDX)*/ trans_date, 
                                                                   record_nr 
    INTO   v_end_date, v_max_rec 
    FROM   gsm_missing_rec_copy_tb 
    WHERE  record_nr = (SELECT Max(record_nr) 
                        FROM   ((SELECT 
                       /*+ PARALLEL(gsm_missing_rec_copy_tb,12) */ 
                  /*+ index(gsm_missing_rec_copy_tb D_GSM_PROSP_NRECORDNR_IDX)*/ 
                               trans_date, 
                               record_nr 
                                 FROM   gsm_missing_rec_copy_tb 
                                 WHERE  trans_date >= v_search_date_end 
                                        AND trans_date <= v_end_date 
                                 ORDER  BY record_nr DESC))) 
           AND trans_date >= v_search_date_end 
           AND trans_date <= v_end_date; 

    v_record_nr := v_min_rec; 

    v_trans_date := v_start_date; 

    /************************************************************************************************ 
          Perform while loop to check for each record number in the date range specified.   
          If a record number is not found, notify via mail and SMS that missing records exists.       
    ************************************************************************************************/ 
    WHILE ( v_err_count <= 30 
             OR v_trans_date <= v_end_date ) LOOP 
        INSERT INTO cdrgsm_omit 
                    (record_nr, 
                     trans_date, 
                     file_name, 
                     reason, 
                     switch_id) 
        VALUES      (v_record_nr, 
                     v_trans_date, 
                     v_file_name, 
                     ' ', 
                     ' '); 

        COMMIT; 

        v_find_record_nr := 0; 

        v_search_date_start := v_trans_date - 50 / 1440; 

        v_search_date_end := v_trans_date + 45 / 1440; 

        -- check if the next record can be found in the time range of the previous record timestamp  
        -- less 50 minutes and previous record timestamp plus 1 hour. 
        v_count := 0; 

        SELECT /*+ PARALLEL(gsm_missing_rec_copy_tb,12) */ 
        /*+ index(gsm_missing_rec_copy_tb D_GSM_PROSP_NRECORDNR_IDX)*/ Count(*) 
        INTO   v_count 
        FROM   gsm_missing_rec_copy_tb 
        WHERE  record_nr = v_record_nr 
               AND trans_date >= v_search_date_start 
               AND trans_date <= v_search_date_end; 

        IF v_count <> 0 THEN -- record found 
          SELECT /*+ PARALLEL(gsm_missing_rec_copy_tb,12) */ 
          /*+ index(gsm_missing_rec_copy_tb D_GSM_PROSP_NRECORDNR_IDX)*/ 
          trans_date, 
          record_nr, 
          file_name 
          INTO   v_trans_date, v_find_record_nr, v_file_name 
          FROM   gsm_missing_rec_copy_tb 
          WHERE  record_nr = v_record_nr 
                 AND trans_date >= v_search_date_start 
                 AND trans_date <= v_search_date_end; 
        ELSE -- record not found, create error mail     
          IF v_first = 'F' THEN 
            b := ''; 

            d := ( To_char('Missing Record Nrs from ' 
                           || v_record_nr 
                           || ' ' 
                           || v_trans_date 
                           || ' ' 
                           || 'on file ' 
                           || v_file_name) ); 

            v_first := 'T'; 

            v_err_count := v_err_count + 1; 

            v_missing := 'T'; 
          ELSE 
            b := ( To_char('Record Nr ' 
                           || v_record_nr 
                           || ' missing on file ' 
                           || v_file_name) ); 

            c := ( To_char (c 
                            || Chr (10) 
                            || b 
                            || Chr (10)) ); 

            v_err_count := v_err_count + 1; 

            v_missing := 'T'; 
          END IF; 
        END IF; -- end of v_count <> 0  
        v_record_nr := v_record_nr + 1; 

        IF v_record_nr > 9999999 THEN 
          v_record_nr := 1; 
        END IF; 
    END LOOP; 

    IF v_missing = 'T' THEN 
      a := To_char (SYSDATE, 'YYYY-MM-DD HH24:MI:SS'); 

      c := ( To_char (a 
                      || Chr (10) 
                      || c) ); 

      -- only until I am sure procedure is working 100% then I will include 
      -- automatic mail again. 
      Quickmail ('xxxxx@xxx.com.na', 'Missing CDRs', c); 
    END IF; 
END recordcheck_sp;  


Re: Check for missing records in 75 million records per day [message #630796 is a reply to message #630794] Mon, 05 January 2015 08:10 Go to previous messageGo to next message
Bill B
Messages: 1971
Registered: December 2004
Senior Member
and are you looking for missing sequence numbers or missing records. Also why are you using a timestamp and sequence for your primary key. An oracle sequence can be 26 digits and it would never run out. Recycling the sequence makes no sense. And if you are using a sequence you will be guaranteed to have holes in the key

[Updated on: Mon, 05 January 2015 08:13]

Report message to a moderator

Re: Check for missing records in 75 million records per day [message #630797 is a reply to message #630795] Mon, 05 January 2015 08:11 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
row by row is SLOW by SLOW

Never do in PL/SQL that which can be done in plain SQL.
Re: Check for missing records in 75 million records per day [message #630800 is a reply to message #630796] Mon, 05 January 2015 08:22 Go to previous messageGo to next message
bgraupe
Messages: 25
Registered: July 2009
Location: Windhoek Namibia
Junior Member
The sequence numbers are the nubmers generated on the source system and not oracle sequences.

The combination of timestamp (record date and time as generated in the source system) and sequence is the way the source system are generating the records. This is the only two columns that make the records unique.

If a sequence number is missing in the date range before the sequence number get initialized to 0 again, then I know that the record is missing and I need to inform the source system via mail and sms.
Re: Check for missing records in 75 million records per day [message #630803 is a reply to message #630800] Mon, 05 January 2015 08:26 Go to previous messageGo to next message
Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web
Senior Member
Why isn't it taken care in the source system itself?
Re: Check for missing records in 75 million records per day [message #630804 is a reply to message #630803] Mon, 05 January 2015 08:28 Go to previous messageGo to next message
bgraupe
Messages: 25
Registered: July 2009
Location: Windhoek Namibia
Junior Member
Cannot be done on the source. It is a telecommunications switch that generates the records. These records are loaded into a table on my side and I need to check for missing records.
Re: Check for missing records in 75 million records per day [message #630805 is a reply to message #630804] Mon, 05 January 2015 08:36 Go to previous messageGo to next message
Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web
Senior Member
I suggest, taking care of data before loading into DB is better than verifying it after being inserted into DB. You are rectifying the due to source system. It is going to be inefficient and resource consuming.
Re: Check for missing records in 75 million records per day [message #630806 is a reply to message #630805] Mon, 05 January 2015 08:38 Go to previous messageGo to next message
bgraupe
Messages: 25
Registered: July 2009
Location: Windhoek Namibia
Junior Member
Thank you for the input, however the responsibility still remains on my side and I am trying to find an optimized method to do the process.
Re: Check for missing records in 75 million records per day [message #630808 is a reply to message #630806] Mon, 05 January 2015 08:46 Go to previous messageGo to next message
Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web
Senior Member
With so many restrictions, there might be many approaches. Are you blindly loading the source data?
Re: Check for missing records in 75 million records per day [message #630809 is a reply to message #630808] Mon, 05 January 2015 08:47 Go to previous messageGo to next message
bgraupe
Messages: 25
Registered: July 2009
Location: Windhoek Namibia
Junior Member
The files are ftp'd to my server every 15 minutes (about 12 files for each cycle of 15 minuts) where I load them with sqlldr into the database.
Re: Check for missing records in 75 million records per day [message #630810 is a reply to message #630808] Mon, 05 January 2015 08:48 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
where is actual time being spent within post procedure?
If you don't know where it is slow, it is a real challenge to know what to change to make it faster.

ATER SESSION SET SQL_TRACE=TRUE;
Re: Check for missing records in 75 million records per day [message #630812 is a reply to message #630809] Mon, 05 January 2015 08:51 Go to previous messageGo to next message
bgraupe
Messages: 25
Registered: July 2009
Location: Windhoek Namibia
Junior Member
Before loading the files into the database, I need to decode the records with C++ from HEX to ASCII/Character code.
Re: Check for missing records in 75 million records per day [message #630815 is a reply to message #630812] Mon, 05 January 2015 08:53 Go to previous messageGo to next message
bgraupe
Messages: 25
Registered: July 2009
Location: Windhoek Namibia
Junior Member
The procedure mostly exists of a loop that checks for each record (based on record timestamp and sequence number generated at the source system). This is what is taking hours to complete.
Re: Check for missing records in 75 million records per day [message #630819 is a reply to message #630815] Mon, 05 January 2015 09:00 Go to previous messageGo to next message
Roachcoach
Messages: 1576
Registered: May 2010
Location: UK
Senior Member
Can you shoehorn it into minus operation if you have a "good" and "potentially bad" set?
Re: Check for missing records in 75 million records per day [message #630829 is a reply to message #630806] Mon, 05 January 2015 09:25 Go to previous messageGo to next message
John Watson
Messages: 8931
Registered: January 2010
Location: Global Village
Senior Member
bgraupe wrote on Mon, 05 January 2015 14:38
Thank you for the input, however the responsibility still remains on my side and I am trying to find an optimized method to do the process.
How about a different approach? Do not use SQL*Loader, but rather an external table. You can then read the external table with a PL/SQL loop, and insert the rows row-by-row into your destination table. As you do that, store the sequential number in a variable and compare it to the value of the next row. Then write out missing numbers to another table every time the sequential value increases by more than 1. So you identify the missing numbers as you insert the rows.

I am of course assuming that the source data is in sequential order.

Re: Check for missing records in 75 million records per day [message #630878 is a reply to message #630829] Tue, 06 January 2015 00:58 Go to previous messageGo to next message
bgraupe
Messages: 25
Registered: July 2009
Location: Windhoek Namibia
Junior Member
The issue is that the records are not generated in sequential order, but in random order. Also I can have a scenario that sequential number 5 is in file no 3, but sequential number 4 is in file no. 2. Therefore I need to load all files before I can check for missing numbers.
Re: Check for missing records in 75 million records per day [message #630879 is a reply to message #630878] Tue, 06 January 2015 01:02 Go to previous messageGo to next message
John Watson
Messages: 8931
Registered: January 2010
Location: Global Village
Senior Member
In that case, do as RC suggested: create a table with 75000000 rows numbered sequentially numbered, and use MINUS.
Re: Check for missing records in 75 million records per day [message #630906 is a reply to message #630879] Tue, 06 January 2015 04:21 Go to previous message
bgraupe
Messages: 25
Registered: July 2009
Location: Windhoek Namibia
Junior Member
Thank you very much. I will try this option.
Previous Topic: Find a column Name using data
Next Topic: Sql querry needed for scenario
Goto Forum:
  


Current Time: Wed Apr 24 16:15:42 CDT 2024