Check for missing records in 75 million records per day [message #630792] |
Mon, 05 January 2015 07:39 |
bgraupe
Messages: 25 Registered: July 2009 Location: Windhoek Namibia
|
Junior Member |
|
|
Good day
I need to create a procedure that checks for missing records throughout the day. An average of 75 million records are inserted into the table daily.
My aim is to execute the procedure every hour for the records inserted into a table until that hour, but the procedure is not done before the next execution. The records are generated with a sequential number. As soon as the highest number is reached, the number will be initialized to the value of 0 again. This is a process that gets repeated a few times throughout a day. My unique key on the table is the timestamp combined with the sequential number.
I also tried to create a temporary table once a day with the records of the previous day and repeat the process on this temporary table not to affect the production table, but still the job is running between 6 and 7 hours to check through all records of the day.
Any recommendations to speed up this process will help.
Attached find a shortened copy of the procedure.
If you want to forward a mail, please mail to bgraupe@mtc.com.na
Regards
Bonita
[Updated on: Mon, 05 January 2015 07:41] Report message to a moderator
|
|
|
|
|
Re: Check for missing records in 75 million records per day [message #630795 is a reply to message #630794] |
Mon, 05 January 2015 08:09 |
bgraupe
Messages: 25 Registered: July 2009 Location: Windhoek Namibia
|
Junior Member |
|
|
Thank you for the advise. Will keep that in mind for future posts. (new on this)
The sequence numbers are generated at the source system. I don't have any control over this. I just need to load the records and ensure no missing records. There can be many reasons for missing numbers. Mostly it was an error on our switch that generated the records and my application is the only source that loads these records and as a audit measure I need to ensure no missing records. Reason being that these records are used for law enforcement support.
CREATE OR replace PROCEDURE Recordcheck_sp (g_start_date IN DATE,
g_end_date IN DATE)
IS
/************************************* Variable declaration **************************************/
v_startdate_c VARCHAR2(18) := To_char(Trunc(g_start_date), 'dd-mon-rrrr'
)
|| ' 000000';
v_enddate_c VARCHAR2(18) := To_char(Trunc(g_end_date), 'dd-mon-rrrr')
|| ' 235959';
v_start_date TIMESTAMP := To_date(v_startdate_c, 'dd-mon-rrrr hh24miss'
);
v_end_date TIMESTAMP := To_date(v_enddate_c, 'dd-mon-rrrr hh24miss');
v_min_rec NUMBER(9) := 0;
v_max_rec NUMBER(9) := 0;
v_err_count NUMBER (2);
v_trans_date TIMESTAMP;
v_record_nr cdruser.cdrgsm_prosp_tb.record_nr%TYPE;
v_find_record_nr cdruser.cdrgsm_prosp_tb.record_nr%TYPE;
v_file_name cdruser.cdrgsm_prosp_tb.file_name%TYPE;
v_recording_entiry cdruser.cdrgsm_prosp_tb.recording_entity%TYPE;
--v_start_date TIMESTAMP := g_start_date;
--v_end_date TIMESTAMP := g_end_date;
v_search_date TIMESTAMP;
v_search_date_start TIMESTAMP;
v_search_date_end TIMESTAMP;
a VARCHAR2 (20) := '';
b VARCHAR2 (120) := '';
c VARCHAR2 (4000) := '';
d VARCHAR2 (250) := '';
v_missing VARCHAR2 (1) := 'F';
v_find_count NUMBER := 0;
v_first VARCHAR2 (1) := 'F';
v_count NUMBER := 0;
x INT;
BEGIN
/************************************************************************************************
get min record nr of the day from table gsm_missing_rec_copy_tb
************************************************************************************************/
v_search_date_end := v_start_date + 15 / 1440; -- + 30/1440
SELECT /*+ PARALLEL(gsm_missing_rec_copy_tb,12) */
/*+ index(gsm_missing_rec_copy_tb D_GSM_PROSP_NRECORDNR_IDX)*/ trans_date,
record_nr
INTO v_start_date, v_min_rec
FROM gsm_missing_rec_copy_tb
WHERE record_nr = (SELECT Min(record_nr)
FROM ((SELECT
/*+ PARALLEL(gsm_missing_rec_copy_tb,12) */
/*+ index(gsm_missing_rec_copy_tb D_GSM_PROSP_NRECORDNR_IDX)*/
trans_date,
record_nr
FROM gsm_missing_rec_copy_tb
WHERE trans_date >= v_start_date
AND trans_date <= v_search_date_end
ORDER BY record_nr)))
AND trans_date >= v_start_date
AND trans_date <= v_search_date_end;
/************************************************************************************************
get max record nr from table huaweigsm_daily_recnr_tmp_tb
************************************************************************************************/
v_search_date_end := v_end_date - 30 / 1440;
SELECT /*+ PARALLEL(gsm_missing_rec_copy_tb,12) */
/*+ index(gsm_missing_rec_copy_tb D_GSM_PROSP_NRECORDNR_IDX)*/ trans_date,
record_nr
INTO v_end_date, v_max_rec
FROM gsm_missing_rec_copy_tb
WHERE record_nr = (SELECT Max(record_nr)
FROM ((SELECT
/*+ PARALLEL(gsm_missing_rec_copy_tb,12) */
/*+ index(gsm_missing_rec_copy_tb D_GSM_PROSP_NRECORDNR_IDX)*/
trans_date,
record_nr
FROM gsm_missing_rec_copy_tb
WHERE trans_date >= v_search_date_end
AND trans_date <= v_end_date
ORDER BY record_nr DESC)))
AND trans_date >= v_search_date_end
AND trans_date <= v_end_date;
v_record_nr := v_min_rec;
v_trans_date := v_start_date;
/************************************************************************************************
Perform while loop to check for each record number in the date range specified.
If a record number is not found, notify via mail and SMS that missing records exists.
************************************************************************************************/
WHILE ( v_err_count <= 30
OR v_trans_date <= v_end_date ) LOOP
INSERT INTO cdrgsm_omit
(record_nr,
trans_date,
file_name,
reason,
switch_id)
VALUES (v_record_nr,
v_trans_date,
v_file_name,
' ',
' ');
COMMIT;
v_find_record_nr := 0;
v_search_date_start := v_trans_date - 50 / 1440;
v_search_date_end := v_trans_date + 45 / 1440;
-- check if the next record can be found in the time range of the previous record timestamp
-- less 50 minutes and previous record timestamp plus 1 hour.
v_count := 0;
SELECT /*+ PARALLEL(gsm_missing_rec_copy_tb,12) */
/*+ index(gsm_missing_rec_copy_tb D_GSM_PROSP_NRECORDNR_IDX)*/ Count(*)
INTO v_count
FROM gsm_missing_rec_copy_tb
WHERE record_nr = v_record_nr
AND trans_date >= v_search_date_start
AND trans_date <= v_search_date_end;
IF v_count <> 0 THEN -- record found
SELECT /*+ PARALLEL(gsm_missing_rec_copy_tb,12) */
/*+ index(gsm_missing_rec_copy_tb D_GSM_PROSP_NRECORDNR_IDX)*/
trans_date,
record_nr,
file_name
INTO v_trans_date, v_find_record_nr, v_file_name
FROM gsm_missing_rec_copy_tb
WHERE record_nr = v_record_nr
AND trans_date >= v_search_date_start
AND trans_date <= v_search_date_end;
ELSE -- record not found, create error mail
IF v_first = 'F' THEN
b := '';
d := ( To_char('Missing Record Nrs from '
|| v_record_nr
|| ' '
|| v_trans_date
|| ' '
|| 'on file '
|| v_file_name) );
v_first := 'T';
v_err_count := v_err_count + 1;
v_missing := 'T';
ELSE
b := ( To_char('Record Nr '
|| v_record_nr
|| ' missing on file '
|| v_file_name) );
c := ( To_char (c
|| Chr (10)
|| b
|| Chr (10)) );
v_err_count := v_err_count + 1;
v_missing := 'T';
END IF;
END IF; -- end of v_count <> 0
v_record_nr := v_record_nr + 1;
IF v_record_nr > 9999999 THEN
v_record_nr := 1;
END IF;
END LOOP;
IF v_missing = 'T' THEN
a := To_char (SYSDATE, 'YYYY-MM-DD HH24:MI:SS');
c := ( To_char (a
|| Chr (10)
|| c) );
-- only until I am sure procedure is working 100% then I will include
-- automatic mail again.
Quickmail ('xxxxx@xxx.com.na', 'Missing CDRs', c);
END IF;
END recordcheck_sp;
|
|
|
|
|
Re: Check for missing records in 75 million records per day [message #630800 is a reply to message #630796] |
Mon, 05 January 2015 08:22 |
bgraupe
Messages: 25 Registered: July 2009 Location: Windhoek Namibia
|
Junior Member |
|
|
The sequence numbers are the nubmers generated on the source system and not oracle sequences.
The combination of timestamp (record date and time as generated in the source system) and sequence is the way the source system are generating the records. This is the only two columns that make the records unique.
If a sequence number is missing in the date range before the sequence number get initialized to 0 again, then I know that the record is missing and I need to inform the source system via mail and sms.
|
|
|
|
|
|
|
|
|
|
|
|
|
Re: Check for missing records in 75 million records per day [message #630829 is a reply to message #630806] |
Mon, 05 January 2015 09:25 |
John Watson
Messages: 8931 Registered: January 2010 Location: Global Village
|
Senior Member |
|
|
bgraupe wrote on Mon, 05 January 2015 14:38Thank you for the input, however the responsibility still remains on my side and I am trying to find an optimized method to do the process. How about a different approach? Do not use SQL*Loader, but rather an external table. You can then read the external table with a PL/SQL loop, and insert the rows row-by-row into your destination table. As you do that, store the sequential number in a variable and compare it to the value of the next row. Then write out missing numbers to another table every time the sequential value increases by more than 1. So you identify the missing numbers as you insert the rows.
I am of course assuming that the source data is in sequential order.
|
|
|
Re: Check for missing records in 75 million records per day [message #630878 is a reply to message #630829] |
Tue, 06 January 2015 00:58 |
bgraupe
Messages: 25 Registered: July 2009 Location: Windhoek Namibia
|
Junior Member |
|
|
The issue is that the records are not generated in sequential order, but in random order. Also I can have a scenario that sequential number 5 is in file no 3, but sequential number 4 is in file no. 2. Therefore I need to load all files before I can check for missing numbers.
|
|
|
|
|