Home » SQL & PL/SQL » SQL & PL/SQL » Design Question (12c )
Design Question [message #675863] Fri, 26 April 2019 10:25 Go to next message
ora9a
Messages: 42
Registered: June 2010
Member
Hi Guys,

I have a requirement to send an extract of data (in xml format) to a 3rd party every day. After the initial extract with all data, subsequent extracts should only be for any data that has changed (from yesterday's extract).

I am trying to decide what is the best way to get only changed data. I will be using a pretty simple query that uses 3 tables for the data.

e.g


SELECT forename, surname, address, course_name
  FROM student s, address a, courses c
 WHERE s.id = a.id
   AND s.id = c.id


So I only want it to return records where forename, surname, address or course_name may have changed from the previous day.

Does anyone have an efficient way of doing this? I was thinking of doing something like>

Above query populates a table each day, then the next day's query compares against the table for any changed data.

Thanks
Re: Design Question [message #675864 is a reply to message #675863] Fri, 26 April 2019 10:45 Go to previous messageGo to next message
Michel Cadot
Messages: 68625
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

1/ How may rows (total and changing each day)?
2/ First thought: MINUS
3/ Second thought MVIEW LOG
4/ Third thought trigger filling the today table

Re: Design Question [message #675865 is a reply to message #675864] Fri, 26 April 2019 10:51 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
Please read and follow the forum guidelines, to enable us to help you:
OraFAQ Forum Guide
How to use {code} tags and make your code easier to read

FWIW - 12c is a marketing alias & NOT acceptable. Always post Oracle version to 4 decimal places.

>So I only want it to return records where forename, surname, address or course_name may have changed from the previous day.
What should be done for new rows (INSERT) being added?
What should be done when old rows are removed (DELETE)?


post SQL & results that show only changed data.


>So I only want it to return records where forename, surname, address or course_name may have changed from the previous day.
Re: Design Question [message #675866 is a reply to message #675864] Fri, 26 April 2019 10:52 Go to previous messageGo to next message
ora9a
Messages: 42
Registered: June 2010
Member
Initial will be about 20,000 records. After that daily changes will be few hundred records. But at certain times in year when students enrol they will all need to be sent in the extract, which may be about 10,000

Minus - Do you mean using my approach of creating a table and then querying that using Minus?

Thanks

Re: Design Question [message #675867 is a reply to message #675865] Fri, 26 April 2019 10:59 Go to previous messageGo to next message
ora9a
Messages: 42
Registered: June 2010
Member
New Rows also need to be included in the extract. There will be no deletes.
Re: Design Question [message #675868 is a reply to message #675867] Fri, 26 April 2019 11:07 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
> There will be no deletes.
You are a naive newbie.
Do you believe that the initial data entry will always be 100% perfect?

When table alias exist in FROM clause, then every column in SELECT clause should be qualified by table alias.

You need to post Test Case if you desire actual SQL solution.
Re: Design Question [message #675869 is a reply to message #675868] Fri, 26 April 2019 11:17 Go to previous messageGo to next message
ora9a
Messages: 42
Registered: June 2010
Member
> There will be no deletes

Sorry I meant from the student, address, course tables. These student records are never deleted from the db.

I will create some test data.

Thanks
Re: Design Question [message #675870 is a reply to message #675869] Fri, 26 April 2019 11:29 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
BTW, I find single character table alias to be an anathema, because searching for them in long procedure can result in many false positives.
SSS, AAA, or CCC almost NEVER result in false positive.

Consider that if on day 1234 after initial data dump & asked to produce current table state, you have to apply only 1233 incremental restores.

I suggest that you should consider to additionally do periodic (weekly or monthly) full table exports using expdp.

>These student records are never deleted from the db.
Except data entry error occurs or the "rules" change to save disk space.
Re: Design Question [message #675871 is a reply to message #675866] Fri, 26 April 2019 12:04 Go to previous messageGo to next message
Michel Cadot
Messages: 68625
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
ora9a wrote on Fri, 26 April 2019 17:52
...
Minus - Do you mean using my approach of creating a table and then querying that using Minus?
...
Yes.
Given the tiny number of rows you have this seems to be the best way.

Re: Design Question [message #675872 is a reply to message #675870] Fri, 26 April 2019 12:10 Go to previous messageGo to next message
ora9a
Messages: 42
Registered: June 2010
Member
CREATE TABLE student (ID NUMBER, FORENAME VARCHAR2(50), SURNAME VARCHAR2(50));

CREATE TABLE address (ID NUMBER, LINE1 VARCHAR2(50), CITY VARCHAR2(50), ZIPCODE VARCHAR2(20), STU_ID NUMBER);

INSERT INTO student VALUES (1, 'Joe','Bloggs');
INSERT INTO student VALUES (2, 'Adam','Smith');

INSERT INTO address VALUES (1,'15 New Avenue','London','SE1XAD',1);
INSERT INTO address VALUES (2,'25 Bond Street','London','SWBAJ',2);

SELECT stu.id AS Student_id, stu.forename, stu.surname, adr.line1,adr.city
  FROM student stu, address adr
 WHERE stu.id = adr.stu_id; 

So on Day 1 this would return 2 rows.
On Day 2, a student's address is updated:

UPDATE address SET LINE1 = '80 Harvey Drive', CITY = 'Leeds', ZIPCODE = 'L1BXU'
 WHERE stu_id = 1;

When the query is executed on day 2, it should only return this student that has been updated, and not both records as in the first day.

Hope that makes sense!

Thanks.

(And the aliases etc. are only for brevity and not according to standards).





Re: Design Question [message #675873 is a reply to message #675872] Fri, 26 April 2019 12:53 Go to previous messageGo to next message
Michel Cadot
Messages: 68625
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

On day one you fill the stage table with the table, on day 2 you minus the current table with the stage one.

Re: Design Question [message #675915 is a reply to message #675873] Mon, 29 April 2019 05:37 Go to previous messageGo to next message
ora9a
Messages: 42
Registered: June 2010
Member
Michel,

I am assuming I will have to keep updating the staging table everyday. e.g.

CREATE TABLE Staging AS (
SELECT stu.id AS Student_id, stu.forename, stu.surname, adr.line1,adr.city
  FROM student stu, address adr
 WHERE stu.id = adr.stu_id);

DECLARE

CURSOR Cur1 IS 
SELECT stu.id AS Student_id, stu.forename as forename, stu.surname as surname, adr.line1 as line1,adr.city as city
  FROM student stu, address adr
 WHERE stu.id = adr.stu_id
MINUS 
SELECT Student_id,forename,surname,line1,city
  FROM Staging;

BEGIN
  
 FOR idx IN cur1 LOOP

 UPDATE Staging 
   SET forename = idx.forename,
       surname = idx.surname,
       line1 = idx.line1,
       city = idx.city
  WHERE student_id = idx.student_id;
 END LOOP;
END;
Re: Design Question [message #675917 is a reply to message #675915] Mon, 29 April 2019 07:49 Go to previous messageGo to next message
cookiemonster
Messages: 13917
Registered: September 2008
Location: Rainy Manchester
Senior Member
2 problems:
1) you aren't accounting for new rows - you'll need to do an insert as well. You could replace both update and insert with a single merge.
2) staging will include rows that haven't changed. You could get round this by adding a date column and setting it to trunc(sysdate) whenever a row is inserted or updated.
Then the process that actually creates the xml can just look at rows where date column = trunc(sysdate).
Re: Design Question [message #675920 is a reply to message #675915] Mon, 29 April 2019 11:23 Go to previous messageGo to next message
Michel Cadot
Messages: 68625
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

You do NOT update staging, after generating your report, the only things you do on it are:
TRUNCATE
INSERT SELECT

Re: Design Question [message #675922 is a reply to message #675920] Mon, 29 April 2019 11:40 Go to previous messageGo to next message
Bill B
Messages: 1971
Registered: December 2004
Senior Member
Include the student ID and a last_modification date in the staging table and then use a merge statement ones that change get updated with the modification_date being set to sysdate (or timestamp. New rows get inserted. You then simply query the staging table where the modification_date is the current day. Push all those through the XML file
Re: Design Question [message #675931 is a reply to message #675920] Tue, 30 April 2019 03:17 Go to previous messageGo to next message
cookiemonster
Messages: 13917
Registered: September 2008
Location: Rainy Manchester
Senior Member
Michel Cadot wrote on Mon, 29 April 2019 17:23

You do NOT update staging, after generating your report, the only things you do on it are:
TRUNCATE
INSERT SELECT

If you truncate it what are you doing a minus against?
Re: Design Question [message #675932 is a reply to message #675931] Tue, 30 April 2019 03:34 Go to previous messageGo to next message
Michel Cadot
Messages: 68625
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

The steps are:
Generate report using minus
TRUNCATE staging table
INSERT staging table SELECT FROM current tables, then the staging table is ready for the next day.

Re: Design Question [message #675941 is a reply to message #675932] Wed, 01 May 2019 05:47 Go to previous messageGo to next message
ora9a
Messages: 42
Registered: June 2010
Member
Thanks Michel.
Re: Design Question [message #678822 is a reply to message #675941] Thu, 16 January 2020 10:57 Go to previous messageGo to next message
ora9a
Messages: 42
Registered: June 2010
Member
Hi Michel,

I've come across an issue. If the original query has a where clause such as:

SELECT forename, surname, address, course_name
  FROM student s, address a, courses c
 WHERE s.id = a.id
   AND s.id = c.id
   AND course_name = 'A'
This works ok, until a student changes course. The query will therefore not return this student's record, and so the MINUS from the staging table will not return anything.

This means i cannot show that a student's course yesterday was 'A', but today is 'B'.

How can I get around this problem?

Thanks
Re: Design Question [message #678824 is a reply to message #678822] Thu, 16 January 2020 11:47 Go to previous messageGo to next message
Michel Cadot
Messages: 68625
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

There should not be any restriction in your data you need as a result.
I mean if evolution in course names should be in the result then you must not restrict on course name.

Re: Design Question [message #678826 is a reply to message #678824] Thu, 16 January 2020 13:07 Go to previous messageGo to next message
Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web
Senior Member
If the query is scheduled to extract the data that changed within a stipulated time frame like a day ago, then how about using a DATE column which stores details when the DML happened. So you only extract the data which was manipulated within your required time period.

You would need two date columns, each for inserted and updated datetime values e.g. INS_DATE, UPD_DATE respectively. These columns would only be required in the main table, while you only have one day data in staging stables. As Michel suggested, staging table needs to be truncated everyday and then you could MERGE only the delta everyday into main table. Now all that you need to do is add a filter in your query to send only the rows that have been INS/UPD on SYSDATE -1.

[Updated on: Thu, 16 January 2020 13:53]

Report message to a moderator

Re: Design Question [message #678827 is a reply to message #678826] Thu, 16 January 2020 13:38 Go to previous messageGo to next message
Bill B
Messages: 1971
Registered: December 2004
Senior Member
add the column last_mod (date) to your table add a simple trigger

CREATE OR REPLACE TRIGGER student_t1
BEFORE INSERT OR UPDATE
ON student
REFERENCING NEW AS New OLD AS Old
FOR EACH ROW
BEGIN
  :new.last_mod := sysdate;
END student_t1;
Make a simular trigger and column in the address table and you could run a generation report for the previous day (or any other range)

select *
from student
where last_mod > trunc(sysdate-1)
      and last_mod <= trunc(sysdate);

[Updated on: Thu, 16 January 2020 13:44]

Report message to a moderator

Re: Design Question [message #678829 is a reply to message #678827] Thu, 16 January 2020 13:53 Go to previous messageGo to next message
Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web
Senior Member
I always like to have two date columns for DML changes, one specific to INSERTS another for UPD/DEL. It just completes the entire design and it also leverages for a lot of analytical queries.

Also, depending on requirements, if the use case is suitable for an unified auditing policy I would go for it. As it would be an efficient method to manage compared to triggers.
Re: Design Question [message #678843 is a reply to message #678829] Fri, 17 January 2020 05:24 Go to previous messageGo to next message
ora9a
Messages: 42
Registered: June 2010
Member
Hi Guys,

@Michel Unfortunately that's how they want it. Only data for certain courses, but if student changes from that course then that needs to be captured.
@Lalit Not allowed to add columns to the base table student.
The process will run at a fixed time every night.

Day 1...

Student: John, Smith, 11 Walter St, A

SELECT forename, surname, address, course_name
  FROM student s, address a, courses c
 WHERE s.id = a.id
   AND s.id = c.id
   AND course_name = 'A'
MINUS 
SELECT forename, surname, address, course_name
  FROM stage

TRUNCATE stage

INSERT INTO stage (
SELECT forename, surname, address, course_name
  FROM student s, address a, courses c
 WHERE s.id = a.id
   AND s.id = c.id
   AND course_name = 'A')
-- Day 2

Data changes:
Student: John, Smith, 11 Walter St, B
This will not be picked up in query.

Re: Design Question [message #678849 is a reply to message #678843] Fri, 17 January 2020 07:54 Go to previous messageGo to next message
EdStevens
Messages: 1376
Registered: September 2013
Senior Member
"The process will run at a fixed time every night."

Again, you are very naive. The job will run at a fixed time every night, until something happens that causes it to not run at the fixed time. That's not an "if", it's a "when".

There have been a lot of proposed approaches, but it is still not clear exactly what data is needed at the receiving end. You have been focused on the technique of sending them only daily changes, but that is just a pre-conceived and possibly ill-conceived technical solution. Have you considered - and discussed with the receiving end - the possibility of sending them a full extract every day (just like the initial extract) and on their end truncating the receiving table before re-loading with the full extract? If the only objection is the perception that they are saving time/bandwidth by processing only changes, that is probably a false economy. The 10k row count you mention for a full extract is nothing.
Re: Design Question [message #678850 is a reply to message #678849] Fri, 17 January 2020 08:19 Go to previous messageGo to next message
ora9a
Messages: 42
Registered: June 2010
Member
Thanks for the response Ed. The 3rd party is not willing to do anything, they just want the initial extract, then changed data every day. If the process fails, it can be rerun because the staging table will still be there with previous day's data so i don't think that will be a problem.

I'm wondering whether to create views, one with full data, and second with the restricted where clauses, then somehow comparing the results..

Thanks
Re: Design Question [message #678852 is a reply to message #678850] Fri, 17 January 2020 09:44 Go to previous messageGo to next message
John Watson
Messages: 8922
Registered: January 2010
Location: Global Village
Senior Member
Quote:
The 3rd party is not willing to do anything, they just want the initial extract, then changed data every day.
Can you not capture the changes as they occur? Use triggers to write out the I/U/D changes to another table and generate your daily change data report from that.
Re: Design Question [message #678853 is a reply to message #678852] Fri, 17 January 2020 10:01 Go to previous messageGo to next message
ora9a
Messages: 42
Registered: June 2010
Member
Yes, triggers is the other option, but wanted to avoid as there will be quite a lot of tables involved so will have triggers everywhere.
Re: Design Question [message #678854 is a reply to message #678843] Fri, 17 January 2020 10:29 Go to previous messageGo to next message
Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web
Senior Member
ora9a wrote on Fri, 17 January 2020 16:54
Not allowed to add columns to the base table student.
So your entire focus is about WHEN part as Ed also pointed out. Like WHEN a student is enrolled and the WHEN part is even more important with your job scheduling, but you don't have a DATETIME column to address the WHEN part itself? Without it how would you ever know WHEN was the row modified? It will be quite useful even if you go with the trigger approach also if you want to do data validations etc.

With so many design limitations, I would suggest you validate and process the data to get the delta before modifying the base table. It will have an overhead on the ETL but you could process the delta and send it to your 3rd party application. For more help you need to provide more details about the ETL that modifies your base table. It's design, the workflow etc.
Re: Design Question [message #678857 is a reply to message #678843] Fri, 17 January 2020 11:02 Go to previous messageGo to next message
Michel Cadot
Messages: 68625
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Quote:
Student: John, Smith, 11 Walter St, B
This will not be picked up in query.
It is not clear what you expect in this case.
Do you want a "status", student no more in course A or student is now in course B?
In the later why do you currently restrict the statements to class A if other classes are also important?

Re: Design Question [message #678858 is a reply to message #678857] Fri, 17 January 2020 11:11 Go to previous messageGo to next message
ora9a
Messages: 42
Registered: June 2010
Member
The statement is restricted because only certain types of students are required in the extract, e.g. course A. BUT if the student changes course to 'B', then the 3rd party needs to know about this.

Unfortunately, this is the requirement from the business.

Cheers
Re: Design Question [message #678860 is a reply to message #678858] Fri, 17 January 2020 11:43 Go to previous messageGo to next message
Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web
Senior Member
ora9a wrote on Fri, 17 January 2020 22:41
The statement is restricted because only certain types of students are required in the extract, e.g. course A. BUT if the student changes course to 'B', then the 3rd party needs to know about this.

Unfortunately, this is the requirement from the business.
Your business unit must understand that design is important than "how to achieve this ASAP" part. Else, they will feel to be unfortunate.

This seems to be a fine grained auditing at row level or action based audit for particular columns. But still, there are lot of questions you need to answer first. It would be better to stick to WHAT part of design than the HOW/WHEN part.

I hope you understand that your question is about the design at first place, however, you are trying to seek suggestions only about how to achieve the end result. Every ETL process has a critical path which could be made efficient only if the design is good. Otherwise, it will affect the performance of entire ETL jobs and downstream systems/applications consuming your services. It's not late to go back to the drawing board to fix the design first and then think about the latter.

[Updated on: Fri, 17 January 2020 11:59]

Report message to a moderator

Re: Design Question [message #678861 is a reply to message #678860] Fri, 17 January 2020 12:03 Go to previous messageGo to next message
Bill B
Messages: 1971
Registered: December 2004
Senior Member
Another possibility would be to do a minus query using a select of the current table and using a flashback query the contents of the table at a time in the past.
Re: Design Question [message #678862 is a reply to message #678861] Fri, 17 January 2020 12:08 Go to previous messageGo to next message
Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web
Senior Member
Bill B wrote on Fri, 17 January 2020 23:33
Another possibility would be to do a minus query using a select of the current table and using a flashback query the contents of the table at a time in the past.
Yeah could be another approach if current suggestions won't work, however, OP needs to confirm whether automatic undo management is configured or not. And how much the undo retention is set to.
Re: Design Question [message #678863 is a reply to message #678858] Fri, 17 January 2020 12:13 Go to previous messageGo to next message
Michel Cadot
Messages: 68625
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

You can use a double MINUS query with a new "status" field:
SELECT forename, surname, address, course_name, "I/U" status
  FROM student s, address a, courses c
 WHERE s.id = a.id
   AND s.id = c.id
   AND course_name = 'A'
MINUS 
SELECT forename, surname, address, course_name, "I/U" status
  FROM stage
UNION ALL
SELECT forename, surname, address, course_name, "D" status
  FROM stage
MINUS 
SELECT forename, surname, address, course_name, "D" status
  FROM student s, address a, courses c
 WHERE s.id = a.id
   AND s.id = c.id
   AND course_name = 'A'
status "I/U" means inserted into class A or updated inside this class, and status "D" means removed from class A (student dropped or changed class).

Re: Design Question [message #678864 is a reply to message #678858] Fri, 17 January 2020 12:57 Go to previous messageGo to next message
John Watson
Messages: 8922
Registered: January 2010
Location: Global Village
Senior Member
How about a solution that actually uses the database? From release 12, we have Temporal Validity. You said that you cannot add columns to the tables, but the Temporal Validity columns are hidden so only you will know they are there. I think that Temporal Validity is meant for implementing type 6 Slowly Changing Dimensions, but it can possibly be used to track the changes you need. You would still have to write a few triggers. Well, writing code is part of the job.
Re: Design Question [message #678896 is a reply to message #678864] Tue, 21 January 2020 07:15 Go to previous messageGo to next message
Bill B
Messages: 1971
Registered: December 2004
Senior Member
Doesn't Temporal Validity just an easier way to use flashback query?
Re: Design Question [message #678897 is a reply to message #678896] Tue, 21 January 2020 07:23 Go to previous messageGo to next message
John Watson
Messages: 8922
Registered: January 2010
Location: Global Village
Senior Member
I suppose there is a relationship, Bill - it is about tracking multiple versions. But it does it with table and column structures, not with undo. See here:
orclz> select column_name,data_default,hidden_column,virtual_column from user_Tab_cols where table_name='DEPT';

COLUMN_NAME                    DATA_DEFAULT                                                                     HID VIR
------------------------------ -------------------------------------------------------------------------------- --- ---
DEPTNO                                                                                                          NO  NO
DNAME                                                                                                           NO  NO
LOC                                                                                                             NO  NO

orclz> alter table dept add (period for user_time);

Table altered.

orclz> select column_name,data_default,hidden_column,virtual_column from user_Tab_cols where table_name='DEPT';

COLUMN_NAME                    DATA_DEFAULT                                                                     HID VIR
------------------------------ -------------------------------------------------------------------------------- --- ---
DEPTNO                                                                                                          NO  NO
DNAME                                                                                                           NO  NO
LOC                                                                                                             NO  NO
USER_TIME_START                                                                                                 YES NO
USER_TIME_END                                                                                                   YES NO
USER_TIME                      36770601                                                                         YES YES

6 rows selected.

orclz>
and then in your SELECT you have the AS OF PERIOD FOR clause, or use DBMS_FLASHBACK.ENABLE_AT_VALID_TIME
Re: Design Question [message #678976 is a reply to message #678897] Fri, 24 January 2020 10:31 Go to previous messageGo to next message
ora9a
Messages: 42
Registered: June 2010
Member
Thanks for your responses. I am discussing with DBA's long term solutions as we are having to do this DELTA for a lot of new projects.

For now i am trying Michel's suggestion of double minus. @Michel, this works as it brings the student back, but it shows the previous course, (from the stage table), but not the new course that student has changed to. So i am trying to collect this data, then run update on it with new course.

Thanks

Re: Design Question [message #678977 is a reply to message #678976] Fri, 24 January 2020 10:53 Go to previous messageGo to previous message
Michel Cadot
Messages: 68625
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Quote:
but not the new course that student has changed to

Of course, you don't want it! (as per your WHERE clause).

[Updated on: Fri, 24 January 2020 10:53]

Report message to a moderator

Previous Topic: Looping through dblinks with a cursor
Next Topic: Writing from Ref cursor to UTIL file
Goto Forum:
  


Current Time: Thu Mar 28 15:38:14 CDT 2024