Datawarehouse cleanup [message #426300] |
Wed, 14 October 2009 16:03  |
murinfa
Messages: 4 Registered: October 2009 Location: IL
|
Junior Member |
|
|
HI
Could anyone give me suggestion on cleaning up the errored data in DW
Client wants to cleanup the historical data. apart from manual fix, what could b the suggestions?
thanks in advance
rev
|
|
|
Re: Datawarehouse cleanup [message #426303 is a reply to message #426300] |
Wed, 14 October 2009 16:20   |
ThomasG
Messages: 3212 Registered: April 2005 Location: Heilbronn, Germany
|
Senior Member |
|
|
Well. Either use "delete" statements on the data you want to delete, or "create table as" statements for the data you want to keep. That would be the vague answer to a vague question.
The thing you should do first is try to figure out if it's some kind of "standard" DWH written by some software company where you might find documentation on the table structure, or something home-grown where you have to track down and contact the person who wrote it to figure out the table structure.
|
|
|
Re: Datawarehouse cleanup [message #426304 is a reply to message #426303] |
Wed, 14 October 2009 16:22   |
murinfa
Messages: 4 Registered: October 2009 Location: IL
|
Junior Member |
|
|
Thankyou Thomas
Sorry for being vague.. Topic should have been 'historical data correction' rather than cleanup.
Report ran on DW pulls data which is incomplete.. ex: Gender column is blank, emp id in One source system doesnt match with peoplesoft etc.
So we should correct these records, so that the reports are perfect. (Though the correction mechanism in sources systems are not yet figured out)
Thanks again
[Updated on: Wed, 14 October 2009 16:24] Report message to a moderator
|
|
|
Re: Datawarehouse cleanup [message #426305 is a reply to message #426304] |
Wed, 14 October 2009 16:29   |
ThomasG
Messages: 3212 Registered: April 2005 Location: Heilbronn, Germany
|
Senior Member |
|
|
Then I would start by analyzing the errors messages in the transfer logs, to figure out what went wrong where during the transfer.
If it is possible to correct the errors (either in the transfer procedures or the source data) and then re-run the existing transfer jobs, that would be most likely the fastest and easiest solution.
When there is data that is "wrong" in the source systems, you might consider including default values for missing data and/or corrections to the data in the transfer procedures.
|
|
|
Re: Datawarehouse cleanup [message #426306 is a reply to message #426305] |
Wed, 14 October 2009 16:35   |
murinfa
Messages: 4 Registered: October 2009 Location: IL
|
Junior Member |
|
|
Thanks again
Errors here business errors. Pulling historical data for reporting, brings some incorrect data (defaulted ones like 0 for staff id, there cant be an emp in the system without emp id). for this old data, source systems are not gonna be touched.
so i suggested to find out all similar errors, here find all the records having '0' as emp id and update it with correct ones!
but this is tedious to go through each and every error and fix manually.
i just wanted to know if there would be any process to handle these kind of situations!
|
|
|
Re: Datawarehouse cleanup [message #426307 is a reply to message #426306] |
Wed, 14 October 2009 16:52   |
ThomasG
Messages: 3212 Registered: April 2005 Location: Heilbronn, Germany
|
Senior Member |
|
|
murinfa wrote on Wed, 14 October 2009 23:35
but this is tedious to go through each and every error and fix manually.
You should be able to figure out update statements for each "type" of error.
Of course that DOES require analysing the table structure, figuring out where the data is coming from.
Then you should be able to do an
update table
set emp_id = (select [the stuff you have to figure out])
where emp_id = 0;
That WILL of course be tedious. Not really a way around it. It might even be impossible, for example if there are historical employees which never had an employee ID. In that case you would have to come up with a scheme to assign them some sort of pseudo-employee ID which doesn't clash with the real employee IDs.
I did some stuff like that, and depending on how big and how much problems you run into this might take from a few days to a few weeks, or even longer, depending on how many types of business errors you run into.
|
|
|
Re: Datawarehouse cleanup [message #426317 is a reply to message #426300] |
Wed, 14 October 2009 22:52   |
 |
Kevin Meade
Messages: 2103 Registered: December 1999 Location: Connecticut USA
|
Senior Member |
|
|
There is no easy way to fix data errors. You will just have to buckle down and accept that you will need to do the necessary analysis and write the necessary scripts to correct the data. This kind of exercise depends a lot on your user's desire to actually correct the data. There may be a need to go to source systems and get the "correct" answers. In the end this is a very manual task and it can take months.
Additionally, before you start any fix task, you should ask yourself how did these errors happen in the first place. More specifically, what will keep them from comming back after you have fixed the data. You could spend months fixing your DW data and end up a year from now in the same place because you did not fix the original bugs that cause the data corruption first.
Lastly, you need to make sure your DW is not a "system of record" in some way. If it is, then you may, depending upon your industry and data content, be required to prove you have an auditable data change process in place, that you actually folllowed the process, and to show your records of what changes were made, when, by who, and how. If you don't your company could face legal action later if it is discovered you manipulated data in a "system or record" without keeping the necessary documentation. A lack of auditability when it comes to data changes outside the normal processes of a "system of record" gives the appearance of "cooking the books".
Good luck, Kevin
[Updated on: Wed, 14 October 2009 22:54] Report message to a moderator
|
|
|
Re: Datawarehouse cleanup [message #427225 is a reply to message #426317] |
Wed, 21 October 2009 10:27  |
murinfa
Messages: 4 Registered: October 2009 Location: IL
|
Junior Member |
|
|
Thanks to all of you. I was pulled into someother issue and so had no time to look on to this..
I understand clearly what you all meant. And I also want to thank all of you for your time and efforts to help me..
I have questions adding up,
understanding that this is purely data patch up, what would be real / productive solution to fix it? would it be
1. To use sql update statements to correct it?
2. To have a process do it
if we could do it as a job, could you please guide me how it could be done? thanks
errors are of different types. atleast 5-6 kind of errors we have.
Thanks again
Mura
|
|
|