Re: thoughts on "manufactured data"

From: dawn <dawnwolthuis_at_gmail.com>
Date: 28 Mar 2005 08:56:34 -0800
Message-ID: <1112028994.254514.78430_at_g14g2000cwa.googlegroups.com>


Nolan wrote:
> A client with which I'm working has a dictum that "thou shalt not
> manufacture data"; which means we are forced to pull in bad data
> against all sane advice to the contrary.
>
> Can anyone point me to an author who can show how to go about
> integrating poor data quality sources into a datamart with a
defensible
> strategy for data cleansing (which is the client's definition of
> 'manufacturing' data.)

I can't think of a particular author, but you could possibly employ fuzzy sets, with attributes whose data values are most concerning being assigned a probability of accuracy.

Of course that probability is "new data" so it, too, is manufactured, but might not conflict with the standards set by this client as it does not alter any data passed to them.

This would, obviously, make some analysis more complex if you use the probabilities in the computations, but it can also be used as data for the user to see when drilling down on aggregate data.

It sounds like your professional judgement is that the client should cleanse the data, so perhaps the best reading you could do would be related to negotiating techniques ;-) I'm guessing you've given it your best shot at explaining advantages of data cleansing and they simply don't buy it, so would they be willing to have new attributes added to the warehouse that are a "best guess" of what the data value of some other attribute should be so that the analysis can be done on both attributes in order to help support business decisions? That will depend on what they see as their role in the organization -- are they doing their best to support business decision-making with data analysis or do they define their role more narrowly as analyzing the data they are given, whether it could contribute to good decision-making or not?

Best wishes. --dawn  

> TIA
> NM
Received on Mon Mar 28 2005 - 18:56:34 CEST

Original text of this message