How to find a column by sampling data [message #410147] |
Thu, 25 June 2009 08:13  |
glanson
Messages: 11 Registered: August 2008
|
Junior Member |
|
|
Can any one help me to identify the columns
which are ssn by sampling the data in the database or to run a query/script against each table so that the result
will tell you that the data in a a particular column seems to be a ssn ?[I have identified the columns in tables
which has ssn as there column names]. Please help...
|
|
|
Re: How to find a column by sampling data [message #410152 is a reply to message #410147] |
Thu, 25 June 2009 08:36   |
JRowbottom
Messages: 5933 Registered: June 2006 Location: Sunny North Yorkshire, ho...
|
Senior Member |
|
|
1) What is a SSN
2) What percentage of the data in a column needs to match the required format before you decide that it holds a SSN
You'd be better off I think trying to reverse engineer your application code to see where it stores and retrieves SSNs from.
|
|
|
Re: How to find a column by sampling data [message #410156 is a reply to message #410152] |
Thu, 25 June 2009 08:49   |
glanson
Messages: 11 Registered: August 2008
|
Junior Member |
|
|
SSN is social security number xxx-xx-xxxx
Ususally there seems to be one column but there are tables which has more than two columns. for eg: id, ssn, dad_ssn, mom_ssn etc
which are ssns.
Therefore I have to check this for each column in each table for the entire database
|
|
|
Re: How to find a column by sampling data [message #410162 is a reply to message #410156] |
Thu, 25 June 2009 09:01   |
JRowbottom
Messages: 5933 Registered: June 2006 Location: Sunny North Yorkshire, ho...
|
Senior Member |
|
|
Do you have any reason to believe that SSNs are being stored in columns that don't have SSN in the column_name?
Really, can you not check the application code?
Notwithstanding that, you only need to check columns of datatype CHAR or VARCHAR2, with a length of >= 11.
I'd do something like (pseudocode)DECLARE
v_cnt pls_integer;
BEGIN
for rec in (select column_name,table_name from user_tab_columns where data_type in (...) and data_length >=11) loop
execute immediate 'SELECT count(*) from '||rec.table_nam||' where '||rec.column_name||' like ''___-__-____''' into v_cnt;
if v_cnt > 0 then
dbms_outupt.put_line(rec.table_name||'-'||rec.column_name||':'||v_cnt);
end if;
end loop;
end;
/
|
|
|
Re: How to find a column by sampling data [message #410214 is a reply to message #410162] |
Thu, 25 June 2009 13:06   |
joy_division
Messages: 4963 Registered: February 2005 Location: East Coast USA
|
Senior Member |
|
|
JRowbottom wrote on Thu, 25 June 2009 10:01 | Do you have any reason to believe that SSNs are being stored in columns that don't have SSN in the column_name?
|
Looks like he is using ID as one of the columns names
Quote: |
Notwithstanding that, you only need to check columns of datatype CHAR or VARCHAR2, with a length of >= 11.
|
Or possibly =11 to cut it down even more.
|
|
|
Re: How to find a column by sampling data [message #410652 is a reply to message #410214] |
Mon, 29 June 2009 09:35  |
JRowbottom
Messages: 5933 Registered: June 2006 Location: Sunny North Yorkshire, ho...
|
Senior Member |
|
|
Quote: | Looks like he is using ID as one of the columns names
|
D'oh!
Quote: | Or possibly =11 to cut it down even more.
|
True - I was looking at a worst case scenario - they could stuff a SSN in that format into any column >= 11
|
|
|