Home » SQL & PL/SQL » SQL & PL/SQL » How to find a column by sampling data (oracle10g, solaris 10)
How to find a column by sampling data [message #410147] Thu, 25 June 2009 08:13 Go to next message
glanson
Messages: 11
Registered: August 2008
Junior Member
Can any one help me to identify the columns
which are ssn by sampling the data in the database or to run a query/script against each table so that the result
will tell you that the data in a a particular column seems to be a ssn ?[I have identified the columns in tables
which has ssn as there column names]. Please help...
Re: How to find a column by sampling data [message #410152 is a reply to message #410147] Thu, 25 June 2009 08:36 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
1) What is a SSN

2) What percentage of the data in a column needs to match the required format before you decide that it holds a SSN

You'd be better off I think trying to reverse engineer your application code to see where it stores and retrieves SSNs from.
Re: How to find a column by sampling data [message #410156 is a reply to message #410152] Thu, 25 June 2009 08:49 Go to previous messageGo to next message
glanson
Messages: 11
Registered: August 2008
Junior Member
SSN is social security number xxx-xx-xxxx

Ususally there seems to be one column but there are tables which has more than two columns. for eg: id, ssn, dad_ssn, mom_ssn etc
which are ssns.

Therefore I have to check this for each column in each table for the entire database
Re: How to find a column by sampling data [message #410162 is a reply to message #410156] Thu, 25 June 2009 09:01 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
Do you have any reason to believe that SSNs are being stored in columns that don't have SSN in the column_name?

Really, can you not check the application code?

Notwithstanding that, you only need to check columns of datatype CHAR or VARCHAR2, with a length of >= 11.

I'd do something like (pseudocode)
DECLARE
  v_cnt  pls_integer;
BEGIN
  for rec in (select column_name,table_name from user_tab_columns where data_type in (...) and data_length >=11) loop
    execute immediate 'SELECT count(*) from '||rec.table_nam||' where '||rec.column_name||' like ''___-__-____''' into v_cnt;
    if v_cnt > 0 then
      dbms_outupt.put_line(rec.table_name||'-'||rec.column_name||':'||v_cnt);
    end if;
  end loop;
end;
/
Re: How to find a column by sampling data [message #410214 is a reply to message #410162] Thu, 25 June 2009 13:06 Go to previous messageGo to next message
joy_division
Messages: 4963
Registered: February 2005
Location: East Coast USA
Senior Member
JRowbottom wrote on Thu, 25 June 2009 10:01
Do you have any reason to believe that SSNs are being stored in columns that don't have SSN in the column_name?

Looks like he is using ID as one of the columns names

Quote:

Notwithstanding that, you only need to check columns of datatype CHAR or VARCHAR2, with a length of >= 11.


Or possibly =11 to cut it down even more.
Re: How to find a column by sampling data [message #410652 is a reply to message #410214] Mon, 29 June 2009 09:35 Go to previous message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
Quote:
Looks like he is using ID as one of the columns names

D'oh!

Quote:
Or possibly =11 to cut it down even more.

True - I was looking at a worst case scenario - they could stuff a SSN in that format into any column >= 11
Previous Topic: ORA-24374: define not done before fetch or execute and fetch
Next Topic: query optimization
Goto Forum:
  


Current Time: Thu Feb 13 12:06:59 CST 2025