Home » SQL & PL/SQL » SQL & PL/SQL » scrambling the data for a column in a tables (oracle 11g version 2 )
scrambling the data for a column in a tables [message #659224] Mon, 09 January 2017 23:09 Go to next message
raj85844
Messages: 26
Registered: November 2016
Location: chennai
Junior Member
Hi Friends,

Wishing you all a very happy new year...

I got a requirement in scrambling the data in the table

The below are the columns with the data length as mention for referrence

DATE_OF_BIRTH - Year(last digit),Month and date
SURNAME - length 30
FIRST_NAME - length 20
MIDDLE_NAMES - length 50
HOME_PHONE_NO - length 25
MOBILE_PHONE_NO - length 25
EMAIL_ADDRESS - length 255
POSTCODE - Total length is 8 but first 4 digits should be scramble.

As the data in the columns need to be scrambled with the logic as mentioned below
Char/Varchar : randomly generate the letter from(A-Z).
Number : For number randomly select the no from 0-9.
Month : For Month the no should be generate randomly using 01-12.
Day : For Day the no should be generate randomly using 01-31.


i tried to create a generic stored procedure for scrambling the data's in the table
-->input parameters
i/p as p_table_name
i/p as p_schema_name

required a help in achieving the logic in sql with better performance in the store procedure
since this change is going to be in a larger table with 10 million records

Thanks
Raj
Re: scrambling the data for a column in a tables [message #659226 is a reply to message #659224] Mon, 09 January 2017 23:38 Go to previous messageGo to next message
Barbara Boehmer
Messages: 8898
Registered: November 2002
Location: California, USA
Senior Member
It is unclear what you mean by scrambling the data. I am guessing that you want to generate random data. If so, then please see the partial example below. I have used only three columns, one date, one character, and one numeric. For the date, I have subtracted a random number from sysdate, instead of using 1 to 31 days, since some months do not have 31 days. I have generated 10 rows, but you could specify any number of rows you like.

SCOTT@orcl_12.1.0.2.0> CREATE TABLE scrambled
  2    (DATE_OF_BIRTH  DATE,
  3  	FIRST_NAME	 VARCHAR2(20),
  4  	POSTCODE	 NUMBER(8))
  5  /

Table created.

SCOTT@orcl_12.1.0.2.0> INSERT INTO scrambled (date_of_birth, first_name, postcode)
  2  SELECT SYSDATE - DBMS_RANDOM.VALUE (1,366),
  3  	    DBMS_RANDOM.STRING ('U', 20),
  4  	    LPAD (CEIL (DBMS_RANDOM.VALUE (1,99999999)), 8, '0')
  5  FROM   DUAL
  6  CONNECT BY LEVEL <= 10
  7  /

10 rows created.

SCOTT@orcl_12.1.0.2.0> SELECT date_of_birth, first_name, postcode
  2  FROM   scrambled
  3  /

DATE_OF_BIRTH   FIRST_NAME             POSTCODE
--------------- -------------------- ----------
Wed 17-Aug-2016 RVRGBSLRWWJRQJBEGCLV   49870953
Thu 18-Feb-2016 GAUERUCEJVWPVBRYOBKO   57185556
Wed 10-Aug-2016 YCYYYLLAHOLLZJBEWVNS   47283027
Sat 22-Oct-2016 SHIPGRRAGLZFZSPDVKSU   29099732
Sat 12-Mar-2016 YXFNUBUXUHSHSFPPLRMH   75059706
Mon 15-Aug-2016 EOMXOPQUOLDYVDRBOLHY   74212704
Mon 02-May-2016 KOIBUHRMFLKLUUDKWIIX   23236438
Fri 01-Jul-2016 KJGDBTPFZDYMPGEGBVGY   86288842
Tue 09-Feb-2016 SYOOJWJOMEBQRKUKNQPE   95729329
Tue 12-Jan-2016 RZAFGDHEVYMZYQZLHBJB   76658230

10 rows selected.
Re: scrambling the data for a column in a tables [message #659228 is a reply to message #659224] Tue, 10 January 2017 01:51 Go to previous messageGo to next message
John Watson
Messages: 7623
Registered: January 2010
Location: Global Village
Senior Member
Have you considered using the Data Masking Pack?

https://www.oracle.com/database/data-masking-subsetting/index.html

you have to licence it, but it is one of the cheaper options. It has some brilliant algorithms for this sort of thing. Doing it manually you have all sorts of issues. For example, you'll be affecting the performance characteristics of the data; breaking your foreign key relationships; messing up indexes; it's non-deterministic; and of course it is going to take a long time.
Re: scrambling the data for a column in a tables [message #659243 is a reply to message #659224] Tue, 10 January 2017 06:45 Go to previous messageGo to next message
EdStevens
Messages: 985
Registered: September 2013
Senior Member
raj85844 wrote on Mon, 09 January 2017 23:09
Hi Friends,

Wishing you all a very happy new year...

I got a requirement in scrambling the data in the table

The below are the columns with the data length as mention for referrence

DATE_OF_BIRTH - Year(last digit),Month and date
WRONG! DATE_OF_BIRTH should be defined as a DATE data type. That is an internal, binary format, so your idea of it being 'Year(last digit),Month and date' is incorrect. And if it is not defined as DATE, then that is a fundamental design failure that will come back to haunt you again, and again, and again. And further, if it is not a DATE, and it is as you say, one single digit for the year, you have nicely replicated the Y2K problem. Only instead of a once-in-a-mellinium issue, you have made it an annual issue. So yet another design fail.





Re: scrambling the data for a column in a tables [message #659248 is a reply to message #659243] Tue, 10 January 2017 08:17 Go to previous messageGo to next message
cookiemonster
Messages: 13292
Registered: September 2008
Location: Rainy Manchester
Senior Member
Are you really trying to update existing data with randomly generated values?
Re: scrambling the data for a column in a tables [message #659249 is a reply to message #659243] Tue, 10 January 2017 08:22 Go to previous messageGo to next message
cookiemonster
Messages: 13292
Registered: September 2008
Location: Rainy Manchester
Senior Member
EdStevens wrote on Tue, 10 January 2017 12:45

And further, if it is not a DATE, and it is as you say, one single digit for the year, you have nicely replicated the Y2K problem. Only instead of a once-in-a-mellinium issue, you have made it an annual issue. So yet another design fail.

Reminds me of a thread I saw back in Jan 2010 on the otn forums. OP had date in DDDY format and couldn't work out why oracle had suddenly decided to interpret 8 as 2018 when the previous December it had treated it as 2008. And you do have to wonder what stupidity with dates caused that German bank to go offline on 1st Jan 2010.
Re: scrambling the data for a column in a tables [message #659250 is a reply to message #659248] Tue, 10 January 2017 08:26 Go to previous messageGo to next message
Roachcoach
Messages: 1571
Registered: May 2010
Location: UK
Senior Member
cookiemonster wrote on Tue, 10 January 2017 14:17
Are you really trying to update existing data with randomly generated values?
Probably, it needs to happen all the time when people want to copy prod volumes or prod data into testing environments.

Knackers performance analysis on those columns, but it doesn't result in you going getting a spanking from the data protection people for having that data outside of production and its associated access controls Smile
Re: scrambling the data for a column in a tables [message #659251 is a reply to message #659250] Tue, 10 January 2017 08:27 Go to previous messageGo to next message
cookiemonster
Messages: 13292
Registered: September 2008
Location: Rainy Manchester
Senior Member
If that's the case fair enough.
Re: scrambling the data for a column in a tables [message #659260 is a reply to message #659251] Tue, 10 January 2017 12:37 Go to previous messageGo to next message
joy_division
Messages: 4903
Registered: February 2005
Location: East Coast USA
Senior Member
We do it, and like John Watson said, we use Oracle's Data Masking and Subsetting Pack. We do it to mask QA and Development, because those are less secure systems.
Re: scrambling the data for a column in a tables [message #659455 is a reply to message #659260] Wed, 18 January 2017 04:29 Go to previous messageGo to next message
raj85844
Messages: 26
Registered: November 2016
Location: chennai
Junior Member
Hi Team,

Thanks for the inputs as it was really help full to program.

One query in updating the email id column, please find the below attempt of trying the expected result through scrambling.Could you help me to archive the expected result *

Table Name : Customers
Column Name : Cust_Email

CUST_EMAIL VARCHAR2(30)




(DATA)Before Scrambling
---------------------------------
Roger.Mastroianni@CREEPER.COM

* (DATA) Expected result through Scrambling
---------------------------------
Oktud.Cfmcxmrkvps@HJSDFGS.ysu


(DATA) After Scrambling by the below update
---------------------------------
oktud.cfmcxmrkvps@.ysu

Error report -
SQL Error: ORA-12899: value too large for column "OE_TEST"."CUSTOMERS"."CUST_EMAIL" (actual: 32, maximum: 30)

UPDATE customers
   SET cust_email =
             DBMS_RANDOM.string ('l',
                                 LENGTH (REGEXP_SUBSTR (cust_email,
                                                        '[A-Z][a-z]+',
                                                        1,
                                                        1)))
          || '.'
          || DBMS_RANDOM.string ('l',
                                LENGTH (REGEXP_SUBSTR (cust_email,
                                                        '[A-Z][a-z]+',
                                                        1,
                                                       2)))
          || '@'
          || DBMS_RANDOM.string ('l',
                                 LENGTH (REGEXP_SUBSTR (cust_email,
                                                      '[A-Z][a-z]+',
                                                       1,
                                                     1)))
          || '.'
          ||DBMS_RANDOM.string('l',3)
 WHERE cust_email IS NOT NULL;
/
Re: scrambling the data for a column in a tables [message #659456 is a reply to message #659455] Wed, 18 January 2017 04:32 Go to previous messageGo to next message
raj85844
Messages: 26
Registered: November 2016
Location: chennai
Junior Member
Note : Not able to update the characters after the @ symbol need to achieve the exact no of characters after @ with case sensitive
Re: scrambling the data for a column in a tables [message #659462 is a reply to message #659456] Wed, 18 January 2017 08:01 Go to previous messageGo to next message
joy_division
Messages: 4903
Registered: February 2005
Location: East Coast USA
Senior Member
First, why does the case have to match? Case is irrelevant in an email address.
Your variable is 30 characters but one of your email addresses is 32.
Re: scrambling the data for a column in a tables [message #659472 is a reply to message #659462] Wed, 18 January 2017 09:32 Go to previous messageGo to next message
cookiemonster
Messages: 13292
Registered: September 2008
Location: Rainy Manchester
Senior Member
source column = target column so if there's a length mismatch it'll be because the code isn't doing a 1-1 replacement.
I would have thought that the occurance parameter for the third regexp_substr call should be set 3 not 1.
Also it'll probably be more efficient to use plain instr to work out the lengths of the different sections.
Re: scrambling the data for a column in a tables [message #659619 is a reply to message #659472] Tue, 24 January 2017 08:13 Go to previous messageGo to next message
Bill B
Messages: 1802
Registered: December 2004
Senior Member
RUN THE FOLLOWING SELECT AND SEE IF THIS IS WHAT YOU WANT


SELECT CUST_EMAIL,
          TRANSLATE(CUST_EMAIL,REPLACE(REPLACE(CUST_EMAIL,'.'),'@'),DBMS_RANDOM.String ('U', LENGTH(CUST_EMAIL)))
           Altered_email
  FROM CUSTOMERS
 WHERE CUST_Email is not null;

[Updated on: Tue, 24 January 2017 12:31]

Report message to a moderator

Re: scrambling the data for a column in a tables [message #659623 is a reply to message #659619] Tue, 24 January 2017 12:57 Go to previous messageGo to next message
joy_division
Messages: 4903
Registered: February 2005
Location: East Coast USA
Senior Member
That's an excellent one Bill, but with the ridiculous requirement by OP that the case must match, it will not do.
Re: scrambling the data for a column in a tables [message #659624 is a reply to message #659623] Tue, 24 January 2017 13:28 Go to previous message
Bill B
Messages: 1802
Registered: December 2004
Senior Member
If you absolutely have to maintain the case (email doesn't require it) then use the following function

The call to use it for emails would be

SELECT CUST_EMAIL,
          SCRAMBLE_STR(CUST_EMAIL,'.@') Altered_email
  FROM CUSTOMERS
 WHERE CUST_Email is not null;

CREATE OR REPLACE FUNCTION Scramble_str (P_str      IN VARCHAR2,
                                         P_filter   IN VARCHAR2 DEFAULT NULL)
    RETURN VARCHAR2
IS
    P_return   VARCHAR2 (1000);
    Flag       VARCHAR2 (1);
BEGIN
    P_return := NULL;

    IF P_str IS NOT NULL
    THEN
        Flag := 'N';

        FOR Pnt IN 1 .. LENGTH (P_str)
        LOOP
            IF P_filter IS NOT NULL
            THEN
                FOR Pnt2 IN 1 .. LENGTH (P_filter)
                LOOP
                    IF SUBSTR (P_str, Pnt, 1) = SUBSTR (P_filter, Pnt2, 1)
                    THEN
                        IF P_return IS NULL
                        THEN
                            P_return := SUBSTR (P_str, Pnt, 1);
                        ELSE
                            P_return := P_return || SUBSTR (P_str, Pnt, 1);
                        END IF;

                        Flag := 'Y';
                        EXIT;
                    END IF;
                END LOOP;
            END IF;

            IF Flag = 'N'
            THEN
                IF ASCII (SUBSTR (P_str, Pnt, 1)) NOT BETWEEN 97 AND 122
                THEN
                    IF P_return IS NULL
                    THEN
                        P_return := DBMS_RANDOM.String ('U', 1);
                    ELSE
                        P_return := P_return || DBMS_RANDOM.String ('U', 1);
                    END IF;
                ELSIF ASCII (SUBSTR (P_str, Pnt, 1)) BETWEEN 97 AND 122
                THEN
                    IF P_return IS NULL
                    THEN
                        P_return := DBMS_RANDOM.String ('L', 1);
                    ELSE
                        P_return := P_return || DBMS_RANDOM.String ('L', 1);
                    END IF;
                ELSE
                    IF P_return IS NULL
                    THEN
                        P_return := SUBSTR (P_str, Pnt, 1);
                    ELSE
                        P_return := P_return || SUBSTR (P_str, Pnt, 1);
                    END IF;
                END IF;
            END IF;

            Flag := 'N';
        END LOOP;
    END IF;

    RETURN P_return;
END Scramble_str;

[Updated on: Tue, 24 January 2017 13:28]

Report message to a moderator

Previous Topic: SQL Execution Plan
Next Topic: Doubt in SYS_CONTEXT
Goto Forum:
  


Current Time: Tue Oct 23 02:51:13 CDT 2018