Re: Migration validation to OCI

From: Tim Gorman <tim.evdbt_at_gmail.com>
Date: Mon, 15 Apr 2024 11:50:34 -0700
Message-ID: <c611f9b1-ac4c-4b4a-8463-0b02b61241fb_at_gmail.com>





List,

Mark makes an excellent point citing the old technique of "control totals"...

Attached are two SQL*Plus scripts that use the equally old technique of SQL-generating-SQL to create what I call "comparison tables" to generate control totals for comparing the contents of two tables, each with the same "shape", but different versions of one another.  Each of the columns in the two tables have several aggregations run against them, dependent on the data_type of the column.

 From the comments header in the scripts...

           For the entire tables being compared...
                   - name of the table
                   - row count of the table
           For each NUMBER column...
                   - count(distinct column-name)
                   - min(nvl(column-name, 0))
                   - max(nvl(column-name, 0))
                   - avg(nvl(column-name, 0))
                   - sum(nvl(column-name, 0))
                   - stddev(nvl(column-name, 0))
           For each CHAR or VARCHAR2 column...
                   - count(distinct column-name)
                   - min(length(column-name))
                   - max(length(column-name))
                   - avg(length(column-name))
                   - sum(length(column-name))
                   - min(ora_hash(column-name))
                   - max(ora_hash(column-name))
                   - avg(ora_hash(column-name))
                   - sum(ora_hash(column-name))
           For each DATE column...
                   - count(distinct column-name)
                   - min(to_number(to_char(column-name,'YYYYMMDDHH24MISS'))
                   - max(to_number(to_char(column-name,'YYYYMMDDHH24MISS'))
                   - avg(to_number(to_char(column-name,'YYYYMMDDHH24MISS'))
                   - sum(to_number(to_char(column-name,'YYYYMMDDHH24MISS'))
           For other datatypes...
                   - not yet supported by this script

The purpose of these calculations is merely to detect differences, not necessarily to pinpoint where the differences lie or what they are.

Here is a sample output based on testing after a code change to ETL logic to fix how one column (named LOCATION_KEY) is loaded...

COLUMN_NAME T1_VALUE                                 T2_VALUE

  • ----------------------------------------

TABLE_NAME NEW_TABLE_AA                             OLD_TABLE_AA CNT_LOCATION_KEY              4002                                     4272 AVG_LOCATION_KEY           20303.866432106804794529027401907016245 20153.5951564461457324878614033480490347

STD_LOCATION_KEY          14050.4442319081117478342482185723117511 17204.6251672774281857692695152895980603

SUM_LOCATION_KEY         8200264663                               8139573551 MIN_DATE_ENTERED 15-APR-2024 10:38:28                     15-APR-2024 04:17:10

AVG_DATE_ENTERED 15-APR-2024 10:38:28                     15-APR-2024 04:17:10

MAX_DATE_ENTERED 15-APR-2024 10:38:28                     15-APR-2024 04:17:10

MIN_DATE_LAST_UPDATED 15-APR-2024 10:38:28                     15-APR-2024 04:17:10 AVG_DATE_LAST_UPDATED 15-APR-2024 10:38:28                      15-APR-2024 04:17:10 MAX_DATE_LAST_UPDATED 15-APR-2024 10:38:28                     15-APR-2024 04:17:10 This output from the SQL*Plus script generated by the "query_compare.sql" script shows that only the column LOCATION_KEY has been affected by the code change, as the COUNT DISTINCT, AVERAGE, STANDARD DEVIATION, and SUM aggregations of the values in that column are all different.  Even the overall table row counts are the same, as are the values in the dozens of other columns.

The report also shows differences in DATE columns named DATE_ENTERED and DATE_LAST_UPDATED, but those differences are expected since the old table was loaded early in the morning while the new table values were populated mid-morning, about 6 hours later.  These columns are artifacts of the data loading process, not the results of the data load logic.

This output doesn't necessarily tell us exactly what is different in the LOCATION_KEY column, but it does assure us that there are differences in LOCATION_KEY in one or more columns, and importantly this tells us that there are no differences in the total row count of the table, nor in any other significant column of the tables.

Hope this helps?

Thanks!

-Tim

On 4/14/2024 6:29 AM, Mark W. Farnham wrote:
>
> A fairly standard minimum is the number of rows in each user table and
> the sum of the values of numeric columns in each table. The CDP exam
> circa 1982 called these “control totals.”
>
> A more aggressive verification is to also pipe the text fields through
> a hasher and compare the hash values.
>
> A reverse migration of all the data back to the original machine (even
> one user table at a time if space is a problem) with a full row by
> row, column by column comparison is comprehensive, but the cost versus
> the benefit does need to be considered. (This can also be done between
> OCI and the existing machine, but usually the network latency doing
> things row by row makes a back load of the data more performant.)
>
> Good luck,
>
> mwf
>
> PS: I presume your OCI means Oracle Cloud Infrastructure, not Oracle
> Call Interface
>
> *From:*oracle-l-bounce_at_freelists.org
> [mailto:oracle-l-bounce_at_freelists.org] *On Behalf Of *Eriovaldo Andrietta
> *Sent:* Sunday, April 14, 2024 8:28 AM
> *To:* ORACLE-L
> *Subject:* Migration validation to OCI
>
> Hello,
> An exadata database server onpremise will be migrated to OCI As Is
> to the same Oracle version (12.x).
>
> I would like to know what else would have to be validated at OCI to
> ensure that the migration was successful.
> I thought I would check:
>
> - dba_users
> - dba_objects (and each object type separately)
> - dba_tab_privs
>
> - dba_sys_privs
>
> - dba_roles
> - dba_synonym
> - dba_extents (in order to check the volumn)
>
> - count the number of lines for each table, or to check at the
> dba_tables numrows if the statistics are updated.
>    I guess that the number of lines migrated are enough to accept to
> content of the table
>
> The goal is to check only in the context of objects and not
> configuration (v$parameters, tablespaces, resource manager  and other
> structures).
>
> What else could be validated?
>
> Regards
>
> Eriovaldo
>





--
http://www.freelists.org/webpage/oracle-l



Received on Mon Apr 15 2024 - 20:50:34 CEST

Original text of this message