PHYSICAL DATABASE CHECKLIST

CONSIDERATIONS FOR DSS/GIS ENVIRONMENT

Prepared by P. Stankus

Last Update 2/24/2003

REDO LOGGING:

In a batch-load only system there is a large amount of Oracle functionality available to avoid writing redo (direct writes, unrecoverable or nologging options). – Use these options.

ARCHIVELOG MODE:

Archivelog mode for all OLTP productional systems. Archivelog mode not necessary for batch-only loaded systems as a backup can be scheduled off-hours immediately after the batch load which should also be off-hours.

DATABASE CREATION ISSUES THAT CANNOT BE CHANGED AFTER CREATION W/O GREAT DIFFICULTY:

Oracle recommends that the database block size (db_block_size) should be equal to or a multiple of the operating system block size. DSS systems usually start at 16K.

SIZING DATABASE OBJECTS:

Set inittrans to simultaneous transactions on a block. Multiple freelists avoid contention when doing simultaneous INSERTS to a table. Given a batch-only loaded system this paramater will not be a big issue.

UNIX DATA LOADING TIPS:

Pay attention to how you organize files so that you maximize modular coding and efficiency.
Use scripts to dynamically generate DDL, SQL*Loader control files, analyze scripts and all other scripts relating to DSS systems. If you use scripts to dynamically generate these other scripts you will be able to use them for many different databases and have to debug a limited number of times.
Setup an environmental script for automated crontab processes and call this script from all other scripts that run SQL commands as necessary and appropriate.
Setup common and modular code for error-handling and job scheduling and other common aspects of DSS systems.

CONFIGURATION PARAMETERS:

In a DSS environment you can start by using automatic parallel tuning and then based on monitoring begin to tune the different individual parameters.
For most initialization parameters the best thing to do is to take an educated guess best on information available at the time and then monitor and tune in a systematic manner.

DATA WAREHOUSE:

Take advantage of direct parallel loads
Large database blocks and OS blocks that match 16K are usually recommended.
Have all application-related objects owned by one user within the one database. Do not call the database name application specific or program area specific as used across boundaries unless you wish to keep separate as it is provided from a 3^rd party vendor.
Setup partitioning. Create scripts for analyzing last partition added, purging partitions, etc. Partitioning is especially useful in systems with large volumes of data.
Store some amount of metadata and automate the population of this data. Also, consider reporting tools or processes that are smart enough to incorporate this information in report building. Some of the information stored might include:
-name of process that loaded data warehouse(got source)
–input file name size and source
–date and time of job start/end
–job completion status
–number of rows inserts, updated, purged
–number of rows rejected or errors
Roles and revokes/grants of should be used to load “subject area” of the DSS system. Until all automated data checks are okayed don’t grant access back to roles. Often, a particular person will be the second and manual check. Consider setting up a process by which that user can “publish” data.
Use metadata to show historical changes – don’t try to go through and change all history in the data warehouse as there are operational changes!!!
Probably create indexes in parallel using nologging option (Set parallel to double the amount of processors. However, you will need enough space in your index (not temporary) tablespace to create those extents based on degree of parallelism you set.
Can partition indexes (B*tree and bitmapped) even if underlying table is not. An index that is partitioned identically to its partitioned table is called a LOCAL INDEX. Oracle provides bit-mapped indexes and merges that are very efficient especially for low-cardinality data elements.
Create table in data warehouse/ODS that shows scripts, purposes, authors, schedule, etc. This will build in self-documentation as scripts are developed.
Uniform extent sizes based on estimated tables and size of data. If data large enough have tablespaces per table. Partitioning and tablespaces needs to be considered.
Use read-only tablespaces along with incremental backups with RMAN to avoid having to backup or recover significant amounts of data.
Striping data and index segments across disks (esp. with RAID 0) is great for performance because Parallel Query operations and Parallel Load operations can be done without introducing I/O contention. However, temporary segment, and rollback segment access are not parallelized and it is best that these segments do not contend with data/index segments – how we configured 6500.
Use “nologging” to populate objects (w/ least amount of redo) but backup immediately after creation.
Set inittrans to 1 because won’t have multiple transactions modifying the same block. Set maxtrans default.
Consider using parallel degree double amount of CPU’s on the machine (can be in query, at table/index level or entire database, session) or at database.
Horizontal partitioning of large tables (esp. FACT tables) are useful. Horizontal means subsets of rows stored together (i.e. a transaction date):
-partition roughly equal size
–data based on key should not have to be updated frequently
–usually a date
–how most queries will select data
–easier admin. If ea. Partition in separate tablespace and both have same names.
Full-table scans are often chosen by CBO in Data Warehousing queries so make them efficient:

-avoid setting up data warehouse w/ objects that have to be updated. If objects do not have to be updated then set pctfree=0 for more efficient sequential scans (less blocks because less wasted space to scan) – more efficient.

· Unless there is not enough window – analyze objects after creating. With partitioning can just analyze the new partition using: ANALYZE TABLE <TABLENAME> PARTITION <PARTITION_NAME> ESTIMATE STATISTICS SAMPLE 10 PERCENT; 10 percent is usually sufficient.

· Conduct architectural feasibility tests with generated data as early in process as possible. If using generated data then don’t have to have all load and conversion processes done first:
-test query response
–test times take to do incremental loads
–test time to load
–test time backup and recovery
–test time analyze

· Index rebuilds – ensure double amount of space as existing index segments take. May decide to use SINGLEROW option of direct load to maintain indexes as loading if small number of rows and faster than rebuilding or not enough space.

· Check status column of dba_indexes to make sure indexes in place after load.

· Use temporary tablespace to do all aggregates/manipulations then only populate new table at end of process (various ways).

· Restrict random use of SQL*PLUS through PRODUCT_USER_PROFILE table.

· Create one user that owns all objects and runs processes to run conversions in production.

· RBS used for load processes since read-only all other times. See guidelines for #/size above.

· “Lock-down” tablespaces after load to make read-only – esp. with partitioning so can make backup just backup new partitions/tablespaces. Use read-only mode on tablespaces.

· Using SQL*LOADER direct path means during this phase RBS are not used at all.

· Can have serious temp. segment requirements for large queries esp. if PQO used because will be based on DEGREE of parallelism. Consider ratios of 5-10% of total database size to start and monitor. (size of data segments). Also look at sort-area-size. If ratio starts to approach 15-20% of total data size, question validity of queries and how written, look for opportunities for denormalization and aggregation.

· Still might need logging so start w/ 3 groups redo logfiles of 100MB apiece

· Consider how you will populate development database or QA database. Really can’t afford HUGE data warehouse stored repeatedly. Will start with a develop. Database for testing webdb – etc.

· Run in archivelog mode and run backups immediately after database load. Do most loading using unrecoverable option to greatly minimize redo creation (empirically redo information is still generated just a much smaller amount). Using RMAN allows for incremental backups further minimizes the amount of recovery needed especially in a DSS environment.

· Place partitioned tables into separate tablespaces to reduce backup and recovery at any particular point in time.

· Look for opportunities to preaggregate, prejoin and partition!!

CONSIDER THE MAIN PHASES OF DATA WAREHOUSING:

· Extraction from sources and moving data to final host

· Transformation to dimensions, aggregates, etc.

Since final tables will be likely aggregated, denormalized from source – there will be a separate database (ODS) for doing this type of activity then from the final storage and presentation database. The ODS will be the first database we setup and depending on the requirements may change to its degree of normalization.

· Bulk load into final database for reporting

· Quality Assurance checks – pre user and user QA.

· Release/Publishing to end-user community

· Controlled and scheduled updates (possibly)

· Querying

· Auditing

· Security

· Backup and Recovery

BATCH PROCESSING FOR DATA LOADS, REPORTING:

Batch load processes are automated if run on a continued cycle with automated alerts to DBA’s and other responsible parties to follow-up only if there is a problem and also to signal the completion of the load. Different errors and successful completion of the load send specific and distinct code for the fastest recovery of the process possible. There is a e-mail and beep sent to DBA’s and other responsible parties.
Batch load processes are scheduled with notification in e-mail to DBA’s the morning of the loads so that DBA’s can ensure that necessary resources will be available and nothing will compete or affect the successful completion of that job.
DBA’s work closely with batch load developers to assist in tuning the batch load processes to minimize the batch load window so that routine administration window is available and to ensure the best use of the shared resources. Specifically there are a number of things the DBA’s can do to optimize the loads which can only be done at the system or database level and cannot be done through coding.
All database creations, creation of objects within a database and installations involving database components are done by DOH DBA staff and not by vendors of a specifc product. This ensures that new systems from various vendors fit within the existing infrastructure and framework for these environments.
Consider using external tables as part of a loading process instead of temporary segments – 9i. See www.ioug.com
Consider using merge – www.ioug.com
See DSS database design:

ON-GOING MONITORING:

· tablespace fragmentation
Can completely avoid if initial=next and if all objects in a tablespace has uniformly sized extents – best if only one extent size per tbspace. Also, place database objects in profiles of small, medium large and perhaps huge. See Craig Shallahamer’s “Avoiding a Database Reorganization” and Cary Millsap’s “Oracle 7 Server Space Management” and “How to Stop Defragmenting and Start Living: The Definitive Word on Fragmentation” by Bhaskar Himatsingka and Juan Loaiza.

REFERENCES:

Top Partitioning Table Issues