PHYSICAL DATABASE CHECKLIST
CONSIDERATIONS FOR DSS/GIS ENVIRONMENT
Prepared by P. Stankus
Last Update 2/24/2003
- In a
batch-load only system there is a large amount of Oracle functionality
available to avoid writing redo (direct writes, unrecoverable or nologging
options). – Use these options.
mode for all OLTP productional systems.
Archivelog mode not necessary for batch-only loaded systems as a
backup can be scheduled off-hours immediately after the batch load which
should also be off-hours.
DATABASE CREATION ISSUES THAT CANNOT BE CHANGED AFTER
CREATION W/O GREAT DIFFICULTY:
recommends that the database block size (db_block_size) should be equal to
or a multiple of the operating system block size. DSS systems usually start at 16K.
SIZING DATABASE OBJECTS:
inittrans to simultaneous transactions on a block. Multiple freelists avoid contention
when doing simultaneous INSERTS to a table. Given a batch-only loaded system this paramater will not be
a big issue.
UNIX DATA LOADING
attention to how you organize files so that you maximize modular coding and
scripts to dynamically generate DDL, SQL*Loader control files, analyze
scripts and all other scripts relating to DSS systems. If you use scripts to dynamically
generate these other scripts you will be able to use them for many
different databases and have to debug a limited number of times.
an environmental script for automated crontab processes and call this
script from all other scripts that run SQL commands as necessary and
- Setup common
and modular code for error-handling and job scheduling and other common
aspects of DSS systems.
- In a
DSS environment you can start by using automatic parallel tuning and then
based on monitoring begin to tune the different individual
most initialization parameters the best thing to do is to take an educated
guess best on information available at the time and then monitor and tune
in a systematic manner.
advantage of direct parallel loads
database blocks and OS blocks that match 16K are usually recommended.
all application-related objects owned by one user within the one
database. Do not call the database
name application specific or program area specific as used across
boundaries unless you wish to keep separate as it is provided from a 3rd
partitioning. Create scripts for
analyzing last partition added, purging partitions, etc. Partitioning is especially useful in
systems with large volumes of data.
some amount of metadata and automate the population of this data. Also, consider reporting tools or
processes that are smart enough to incorporate this information in report
building. Some of the information
stored might include:
-name of process that loaded data warehouse(got source)
–input file name size and source
–date and time of job start/end
–job completion status
–number of rows inserts, updated, purged
–number of rows rejected or errors
and revokes/grants of should be used to load “subject area” of the DSS
system. Until all automated data
checks are okayed don’t grant access back to roles. Often, a particular person will be the
second and manual check. Consider
setting up a process by which that user can “publish” data.
- Use metadata
to show historical changes – don’t try to go through and change all
history in the data warehouse as there are operational changes!!!
create indexes in parallel using nologging option (Set parallel to double
the amount of processors. However,
you will need enough space in your index (not temporary) tablespace to
create those extents based on degree of parallelism you set.
partition indexes (B*tree and bitmapped) even if underlying table is
not. An index that is partitioned
identically to its partitioned table is called a LOCAL INDEX. Oracle
provides bit-mapped indexes and merges that are very efficient especially
for low-cardinality data elements.
table in data warehouse/ODS that shows scripts, purposes, authors, schedule,
etc. This will build in
self-documentation as scripts are developed.
extent sizes based on estimated tables and size of data. If data large enough have tablespaces
per table. Partitioning and
tablespaces needs to be considered.
read-only tablespaces along with incremental backups with RMAN to avoid
having to backup or recover significant amounts of data.
data and index segments across disks (esp. with RAID 0) is great for
performance because Parallel Query operations and Parallel Load operations
can be done without introducing I/O contention. However, temporary segment, and rollback segment access are
not parallelized and it is best that these segments do not contend with
data/index segments – how we configured 6500.
“nologging” to populate objects (w/ least amount of redo) but backup
immediately after creation.
inittrans to 1 because won’t have multiple transactions modifying the same
block. Set maxtrans default.
using parallel degree double amount of CPU’s on the machine (can be in
query, at table/index level or entire database, session) or at database.
partitioning of large tables (esp. FACT tables) are useful. Horizontal means subsets of rows stored
together (i.e. a transaction date):
-partition roughly equal size
–data based on key should not have to be updated frequently
–usually a date
–how most queries will select data
–easier admin. If ea. Partition in separate tablespace and both have same
scans are often chosen by CBO in Data Warehousing queries so make them
-avoid setting up data warehouse w/
objects that have to be updated. If
objects do not have to be updated then set pctfree=0 for more efficient
sequential scans (less blocks because less wasted space to scan) – more
Unless there is not enough window – analyze objects
after creating. With partitioning can
just analyze the new partition using:
ANALYZE TABLE <TABLENAME> PARTITION <PARTITION_NAME>
ESTIMATE STATISTICS SAMPLE 10 PERCENT;
10 percent is usually sufficient.
Conduct architectural feasibility tests with generated
data as early in process as possible.
If using generated data then don’t have to have all load and conversion
processes done first:
-test query response
–test times take to do incremental loads
–test time to load
–test time backup and recovery
–test time analyze
Index rebuilds – ensure double amount of space as
existing index segments take. May
decide to use SINGLEROW option of direct load to maintain indexes as loading if
small number of rows and faster than rebuilding or not enough space.
Check status column of dba_indexes to make sure indexes
in place after load.
Use temporary tablespace to do all
aggregates/manipulations then only populate new table at end of process
Restrict random use of SQL*PLUS through
Create one user that owns all objects and runs
processes to run conversions in production.
RBS used for load processes since read-only all other
times. See guidelines for #/size above.
“Lock-down” tablespaces after load to make read-only –
esp. with partitioning so can make backup just backup new
partitions/tablespaces. Use read-only
mode on tablespaces.
Using SQL*LOADER direct path means during this phase
RBS are not used at all.
Can have serious temp. segment requirements for large
queries esp. if PQO used because will be based on DEGREE of parallelism. Consider ratios of 5-10% of total database
size to start and monitor. (size of
data segments). Also look at
sort-area-size. If ratio starts to
approach 15-20% of total data size, question validity of queries and how
written, look for opportunities for denormalization and aggregation.
Still might need logging so start w/ 3 groups redo
logfiles of 100MB apiece
Consider how you will populate development database or
QA database. Really can’t afford HUGE
data warehouse stored repeatedly. Will
start with a develop. Database for testing webdb – etc.
Run in archivelog mode and run backups immediately
after database load. Do most loading
using unrecoverable option to greatly minimize redo creation (empirically redo
information is still generated just a much smaller amount). Using RMAN allows for incremental backups
further minimizes the amount of recovery needed especially in a DSS
Place partitioned tables into separate tablespaces to
reduce backup and recovery at any particular point in time.
Look for opportunities to preaggregate, prejoin and
CONSIDER THE MAIN PHASES OF DATA
Extraction from sources and moving data to final host
Transformation to dimensions, aggregates, etc.
Since final tables will be likely
aggregated, denormalized from source – there will be a separate database (ODS)
for doing this type of activity then from the final storage and presentation
database. The ODS will be the first
database we setup and depending on the requirements may change to its degree of
Bulk load into final database for reporting
Quality Assurance checks – pre user and user QA.
Release/Publishing to end-user community
Controlled and scheduled updates (possibly)
Backup and Recovery
BATCH PROCESSING FOR DATA LOADS, REPORTING:
- Batch load processes are automated if
run on a continued cycle with automated alerts to DBA’s and other
responsible parties to follow-up only if there is a problem and also to
signal the completion of the load.
Different errors and successful completion of the load send
specific and distinct code for the fastest recovery of the process
possible. There is a e-mail and
beep sent to DBA’s and other responsible parties.
- Batch load processes are scheduled with
notification in e-mail to DBA’s the morning of the loads so that DBA’s can
ensure that necessary resources will be available and nothing will compete
or affect the successful completion of that job.
- DBA’s work closely with batch load
developers to assist in tuning the batch load processes to minimize the
batch load window so that routine administration window is available and
to ensure the best use of the shared resources. Specifically there are a number of things the DBA’s can do
to optimize the loads which can only be done at the system or database
level and cannot be done through coding.
- All database creations, creation of objects
within a database and installations involving database components are done
by DOH DBA staff and not by vendors of a specifc product. This ensures that new systems from
various vendors fit within the existing infrastructure and framework for
- Consider using external tables as part
of a loading process instead of temporary segments – 9i. See www.ioug.com
- Consider using merge – www.ioug.com
- See DSS database design:
Can completely avoid if initial=next
and if all objects in a tablespace has uniformly sized extents – best if only
one extent size per tbspace. Also,
place database objects in profiles of small, medium large and perhaps
huge. See Craig Shallahamer’s “Avoiding
a Database Reorganization” and Cary Millsap’s “Oracle 7 Server Space
Management” and “How to Stop Defragmenting and Start Living: The Definitive Word on Fragmentation” by
Bhaskar Himatsingka and Juan Loaiza.