PHYSICAL DATABASE CHECKLIST
CONSIDERATIONS FOR DSS/GIS ENVIRONMENT
Prepared by P. Stankus
Last Update 2/24/2003
REDO LOGGING:
- In a
batch-load only system there is a large amount of Oracle functionality
available to avoid writing redo (direct writes, unrecoverable or nologging
options). – Use these options.
ARCHIVELOG MODE:
- Archivelog
mode for all OLTP productional systems.
Archivelog mode not necessary for batch-only loaded systems as a
backup can be scheduled off-hours immediately after the batch load which
should also be off-hours.
DATABASE CREATION ISSUES THAT CANNOT BE CHANGED AFTER
CREATION W/O GREAT DIFFICULTY:
- Oracle
recommends that the database block size (db_block_size) should be equal to
or a multiple of the operating system block size. DSS systems usually start at 16K.
SIZING DATABASE OBJECTS:
- Set
inittrans to simultaneous transactions on a block. Multiple freelists avoid contention
when doing simultaneous INSERTS to a table. Given a batch-only loaded system this paramater will not be
a big issue.
UNIX DATA LOADING
TIPS:
- Pay
attention to how you organize files so that you maximize modular coding and
efficiency.
- Use
scripts to dynamically generate DDL, SQL*Loader control files, analyze
scripts and all other scripts relating to DSS systems. If you use scripts to dynamically
generate these other scripts you will be able to use them for many
different databases and have to debug a limited number of times.
- Setup
an environmental script for automated crontab processes and call this
script from all other scripts that run SQL commands as necessary and
appropriate.
- Setup common
and modular code for error-handling and job scheduling and other common
aspects of DSS systems.
CONFIGURATION PARAMETERS:
- In a
DSS environment you can start by using automatic parallel tuning and then
based on monitoring begin to tune the different individual
parameters.
- For
most initialization parameters the best thing to do is to take an educated
guess best on information available at the time and then monitor and tune
in a systematic manner.
DATA WAREHOUSE:
- Take
advantage of direct parallel loads
- Large
database blocks and OS blocks that match 16K are usually recommended.
- Have
all application-related objects owned by one user within the one
database. Do not call the database
name application specific or program area specific as used across
boundaries unless you wish to keep separate as it is provided from a 3rd
party vendor.
- Setup
partitioning. Create scripts for
analyzing last partition added, purging partitions, etc. Partitioning is especially useful in
systems with large volumes of data.
- Store
some amount of metadata and automate the population of this data. Also, consider reporting tools or
processes that are smart enough to incorporate this information in report
building. Some of the information
stored might include:
-name of process that loaded data warehouse(got source)
–input file name size and source
–date and time of job start/end
–job completion status
–number of rows inserts, updated, purged
–number of rows rejected or errors
- Roles
and revokes/grants of should be used to load “subject area” of the DSS
system. Until all automated data
checks are okayed don’t grant access back to roles. Often, a particular person will be the
second and manual check. Consider
setting up a process by which that user can “publish” data.
- Use metadata
to show historical changes – don’t try to go through and change all
history in the data warehouse as there are operational changes!!!
- Probably
create indexes in parallel using nologging option (Set parallel to double
the amount of processors. However,
you will need enough space in your index (not temporary) tablespace to
create those extents based on degree of parallelism you set.
- Can
partition indexes (B*tree and bitmapped) even if underlying table is
not. An index that is partitioned
identically to its partitioned table is called a LOCAL INDEX. Oracle
provides bit-mapped indexes and merges that are very efficient especially
for low-cardinality data elements.
- Create
table in data warehouse/ODS that shows scripts, purposes, authors, schedule,
etc. This will build in
self-documentation as scripts are developed.
- Uniform
extent sizes based on estimated tables and size of data. If data large enough have tablespaces
per table. Partitioning and
tablespaces needs to be considered.
- Use
read-only tablespaces along with incremental backups with RMAN to avoid
having to backup or recover significant amounts of data.
- Striping
data and index segments across disks (esp. with RAID 0) is great for
performance because Parallel Query operations and Parallel Load operations
can be done without introducing I/O contention. However, temporary segment, and rollback segment access are
not parallelized and it is best that these segments do not contend with
data/index segments – how we configured 6500.
- Use
“nologging” to populate objects (w/ least amount of redo) but backup
immediately after creation.
- Set
inittrans to 1 because won’t have multiple transactions modifying the same
block. Set maxtrans default.
- Consider
using parallel degree double amount of CPU’s on the machine (can be in
query, at table/index level or entire database, session) or at database.
- Horizontal
partitioning of large tables (esp. FACT tables) are useful. Horizontal means subsets of rows stored
together (i.e. a transaction date):
-partition roughly equal size
–data based on key should not have to be updated frequently
–usually a date
–how most queries will select data
–easier admin. If ea. Partition in separate tablespace and both have same
names.
- Full-table
scans are often chosen by CBO in Data Warehousing queries so make them
efficient:
-avoid setting up data warehouse w/
objects that have to be updated. If
objects do not have to be updated then set pctfree=0 for more efficient
sequential scans (less blocks because less wasted space to scan) – more
efficient.
·
Unless there is not enough window – analyze objects
after creating. With partitioning can
just analyze the new partition using:
ANALYZE TABLE <TABLENAME> PARTITION <PARTITION_NAME>
ESTIMATE STATISTICS SAMPLE 10 PERCENT;
10 percent is usually sufficient.
·
Conduct architectural feasibility tests with generated
data as early in process as possible.
If using generated data then don’t have to have all load and conversion
processes done first:
-test query response
–test times take to do incremental loads
–test time to load
–test time backup and recovery
–test time analyze
·
Index rebuilds – ensure double amount of space as
existing index segments take. May
decide to use SINGLEROW option of direct load to maintain indexes as loading if
small number of rows and faster than rebuilding or not enough space.
·
Check status column of dba_indexes to make sure indexes
in place after load.
·
Use temporary tablespace to do all
aggregates/manipulations then only populate new table at end of process
(various ways).
·
Restrict random use of SQL*PLUS through
PRODUCT_USER_PROFILE table.
·
Create one user that owns all objects and runs
processes to run conversions in production.
·
RBS used for load processes since read-only all other
times. See guidelines for #/size above.
·
“Lock-down” tablespaces after load to make read-only –
esp. with partitioning so can make backup just backup new
partitions/tablespaces. Use read-only
mode on tablespaces.
·
Using SQL*LOADER direct path means during this phase
RBS are not used at all.
·
Can have serious temp. segment requirements for large
queries esp. if PQO used because will be based on DEGREE of parallelism. Consider ratios of 5-10% of total database
size to start and monitor. (size of
data segments). Also look at
sort-area-size. If ratio starts to
approach 15-20% of total data size, question validity of queries and how
written, look for opportunities for denormalization and aggregation.
·
Still might need logging so start w/ 3 groups redo
logfiles of 100MB apiece
·
Consider how you will populate development database or
QA database. Really can’t afford HUGE
data warehouse stored repeatedly. Will
start with a develop. Database for testing webdb – etc.
·
Run in archivelog mode and run backups immediately
after database load. Do most loading
using unrecoverable option to greatly minimize redo creation (empirically redo
information is still generated just a much smaller amount). Using RMAN allows for incremental backups
further minimizes the amount of recovery needed especially in a DSS
environment.
·
Place partitioned tables into separate tablespaces to
reduce backup and recovery at any particular point in time.
·
Look for opportunities to preaggregate, prejoin and
partition!!
CONSIDER THE MAIN PHASES OF DATA
WAREHOUSING:
·
Extraction from sources and moving data to final host
·
Transformation to dimensions, aggregates, etc.
Since final tables will be likely
aggregated, denormalized from source – there will be a separate database (ODS)
for doing this type of activity then from the final storage and presentation
database. The ODS will be the first
database we setup and depending on the requirements may change to its degree of
normalization.
·
Bulk load into final database for reporting
·
Quality Assurance checks – pre user and user QA.
·
Release/Publishing to end-user community
·
Controlled and scheduled updates (possibly)
·
Querying
·
Auditing
·
Security
·
Backup and Recovery
BATCH PROCESSING FOR DATA LOADS, REPORTING:
- Batch load processes are automated if
run on a continued cycle with automated alerts to DBA’s and other
responsible parties to follow-up only if there is a problem and also to
signal the completion of the load.
Different errors and successful completion of the load send
specific and distinct code for the fastest recovery of the process
possible. There is a e-mail and
beep sent to DBA’s and other responsible parties.
- Batch load processes are scheduled with
notification in e-mail to DBA’s the morning of the loads so that DBA’s can
ensure that necessary resources will be available and nothing will compete
or affect the successful completion of that job.
- DBA’s work closely with batch load
developers to assist in tuning the batch load processes to minimize the
batch load window so that routine administration window is available and
to ensure the best use of the shared resources. Specifically there are a number of things the DBA’s can do
to optimize the loads which can only be done at the system or database
level and cannot be done through coding.
- All database creations, creation of objects
within a database and installations involving database components are done
by DOH DBA staff and not by vendors of a specifc product. This ensures that new systems from
various vendors fit within the existing infrastructure and framework for
these environments.
- Consider using external tables as part
of a loading process instead of temporary segments – 9i. See www.ioug.com
- Consider using merge – www.ioug.com
- See DSS database design:
ON-GOING MONITORING:
·
tablespace fragmentation
Can completely avoid if initial=next
and if all objects in a tablespace has uniformly sized extents – best if only
one extent size per tbspace. Also,
place database objects in profiles of small, medium large and perhaps
huge. See Craig Shallahamer’s “Avoiding
a Database Reorganization” and Cary Millsap’s “Oracle 7 Server Space
Management” and “How to Stop Defragmenting and Start Living: The Definitive Word on Fragmentation” by
Bhaskar Himatsingka and Juan Loaiza.
REFERENCES:
Top Partitioning
Table Issues