Oracle Data Pump - Part I

James Koopmann's picture
articles: 

The last couple articles I have written focused on meta-data or DDL extraction for Oracle. The search for a part III to those articles lead me to Oracle's Data Pump utility. Not necessarily for the data movement piece but because it has an API for meta-data. Well even though I have been using 10g for quite some time, I have yet to use Data Pump. I thought this would be a great way to introduce myself, and possibly you the reader, to this new utility. This article will serve as a basic introduction to Data Pump and then in subsequent articles we will walk through the new command line options for Data Pump's export and import (expdp & impdp), and look at the PL/SQL packages DBMS_DATAPUMP and DBMS_METADATA.

What is Oracle's Data Pump?

Oracle simply states that Data Pump is a "very high-speed" mechanism for moving data and meta-data from one database to another. Is Data Pump a complete replacement for export (exp) and Import (imp)? Oracle clearly states that the only time you would/ should use the original (exp & imp) is when you need backward compatibility to an earlier version that does not have Data Pump export & import (expdp & impdp). Basically if you wanted to import a 10g export into a pre-10g database or import into a 10g database from a pre-10g database. As we go through this evaluation of Data Pump and its usage this will become more clear if there are any other situations. I venture to say there is.

Data Pump has three distinct components.

The uniqueness of Data Pump is that all processing is done through the database server and through DIRECTORY objects. To do this there are two internal packages to Oracle and an optional replacement for the original export and import command line utilities (exp & imp).

expdp & impdp

These to command line utilities are very close to the old standby export & import (exp & imp) utilities. They are not stand-alone utilities in the sense that they use the DBMS_DATAPUMP PL/SQL package to execute the export and import functions. They accept a variety of command line options that, like exp & imp, allow you to pick and choose the objects to be exported and imported.

DBMS_DATAPUMP

The Data Pump API and can be used independently of expdp & impdp. Is the package accessed to move data and / or metadata between databases.

DBMS_METADATA

The meta-data API in Oracle and can also be used independently of expdp & impdp. If you remember this is the package we were using in the last two articles for extracting meta-data. I am very interested in how it interfaces with Data Pump.

Some features I thought interesting

  • Data Pump export and import are not compatible with the old exp & imp. So if you need to import into a pre-10g database it is best to stick with the original export utility (exp).

  • There are new 10g features that are only supported (exported & imported) with the new Data Pump utility.

  • With Data Pump you can allocate parallel threads to fine-tune resource consumption during export and import. Also available is to have multiple client processes (expdp & impdp) to attach to a Data Pump job. Both of these options allow you to throw more resources to get the Data Pump job completed. Data Pump also works on a set of dump files rather than a single sequential dump file and you can add additional dump files to a running dump process. I am sure we will have to be concerned here with too many I/O operations to a single disk area.

  • Restart ability is supported for export jobs but also you can stop and restart the jobs as you see fit. So if you see a job is consuming too many resources or it is running too long and going to effect production, you can just stop the job and then restart at a latter time.

  • Because Data Pump works on the database server through packaged procedures and directory structures, you now have the ability to perform export and import operations over the network.

  • Various features that allow for the re-mapping, re-naming, including, or excluding of database objects and database structures. As we saw in the DBMS_METADATA package in the last two articles how to compare across schemas and translate one schema ownership to another, Data Pump also has these types of abilities for moving objects and data.

  • Versioning capabilities of database objects.

  • To use Data Pump you must have EXP_FULL_DATABASE or IMP_FULL_DATABASE depending if you will be performing export or import operations. These allow you to expdp & impdp across ownership for items such as grants, resource plans, schema definitions, and re-map, re-name, or re-distribute database objects or structures.

  • Access to data is through direct path and external tables. Both of which, under the covers, have the same data format so Oracle can switch between the two depending on a best method approach to move your data. Some of which is dependent upon the targeted structure as direct path can not be used for some object types and database structures.

  • Monitoring mechanisms have been put in place to see how the Data Pump job is performing. You can monitor through a log file created or one of three views that assist in the monitoring of Data Pump jobs.

    - DBA_DATAPUMP shows all the active Data Pump jobs and details on the state of the job.

    - DBA_DATAPUMP_SESSIONS shows the user sessions that are attached to a Data Pump job.

    - V$SESSION_LONGOPS shows the progress of a Data Pump job in regards to the amount of work it needs to do and how much it has done.

  • Through the use of internal tables, and what Oracle calls master & worker processes, there exists an intercommunication between the processes that are performing work on behalf of Data Pump and internal logging information that allows the user, and Oracle, to understand where in the process Data Pump is.

My security concerns

  • Having EXP_FULL_DATABASE & IMP_FULL_DATABASE privileges opens up users being able to see too much and affect too many objects across the database where you may want a higher granularity of security.

  • Operations across the network. This has implications that allow for unsecured network nodes to be infiltrated where your production servers could then be accessed from and then compromised.

  • By definition, Oracle gives permission to the objects in a DIRECTORY that a user would not normally have access to. Also be aware that there is a default server-based directory object, DATA_PUMP_DIR that Data Pump will look for if a DIRECTORY is not specified. While I have not seen this directory created by default in an Oracle installation and the manuals say it needs to be created by a DBA, it is still something to be on the look out for in subsequent releases as it could cause a security hole.

  • Couple the possibility of network holes with the ability to spawn multiple client (expdp & impdp) services, a server could easily be overcome with processes that would either become a bottleneck or bring your system to a halt.

  • It is highly recommended that new monitoring and auditing of these new server processes should be undertaken or at least validating that they are shut down or restricted to limited use.

Data Pump seems to have a lot of nice features to facilitate the movement of data and meta-data between databases. The monitoring and tuning of the Data Pump jobs seems to be the greatest benefit to moving to this new form of export and import. My only concern is the opening up of the database to see the O/S through database links and directory objects. Hopefully this article served as a nice introduction to Data Pump, I know I learned a lot just by writing this high level overview. Come back for subsequent parts to this introduction and we will explore how Data Pump really works and hopefully put to bed any security concerns of mine.

Comments

Just a question: Are you sure about the name of this view DBA_DATAPUMP? Shouldn't it be DBA_DATAPUMP_JOB?

Regards
Jaouad

Jaouad,

The view name is DBA_DATAPUMP_JOBS and it displays all Data Pump jobs in the database.

SN

James, since expdp is slower than exp with similar parameter, can you shed some light under what circumstances we can benefit from expdp aside from what you discussed in the articles.

thanks ted

Ted you can add "PARALLEL" parameter (Change the number of active workers for current job) to your parfile.