Oracle DBA – A lifelong learning experience

DBMS_DATAPUMP using the API

Posted by John Hallas on May 6, 2009

My challenge was to move a schema from one database to another, simple enough one would assume however I had several limitations. I could not use transportable tables because the charactersets
were different and I did not want to go through a conversion routine but the main limitation was the size (2Tb) and the fact that there were multiple partitions complete with four subpartitions per partition.
I decided to use export and a single file export took 15 hours and created a 700Mb datafile. The import was less than 50% through after 24 hours so I knew I needed a more efficient method.I had already arranged for a large filesystem to be mounted on the
source database and the idea was to export to that area and then get it remounted on the target server. This was to be repeated against the production database at a later date so it was worth getting everything scripted.

I generated 4 export parameter files to take out 3 big tables and ‘all the rest’

userid=xxxx
directory=movetbs
dumpfile=odsprod_odsalltherest.dmp
schemas=odsprod
exclude=TABLE:”IN (‘ODS_SLS_TRAN_HEADER’,’ODS_SLS_PAYMENT_DETAIL’,’ODS_SLS_ITEM_DETAIL’)”
#exclude=TABLE:”LIKE ‘ODS_TEMP_STAGE%'”
logfile=ALLTHEREST.log
content=METADATA_ONLY

userid=xxxx
directory=movetbs
dumpfile=odsprod_odsslscheck.dmp
schemas=odsprod
include=TABLE:”IN (‘ODS_SLS_CHECK_ERRORS’)”
logfile=ODS_SLS_CHECK_ERRORS.log
content=METADATA_ONLY

userid=xxxx
directory=movetbs
dumpfile=odsprod_odstran.dmp
schemas=odsprod
include=TABLE:”IN (‘ODS_SLS_TRAN_HEADER’)”
logfile=ODS_SLS_TRAN_HEADER.log

content=METADATA_ONLY

Note that the above examples have the content=metatdata_only parameter as I wanted to get everything working before I started the exports. I also used the I had already experimented with a command line syntax but discounted it because the line got too long and rather hard to manage. An example is here though

nohup expdp xxxx directory=movetbs dumpfile=odsprod_odstempstage.dmp schemas=odsprod include=TABLE:\”LIKE \(\’ODS_TEMP_STAGE%\’\)\” logfile=TEMPSTAGE.log &

 

So far so good but I then realised that the large table (ODS_SLS_ITEM_DETAIL) was over 4 billion rows in size and created a 600Gbfile by itself and was going to be the biggest bottleneck.I did some experimentation with a query parameter to the export but it took too long to scan the data and was not very efficient. Example provided

userid=xxxx
directory=movetbs
dumpfile=odsprod_odsitemdetquery.dmp
schemas=odsprod
include=TABLE:”IN (‘ODS_SLS_ITEM_DETAIL’)”
query=ODSPROD.ODS_SLS_ITEM_DETAIL:” where id_pk <= 5000000″
logfile=ODSITEMDETQUERY.log

I then decided to export by partition but could not find a way of specifying the partition list using expdp. That is probably because such an option does not exist even in 11.1.0.7 and I needed to use the dbms_datapump API. This was documented in Metalink note 470027.1. To cut to the chase I ended up with 4 export sql files which split up the partitions based on dates, 2008a, 2008b, 2009a,2009b

cat exp2009a.sql

declare
h1 number;

begin
h1 := dbms_datapump.open (operation => ‘EXPORT’,job_mode => ‘TABLE’);

dbms_datapump.add_file (handle => h1,
filename => ‘exp2009a%U.DMP’,
directory => ‘MOVETBS’,
filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_DUMP_FILE);

dbms_datapump.add_file (handle => h1,
directory => ‘MOVETBS’,
filename => ‘EXP2009a.log’,
filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_LOG_FILE);

dbms_datapump.metadata_filter (handle => h1, name => ‘SCHEMA_EXPR’,
value => ‘IN (”ODSPROD”)’);

dbms_datapump.metadata_filter (handle => h1,
name => ‘NAME_EXPR’,
value => ‘IN (”ODS_SLS_ITEM_DETAIL”)’);

dbms_datapump.set_parallel( handle => h1, degree => 6);

dbms_datapump.data_filter (handle => h1,
name => ‘PARTITION_LIST’,
value => ”’ODS_SLS_ITEM_DETAIL_20090101”, ”ODS_SLS_ITEM_DETAIL_20090102”, more here ‘’ODS_SLS_ITEM_DETAIL_20090227”, ”ODS_SLS_ITEM_DETAIL_20090228”’,
table_name => ‘ODS_SLS_ITEM_DETAIL’,
schema_name => ‘ODSPROD’);

dbms_datapump.start_job (handle => h1);
dbms_datapump.detach (handle => h1);
end;
/

Running these 4 jobs in parallel as well as the 3 exports listed above I managed to get the data exported in ~2.5 hours which was a considerable improvement on the original 15 hours. I then moved the mountpoint onto the target server and ran impdb jobs as well as sql scripts to load the data.
Examples below
cat ALLTHEREST.imp

userid=xxxx
directory=movetbs
dumpfile=odsprod_odsalltherest.dmp
full=y
logfile=ALLTHERESTimp.log

cat imp2009a.imp

declare
k1 number;

begin
k1 := dbms_datapump.open (operation => ‘IMPORT’,job_mode => ‘TABLE’);

dbms_datapump.add_file (handle => k1,
filename => ‘exp2009a%U.DMP’,
directory => ‘MOVETBS’,
filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_DUMP_FILE);

dbms_datapump.add_file (handle => k1,
directory => ‘MOVETBS’,
filename => ‘IMP2009a.log’,
filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_LOG_FILE);

dbms_datapump.metadata_filter (handle => k1, name => ‘SCHEMA_EXPR’,
value => ‘IN (”ODSPROD”)’);

dbms_datapump.metadata_filter (handle => k1,
name => ‘NAME_EXPR’,
value => ‘IN (”ODS_SLS_ITEM_DETAIL”)’);
dbms_datapump.set_parallel( handle => k1, degree => 6);
dbms_datapump.set_parameter (k1, ‘TABLE_EXISTS_ACTION’, ‘APPEND’);

dbms_datapump.data_filter (handle => k1,
name => ‘PARTITION_LIST’,
value => ”’ODS_SLS_ITEM_DETAIL_20090101”, ”ODS_SLS_ITEM_DETAIL_20090102”, MORE HERE ”ODS_SLS_ITEM_DETAIL_20090227”, ”ODS_SLS_ITEM_DETAIL_20090228”’,
table_name => ‘ODS_SLS_ITEM_DETAIL’,
schema_name => ‘ODSPROD’);

dbms_datapump.start_job (handle => k1);
dbms_datapump.detach (handle => k1);
end;
/

Note the additional line to ensure that the load is not skipped as the table already exists

dbms_datapump.set_parameter (k1, ‘TABLE_EXISTS_ACTION’, ‘APPEND’);

The total load seem to take around 6 hours now which has reduced from the estimated 48-60 hours that was seen initially. However I am seeing a lot of enq: TM contention as I have 6 worker processes for 4 imports, all running against the ODS_SLS_ITEM_DETAIL table. Still trying to work out how to address that and possible optimise the number of worker processes by changing the PARALLEL parameter. We have 16 cpus but I am more tempted to reduce rather than increase the workers to give less contention. However this load will only happen twice so perhaps I do not need to spend too much time fine-tuning.
I hope that this entry has been useful. I had never used the datapump API before so it was a learning curve for me.

PS This is useful script to show progress on the expdp/impdp jobs

set lines 120
col opname for a25 trunc
col username for a15 trunc
col target for a20
col sid for 999999
col serial# for 999999
col %DONE for a8
select b.username,a.sid,b.opname,b.target,round(b.SOFAR*100 / b.TOTALWORK,0) || ‘%’ as “%DONE”,
b.TIME_REMAINING,to_char(b.start_time,’YYYY/MM/DD HH24:MI:SS’) START_TIME
from V$SESSION_LONGOPS b,V$SESSION a where a.sid=b.sid and TIME_REMAINING 0 order by b.SOFAR/b.TOTALWORK;

About these ads

2 Responses to “DBMS_DATAPUMP using the API”

  1. [...] John Hallas offers an account of his experiences with DBMS_DATAPUMP using the API [...]

  2. Raj said

    I need to migrate 10TB using expdp and then import using impdp. Need valuable tips

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 201 other followers

%d bloggers like this: