Oracle DBA – A lifelong learning experience

How many mistakes can you make when diagnosing a problem

Posted by John Hallas on December 6, 2010

Last week I attended a talk by Daniel Fink at UKOUG around having a plan to sort out problems and perhaps the key point was –‘listen to the problem before setting off down the investigatory path’. I blogged at the time  that I thought it was an interesting  talk and I wanted to try and run through something similar at work.

This blog entry is perhaps a good example of how not to approach a problem but equally holds some technical information that may be of use to others in a similar situation.

I was approached by a developer who asked me about charactersets as he was running a query on two databases both of which had an external table  which came from the same source and yet he was only returning 3 rows out on one database and 4 rows on the other. The column he had narrowed it down to contained a £ sign and so he had assumed it was a characterset issue. In fact his initial mail  stated

Problem field is C21_DPST_RFRNC2….
As a workaround this problem we have used following regular expression when creating a table at source so that it filters out the special characters.
regexp_replace(Regexp_replace(trim(lower(C21_DPST_RFRNC2)),'[^a-z,_, ,A-Z,0-9,@,&,$,£,%,'']',' '),'[[:space:]]{1,}',' ')

 Mistake one – blind acceptance by DBA of what developer had said.

My immediate thought was ‘oh sh……’ we have made a mess of the charactersets at some time and that is going to be a big issue. Given that I have just presented about standards and charactersets was one of the subjects that I touched upon that was not a good place to be.

Analysis of v$nls_parameters showed that all environments of both databases had the same characterset, which was a bit of a relief to be honest.


<br />
select parameter, value from v$nls_parameters where parameter in ('NLS_LANGUAGE','NLS_CURRENCY','NLS_CHARACTERSET','NLS_NCHAR_CHARACTERSET') ;</p>
<p>NLS_LANGUAGE              ENGLISH</p>
<p>NLS_CURRENCY                #</p>


Note that I have missed one parameter out of that query which I will come back to later.

Nothing obvious there so I turned to my new best friend Google as it is very rare nowadays that I go to MoS for anything first, only when other avenues have been exhausted. Google pointed me at issues with parallelism so I spent a little bit of time with alter session parallel and no parallel with no differences at all.

At this point I realised that I did not really understand the problem at all and the sample code I had seen was just plain confusing with two different tables names (both very long) under different schemas on different databases. I asked for the code to be sent to me and then I tried to recreate the  external tables. Firstly the file that was being used was the same size on both servers and I decided to use the DATA_PUMP_DIR directory for my test to keep any issues with permissions out of the equation.

I created both tables and true enough the query only showed 3 rows on one system and 4 on another. I then realised that charactersets were likely not to be the main problem but had no idea of what the solution was.

I recalled an issue I had worked on elsewhere with defining varchars as either byte or char but V$nls_parameters indicated that NLS_SEMANTICS was BYTE on both databases but I knew the problem was something to do with a £ sign or perhaps other special character.

About time to see the code

Code that works

create table db1_export_bkp (</p>
<p>     C1_CO_DAY_KEY,</p>
<p>    C2_BSNS_UNIT_KEY,</p>
<p>    C3_TCHPNT_KEY,</p>
<p>    C4_DAY_KEY,</p>
<p>    C5_TNDR_TYP_CD,</p>
<p>    C6_TNDR_CLASS_CD,</p>
<p>    C7_MLTPL_TNDR_CLASS_CD,</p>
<p>    C8_OVR_SHT_CD,</p>
<p>    C9_BUSI_DAY_KEY,</p>
<p>    C10_NODE_ID,</p>
<p>    C11_EMP_KEY,</p>
<p>    C12_EXP_RFRNC_CD,</p>
<p>    C13_TNDR_IN_AMT,</p>
<p>    C14_TNDR_IN_AMT_LCL,</p>
<p>    C15_TNDR_IN_AMT_RPT,</p>
<p>    C16_TNDR_OUT_AMT,</p>
<p>    C17_TNDR_OUT_AMT_LCL,</p>
<p>    C18_TNDR_OUT_AMT_RPT,</p>
<p>    C19_TRX_CNT,</p>
<p>    C20_DPST_RFRNC1,</p>
<p>    C21_DPST_RFRNC2,</p>
<p>    C22_DPST_RFRNC3,</p>
<p>    C23_LOAD_DT,</p>
<p>    C24_LAST_UPDT_DT,</p>
<p>    C25_LAST_UPDT_BY,</p>
<p>    C26_CURR_IND</p>
<p>    TYPE oracle_datapump</p>
<p>    DEFAULT DIRECTORY dat_dir</p>
<p>    LOCATION ( 'X_db1_to_db2.exp' ))</p>
<p>as SELECT   </p>
<p>where    (1=1)</p>
<p> And (RET_CO_RECON_STR_DTLS.RECON_DATE &gt;= TO_DATE('02-12-2010 03:19:15','DD-MM-YYYY HH24:MI:SS'))</p>

and the code that fails

create TABLE db2_export_bkp (</p>
<p>     C1_CO_DAY_KEY    NUMBER(30),</p>
<p>    C2_BSNS_UNIT_KEY    NUMBER(30),</p>
<p>    C3_TCHPNT_KEY    NUMBER(30),</p>
<p>    C4_DAY_KEY    NUMBER(30),</p>
<p>    C5_TNDR_TYP_CD    VARCHAR2(30),</p>
<p>    C6_TNDR_CLASS_CD    VARCHAR2(30),</p>
<p>    C7_MLTPL_TNDR_CLASS_CD    VARCHAR2(30),</p>
<p>    C8_OVR_SHT_CD    VARCHAR2(1),</p>
<p>    C9_BUSI_DAY_KEY    NUMBER(30),</p>
<p>    C10_NODE_ID    NUMBER(2),</p>
<p>    C11_EMP_KEY    NUMBER(30),</p>
<p>    C12_EXP_RFRNC_CD    VARCHAR2(50),</p>
<p>    C13_TNDR_IN_AMT    NUMBER(18,4),</p>
<p>    C14_TNDR_IN_AMT_LCL    NUMBER(18,4),</p>
<p>    C15_TNDR_IN_AMT_RPT    NUMBER(18,4),</p>
<p>    C16_TNDR_OUT_AMT    NUMBER(18,4),</p>
<p>    C17_TNDR_OUT_AMT_LCL    NUMBER(18,4),</p>
<p>    C18_TNDR_OUT_AMT_RPT    NUMBER(18,4),</p>
<p>    C19_TRX_CNT    NUMBER(9),</p>
<p>    C20_DPST_RFRNC1    NUMBER(8),</p>
<p>    C21_DPST_RFRNC2    VARCHAR2(14),</p>
<p>    C22_DPST_RFRNC3    NUMBER(8),</p>
<p>    C23_LOAD_DT    DATE,</p>
<p>    C24_LAST_UPDT_DT    DATE,</p>
<p>    C25_LAST_UPDT_BY    VARCHAR2(30),</p>
<p>    C26_CURR_IND    CHAR(1)  </p>
<p>    TYPE oracle_datapump</p>
<p>    DEFAULT DIRECTORY dat_dir</p>
<p>    LOCATION ('X_db1_to_db2.exp'))

Looking in the dat_dir I knew both files were the same size but then I noticed that a log file had been created each time I had created the external tables. Looking at the contents I saw rather a giveaway line

KUP-11003: bad data encountered while reading from file X_db1_to_db2.exp</p>
<p>ORA-12899: value too large for column C21_DPST_RFRNC2 (actual: 16, maximum: 14)</p>
<p>C21_DPST_RFRNC2 : 0X'666F7267656420EFBFBD35206E6F7465'

Once I had seen the error message and known it was a £ sign that was causing the issue I realised that some characters were multibytye and most were not. As this was a free format field someone had entered text which had caused a problem. The table definition in db2 was varchar2(14). Changing that to varchar2(16) quickly resolved the problem.

Mistake 2. A bit of a giveaway really so why had I not thought about checking the number of rows in both tables first rather than just assuming that we were not selecting the records for some reason. Bad diagnosis and going off down a blind avenue wastes time and effort.

So why did db1 work OK with the same data. Looking at the load in db1 it was creating the table as a select from existing data and was using the column size of that data source, which was varchar2(20) and so no problems were encountered.

So what have I learned from all this.

  • Listen to the problem before making an assumption.
  • Ensure that you understand the problem before doing anything.
  • Make a note of what you have done as explaining later (or writing it up in a blog) is much harder if you have not got any notes to work from.

As a side-note how many times have I attended wash-up meetings for majorincidents (anything that impacts a production service) an we list what fixes to put in place but we rarely if ever actually delve into how we approached the diagnosis , what thought processes went into it and how well we documented each decision path.


2 Responses to “How many mistakes can you make when diagnosing a problem”

  1. coskan said

    Nice one John,

    As a DBA who is mostly working as performance and troubleshooting engineer; most of the time I try to keep the information level minimum from the source before I start my checks because it is very easy to waste your time on other people prejudgements and opinions.

    I know it is good to have as much as information upfront but I leave that till I finish my initial lookup and I start to ask questions, otherwise I always find myself looking at the wrong direction.

    One thing I learned in 10 years of career is that everybody has got their own version of truth so better not to assume anything is absolute true:)

    • John Hallas said

      Thanks for the comments Coskan. That seems to be the opposite approach to which I am suggesting. In a way though they are both the same in that they both follow a consistent path which is the best way of tackling problems. I must admit I normally go in like a bull at a gate but this is the new me – cool, calm, collected.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: