Wednesday, August 14, 2013

[DWH] Error Handling Made Easy Using Row Error Logging

Error handling made easy using Row Error Logging




Every time we start with a Data Warehouse or Data Integration project we spent lot of time defining our error handling approach. Still there are good chances that we might miss some scenarios because of unexpected data issues. Here in this article, lets us discuss an approach to handle the unexpected error.


Error handling Configuration.


We do not have any error configuration required during the mapping development. So after the mapping is created, during the session configuration set the session properties as shown in below image.










Here is the details on what each property means.


  • Error Log Type :-  Specifies the type of error log to create. It can be relational database or flat file.
  • Error Log DB Connection :- Database connection for a relational log.
  • Error Log Table Name Prefix :- Specifies the table name prefix for relational logs.
  • Log Row Data :- Specifies whether or not to log transformation row data.
  • Log Source Row Data :- Specifies whether or not to log source row data.
  • Data Column Delimiter :- Data will be delimited by the specified character in DB column.


With this configuration we specified, Informatica PowerCenter will create four different tables for error logging and the table details as below.

  • ETL_PMERR_DATA :- Stores data about a transformation row error and its corresponding source row.
  • ETL_PMERR_MSG :- Stores metadata about an error and the error message.
  • ETL_PMERR_SESS :- Stores metadata about the session.
  • ETL_PMERR_TRANS :- Stores metadata about the source and transformation ports, when error occurs.


With this configuration, we are done with the setting required to capture any error during the session execution. Now lets see how do we retrieve the data from the error log tables and report it to Business Users or IT Department.

Report the Error Data.


Now we have the error data stored in the error table, it is important to share the error data to the Business Users or to IT Department. Lets see how we can pull data from these tables.

 
We can pull the basic error report using the SQL. We can be more fancy with the SQL and get more information from the error tables.



select 

       sess.FOLDER_NAME as 'Folder Name',

       sess.WORKFLOW_NAME as 'WorkFlow Name',

       sess.TASK_INST_PATH as 'Session Name',

       data.SOURCE_ROW_DATA as 'Source Data',

       msg.ERROR_MSG as 'Error MSG'
from 

       ETL_PMERR_SESS sess
left outer join ETL_PMERR_DATA data
      on data.WORKFLOW_RUN_ID = sess.WORKFLOW_RUN_ID and 

      data.SESS_INST_ID = sess.SESS_INST_ID
left outer join ETL_PMERR_MSG msg
      on msg.WORKFLOW_RUN_ID = sess.WORKFLOW_RUN_ID and 

      msg.SESS_INST_ID = sess.SESS_INST_ID
where 

      sess.FOLDER_NAME = <Project Folder Name> and
      sess.WORKFLOW_NAME = <Workflow Name> and
      sess.TASK_INST_PATH = <Session Name> and

      sess.SESS_START_TIME = <Session Run Time>


 

The above SQL can be converted to an Informatica mapping and can be scheduled to run after all your ETL jobs are completed to get a daily error report and the same can be scheduled for an email delivery to the data owners to take relevant action on the errors.


Pros and Cons of this Approach.


We should know the Pros and Cons of this approach  before applying this to your project.



Pros.

  1. Out of the box Solution Provided by Informatica.
  2. Captures all the Error, Including the unthought error.
  3. Less Coding and Testing efforts required by the development team.


Cons.

  1. If an error occurs after an active transformation, Integration service will not be able to capture source data into the erro table.
  2. In sessions with multiple non-pass through partitions, Integration service can not capture source data into error table.
  3. Added overhead to the Session performance (Which is expected and acceptable).






No comments:

Post a Comment