The Result Data Newsletter   
Volume 701 - January 2007   
© Copyright 2007 Result Data Consulting, Ltd.  614-505-0770  www.resultdata.com   

    Result Data Home Page  |  Newsletter Archive  |  Upcoming Events  |  Classes & Workshops  |  Request Information
 Upcoming Events:  Feb 8: MOBOUG User's Group,  March 7: Microsoft BI Seminar,  April 11: Business Objects Seminar

Return to Newsletter Contents...

 

Using the Data Profiler in Data Integrator

by: Tom Hinkle, Consultant, CRCP

Data Integrator includes a Data Profiler feature to help you understand data sources. Data profiling is important because it is a tool to better understand source systems. It gives visibility into the quality and content of various data sources.

Data anomalies can be uncovered by inspecting the true content, distribution, structure and relationship within enterprise data systems. Effective data profiling can help achieve a trusted data source. Data Profiler can verify that the metadata information provided is indeed valid and discover the data quality before starting the design of an ETL project. For example, you can look at the number of nulls and distinct values in a specified column.

The Data Profiler obtains information that can be used to determine:

  • Source data quality before extraction.  Data Profiler can identify anomalies in source data to better define corrective actions in the validation, data cleansing, or other transformations.
  • The distribution, relationship, and structure of source data to better design Data Integrator jobs and data flows, as well as the target data warehouse.
  • The content of source and target data to verify the data extraction job returns the expected results.

Use the Data Profiler to generate profiling tasks to collect information that multiple users can view, such as:

  • Column analysis.  This information includes minimum value, maximum value, average value, minimum string length, and maximum string length. Detailed column analysis can also be generated such as distinct count, distinct percent, median, median string length, pattern count and pattern percent.
  • Relationship analysis.  This information identifies data mismatches between any two columns for which a relationship is defined, including columns that have an existing primary key and foreign key relationship.

      The information generated from profiling tasks can assist in many different tasks, such as:

       

  • Obtaining information on basic statistics, frequencies, ranges and outliers
  • Identify variations of the same data content
  • Discover and validate data patterns and formats
  • Analyze a numeric range
  • Identify and validate redundant data and relationships across data sources
  • Identify duplicate name and address and non-name and address information
  • Identify missing data, nulls and blanks in the source system    

Setting up Data Profiler:  To enable profiling in Data Integrator, a Data Profiler repository must be created, associated to the same or different Job Server as your Designer and added to the Administrator. If different users intend to share the same profiling results then dedicate a Job Server for the Data Profiler repository. The Data Profiler repository is created in the Data Integrator Repository Manager. This repository can be created and connections set up during the Data Integrator installation process or later.

 

Go to Top  |  Return to Newsletter Contents

The Result Data Newsletter is published approximately once a month to share the latest information on business intelligence data management and CRM. There should be a link below to allow you to change or remove yourself from our list. We take your requests very seriously. If you have any difficulty please contact us at 614-505-0770 and we will make sure that your request is handled properly. This is not intended to be an unsolicited message and you can reach us in person if needed.

© Copyright 2007 Result Data Consulting, Ltd. - All Rights Reserved
All trademarks and copyrights are the property of their respective owners. This information is provided without warranty.
Announcements
New Classes at Result Data
With the new year comes new classes at our award winning training center. New classes for Q1 '07 include:

Xcelsius: Designing Interactive Presentations Certified Class

Microsoft SQL Server Integration Services Workshop

Microsoft SQL Server Analysis Services Workshop

Click on the class name to get information on dates or to register.
Microsoft Business Intelligence Seminar
Join us to learn about the new Business Intelligence technologies on the way from Microsoft. The seminar will be held on March 7th at the Microsoft Polaris office. Please check our web site for the agenda and to register. This free 1/2 day seminar will run from 8:30AM to noon. Click here to register.
Quality Control for Dummies
This book, part of the best-selling "Dummies" series from Wiley, has just been released. Co-authored by Michael Wallace, Principal Consultant at Result Data.
Private Training Classes & Mentoring
Ask us about our private classes and mentoring services and how they can help your team get up the curve fast. 
Next MOBOUG Meeting
The next Mid-Ohio Business Objects User Group (MOBOUG) meeting is February 8th, 2007.  Call 614-505-0770 or click here to reserve your seat
Looking for a Few Good Men and Women
Join our award winning team of Business Intelligence consultants and .Net Software developers.

Send your resume and salary requirements to: jobs@resultdata.com