• Skip Navigation |
  • Accessibility 
IT-Director.com Logo
  • Singularity go SaaS with LiveAgility
  • User Experience Monitoring as Governance?
  • Running IT as a business: don't be daft
 

Main navigation - go to a section of this website:

  • ARCHIVE
  • PAPERS
  • EVENTS
  • NEWSWIRE
  • BLOGS

  

Member Login | Become a Member

 
DOMAINS
  • Enterprise
  • SME
  • Business Issues
  • Technology
    • Data Management
    • Applications
    • Infrastructure
    • Systems Mgmt
    • Security
    • Mobile
    • Storage
    • Personal Productivity
  • Services
  • Channels
FEATURED EVENTS
  • Legal IT Show 2010
    10th February - 11th February
    London, United Kingdom
  • Data Modelling Fundamentals
    15th February - 16th February
    London, United Kingdom
POPULAR PAPERS
  • The IBM Workload Optimized Approach by Sageza Group, Inc.
  • Integrated Systems Management by Sageza Group, Inc.
TRANSLATE PAGE



USEFUL LINKS
  • Last 7 Days
  • Archives
  • Market Place
  • Top Articles
INTERACT
  • Advertising
  • Site Feedback
  • Newsletters
  • Contact Us
  • Registration
CONTENT FEED

Technology -> Data Management
RSS Feed:

RSS Icon

What is RSS?

RANDOM QUOTE
Raw Wit - "I live so far out of town the mailman mails me my letters." - Henny Youngman

ADVERTISEMENT
Analysis

Turbo charging data quality

Andy Hayler By: Andy Hayler, CEO, The Information Difference
Published: 14th April 2009
Copyright The Information Difference © 2009
Logo for The Information Difference
Page Tools

Request Reprints
Tell A Friend
Contact Author

More from author
  • February 2010
    Appliances Are Getting Cheaper
  • January 2010
    Keeping An Open Mind
  • January 2010
    Oracle sees a silver lining in product data
  • December 2009
    Bolt from the blue
  • October 2009
    Data governance for the masses
  • October 2009
    Spend, Spend Spend
  • September 2009
    Opening Up MDM
Syndication
  • Delicious Icon Delicious
  • Digg Icon Digg
  • reddit Icon reddit
  • Facebook Icon Facebook
  • StumbleUpon Icon StumbleUpon

Master data management initiatives are now being deployed with sizeable data volumes. A few years ago 10 million master data records was quite chunky, but we now see some examples of 100 million master data record applications. Simply processing such volumes has issues, and then you have to consider how you are going to keep your new shiny data in mint condition. You can put in a data quality "firewall" which, for example, will check for potential duplicate records about to be entered in, say, an order processing system. However applying clever matching algorithms to large volumes of data and still expecting a sensible response time is problematic.

This background makes the general availability of the DataRush engine by Pervasive Software potentially interesting, a company long established in the field of embeddable databases and data integration. The DataRush technology uses highly parallel techniques to enable processing of large amounts of data very quickly. For example common data quality algorithms such as "edit distance" are delivered with the engine, meaning that such common tasks as name and address checking can be done quickly. Beta applications at companies such as TC3, who process large volumes of health care claims, have seen some dramatic performance improvements over previous approaches. Another documented example is at PIERS, a company that collect bills of loading in the shipping world and analyses these to help companies understand trends in international trade. Extensive processing is needed to eliminate duplicate data before the data can be turned into meaningful information.

There is no shortage of business use cases in the world where data quality processing has to be applied to large volumes of data (another example is mortgage claims), so there should be a substantial market for something that can make this go much faster. This technology has the potential to be picked up by data quality software providers, and perhaps MDM vendors (MDM applications have a significant data quality component) to turbo-charge their own products. Given Pervasive's track record of producing reliable embeddable software, they will be taken seriously. The engine could in principle be used in other use cases, such as analytics, but data quality is the obvious focus at present. In addition to software vendors, there are plenty of systems integrators that custom-build applications in specialist areas with data quality elements, and in some of these cases volume and processing time will be a major issue.

It is early days, but with growing interest in data quality (a market that grew 17% in 2008 according to our latest research) and increasing need to deal with high data volumes, DataRush could be in the right place at the right time.

Reader Comments

Sorry, we are no longer accepting comments on this item. We suggest trying to contact the author directly.

  • Site Map
  • | Terms of Use
  • | Privacy

Published by: IT Analysis Communications Ltd.
T: +44 (0)1908 880760 | F: +44 (0)1908 880761