• Skip Navigation |
  • Accessibility 
IT-Director.com Logo
  • Metastorm leverages Azure to leap into Cloud-based collaborative modelling
  • Uwhat?
  • A Clear Message for Vendors In the SMB Technology Market
 

Main navigation - go to a section of this website:

  • ARCHIVE
  • PAPERS
  • EVENTS
  • NEWSWIRE
  • BLOGS

  

Member Login | Become a Member

 
 
DOMAINS
  • Enterprise
  • SME
  • Business Issues
  • Technology
    • Data Management
    • Applications
    • Infrastructure
    • Systems Mgmt
    • Security
    • Mobile
    • Storage
    • Personal Productivity
  • Services
  • Channels
FEATURED EVENTS
  • Data Protection Essential Knowledge - Level 2
    5th August
    Edinburgh, United Kingdom
  • Enterprise Architects TOGAF™ v9 Level 1 & Level 2 Training course - Special UK price of £1599 plus 17.5% vat
    23rd August - 26th August
    London, United Kingdom
POPULAR PAPERS
  • Identity Management as a Service by Bloor Research
TRANSLATE PAGE



USEFUL LINKS
  • Last 7 Days
  • Archives
  • Market Place
  • Top Articles
INTERACT
  • Advertising
  • Site Feedback
  • Newsletters
  • Contact Us
  • Registration
CONTENT FEED

Technology -> Applications
RSS Feed:

RSS Icon

What is RSS?

RANDOM QUOTE
Observations - "Alcohol is the anaesthesia by which we endure the operation of life." - George Bernard Shaw

ADVERTISEMENT
Analysis

The problem with data quality solutions part 2

Philip Howard By: Philip Howard, Research Director - Data Management, Bloor Research
Published: 28th November 2008
Copyright Bloor Research © 2008
Logo for Bloor Research
Page Tools

Request Reprints
Tell A Friend
Contact Author

More from author
  • July 2010
    Uwhat?
  • July 2010
    And so, it begins ...
  • July 2010
    Whither analytics?
  • July 2010
    Approaching heterogeneous storage optimisation
  • June 2010
    Storage optimisation
  • June 2010
    RainStor 4
  • June 2010
    DataRush extends its boundaries

I recently wrote about the deficiencies of traditional data quality tools when it comes to data matching. How the conventional pattern-based approach with user defined rules for weights simply can't hack it. In this article I want to take that further and consider the difficulties that arise when you get beyond names and addresses and other relatively simple data and start to consider complex data such as products.

Here the problems are the same but magnified. Not only may you have strings of descriptive data but embedded within that may be technical terms (and symbols), weights and measures and abbreviations. Often, the same product code may be used for different parts in different countries. And then there are foreign languages to be considered. Not surprisingly, one company I recently spoke to was only getting a 35% match rate for its product data when using conventional approaches to data quality. In other words, almost two thirds of matches had to be identified manually.

With figures like that it's perhaps no surprise that most data quality vendors still argue that their biggest competitor is hand coding.

Anyway, that doesn't mean that there isn't a solution. That same company today is getting a match rate of better than 85% using Silver Creek's software.

As in my last article, Silver Creek doesn't use a pattern-based approach to matching but instead focuses on semantics: understanding the meaning of the data rather than its ability to fit a pattern. Doing this means that it is easy to see that "mtr" = "motor" = "moteur" and so on. What's more, it too is self-learning, so that you don't have to manually define and maintain rules for the weights that determine how probable a match is.

It isn't, of course, that the traditional vendors in this field are not doing something about product data. The typical approach is to adding parsing into their existing offerings so that they can support unstructured (and multi-lingual) text. What this effectively does is to use the parsing to recognise patterns within the text in a similar way to the sorts of products that extract "context" from documents. Certainly, as far as the data quality vendors are concerned, this has improved their capabilities and may even make them reasonably suitable for simple product environments. However, these solutions are still based on pattern recognition and will always require a lot of manual effort (and, therefore, expense). In my opinion, pattern recognition has limited utility when it comes to all but the very simplest matching. Moreover, traditional approaches do not incorporate self-learning.

The bottom line is that pattern recognition is not good enough and is insufficient. Moreover, the whole probabilistic, statistical, rules-based approach that typifies conventional data quality products is time consuming, expensive and inadequate. We need a new approach. One such is exemplified by Silver Creek's technology.

Reader Comments

Sorry, we are no longer accepting comments on this item. We suggest trying to contact the author directly.

30th November 2008: 'Steve Sarsfield' said:

I'm forced to remember the HAL2000 telling Dave that he has "made some very poor decision lately." Hard to imagine a successful solution could actually be self-learning. Real people are the most knowledgeable about corporate data - where it came from, what's broken, and what's most important to fix. People therefore must teach the solution how to solve the data anomalies somehow, whether it's by editing a text file, editing business rules or some special GUI.

Reply to Steve Sarsfield?

1st December 2008: 'Martin Boyd' said:

Steve – Your cynicism is understandable. Most people have not seen this type of capability, especially if they are used to traditional tools and pattern-based technology. However this technology is real and is being used in some of the largest companies in the world to solve problems they have been unable to solve with traditional DQ tools. I’d be happy to show you how AutoLearning works and what it can do. Contact me via our website – www.silvercreeksystems.com and we can schedule a demo. Martin Boyd, Silver Creek Systems.

Reply to Martin Boyd?

The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.

  • Site Map
  • | Terms of Use
  • | Privacy

Published by: IT Analysis Communications Ltd.
T: +44 (0)1908 880760 | F: +44 (0)1908 880761