Technology -> Applications
RSS Feed:
|
By: Philip Howard, Research Director - Data Management, Bloor Research Published: 2nd December 2008 Copyright Bloor Research © 2008 |
So far in this series of articles I have discussed the failures of traditional data quality tools when it comes to matching in general and product and complex data matching in particular. However, these aren't the only areas they fall down in: they are not very good at dealing with names either (which makes one wonder what they are good at?).
Suppose you are Chinese and you go to live in America. Do you keep your Chinese name? Do you anglicise it? If so, how? Do you reverse your names so that your forename goes first? Now consider a data quality solution trying to match in these circumstances. Or think about criminals with 30 different aliases: how do you match these names?
Fortunately, the data quality fraternity (or some of them) has owned up to this omission in its capabilities. Thus IBM bought LAS (now Global Name Recognition) and Informatica more recently acquired Identity Systems, though the other vendors in the market remain in the cold in this regard.
However, if you have read the previous articles in this series you will know that lack of ability when it comes to names is the least of my concerns when it comes to data quality and that my real worry is that all the leading products have been built using out-of-date technology that has now been superseded.
In the first article I highlighted Netrics, which uses mathematical modelling as an alternative to the conventional pattern-matching used by the traditional vendors. And in the second article I mentioned Silver Creek, which uses a semantic approach. In particular, both of these products feature self-learning capabilities (as does Zoomix, recently acquired by Microsoft) that improve the efficiency of the match process over time while reducing the amount of human involvement that is required.
It is not that these products are new—Silver Creek has been around for a number of years, Netrics has 150 odd customers—but I have now got to the point where I think we need the existing market to be radically disrupted. Current products are being incrementally improved but incremental improvements are not enough: we need dramatic improvements. Otherwise, most companies will continue to (ineffectively) use manual efforts for data cleansing because they can't see the cost benefits (and I am not sure I can blame them) of moving to inadequate pattern-based matching products.
If data quality is the huge issue we all say it is, and it is, then we owe it to users to actually provide them with technologies that help them to resolve those problems rather than just a sop, which is what they are, in most cases, getting. Leading vendors need to recognise that the likes of Netrics and Silver Creek offer way better technology than they do and they need to buy or build comparable capability as soon as possible if they are not to continue to disappoint the market generally.
Sorry, we are no longer accepting comments on this item. We suggest trying to contact the author directly.
4th December 2008: 'Bob Barker' said:
Great discussion, Philip, but is the data quality world really this homogeneous? While mathematical modeling and semantic analysis are extremely useful in some solutions, it doesn’t follow that they can solve every problem in every domain equally well. For example, a solution great at matching product data may fail miserably when applied to a Wall Street insider trading problem. Sometimes combining different analytics is more effective, depending on the problem domain. I posted a longer discussion of this point yesterday on www.identityresolution.com for anyone who’s interested.
4th December 2008: 'Philip Howard' said:
Good point. It would suggest that the ultimate data quality product would support multiple matching engines that can be deployed as appropriate, depending on the class of problem, in a way analagous to the use of different algorithms as provided by data mining tools.
10th December 2008: 'Bob Barker' said:
Oops. Last week I referred to an expanded version of my response and made a brilliant typo. The blog address is www.identityresolutiondaily.com if you're interested.
The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.
Published by: IT Analysis Communications Ltd.
T: +44 (0)1908 880760 | F: +44 (0)1908 880761