• Skip Navigation |
  • Accessibility 
IT-Director.com Logo
  • Metastorm leverages Azure to leap into Cloud-based collaborative modelling
  • Uwhat?
  • A Clear Message for Vendors In the SMB Technology Market
 

Main navigation - go to a section of this website:

  • ARCHIVE
  • PAPERS
  • EVENTS
  • NEWSWIRE
  • BLOGS

  

Member Login | Become a Member

 
 
DOMAINS
  • Enterprise
  • SME
  • Business Issues
  • Technology
  • Services
  • Channels
FEATURED EVENTS
  • Smart Grids Summit 2010
    13th September
    Málaga, Spain
  • Mastering the Requirements Process
    13th September - 15th September
    London, United Kingdom
POPULAR PAPERS
  • Cloud Computing - taking IT to task by Quocirca
  • A gift from IT to the business by Quocirca
  • Voice Data Security by Bloor Research
TRANSLATE PAGE



USEFUL LINKS
  • Last 7 Days
  • Archives
  • Market Place
  • Top Articles
INTERACT
  • Advertising
  • Site Feedback
  • Newsletters
  • Contact Us
  • Registration
CONTENT FEED

Sitewide
RSS Feed:

RSS Icon

What is RSS?

RANDOM QUOTE
Say Again? - "Flying saucers are just an optical conclusion." - Anonymous

ADVERTISEMENT
Analysis

Open Source Data Quality

Philip Howard By: Philip Howard, Research Director - Data Management, Bloor Research
Published: 5th September 2007
Copyright Bloor Research © 2007
Logo for Bloor Research
Page Tools

Request Reprints
Tell A Friend
Contact Author

More from author
  • September 2010
    Dual Loading for Teradata
  • August 2010
    EntropySoft introduces appliance
  • August 2010
    What’s happening with event processing?
  • August 2010
    The impact of Mark Hurd
  • July 2010
    Uwhat?
  • July 2010
    And so, it begins ...
  • July 2010
    Whither analytics?

While there are a number of open source ETL (extract, transform and load) vendors I had not previously encountered an open source data quality solution until I recently spoke with Infosolve Technologies. However, Infosolve is not your typical open source vendor.

Infosolve in fact has two products: OpenDQ and OpenCDI (data quality and customer data integration respectively), where the latter leverages the former. So, how does Infosolve differ from other open source vendors?

The biggest difference between Infosolve and the remainder of the open source community is that Infosolve does not believe that you can make any money by simply having a download site and then trying to sell support or services on the back of that download. No, Infosolve believes that you need to do the complete reverse of this: go out and sell your professional services, in this case for data quality, through a direct sales force. Then you implement your solution for the customer on a “free” open source platform. In other words, as I have remarked before, Infosolve is using open source as simply a different licensing model. Typical service engagements range between three weeks and nine months, though the company informs me that it is shortly hoping to sign a two year engagement.

In addition to its own direct sales force, Infosolve is also exploiting the channel: partnering with systems integrators and sub-licensing OpenDQ to other open source (and, for that matter, non-open source) vendors and ISVs.

Remaining on the open source discussion, Infosolve is a partner of Sun's and runs on Sun grid technology and, in particular, is available via Sun's utility computing offering, meaning that you can have OpenDQ hosted for you using a utility-based approach that can cost as little as an hour. Infosolve refers to this open source, utility-based model as a "zero-based data solution".

This means that, apart from the initial professional service engagement (to determine and set up appropriate data quality business rules, for example) and any on-going service fees, you will have more or less zero costs for the whole project—actually more but at an hour not much more. You can of course run the OpenDQ software on your own hardware should you prefer to do that.

On the technical side, OpenDQ is tightly integrated with Pentaho's data integration (formerly KETTLE) product but perhaps most interesting is the fact that the company will shortly be introducing support for unstructured data. This is important when it comes to non-name and address data such as product data, where information about products often comes into the organisation in unstructured format. The company will be using natural language processing to support unstructured data, which is probably the best approach to take.

The introduction of unstructured support is interesting, not just because it is clear that product data quality is becoming more of an issue but that it suggests (and I want to make it clear here that this is my own inference) that Infosolve may introduce an OpenPIM (product information management) product to go alongside its OpenCDI offering. Which, of course, raises the whole question of open source MDM (master data management): while that is a discussion for another day we see no reason why Infosolve shouldn't be as successful with MDM as it is with data quality.

Reader Comments

Sorry, we are no longer accepting comments on this item. We suggest trying to contact the author directly.

3rd February 2008: 'Kasper Sørensen' said:

Why one would call OpenDQ "Open Source" is a mystery to me. Where excactly IS the source available? As far as I can see the source code is only "open" to paying customers, making it quite closed source in my eyes and not different from proprietary solutions sold by almost everybody else.

It seems everybody wants to call them selves open source these days, without even finding out what Open Source is.

Oh yeah, and I read somewhere else (http://blogs.cnet.com/8301-13505_1-9802297-16.html) that the software is being accompagnied by a GPL license, but it is still quite inconsistent with the Open Source Definition (http://www.opensource.org/docs/osd), which the GPL is supposed to be compliant with:

"... The license shall not require a royalty or other fee for such sale."

"Where some form of a product is not distributed with source code, there must be a well-publicized means of obtaining the source code for no more than a reasonable reproduction cost preferably, downloading via the Internet without charge."

Reply to Kasper Sørensen?

12th February 2008: 'mark madsen' said:

I was contacted by them a few months ago too and opted not to write about them because they have the benefits of open source backwards. The value to a customer of open source where there is no community and there is no support and service (they offer none) is actually negative. Why would I as a customer hire someone to use and build custom data quality software that I then have to inherit and maintain? I'd be far better off buying off the shelf and implementing because of this.

I talked to a few people about the legality (GPL violation) of what they're doing and it's in a gray area. The legal opinion I trust most suggested that this is legal since the source is available to the people it is distributed to. One of the license authors stated it was gray and subject to interpretation of what is meant with the distribution requirement. Another one said "absolutely not, it should be reported".

Reply to mark madsen?

13th February 2008: 'Kasper Sørensen' said:

That's excactly my point. Actually I'm beginning a new Open Source project for data quality, which is still in the very early stages. But for people interested, have a look at my project, which is called DataCleaner: http://www.eobjects.dk/trac/wiki/DataCleaner

It's being released under the Apache Licence, Version 2.0, and I hope to have the first alpha ready by march 2008.

Reply to Kasper Sørensen?

The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.

  • Site Map
  • | Terms of Use
  • | Privacy

Published by: IT Analysis Communications Ltd.
T: +44 (0)1908 880760 | F: +44 (0)1908 880761