• Jump to Left Menu
  • Jump to Right Menu
  • Jump to Main Content
  • Jump to Footer
  • Accessibility Page
IT-Director.com Logo

 

Main navigation - go to a section of this website:

  • ARCHIVE
  • PAPERS
  • EVENTS
  • NEWSWIRE
  • BLOGS

  

Register For Membership | Member Login

 
 
DOMAINS
  • Business Issues
  • Channels
  • Enterprise
  • Services
  • SME
  • Technology
FEATURED EVENTS
  • Free Webinar - ISO 22301: The New Standard for Business Continuity Best Practice
    23rd May
    Webinar (online)
  • Telecoms Tech World
    4th June - 5th June
    London, United Kingdom
POPULAR PAPERS
  • FM, IT and Data Centres by Quocirca
  • The next frontier for managed print services by Quocirca
  • Beyond Big Data - The New Information Economy by Quocirca
USEFUL LINKS
  • Last 7 Days
  • Archives
  • Top Articles
SHARE THIS PAGE
  • Delicious Icon Delicious
  • Digg Icon Digg
  • reddit Icon reddit
  • Facebook Icon Facebook
  • StumbleUpon Icon StumbleUpon
CONTENT FEED

Sitewide
RSS Feed:

RSS Icon

What is RSS?

RANDOM QUOTE
Say Again? - "Greeks invented three kinds of columns: Corinthian, Doric and Ironic." - From Student Bloopers

PAGE TOOLS
RECENT POSTS
  • IBM JSON
  • IBM boo-boo on big data
  • If I haven't heard of it it's probably NoSQL!
  • IBM adds new Netezza model
  • Greenplum update
  • Federating Big Data
ADVERTISEMENT
BLOG ARCHIVE
  • May, 2013
  • April, 2013
  • February, 2013
  • October, 2012
  • June, 2012
  • May, 2012
  • April, 2012
  • January, 2012
  • October, 2011
  • August, 2011
  • June, 2011
  • April, 2011
Blogs > Bloor IM Blog

HPCC: a Hadoop competitor or a data warehouse in its own right?

Philip Howard By: Philip Howard, Research Director - Data Management, Bloor Research
Published: 14th June 2012
Copyright Bloor Research © 2012
Logo for Bloor Research

Let's start out with some figures comparing HPCC with Hadoop (yes, yes, I know you don't know who/what HPCC is - I'll come to that in a moment). Specifically, let's talk about the TeraSort benchmark, which is a sort of standard benchmark within the NoSQL community for sorting data. HPCC currently holds the record, albeit only by sorting 100 Gb rather than the Terabyte suggested by the name. When questioned about this they told me that the previous record holder had also only sorted 100 Gb and they wanted to compare like with like. Anyway that's not really the point; the really interesting thing is that, with HPCC, the sorting routine in ECL (enterprise control language) required just 4 lines of code. Writing the equivalent routine in Java, to make use of MapReduce, would take several hundred (the HPCC guys claim 700) lines of code. Of course you have to learn ECL but if that's typical that's a huge productivity gain.

So, who are these "HPCC guys" and what is HPCC? HPCC stands for high performance computing cluster and it is the database developed by, and underpinning, LexisNexis. As possibly the world's leading data aggregator, LexisNexis is well known as a subsidiary of Reed Elsevier. As its name, and the foregoing discussion suggests, HPCC is a NoSQL, open source, schema-free, clustered database that looks on the surface very much like Hadoop. Except that it has been in production for a decade, has no single point of failure (no NameNode or JobTracker issues), has built-in data integration capabilities, and, by all indications, out-performs Hadoop. This shouldn't be surprising because ECL (which is declarative rather than procedural) code is compiled whereas Hadoop depends on a Java Virtual Machine. There's also the sort of optimiser that you would expect from a conventional relational database and the company offers various text and predictive analytic capabilities such as clustering, regression and so on.

The question is, if it's so good, how come everybody is flocking to Hadoop and no-one is talking about HPCC? Well, as is usually the case with such things it comes down to marketing. But, in this case, one particular marketing decision. This is that HPCC used to be a proprietary product that was marketed and sold in a conventional manner. With increased interest in Hadoop the HPCC guys recognised that they needed to go down the open source route also, but it took them some time to persuade the bosses at Reed Elsevier that this was a good idea. So the product lost time and it didn't become open source until the middle of last summer, when a Community Edition was introduced (the Enterprise Edition is chargeable). Needless to say it therefore doesn't have the momentum that Hadoop does.

All of this begs a further question. If HPCC can scale to support multiple petabytes (LexisNexis stores more than 4Pb), if it has a declarative programming language and optimiser, if it is able to support multiple different types of data model (tabular, relational, ontological and so on), if it supports complex analytics against both structured and unstructured data, and if it runs on low-cost commodity hardware (which can be delivered as an appliance on a turnkey basis) then should we be thinking about HPCC as a data warehouse in its own right as opposed to just a Hadoop competitor? Indeed, given that all the warehousing vendors are talking about co-existing with Hadoop, doesn't HPCC represent an alternative that has been designed to have all of your data in one place rather than two? The answer to both of these questions has to be yes and, in fact, LexisNexis has customers that do indeed use HPCC as a traditional data warehouse.

Regardless, HPCC is definitely worth a look.

Reader Comments

We have not received any comments against this entry. Why not be the first?

We automatically stop accepting comments 180 days after a post is published. If you would like to know more about this subject, please contact us and we'll try to help.

  • Contact
  • | Site Map
  • | Terms of Use
  • | Privacy Policy
  • | Cookie Policy

Published by: IT Analysis Communications Ltd.
T: +44 (0)190 888 0760 | F: +44 (0)190 888 0761