• Jump to Left Menu
  • Jump to Right Menu
  • Jump to Main Content
  • Jump to Footer
  • Accessibility Page
IT-Director.com Logo

 

Main navigation - go to a section of this website:

  • ARCHIVE
  • PAPERS
  • EVENTS
  • NEWSWIRE
  • BLOGS

  

Register For Membership | Member Login

 
 
DOMAINS
  • Enterprise
  • SME
  • Business Issues
  • Technology
  • Services
  • Channels
FEATURED EVENTS
  • 24th Annual FIRST Conference on Computer Security and Incident Response
    17th June - 22nd June
    Portomaso St. Julians, Malta
  • Enterprise Architecture Conference Europe 2012 Business Process Management Conference Europe 2012
    18th June - 20th June
    London, United Kingdom
POPULAR PAPERS
  • Data profiling: the business case by Bloor Research
USEFUL LINKS
  • Last 7 Days
  • Archives
  • Top Articles
SHARE THIS PAGE
  • Delicious Icon Delicious
  • Digg Icon Digg
  • reddit Icon reddit
  • Facebook Icon Facebook
  • StumbleUpon Icon StumbleUpon
CONTENT FEED

Sitewide
RSS Feed:

RSS Icon

What is RSS?

RANDOM QUOTE
Observations - "Fashion is a form of ugliness so intolerable that we have to alter it every six months." - Oscar Wilde

PAGE TOOLS
RECENT POSTS
  • What exactly is in-memory?
  • Graph databases and the warehouse
  • Service virtualisation
  • YarcData
  • Neo4j
  • Graph databases and NoSQL
BLOG ARCHIVE
  • May, 2012
  • April, 2012
  • January, 2012
  • October, 2011
  • August, 2011
  • June, 2011
  • April, 2011
  • March, 2011
  • February, 2011
  • January, 2011
  • November, 2010
  • October, 2010
Blogs > Bloor IM Blog

Cassandra and Hadoop

Philip Howard By: Philip Howard, Research Director - Data Management, Bloor Research
Published: 25th January 2012
Copyright Bloor Research © 2012
Logo for Bloor Research

I am continuing to investigate Hadoop storage options as I get briefed by more vendors and as new products get released. In this article I want to focus on Cassandra.

DataStax is the leading commercial provider for distributions of Cassandra, which is a BDDB (big data database). However, unlike HDFS (the standard storage mechanism for Hadoop) or GPFS (IBM's alternative) Cassandra is not a key-value store but a column-family store. This is not to be confused with a column-based relational database such as HP Vertica or ParAccel. In fact, it is unfortunate that whoever thought of the name "column-family" didn't think of something else. The point is that while Infobright and Sensage (more columnar relational databases) and Cassandra all use columns, this is the limit of their similarity: the former two are relational and Cassandra isn't.

I don't intend to go into the details of column-family databases and how they are architected. At least not now. But the main difference between a column-family database such as Cassandra and a key-value data store such as HDFS is that the latter stores just a key and a value while the former stores tuples that consist of a name, a value and a time stamp. It is this last that makes a big difference: there are lots of environments - smart metering, security logs and so on - where understanding time series is important and this means that Cassandra can support applications that Hadoop cannot. Not surprisingly, DataStax is exploiting this capability. Thus, for example, you can either store timestamps as the order in which they arrive in the database or as the order in which the events actually occurred (which may not be the same thing). You can also index against the timestamps and, indeed, the software supports secondary indexes as well. One further notable feature is that DataStax has introduced CQL as a query language, which is a subset of SQL, although you can't do such things as joins, because there are no tables.

In so far as Hadoop is concerned you can implement Hadoop and Cassandra on the same cluster. This means that you can have your time-based and real-time applications (real-time being a strength of Cassandra) running under Cassandra while batch-based analytics and queries that do not require a timestamp can run on Hadoop. In practice, in this environment, Cassandra replaces HDFS under the covers but this is invisible to the developer. You can reassign (dynamically where appropriate) nodes between the Cassandra and Hadoop environments as is appropriate for your workload. The other major upside is that using Cassandra removes the single points of failure that are associated with HDFS, namely the NameNode and JobTracker, which I have discussed in previous articles.

One final point is that Cassandra has a reputation for being difficult to get started. In order to simplify this process, DataStax is providing installers, examples and so forth within its Community Edition, while the Enterprise Edition, amongst other things, includes a visual point-and-click, web-based management environment that integrates with third party environments such as Tivoli and OpenView.

Reader Comments

Posted: 25th January 2012 | By Andy Ormsby :

Great to see you covering Cassandra, but it's worth mentioning that the commercial organisations supporting Cassandra extend beyond Datastax. Acunu, a company founded in the UK and with offices in London and San Francisco provides support and training for Cassandra as well as its own distribution of the software together with a high performance back-end that eliminates some of the complexity of realising the great performance that Cassandra promises. (Disclaimer - I'm an Acunu employee).

The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.

Post A Comment?

All fields must be completed to submit a comment. Email addresses are passed through to the author so they can contact you directly if needed.





  • Contact
  • | Site Map
  • | Terms of Use
  • | Privacy Policy

Published by: Electronicdawn Ltd.
T: +44 (0)190 888 0760 | F: +44 (0)190 888 0761