Technology -> Big Data
By: Philip Howard, Research Director - Data Management, Bloor Research
Published: 8th November 2012
Copyright Bloor Research © 2012
IBM's Information on Demand conference was, as usual, informative though it's sometimes difficult to keep track of all the new releases unless you attend the press conference (which I don't) and even then you don't get what was released the previous week.
Anyway, there was lots of good stuff. However, the most interesting thing - to me at any rate - was about old stuff and specifically, about the triple store feature in DB2 10. When I was briefed about this back in the spring I was told that the data was still stored relationally but was, in effect, tagged. And, of course, DB2 supports SPARQL for querying this data. I subsequently wrote about this and described this feature in this way without any amendments coming from IBM (I always check technical features before publishing details about them). I also wrote that I didn't expect great performance, precisely because the data was still being stored relationally.
It turns out that this was all hogwash. In fact, the data in the triple store is not stored relationally but as an encoded vector, which is a different thing altogether. As the person who told me this was Curt Cotner, who is CTO for IBM Database Servers, I am inclined to believe him. I think the problem is that there are not enough people within IBM who actually appreciate and understand how important triple stores (otherwise known as graph or RDF databases) are going to be and haven't felt the need to understand how they truly work. My personal view is that they will be the next big thing after all the fuss about Hadoop has died down but I've been at several IBM events where DB2 10 has been discussed but this feature has not even been mentioned.
It is interesting to explore why IBM has implemented this triple store and that's because IBM Rational asked them to. Rational was developing its Rational Jazz product, which is a collaborative repository for development objects, and could not get it to perform using traditional database technology so they approached Curt's group for help, and that's why it was built into DB2. Tivoli is also making use of this storage mechanism. It should be noted that DB2 is not a full graph database at this point as it lacks the inference engine that would generally be included in such environment, but it is likely that DB2 will be integrated with one of the open source engines of this type in due course.
There's an interesting sidebar to this triple store implementation as it means that DB2 now effectively has three different storage engines: relational, XML and triple store, each with its own access mechanism - SQL, XQuery and SPARQL. Now, if you've got three storage engines why not four or five? In fact, IBM is already working on a JSON store, accessible via SQL and with a callable interface like MongoDB. So, why not a Hadoop storage engine (HDFS or GPFS) as well? You'd get the low cost clustered hardware advantage, you'd get the advantage of "schema later" but you'd have a single management environment across multiple storage engines. Okay, don't expect this tomorrow but I think this is the direction in which IBM is going to move: the addition of the triple store to DB2 is not only important in its own right I think it's a pointer to the future.
There is one outstanding question: if DB2 already has three distinct storage mechanisms and is likely to have more in the future then is it valid to continue to call it a relational database management system? Isn't it a general-purpose database management system now or even just a data management system?
Posted: 10th November 2012 | By JIm Fuller :
interesting, because thats exactly what MarkLogic has gone and done
Posted: 11th November 2012 | By Rurik Thomas Greenall :
A nitpick here: from my understanding, Rational Jazz is not based on "traditional database technology", but an IBM implementation of the Jena (TDB?) triplestore.
Posted: 14th November 2012 | By Barry Norton :
Jim, I don't find any reference to indexing graph-like data or supporting the SPARQL query language in your link to MarkLogic - could you explain what you mean?
Posted: 22nd November 2012 | By Philip Howard (Author):
Rational Jazz is based on the triple store embedded in DB2 not Jena.
Posted: 23rd November 2012 | By Rurik Thomas Greenall :
@Philip Howard now, yes, but historically, ergo my point. Cf. http://www-01.ibm.com/support/docview.wss?uid=swg21498819
The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.
We automatically stop accepting comments 180 days after a post is published. If you would like to know more about this subject, please contact us and we'll try to help.
Published by: IT Analysis Communications Ltd.
T: +44 (0)190 888 0760 | F: +44 (0)190 888 0761