By: Philip Howard, Research Director - Data Management, Bloor Research
Published: 9th December 2013
Copyright Bloor Research © 2013
Not for the first time, IBM has been underselling itself. I refer specifically to DB2 with BLU Acceleration. Given this second sentence you might be surprise at the first: IBM has been heavily marketing BLU. However, it's true.
Let me go back. One of the slides that IBM prepared for its BLU launch showed how a query processing 10TB of data could actually fit into 8MB of memory. This is how it works: assuming a 10x compression ratio, that 10TB of raw data is only 1TB on disk; but let's suppose that table you are processing has 200 columns and you only want to read 2 of them then that's two orders of magnitude reduction to 10GB; now use the newly introduced data skipping technology and that's a further reduction by 1/10th to 1GB. Next, bear in mind that you have parallel processing across each core and you get a reduction to 32MB and, finally, you have vector processing which means a further reduction by a quarter to 8MB. There the slide stops.
Of course, this is all highly theoretical: you might want 20 columns and not 2 but on the other hand there are not many queries that access 10TB of data in the first place.
Bu the interesting thing, and where IBM has been under-selling itself, is that it doesn't tell you any more about that 8MB. Because, a rarely mentioned feature of BLU is that it uses processor cache wherever possible and L3 cache in particular. And, as it happens, L3 cache typically starts at 8MB (up to about 24MB currently) so, in this particular example, and if it was the only query running - which it won't be - then the whole query could fit in L3 cache. What's important about that is that L3 cache is around an order of magnitude faster than RAM.
And that's why this article has the heading it does. Consider that IBM is a major manufacturer of processing chips with its Power series. Given that DB2 with BLU Acceleration can exploit processor cache, wouldn't you expect IBM to be developing its next generation of chips with a great deal more L3 cache? I am speculating here and I haven't discussed this with IBM at all: but it makes sense to me albeit that there may be hardware constraints that I am not aware of.
But, if I'm right, where does that leave HANA and other such (purely) in-memory technologies? If processor cache can be significantly expanded then the drive must be towards exploiting this as much as possible and in-memory processing will be limited to non-urgent, less important queries - and if they are not urgent and less important then why not leave them on disk? Will in-memory processing soon become passé?
Posted: 10th December 2013 | By Paul Zikopoulos :
You nailed it. Personally, I hate how we called BLU "In-Memory" or "Dynamic In-Memnory" which refers to not having the requirement to have all the active data in memory (vendors vary by granularization) to process the query. In a BigData world I just don't get how that is possible. I wish we called in DB2 BLU Acceleration with In-Cache Analytics. to me, memory is the new disk when it comes to performance.
Posted: 28th December 2013 | By Kent Collins :
I would have serious reservations about buying any solution that only works when dynamically changing data all fits into dynamically changing memory. Perhaps memory is getting cheaper but keeping 1.5 ratio of RAM to data in of all places SAP, seems to be a significant cost. I know of one SAP/BW environment that consumed three years of SAN projected growth in less than one year.
One company I know about has taken some SAP/BW tables and converted them to IBM BLU and seen significant SQL elapse time reductions. All of this done apart from SAP or IBM.
I believe sometime in the future we will see DBMS solutions that take disk storage optimized data( Best organized for at-rest ) and based on the access plan, determine in real time the memory storage organization best suited for run-time.( Best organized for memory )
After all I want a Table registered in the DB with data. How it is stored on disk I do not care just do it efficiently. I want to retrieve data from that Table and again I do not care if the access is Row or Column or some combination of the two or even a new third method. Why is it that we store it row and read it row and store it column and read it column?
The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.
We automatically stop accepting comments 180 days after a post is published. If you would like to know more about this subject, please contact us and we'll try to help.
Published by: electronicdawn Ltd.