Technology -> Data Management
RSS Feed:
|
By: Philip Howard, Research Director - Data Management, Bloor Research Published: 3rd October 2006 Copyright Bloor Research © 2006 |
I have recently returned from Netezza's second annual conference. This was well attended, with nearly all of the company's customers (around 75) being represented, as well as a significant number of both prospects and partners. It was very (to use a technical term) buzzy and there was a degree of enthusiasm that I have rarely encountered. However, what was most interesting for me was the number of things I had not previously appreciated about Netezza's technical capabilities. And, of course, its roadmap for the future (though I can't say too much about that).
To begin with there is the question of indexes. Data warehouse appliances in general, and Netezza in particular, tends to be type cast by detractors as only being good for large table scans, because they do not support indexes and therefore cannot run complex joins. However, in the case of Netezza, at any rate, this is misleading. This is because it uses what might be described as an anti-index, which is called a zonemap. What a zonemap does is to allow you to load say, sales by time, and then the zonemap breaks the relevant data down into blocks, storing the details of the first and last record in each block (thus there is a much lower overhead compared to an index). What this means is that when you run a query you only read the blocks that contain the data you are interested in, ignoring all the other blocks. This ability to limit the data you read means that joins are much more effective than would otherwise be the case. In its roadmap, Netezza described future approaches that will further reduce the amount of data you need to read.
Another interesting thing to come out of the conference was that a number of Netezza customers have stopped using aggregates as a result of implementing Netezza. For example, Carphone Warehouse told me that it was both faster and more accurate to calculate directly from the raw data. As aggregates are a major issue for database administrators, being able to get rid of them (or, at least, minimise their use) is a significant benefit. Not that Netezza eschews aggregates altogether. More than one user employs a data warehouse appliance (not only from Netezza) as an aggregating engine as a front-end to a third party enterprise data warehouse. I will discuss this further in a subsequent article.
And while talking about enterprise data warehouses (EDW) there are several arguments put against using a data warehouse appliance as an EDW. The first is that you can't use an appliance for complex joins but, as discussed above, this is less and less true, at least as far as Netezza is concerned. Secondly, there is the issue that the large EDW vendors provide pre-built data models—well, one of the things that Netezza has not made much of is the fact that it has partners that provide exactly these sort of capabilities (typically built on either a star or snowflake schema). And, thirdly, there is the question of managing mixed workloads. In this last case, Netezza offers guaranteed resource allocation (floors but not ceilings yet), short query bias, materialised views and prioritisation.
Another area in which Netezza has been hiding its light under a bushel is in the matter of FPGAs (field programmable gate arrays). FPGAs are used to process data as it is streamed off disk. Note that this is important to understand. Most data warehouse appliances (and, indeed, conventional products) use a caching architecture whereby data is read from disk and then held in cache for processing. Netezza, on the other hand, uses an approach that queries the data as it comes off disk before passing the results on to memory. In other words it uses a streaming architecture in which the data is streamed through the queries (whose programs have been loaded into the FPGA) rather than being stored (even if in memory) and then queried.
There are several points to make about this. The first is that you can get much better performance when using this sort of approach than when using a conventional one. For example, it is stream-based processing that is used for algorithmic trading, where processing requirements are of the order of 150,000 transactions per second. The second is that FPGAs are the natural way of handling streaming environments. For example, they are widely used for voice and video streaming. They are not yet used for event stream processing but we know of one vendor that plans to do exactly that. In turn, what this means is that FPGAs are very much a commodity item. Those of us working in more conventional environments may not think of FPGAs like that but they are as much of a commodity as, say, an Intel processor.
And talking about processors, the other thing that Netezza uses that may seem odd to some people is that it employs a PowerPC chip rather than using said Intel (or AMD). Again, this is similarly a commodity device that is widely used in small footprint devices, primarily because of its low power consumption. To be specific a Netezza Snippet Processing Unit (where a snippet is the compiled SQL query that data is streamed through) requires just 30 watts. A complete Netezza rack with 112 of these and 16.5Tb of disks (with 5.5Tb of user data) requires little more than 4Kw and produces 12,000 BTU heat output. Given the power and cooling issues afflicting most data centres today, this is a substantial advantage, as are the reduced floor space requirements.
Returning to FPGAs for a moment, the performance and price of these is following along a similar price/performance curve as those of processors. It is expected that performance and price will both improve by five times by 2010, as will the amount of logic that you can put on an FPGA. This last is particularly important because it will enable Netezza to introduce even more functionality into the FPGA in the future.
Even with the current FPGAs, Netezza plans to introduce features that will increase raw scan-rate performance, tactical query performance and advanced analytic performance. The advanced analytic capabilities will be made available to partners rather than end users and will allow predictive analytics vendors (like SPSS or SAS) to embed scoring capabilities (say) directly into the FPGA, which should provide significant performance advantages.
Another potential use of the functionality embedded in the FPGA would be to implement column-level encryption, which would be useful for companies in the data aggregation and resale market, for example, because you could use different encryption techniques for each customer's data. Encryption generally is not available and is not currently on the roadmap and while I would like to see this it is arguably unnecessary—given the structure of a Netezza appliance you would need some seriously good hacking skills to read a Netezza disk, even if you could get at one— - so column-level encryption on its own may be good enough.
To conclude, I was surprised by this conference, not just by the enthusiasm of the attendees but also about some of the functionality that Netezza can offer, which I don't think it has done a good job of explaining to the market. It has, for obvious reasons, concentrated on performance, price and reduced cost of ownership but, to take TCO, it has tended to focus on the removal of indexes and tuning but hasn't discussed its advantages when it comes to aggregates. Similarly, it hasn't really explained why using FPGAs is a good idea, it hasn't made it clear that zonemaps are a form of anti-index, and it hasn't talked much about its advantages in the data centre. Given all of this, and adding in the rich set of new features in the company's roadmap (a number of which I have not mentioned), there is no reason to expect Netezza to do anything but go from strength to strength
Sorry, we are no longer accepting comments on this item. We suggest trying to contact the author directly.
3rd October 2006: 'Stuart Frost' said:
Philip, you make some interesting observations in your piece on Netezza. However, some of your comments make it seem as though Netezza is the only DW appliance vendor with similar or even better capabilities. That's not the case.
For example, we use sophisticated multi-level partitioning to reduce the quantity of data scanned for a given query without requiring indexes. This is much better than zone maps, which are really only effective for data that is ordered (e.g. by date). In our system, you can also use indexes if absolutely necessary. These features are very easy for DBAs to implement, so they don't lead to a higher TCO.
As you know, we've offered encryption for over a year, so that's hardly new.
On power consumption, our system is roughly the same per disk, so the wattage per TB is around the same.
As for FPGAs - so performance will improve 5X by 2010? Intel's roadmap would indicate around 10X by then. FPGA manufacturers have never been able to keep up with the commodity CPU companies. In addition, Netezza will have to considerably re-architect their FPGA-based software to take advantage of the additional silicon real-estate.
In customer proof of concept projects, we've consistently shown that our more sophisticated architecture beats Netezza on price/performance and price per TB - hence our win rate of over 80%. As we ride the wave of Intel's multi-core CPU innovations, we're very confident that this will continue for the forseeable future.
Stuart Frost
DATAllegro
3rd October 2006: 'Philip Howard' (Author) said:
Stuart
I have written articles just about DATAllegro in the past, it doesn't seem unreasonable to write ones just about Netezza.
On the issue of FPGAs see http://newyorkscot.wordpress.com/tag/grid/ where it states that "although FPGAs can deliver up to 1000x faster performance than CPUs, the implementation may actually result in performance gains in the order of 40x or 200x ..." so the issue of increased speed would seem to be more a question of Intel catching up rather than the other way around.
4th October 2006: 'Stuart Frost' said:
Philip - sorry, didn't mean to imply that you hadn't written some excellent stuff on DATAllegro too :) It's great that there's so much detailed coverage of DW appliances coming through from you and other analysts.
However, your comments on FPGAs don't match our real-world experience on customer PoCs and projects.
FPGAs may be faster than general purpose CPUs for certain tasks. However, this comparison isn't really fair for complex software such as databases that require a high degree of parallelism.
We're now achieving a similar level of I/O throughput from each disk as Netezza - with far less 'CPU' power.
In Netezza, each disk has a dedicated FPGA and a general purpose Power CPU. In DATAllegro appliances, we have just two (dual core) Xeons for every 12 disks. If FPGAs are so much better than Xeons, how do you explain the fact that we can match Netezza on disk throughput with such a lower ratio of CPUs to disks?
We also tend to read less data than Netezza for a given query, so overall performance is quite a bit better.
Cheers,
Stuart
DATAllegro
15th March 2007: 'Green Light' said:
The comments are funny. I think the problem of datallegro that they called a "shadow of netezza" is personal and emotional: each time someone writes about netezza, they stand up and asking to mention their name too.
Philip, small suggestion: next time you write about netezza please add a P.S. with somethin like "datallegro, i think about you too" :)
5th May 2007: 'Sanjay' said:
If Datallegro is doing as well as Stuart indicates, why do they not advertise their customer successes, they do not have a list of customers on their website and only a handfull of customer success press releases. I think the reality is that Netezza is streets ahead of Datallegro and has a very solid list of blue chip customers that are all keen to shout about their success.
17th May 2007: 'Leo' said:
Recently Datallegro abandoned their white box Intel & custom storage enclosures due to engineering problems and switched to servers from Dell & Storage Arrays from EMC. Even if this solves their technical problems, having studied this approach, the cost/performance ratio (not raw $/TB) is at best marginal. The big problem I see is their business model. How can they be profitable when the bulk of the dollar content of their system is going to Dell & EMC. They are probably looking to sell the business. They have ridden Netezza's wave and their PR tries to make one think they're in the same league as Netezza when they're not. They've even imitated the look, feel, and colors of Netezza's website.
The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.
Published by: IT Analysis Communications Ltd.
T: +44 (0)1908 880760 | F: +44 (0)1908 880761