Technology -> Data Management
RSS Feed:
|
By: Philip Howard, Research Director - Data Management, Bloor Research Published: 23rd March 2005 Copyright Bloor Research © 2005 |
Data warehouses have always had a problem with performance when it comes to (complex) analytic queries. Data Appliances are the latest answer to this issue. The question is: how well will they fare?
First, let’s be clear about the meaning of analytics – this is the processing of queries against transaction-level data. For example, queries like “which customers bought patio furniture within three weeks of purchasing a barbecue”? Note that we are interested in individual customers not just how many – not, mind you, that there aren’t performance issues with regard to aggregated queries also, but the issue is more acute when granular level detail is required, in part because of the volumes of data that have to be searched.
As I have stated, this has long been recognised as a problem and various “analytic servers” have been put forward as solutions. The most common technique employed by the suppliers of these solutions has been column-based relational databases, but exotica such as vector-based databases have also been tried. While I have been a fan of these offerings for some time, the truth is that most of the suppliers (Sybase with Sybase IQ is an exception) have diverted into niche or different markets. For example, Alterian is focused especially on market campaign analysis, Kx Systems on stock ticker information, and Sand, although relatively successful in this sector, is increasingly focusing on its archiving capabilities. Meanwhile, Aruna has gone out of business and WhiteLight got bought out by SymphonyRPM.
In other words, and for whatever reason (including marketing), these solutions have largely failed to capture the attention of the buying public. Now a new solution is available, as supplied by Nētezza and Datallegro. These two companies offer data warehousing appliances that promise more (typically much more) performance, plus scalability, at lower cost, than conventional data warehousing solutions.
The big difference between these two vendors and their column-based rivals is that they use conventional relational (open source) databases; in the case of Nētezza based on PostgreSQL and for Datallegro, based on Ingres. This is likely to make them much more acceptable to the average database administrator, because there is no need to explain the concept of how the product works in software terms.
However, there is an issue over hardware because the whole point of an appliance is that it is a solution that blends hardware (processors – Linux-based, and disks) and software into a single solution. The advantage of this approach is not only that you get everything from one vendor but also that the software is specifically optimised to run on the selected hardware. As a result you get much better performance. Actually, there is more to it than that because, for example, you can implement software directly within disk controllers. Thus for some queries you can retrieve data at close to disk access speeds. To cut a long story short: the results are impressive.
The nice thing is that this is a well-trodden path. Going back to the days of Britton-Lee and then with the likes of Teradata and WhiteCross, the idea of mixing hardware and software, particularly in the data warehousing arena, is well-established. Thus Ntezza and Datallegro should not find it difficult to find customers and, indeed, there are a number of well-known companies that have already adopted this technology. It is too early to say that data appliances will take the data warehousing space by storm but there is lots of opportunity: there are plenty of users that are unhappy with the performance of their existing systems or that have dismissed the whole concept as too expensive. Data appliances offer a potential solution to both of these groups of users.
Sorry, we are no longer accepting comments on this item. We suggest trying to contact the author directly.
23rd March 2005: 'Dave the Pharmacist' said:
There is no such thing as a free lunch. Have you ever looked at the load times for Netezza?
Whilst the query times look great, we'd never get all our data loaded in our batch window.
SQL Server does pretty well, but the fastest load times I've seen is SPDS from SAS - even with lots of indexes
25th July 2005: 'Satisfied Netezza Customer' said:
I have personally seen Netezza with average load times of 250 gig per hour. Netezza has been very fast in querying, loading data, and in data transfomations. Netezza has changed the way we look at our business, and we are now asking questions that were never before possible (and with greater data volumes).
I would highly recommend the Netezza solution to anyone.
The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.
Published by: IT Analysis Communications Ltd.
T: +44 (0)1908 880760 | F: +44 (0)1908 880761