Technology -> Data Management
By: Andy Hayler, Associate Analyst, Bloor Research
Published: 19th February 2008
Copyright Bloor Research © 2008
Vertica is one of the plethora of vendors which have emerged in the analytics "fast database" space pioneered by Teradata and more recently opened up by Netezza. The various vendors take different approaches. Some (e.g. Netezza) have proprietary hardware, some (e.g. Kognitio, Dataupia) are software only, some (e.g. ParAccel) rely mainly on in-memory techniques, others simply use different designs from the traditional designs of the mainstream DBMS vendors (Oracle, DB2).
Vertica (whose CTO is Mike Stonebraker of Ingres and Postgres fame) is in the latter camp. Like Sybase IQ (and Sand) it uses a column-oriented design (i.e., it groups data together by column on disk) rather than the usual row-oriented storage used by Oracle and the like. This approach has a number of advantages for query performance. It reduces disk I/O by only having to read the columns referenced by the query and also by aggressively compressing data within columns. Through use of parallelism across clusters of shared-nothing computers, Vertica databases can scale easily and affordably by adding additional servers to the cluster. Normally the drawback to column-oriented approaches is their relatively slow data load times, but Vertica has some tricks up its sleeve (a mix of in-memory processing which trickle feeds disk updating) which, it claims, allow load times comparable to, and sometimes better than, row-oriented databases. Vertica comes with an automated design feature that allows DBAs to provide it with the logical schema, plus training data and queries, which it then uses to come up with a physical structure that organizes, compresses and partitions data across the cluster to best match the workload (though ever-wary DBAs can always override this if they think they are smarter). With a standard SQL interface Vertica can work with existing ETL and business intelligence tools such as Business Objects, and has significantly expanded the list of supported vendors in their upcoming 2.0 release.
With so many competing vendors all claiming tens of times better performance than others, the measure that perhaps matters most is not a lab benchmark but customer take-up. Vertica now has 30 customers such as Comcast, BlueCrest Capital Management, NetworkIP, Sonian Networks and LogiXML, and with its upcoming 2.0 release out on 19/2/2008 is doing joint roadshows with some of these. It has done well in Telcos, who have huge data volumes in their call detail records databases. Two deployed Vertica customers have databases approaching 40 TB in size. Another area is financial services, where hedge funds want to backtest their trading algorithms against historical market data. With one year worth of US financial markets data taking up over 2TB, this can quickly add up, and so Vertica has proved popular amongst this community, as well as with marketing companies with large volumes of consumer data to trawl trough. Vertica runs on standard Linux servers, and it has a partnership with HP and Red Hat to provide a pre-bundled appliance, which is available from select HP resellers.
With solid VC backing, a glittering advisory board (Jerry Held, Ray Lane, Don Hadrele...) and genuine customer traction in an industry long on technology but short on deployed customers, Vertica should be on every vendor short-list for companies with heavy duty analytical requirements which currently stretch performance limits and budgets.
Posted: 25th February 2008 | By Samantha Stone :
I enjoyed reading your article. Businesses have many new, and sometimes confusing choices to enhance their data collection and usage applications.
As a point of clarification, Dataupia, is a complete Data Warehouse Appliance and includes software and commodity, hardware components which are fully integrated. We like to say we pass the Tivo Test (a popular television recording device). A Tivo has a complete computer inside, and yet consumers have no reason to manage the individual components, but rather use their remote control to personalize their television recording choices. Similarly, the Dataupia solution simplifies the management of data using non-propriatary hardware and our omniversal transparency technology which integrates with a clients existing Oracle, DB2 or MS SQL Server environments for enhanced scalability.
The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.
We automatically stop accepting comments 180 days after a post is published. If you would like to know more about this subject, please contact us and we'll try to help.
Published by: electronicdawn Ltd.