Technology -> Data Management
By: Andy Hayler, CEO, The Information Difference
Published: 31st July 2008
Copyright The Information Difference © 2008
SQL Server has long been perceived, rightly or wrongly, as a database suitable for data marts rather than large-scale data warehouses. Indeed many large corporations have a hub and spoke architecture with a large central data warehouse using a technology like Teradata, and smaller data marts based on either SQL Server or Oracle. Try as it might, Microsoft has not been able to shift this perception of limited scalability. Until now that is, since it has just purchased DATAllegro, one of the more interesting of the flood of appliance start-ups that appeared in the wake of the success of pioneer appliance vendor Netezza.
DATAllegro was interesting because it did not mess around at the medium-sized end of data warehouses: its smallest customer has 50TB of data (not indexes, not backups, but actual data). It has been cagey about its customer base and revenues, but does have some very large deployed customers, only a few of which are public, such as Sears and Teoco. This proven high-end scalability gives an opportunity for Microsoft to once and for all kill off the SQL Server scalability perception issue. First DATAllegro will have to port its technology From Ingres/Linux to SQL Server/Windows, but given the way the technology was designed this is not as big a job as it may appear. Currently DATAllegro is tightly tied to EMC hardware, but this is also likely to change in the medium term. The port will be to SQL Server 2008, so it does not depend on the next major release of SQL Server, whose release dates is unclear but is unlikely to pop its head over the parapet until 2011.
An aspect that many writers seem to have overlooked about DATAllegro's technology is perhaps its most interesting aspect. As well as multi-parallel processing of a single large data warehouse, it has a grid capability, allowing it to run a hub and spoke network of warehouse and dependent data marts. To achieve this it has developed very fast data transfer speeds using its parallel capabilities. For example in the Teoco case there is a central call centre detail operational data store, feeding a data warehouse, and then a separate archive data warehouse for less urgent queries. This deployment has 400 TB of actual data, with transfer speeds of around 1 Tb a day between the stages. Another manufacturing example is a case where there are two central hubs and multiple spokes (datamarts) for marketing and financial data, with a 300 TB capacity (not fully utilised yet).
Imagine a company with a central data warehouse and a number of SQL Server data marts. In principle it would be possible to migrate this architecture to a DATAllegro grid where there was a central data warehouse and multiple dependent data marts, this time all running SQL Server. Of course if the data marts are currently independent then considerable work may be needed to harmonise the schemas and master data contained in these, but at least now there is an option to do this all within a single technology platform.
Microsoft is taking this acquisition seriously, indicated by the fact that Stuart Frost (DATAllegro's founder) will become general manager of a data warehouse division, at organisational level parallel to the SQL Server division, both reporting to Ted Kummert, who heads up Microsoft's data and storage platform division. These days Microsoft is sensibly leaving acquired R&D teams where they are rather than forcing them to relocate to Redmond, so there is less likelihood of talent leakage.
If Microsoft manages this acquisition well then it has the potential to finally slay the SQL Server scalability dragon, with DATAllegro's grid technology offering a path forward to enterprises wishing to consolidate their data warehouses and data marts to a single platform.
Posted: 1st August 2008 | By O. R. :
I am continuously surprised how here we are in 2008 and there are still vendors and experts around the world that call for hub and spoke architectures. We are in the middle of Analytics as a Service and Agile Analytics and companies are being told to build on failing architectures from the past. Hub and Spoke is a means to overcome architectural limitations of the underlying technology. It adds processing, huge amounts of latency and forces more and more duplication of data inside an organization. 1TB/day is by no means fast and most departmental data marts of larger companies need to pull multiple TB/day just to update their base data. Modern, large scale MPP platforms can process a TB of data in less than 10 sec and new behavioral business analytics require ever increasing volumes of data with more and more realtime delivery.
The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.
We automatically stop accepting comments 180 days after a post is published. If you would like to know more about this subject, please contact us and we'll try to help.
Published by: electronicdawn Ltd.