• Skip Navigation |
  • Accessibility 
IT-Director.com Logo
  • Singularity go SaaS with LiveAgility
  • User Experience Monitoring as Governance?
  • Running IT as a business: don't be daft
 

Main navigation - go to a section of this website:

  • ARCHIVE
  • PAPERS
  • EVENTS
  • NEWSWIRE
  • BLOGS

  

Member Login | Become a Member

 
DOMAINS
  • Enterprise
  • SME
  • Business Issues
  • Technology
    • Data Management
    • Applications
    • Infrastructure
    • Systems Mgmt
    • Security
    • Mobile
    • Storage
    • Personal Productivity
  • Services
  • Channels
FEATURED EVENTS
  • Legal IT Show 2010
    10th February - 11th February
    London, United Kingdom
  • Data Modelling Fundamentals
    15th February - 16th February
    London, United Kingdom
POPULAR PAPERS
  • The IBM Workload Optimized Approach by Sageza Group, Inc.
  • Integrated Systems Management by Sageza Group, Inc.
TRANSLATE PAGE



USEFUL LINKS
  • Last 7 Days
  • Archives
  • Market Place
  • Top Articles
INTERACT
  • Advertising
  • Site Feedback
  • Newsletters
  • Contact Us
  • Registration
CONTENT FEED

Technology -> Data Management
RSS Feed:

RSS Icon

What is RSS?

RANDOM QUOTE
Raw Wit - "I live so far out of town the mailman mails me my letters." - Henny Youngman

ADVERTISEMENT
Analysis

The optimal warehouse

Philip Howard By: Philip Howard, Research Director - Data Management, Bloor Research
Published: 7th August 2009
Copyright Bloor Research © 2009
Logo for Bloor Research
Page Tools

Request Reprints
Tell A Friend
Contact Author

More from author
  • February 2010
    Making sense of it all 2
  • February 2010
    Bribery
  • February 2010
    Making sense of it all 1
  • February 2010
    Columns aren't enough anymore
  • February 2010
    Calpont finally comes to market
  • January 2010
    Informatica 9: (r)evolutionary?
  • January 2010
    Cadis EDM
Syndication
  • Delicious Icon Delicious
  • Digg Icon Digg
  • reddit Icon reddit
  • Facebook Icon Facebook
  • StumbleUpon Icon StumbleUpon

Suppose you wanted the best possible combination of features to provide ultimate performance in data warehousing? Actually, you probably do. What would this look like?

Well, to begin with you would probably want a hybrid storage architecture. Columns are great for large table scans but they are not a lot of use for real-time updates, look-ups or queries involving just a few rows. So, it would probably be a good idea to have a row store as well as a column store.

Secondly, you would want to maximise the use of available chip technology. Standard chips (not multi-cores) have lots of parallel capabilities but databases cannot typically make use of these. For example, suppose you load all the relevant data into memory so that there is no I/O effect, then a typical hand written C++ query will run orders of magnitude faster than a typical database query. You would really like to get close to C++ performance if you can.

Next, of course you would want to support parallelism across any multi-core CPUs within each processor you are using, as well as across processors. And you would want to support pipeline parallelism (where you can start off the next part of a query before the previous part has finished).

And, finally (there are probably more things to consider but this will do for now) you would not only support direct attached storage (DAS) for the fastest possible I/O but also a storage area network (SAN) for its high availability characteristics. However, the latter does not perform as well as DAS so, again, you would want some sort of hybrid mechanism to ensure best possible performance at the same time as providing all that resilience.

The good news is that there are companies doing all of these things though no one company is doing all of them.

On the storage side, SAP BW already has a hybrid architecture while Ingres in conjunction with VectorWise recently announced that it would also be supporting a similar column/row architecture. Of course, there are companies that having been addressing the large table scan issue in other ways, such as Netezza, and Oracle with Exadata, but both of these rely on hardware/software approaches rather than pure software.

As for the exploitation of chips is concerned this was precisely the big news from Ingres and VectorWise. By using vector processing techniques they are able to get much better performance out of their CPUs. Note, however, that this is not a vector database in the sense of storing data in arrays, merely in processing terms. I suppose you could argue that this is what Netezza has been doing at the FPGA level; and anyone who supports compiled queries will get better performance than otherwise, but probably not approaching the performance that VectorWise is claiming.

More general parallelism is of course widespread though the companies that have built cross-core parallelism have not as yet (as far as I know) extended this to cross-processor parallelism. However, it seems an obvious next step.

Finally, in so far as hybrid DAS and SAN support with optimal performance is concerned, this is just what ParAccel has introduced in its latest release.

So, the omens are looking good. This is just as well. With Ingres/VectorWise entering the market this means there will be 24 vendors in the market by my reckoning, not counting specialised plays, and that's far too many. Who's going to disappear? The ones that are not flexible enough to cope with the sort of developments just discussed.

Reader Comments

Sorry, we are no longer accepting comments on this item. We suggest trying to contact the author directly.

  • Site Map
  • | Terms of Use
  • | Privacy

Published by: IT Analysis Communications Ltd.
T: +44 (0)1908 880760 | F: +44 (0)1908 880761