• Jump to Left Menu
  • Jump to Right Menu
  • Jump to Main Content
  • Jump to Footer
  • Accessibility Page
IT-Director.com Logo

 

Main navigation - go to a section of this website:

  • ARCHIVE
  • PAPERS
  • EVENTS
  • NEWSWIRE
  • BLOGS

  

Register | Login to Member's Area

 
 
DOMAINS
  • Enterprise
  • SME
  • Business Issues
  • Technology
  • Services
  • Channels
FEATURED EVENTS
  • Information Process Quality Improvement
    19th March - 21st March
    London, United Kingdom
  • Convergence Summit North 2012
    17th April - 18th April
    Manchester, United Kingdom
POPULAR PAPERS
  • Best practices for cloud security by Bloor Research
USEFUL LINKS
  • Last 7 Days
  • Archives
  • Top Articles
SHARE THIS PAGE
  • Delicious Icon Delicious
  • Digg Icon Digg
  • reddit Icon reddit
  • Facebook Icon Facebook
  • StumbleUpon Icon StumbleUpon
CONTENT FEED

Sitewide
RSS Feed:

RSS Icon

What is RSS?

RANDOM QUOTE
Raw wit - "She plunged into a sea of platitudes and with the powerful breaststroke of a channel swimmer made her confident way towards the white cliffs of the obvious." - W. Somerset Maugham

PAGE TOOLS
  • Request Reprints
  • Tell A Friend
  • Contact Author
RECENT POSTS
  • Cassandra and Hadoop
  • Another choice for Hadoop
  • Informatica Data Replication
  • Hive, DataRush and Hadoop
  • Challenging Cloudera
  • The EDW is dead
ADVERTISEMENT
BLOG ARCHIVE
  • January, 2012
  • October, 2011
  • August, 2011
  • June, 2011
  • April, 2011
  • March, 2011
  • February, 2011
  • January, 2011
  • November, 2010
  • October, 2010
  • September, 2010
  • August, 2010
Blogs > Bloor IM Blog

DataRush extends its boundaries

Philip Howard By: Philip Howard, Research Director - Data Management, Bloor Research
Published: 16th June 2010
Copyright Bloor Research © 2010
Logo for Bloor Research

Pervasive has just released version 4.4 of its DataRush platform. Which you might think, being a point release, is just more of the same (whatever that same is—I’ll come to that in a moment). However, that would be an incorrect assumption: DataRush 4.4 represents a radical, and important, new direction for DataRush.

So, to go back to the beginning: what is DataRush? In a nutshell it’s a very fast parallel engine for doing stuff. In particular, it’s a cross-core parallel engine. What that means is that if you have an eight core machine then you get eight parallel processing streams. While there are a few other vendors in particular markets that have developed comparable capabilities most vendors that deliver parallelised products do so across machines: so you would need eight servers to get eight-way parallelism, for example, rather than one server with eight cores. As you can imagine, that makes DataRush very much more cost effective.

DataRush differs from those few other suppliers that have built intra-core parallelism in that it is a general purpose engine. That is to say, you can OEM it for whatever purpose suits you. In so far as Pervasive itself has been concerned, to date it has focused on high performance data preparation (the company has both data profiling and matching technologies that run on top of DataRush) both for generic data cleansing purposes and to streamline preparation time for data mining and analytic functions.

So, that was the position up until version 4.2. But with 4.4, DataRush will actually perform your data mining operations for you. With this release the company has introduced an analytics function library that includes k-Means clustering; naïve Bayes, decision tree (C4.5) and k-nearest neighbour classification algorithms; four types of regression association rule mining and principal component analysis. This has been integrated with Eclipse-based workflow from the open source data mining vendor: KNIME (which is German). In addition, DataRush 4.4 also supports PMML (predictive modelling mark-up language) so you can import any existing models you may have.

The idea with DataRush is that you extract the data from your data warehouse and then process the data within the DataRush engine, making use of its inexpensive parallelism. The potential alternatives to this are a) do data mining the old fashioned way, which means extracting the data to an application server and then running the analytics there or b) perform data mining in the database where that is available. DataRush should be significantly faster, more accurate (since you shouldn’t need to sample the data) and less expensive than the first of these. With respect to the second, the short answer is that I don’t know how it will stack up: you still have to move the data, which is a downside but otherwise it will likely depend on the environment. Typically, you already have a data processing workload on your warehouse or mart so any additional in-database analytics may impact on existing workloads, so you will have to extend your warehouse: which will be most effective in performance and cost terms—using DataRush or in-database analytics—will only be proven once we have had some competitive proofs of concept. Of course, a lot of warehouse vendors do not yet have, or do not have very advanced, in-database analytics so in those cases DataRush should certainly represent a significant contender.

Reader Comments

The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.

We automatically stop accepting comments 180 days after a post is published. If you would like to know more about this subject, please contact us and we'll try to help.



  • Report errors / Make Suggestions
  • | Site Map
  • | Terms of Use
  • | Privacy

Published by: IT Analysis Communications Ltd.
T: +44 (0)190 888 0760 | F: +44 (0)190 888 0761