We are faced with ever growing volumes of data and, within that data, we know that there exists the answers to many of the most critical questions we face; who will buy, who will leave us, when will they change their behaviour, what are the signs that show we need to intervene to save an account, what are the signs that this is a loyal profitable customer and so forth. However to date to actually extract that knowledge in an actionable form has required time, expertise, the crossing over from technical to business skilled personal, and above all lots of cost.
A decade ago people started to realise that having only a handful of models which took months to build, many weeks to tune, and many weeks to explain to the business was not a viable model, and a revolution was started in the data mining industry. The goal was to speed up the process, reduce the reliance on very expensive statistically-savvy professionals and, at the same time, to not reduce the predictive power of the tools to find the interesting gems within the mass of data at our disposal.
This has led to visual interfaces based on work flows, faster means to crunch the data such as machine learning, and more visual displays of the results to aid the assimilation of the results into actionable insights. Everyone has improved their products dramatically from the very technical frameworks that existed at the end of the last century, but, as good as these improvements have been, they have failed to really deliver the power, speed and ease of use that is required to break the mould and unleash the potential of on-demand data mining in the hands of a business-focussed analyst. That could all be about to change; we now have the delivery of RapidMiner as a serious contender, with the financial backing to make an impact on the incumbents.
The product has been around for a number of years, and has built up a useful user base. The GUI is based on building a workflow, which is a metaphor that SPSS Clementine pioneered and has proven its worth. In addition, users can add scripts in the R programming language, which is the basis of most university courses and is the most widely used language for statistical computations. Finally there is the use of data visualisation to display the results, so a complete visual workbench to deliver an end to end solution. Penetration into the marketplace is now into its second phase, that of commercialisation. This has led to the company rebranding to align the company to the product name, and obtaining the investment required to roll out to beyond its current community of devotees.
The product is based on what is essentially a client server model, with the server also available as SaaS or on cloud infrastructures, with the environment being deployable as a web service with full scheduling into whatever workflow is required, enabling seamless and straightforward integration. Being based on an open source model means that whilst the company provides the core functionality, the community is encouraged to add on all of the bells and whistles that make for a complete entity in every conceivable configuration, and those are made available through a marketplace. This is of vital importance because a proprietary model just cannot provide that breadth of coverage and tailoring at a competitive price. The other advantage of the open source model is that encourages innovation and early adoption.
The key feature that sets RapidMiner apart is that it is designed to provide 99% of an advanced analytics solution without the need to code. The product includes everything from ETL, to engine, to display, and these are enacted by deploying template-based frameworks that speed delivery and provide tried and tested capability. The environment is very flexible with 3 core engines, being deployable as in-memory, in-database and in-Hadoop configurations.
The goal is to deliver the first models with 5 minutes of set up. So once the use case is defined and the appropriate template is deployed, with the appropriate data plugged in, capability is available for display and refinement with the business in minutes, which is a quantum leap from the hours, if not days that has been typical up to now.
A point that I really like is that they have looked at other models, most notably computer gaming, to develop tutorials and help, so that the right sort of information is delivered based on the background, experience and skill of the user. As someone who has failed to understand most pages of help delivered by companies such as Microsoft, who think we are all computer science geeks, this is something that I think is really exciting.
The product also has a clear licensing model with tiers that are based on user needs that all of us can understand. The software is available as a trial version which can then be converted into a licensed version. The tiers relate to the memory usage and data sources being required.
There are already some very convincing case histories from credible companies performing credible core processes using the tool. This proves shows that this is not just a marketing message, but tried and tested technology. Part of the investment that has been sought is being used to provide the vital services required to assist adopters to get up to speed on site with the technology.
I think this is one of the most exciting things to happen to data mining, going beyond the capability of the established players, not with questionable additional capability, but with a focus that delivers the business benefits that we have all sought and to date have found elusive. I think this is a product to take very seriously and I wish them well as they expand and grow.