Technology -> Big Data
Released: 20th May 2013
Syncsort, a global leader in Big Data integration solutions, today announced the availability of its Spring ‘13 release, including two brand new Hadoop products and breakthrough enhancements to DMX that turn Hadoop into a more robust, feature-rich, and easy-to-use ETL solution.
Big Data is prompting organisations to look at Hadoop to process more data in less time and for less money, but Hadoop is not yet a complete ETL solution. Syncsort’s two new offerings for Hadoop – DMX-h ETL Edition and DMX-h Sort Edition are designed to strengthen Hadoop by providing the full functionality required to deliver enterprise ETL capabilities. They provide greater ease-of-use and maximise node performance compared to non-native, code-generating ETL tools. In addition, performance and connectivity enhancements to DMX expand usage by end-users and partners.
“Analysing Big Data is critical to our customers’ ability to sustain competitiveness, but the avalanche of information is breaking traditional data integration architectures ─ many of the tools are too code- and resource-intensive and ultimately drive costs too high,” said Josh Rogers, Senior Vice President, Data Integration business, Syncsort. “With our new DMX editions, we are strengthening Hadoop by providing seamless and powerful ETL and sort capabilities, and at the same time reinvigorating the value proposition of ETL by leveraging the power of Hadoop to scale core processing of Big Data.”
The new DMX-h solutions take advantage of Syncsort’s recent contribution to Apache Hadoop, which provides a unique level of native integration to deliver best-in-class data integration capabilities and Sort acceleration for Apache Hadoop distributions.
Highlights of the DMX-h ETL include:
Recent Syncsort benchmarks show significant Hadoop performance and resource efficiency improvements when using DMX-h. More importantly, the results show very predictable and sustainable throughput even as data volumes grow. Using the TeraSort benchmark, DMX-h Sort Edition achieved a sustainable throughput of over 100 megabytes per second per node (MB/S/N) delivering upwards of 2x higher throughput per node than Hadoop's native sort at 45 MB/S/N. Similarly, DMX-h ETL Edition achieved sustainable throughput in excess of 255 MB/S/N for up to 2.5x faster performance than Pig when aggregating 2TB of Web log data. In both cases, tests were run for data volumes ranging from 500GB to 2TB of data. While alternatives such as Hadoop's native sort and Pig reach a saturation point - where throughput starts to decline - at around 500GB of data, DMX-h delivered sustainable and predictable performance from 500GB to 2TB. The implications are huge for organisations, as they can more efficiently size their Hadoop infrastructure, minimise uncertainty, and achieve a more predictable cost–structure as Big Data becomes even bigger.
“Hadoop is lowering the cost structure of processing data at scale, but deploying Hadoop at the enterprise level is not free, and significant hardware and IT productivity costs can damage ROI,” said Evan Quinn, Senior Principal Analyst, Enterprise Strategy Group. “Syncsort’s Spring ’13 release provides unique capabilities in Hadoop to help maximise savings, delivering best-in-class ETL technology at a price point that is highly disruptive for the data integration market and more consistent with the cost structure of open source solutions.”
“In tag management, we facilitate a huge number of interactions between marketers and their vendors and, as a result, we are able to see the complex journey a consumer takes prior to making a purchase. This involves a huge amount of data processing. To be competitive, we must convert the high volume of ‘path-to-purchase’ data captured by our platform into actionable intelligence that drives decisions by both marketers and their vendors,” said Ave Wrigely, CTO of TagMan. “What’s compelling about Syncsort’s latest DMX product deliveries is the unique approach to replacing older code-driven approaches with a streamlined, GUI-driven way to collect, cleanse, and distribute information inside and outside of Hadoop, saving time and resources and giving us maximum flexibility in preparing Big Data for business analytics and data visualisation.”
“Cloudera sees ETL as one of the top use cases for Hadoop ─ it is essential to our mission of maximising the value of big data,” said Amr Awadallah, Chief Technology Officer, Cloudera. ”We see Syncsort’s new DMX-h offerings enabling our mutual customers with critical data integration and ETL capabilities which simplify ETL deployments while efficiently processing data natively on Hadoop. The CDH 4.2 release includes Syncsort’s contribution to Apache Hadoop making the sort phase pluggable, enabling DMX-h, and broadening use cases on Hadoop.”
Fast Start DMX-h ETL Test Drive
Anyone looking to leverage DMX-h ETL can now download a free test drive that contains everything they require without the need to set up their own Hadoop cluster. It includes a Linux Virtual Machine with Cloudera CDH 4.2 and DMX-h ETL Edition pre-installed, along with use case accelerators and sample data.
About Syncsort’s Data Integration Business
Syncsort provides data-intensive organisations across the big data continuum with a smarter way to collect and process the ever-expanding data avalanche. With thousands of deployments across all major platforms, including mainframe, Syncsort helps customers around the world to overcome the architectural limits of today’s ETL and Hadoop environments, empowering their organisations to drive better business outcomes in less time, with fewer resources and lower TCO. For more information, visit www.syncsort.com.
Published by: IT Analysis Communications Ltd.
T: +44 (0)190 888 0760 | F: +44 (0)190 888 0761