Technology -> Big Data
By: Dana Gardner, Principal Analyst, Interarbor Solutions
Published: 11th April 2013
Copyright Interarbor Solutions © 2013
A data dichotomy has changed the face of information management, bringing with it huge new data challenges for businesses to solve.
The dichotomy means that organizations, both large and small, not only need to manage all of their internal data to provide intelligence about their businesses, they need to manage the growing reams of increasingly external big data that enables them to discover new customers and drive new revenue.
The latest BriefingsDirect software how-to discussion then focuses on bringing far higher levels of automation and precision to the task of solving such varied data complexity. By embracing an agnostic, end-to-end tool chain approach to overall data and information management, businesses are both solving complexity and managing data better as a lifecycle.
To gain more insights on where the information management market has been and where it's going, we are joined by Matt Wolken, Executive Director and General Manager for Information Management at Dell Software. The discussion is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions. [Disclosure: Dell Software is a sponsor of BriefingsDirect podcasts.]
Here are some excerpts:
Gardner: What are the biggest challenges that businesses need to solve now when it comes to data and information management?
Wolken: About 10 or 15 years ago, the problem was that data was sitting in individual databases around the company, either in a database on the backside of an application, the customer relationship management (CRM) application, the enterprise resource planning (ERP) application, or in data marts around the company. The challenge was how to bring all this together to create a single cohesive view of the company?
That was yesterday's problem, and the answer was technology. The technology was a single, large data warehouse. All of the data was moved to it, and you then queried that larger data warehouse where all of the data was for a complete answer about your company.
What we're seeing now is that there are many complexities that have been added to that situation over time. We have different vendor silos with different technologies in them. We have different data types, as the technology industry overall has learned to capture new and different types of data—textual data, semi-structured data, and unstructured data—all in addition to the already existing relational data. Now, you have this proliferation of other data types and therefore other databases.
The other thing that we notice is that a lot of data isn't on premise any more. It's not even owned by the company. It's at your software-as-a-service (SaaS) provider for CRM, your SaaS provider for ERP, or your travel or human resources (HR) provider. So data again becomes siloed, not only by vendor and data type, but also by location. This is the complexity of today, as we notice it.
All of this data is spread about, and the challenge becomes how do you understand and otherwise consume that data or create a cohesive view of your company? Then there is still the additional social data in the form of Twitter or Facebook information that you wouldn't have had in prior years. And it's that environment, and the complexity that comes with it, that we really would like to help customers solve.
Gardner: When it comes to this so-called data dichotomy, is it oversimplified to say it's internal and external, or is there perhaps a better way to categorize these larger sets that organizations need to deal with?
Wolken: There's been a critical change in the way companies go about using data. There are some people who want to use data for an outcome-based result. This is generally what I would call the line-of-business concern, where the challenge with data is how do I derive more revenue out of the data source that I am looking at.
What's the business benefit for me examining this data? Is there a new segment I can codify and therefore market to? Is there a campaign that's currently running that is not getting a good response rate, and if so, do I want to switch to another campaign or otherwise improve it midstream to drive more real value in terms of revenue to the company?
That’s the more modern aspect of it. All of the prior activities inside business intelligence (BI)—let’s flip those words around and say intelligence about the business—was really internally focused. How do I get sanctioned data off of approved systems to understand the official company point of view in terms of operations?
That second goal is not a bad goal. That's still a goal that's needed, and IT is still required to create that sanctioned data, that master data, and the approved, official sources of data. But there is this other piece of data, this other outcome that's being warranted by the line of business, which is, how do I go out and use data to derive a better outcome for my business? That's more operationally revenue-oriented, whereas the internal operations are around cost orientation and operations.
So where you get executive dashboards for internal consumption off of BI or intelligence for the business, the business units themselves are about visualization, exploration, and understanding and driving new insights.
It's a change in both focus and direction. It sometimes ends up in a conflict between the groups, but it doesn't really have to be that way. At least, we don't think it does. That's something that we try to help people through: How do you get the sanctioned data you need, but also bring in this third-party data and unstructured data and add nuance to what you are seeing about your company.
Gardner: Do traditional technology offerings allow this dichotomy to be joined, or do we need a different way to create these insights across both internal and external information?
Wolken: There are certainly ways to get to anything. But if you're still amending program after program or technology after technology, you end up with something less than the best path, and there might be new and better ways of doing things.
There are lots of ways to take a data warehouse forward in today's environment, manipulate other forms of data so it can enter a data warehouse or relational data warehouse, and/or go the other way and put everything into an unstructured environment, but there's also another way to approach things, and that’s with an agnostic tool chain.
Tools have existed in the traditional sense for a long time. Generally, a tool is utilized to hide complexity and all of the issues underneath the tool itself. The tool has intelligence to comprehend all of the challenges below it, but it really abstracts that from the user.
We think that instead of buying three or four database types, a structured database, something that can handle text, a solution that handles semi-structured or structured, or even a high performance analytical engine for that matter, what if the tool chain abstracts much of that complexity? This means the tools that you use every day can comprehend any database type, data structure type, or any vendor changes or nuances between platforms.
That's the strategy we’re pursuing at Dell. We’re defining a set of tools—not the underlying technologies or proliferation of technologies—but the tools themselves, so that the day-to-day operations are hidden from the complexity of those underlying sources of vendor, data type, and location.
That's how we really came at it—from a tool-chain perspective, as opposed to deploying additional technologies. We’re looking to enable customers to leverage those technologies for a smoother, more efficient, and more effective operation.
Let's just take data integration as a point. I can sometimes go after certain siloed data integration products. I can go after a data product that goes after cloud resources. I can get a data product that only goes after relational. I can get another data product to extract or load into Hive or Hadoop. But what if I had one that could do all of that? Rather than buying separate ones for the separate use cases, what if you just had one?
Gardner: What are the stakes here? What do you get if you do this right?
Wolken: There are a couple of ways we think about it, one of which is institutional knowledge. Previously, if you brought in a new tool into your environment to examine a new database type, you would probably hire a person from the outside, because you needed to find that skill set already in the market in order to make you productive on day one.
Instead of applying somebody who knows the organization, the data, the functions of the business, you would probably hire the new person from the outside. That's generally retooling your organization.
Or, if you switch vendors, that causes a shift as well. One primary vendor stack is probably a knowledge and domain of one of your employees, and if you switch to another vendor stack or require another vendor stack in your environment, you're probably going to have to retool yet again and find new resources. So that's one aspect of human knowledge and intelligence about the business.
There is a value to sharing. It's a lot harder to share across vendor environments and data environments if the tools can't bridge them. In that case, you have to have third-party ways to bridge those gaps between the tools. If you have sharing that occurs natively in the tool, then you don't have to cross that bridge, you don't have the delay, and you don't have the complexity to get there.
So there is a methodology within the way you run the environment and the way employees collaborate that is also accelerated. We also think that training is something that can benefit from this agnostic approach.
But also, generically, if you're using the same tools, then things like master data management (MDM) challenges become more comprehensive, if the tool chain understands where that MDM is coming from, and so on.
You also codify how and where resources are shared. So if you have a person who has to provision data for an analyst, and they are using one tool to reach to relational data, another to reach into another type of data, or a third-party tool to reach into properties and SaaS environments, then you have an ineffective process.
You're reaching across domains and you're not as effective as you would be if you could do that all with one tool chain.
So those are some of the high-level ideas. That's why we think there's value there. If you go back to what would have existed maybe 10 or 15 years ago, you had one set of staff who used one set of tools to go back against all relational data. It was a construct that worked well then. We just think it needs to be updated to account for the variance within the nuances that have come to the fore as the technology has progressed and brought about new types of technology and databases.
Gardner: What are typically some of the business paybacks, and do they outweigh the cost?
Wolken: It all depends on how you go about it. There are lots of stories about people who go on these long investment cycles into some massive information management strategy change without feeling like they got anything out of it, or at least were productive or paid back the fee.
There's a different strategy that we think can be more effective for organizations, which is to pursue smaller, bite-size chunks of objective action that you know will deliver some concrete benefit to the company. So rather than doing large schemes, start with smaller projects and pursue them one at a time incrementally—projects that last a week and then you have 52 projects that you know derive a certain value in a given time period.
Other things we encourage organizations to do deal directly with how you can use data to increase competitiveness. For starters, can you see nuances in the data? Is there a tool that gives you the capability to see something you couldn't see before? So that's more of an analytical or discovery capability.
There's also a capability to just manage a given data type. If I can see the data, I can take advantage of it. If I can operate that way, I can take advantage of it.
Another thing to think about is what I would call a feedback mechanism, or the time or duration of observation to action. In this case, I'll talk about social sentiment for a moment. If you can create systems that can listen to how your brand is being talked about, how your product is being talked about in the environment of social commentary, then the feedback that you're getting can occur in real time, as the comments are being posted.
Now, you might think you'll get that anyway. I would have gotten a letter from a customer two weeks from now in the postal system that provided me that same feedback. That’s true, but sometimes that two weeks can be a real benefit.
Imagine a marketing campaign that's currently running in the East, with a companion program in the West that's slightly different. Let's say it's a two-week program. It would be nice if, during the first week, you could be listening to social media and find out that the campaign in the West is not performing as well as the one in the East, and then change your investment thesis around the program—cancel the one that's not performing well and double down on the one that's performing well.
There's a feedback mechanism increase that also can then benefit from handling data in a modern way or using more modern resources to get that feedback. When I say modern resources, generally that's pointing towards unstructured data types or textual data types. Again, if you can comprehend and understand those within your overall information management status, you now also have a feedback mechanism that should increase your responsiveness and therefore make your business more competitive as well.
Gardner: Given that these payoffs could be so substantial, what's between companies and the feedback benefits?
Wolken: I think it's complexity of the environment. If you only had relational systems inside your company previously, now you have to go out and understand all of the various systems you can buy, qualify those systems, get pure feedback, have some proofs of concept (POCs) in development, come in and set all these systems up, and that just takes a little bit of time. So the more complexity you invite into your environment, the more challenges you have to deal with.
After that, you have to operate and run it every day. That's the part where we think the tool chain can help. But as far as understanding the environment, having someone who can help you walk through the choices and solutions and come up with one that is best suited to your needs, that’s where we think we can come in as a vendor and add lots of value.
When we go in as a vendor, we look at the customer environment as it was, compare that to what it is today, and work to figure out where the best areas of collaboration can be, where tools can add the most value, and then figure out how and where can we add the most benefit to the user.
What systems are effective? What systems collaborate well? That's something that we have tried to emulate, at least in the tool space. How do you get to an answer? How do you drive there? Those are the questions we’re focused on helping customers answers.
For example, if you've never had a data warehouse before, and you are in that stage, then creating your first one is kind of daunting, both from a price perspective, as well as complexity perspective or know-how. The same thing can occur on really any aspect—textual data, unstructured data, or social sentiment.
Each one of those can appear daunting if you don't have a skill set, or don't have somebody walking you through that process who has done it before. Otherwise, it's trying to put your hands on every bit of data and consume what you can and learning through that process.
Those are some of the things that are really challenging, especially if you're a smaller firm that has a limited number of staff and there's this new demand from the line of business, because they want to go off in a different direction and have more understanding that they couldn't get out of existing systems.
How do you go out and attain that knowledge without duplicating the team, finding new vendor tools, and adding complexity to your environment, maybe even adding additional data sources, and therefore more data-storage requirements. Those are some of the major challenges—complexity, cost, knowledge, and know-how.
Gardner: Why are mid-market organizations now more able to avail themselves of some of these values and benefits than in the past?
Wolken: As the products are well-known, there is more trained staff that understands the more common technologies. There are more codified ways of doing things that a business can take advantage of, because there's a large skill set, and most of the employees may already have that skill set as you bring them into the company.
There are also some advantages just in the way technologies have advanced over the years. Storage used to be very expensive, and then it got a little cheaper. Then solid-state drives (SSD) came along and then that got cheaper as well. There are some price point advantages in the coming years, as well.
Dell overall has maintained the status that we started with when Michael Dell started recreating PCs in his dorm room from standard product components to bring the price down. That model of making technology attainable to larger numbers of people has continued throughout Dell’s history, and we’re continuing it now with our information management software business.
We’re constantly thinking about how we can reduce cost and complexity for our customers. One example would be what we call Quickstart Data Warehouse. It was designed to democratize data to a lower price point, to bring the price and complexity down to a much lower space, so that more people can afford and have their first data warehouse.
We worked with our partner Microsoft, as well as Dell’s own engineering team, and then we qualified the box, the hardware, and the systems to work to the highest peak performance. Then, we scripted an upfront install mechanism that allows the process to be up and running in 45 minutes with little more than directing a couple of IP addresses. You plug the box in, and it comes up in 45 minutes, without you having to have knowledge about how to stand up, integrate, and qualify hardware and software together for an outcome we call a data warehouse.
Another thing we did was include Boomi, which is a connector to automatically go out and connect to the data sources that you have. It's the mechanism by which you bring data into it. And lastly, we included services, in case there were any other questions or problems you had to set it up.
If you have a limited staff, and if you have to go out and qualify new resources and things you don't understand, and then set them up and then actually run them, that’s a major challenge. We're trying to hit all of the steps, and the associated costs—time and/or personnel costs—and remove them as much as we can.
It's one way vendors like Dell are moving to democratize business intelligence a little further, bring it to a lower price point than customers are accustomed to and making it more available to firms that either didn’t have that luxury of that expertise link sitting around the office, or who found that the price point was a little too high.
Gardner: You mentioned this concept of the tool chain several times—being agnostic to the data type, holistic management, complete view, and then of course integrate it. What is it about the tool chain that accomplishes both a comprehensive value, but also allows it to be adopted on a fairly manageable path, rather than all at once?
Wolken: One of the things we find advantageous about entering the market at this point in time is that we're able to look at history, observe how other people have done things over time, and then invest in the market with the realization that maybe something has changed here and maybe a new approach is needed.
Whereas the industry has typically gone down the path of each new technology or advancement of technology requires a new tool, a new product, or a new technology solution, we’ve been able to stand back and see the need for a different approach. We just have a different point of view, which is that an agnostic tool chain can enable organizations to do more.
So when we look at database tools, as an example, we would want a tool that works against all database types, as opposed to one that works against only a single vendor or type of data.
The other thing that we look at is if you walk into an average company today, there are already a lot of things laying around the business. A lot of investment has already been made.
We wanted to be able to snap in and work with all of the existing tools. So, each of the tools that we’ve acquired, or have created inside the company, were made to step into an existing environment, recognize that there were other products already in the environment, and recognize that they probably came from a different vendor or work on a different data type.
That’s core to our strategy. We recognize that people were already facing complexity before we even came into the picture, so we’re focused on figuring out how we snap into what they already have in place, as opposed to a rip-and-replace strategy or a platform strategy that requires all of the components to be replaced or removed in order for the new platform to take its place.
What that means is tools should be agnostic, and they should be able to snap into an environment and work with other tools. Each one of the products in the tool chain we’ve assembled was designed from that point of view.
But beyond that, we’ve also assembled a tool chain in which the entirety of the chain delivers value as a whole. We think that every point where you have agnosticism or every point where you have a tool that can abstract that lower amount of complexity, you have savings.
You have a benefit, whether it’s cost savings, employee productivity, or efficiency, or the ability to keep sanctioned data and a set of tools and systems that comprehend it. The idea being that the entirety of the tool chain provides you with advantages above and beyond what the individual components bring.
Now, we're perfectly happy to help a customer at any point where they have difficultly and any point where our tools can help them, whether it's at the hardware layer, from the traditional Dell way, at the application layer, considering a data warehouse or otherwise, or at the tool layer. But we feel that as more and more of the portfolio—the tool chain—is consumed, more and more efficiency is enabled.
Gardner: It also sounds as if this sets you up for a data and information lifecycle benefits, not just on the business and BI benefits, but also on the IT benefits.
Wolken: One of the problems that you uncover is that there's a lot of data being replicated in a lot of places. One of the advantages that we've put together in the tool chain was to use virtualization as a capability, because you know where data came from and you know that it was sanctioned data. There's no reason to replicate that to disk in another location in the company, if you can just reach into that data source and pull that forward for a data analyst to utilize.
You can virtually represent that data to the user, without creating a new repository for that person. So you're saving on storage and replication costs. So if you’re looking for where is there efficiency in the lifecycle of data and how can you can cut some of those costs, that’s something that jumps right out.
Doing that, you also solve the problem of how to make sure that the data that was provisioned was sanctioned. By doing all of these things, by creating a virtual view, then providing that view back to the analyst, you're really solving multiple pieces of the puzzle at the same time. It really enables you to look at it from an information-management point of view.
Gardner: How should enterprises and mid-market firms get started?
Wolken: Most companies aren’t just out there asking how they can get a new tool chain. That's not really the strategy most people are thinking about. What they are asking is how do I get to the next stage of being an intelligent company? How do I improve my maturity in business intelligence? How would I get from Excel spreadsheets without a data warehouse to a data warehouse and centralized intelligence or sanctioned data?
Each one of these challenges come from a point of view of, how do I improve my environment based upon the goals and needs that I am facing? How do I grow up as a company and get to be more of a data-based company?
Somebody else might be faced with more specific challenges, such a line of business is now asking me for Twitter data, and we have no systems or comprehension to understand that. That's really the point where you ask, what's going to be my strategy as I grow and otherwise improve my business intelligence environment, which is morphing every year for most customers.
That's the way that most people would start, with an existing problem and an objective or a goal inside the company. Generically, over time, the approach to answering it has been you buy a new technology from a new vendor who has a new silo, and you create a new data mart or data warehouse. But this is perpetuating the idea that technology will solve the problem. You end up with more technologies, more vendor tools, more staff, and more replicated data. We think this approach has become dated and inefficient.
But if, as an organization, you can comprehend that maybe there is some complexity that can be removed, while you're making an investment, then you free yourself to start thinking about how you can build a new architecture along the way. It's about incremental improvement as well as tangible improvement for each and every step of the information management process.
So rather than asking somebody to re-architect and rip and replace their tool chain or the way they manage the information lifecycle, I would say you sort of lean into it in a way.
If you're really after a performance metric and you feel like there is a performance issue in an environment, at Dell we have a number of resources that actually benchmark and understand the performance and where bottlenecks are in systems.
Sometimes there’s an issue occurring inside the database environment. Sometimes it's at the integration layer, because integration isn’t happening as well as you think. Sometimes it's at the data warehouse layer, because of the way the data model was set up. Whatever the case, we think there is value in understanding the earlier parts of the chain, because if they’re not performing well, the latter parts of the chain can’t perform either.
And so at each step, we've looked at how you ensure the performance of the data. How do you ensure the performance of the integration environment? How do you ensure the performance of the data warehouse as well? We think if each component of the tool chain in working as well as it should be, then that’s when you enable the entirety of your solution implementation to truly deliver value.
We automatically stop accepting comments 180 days after a post is published. If you would like to know more about this subject, please contact us and we'll try to help.
Published by: electronicdawn Ltd.