Technology -> Data Management
By: Dana Gardner, Principal Analyst, Interarbor Solutions
Published: 11th July 2013
Copyright Interarbor Solutions © 2013
Debunking myths around big data should be a first step to making better business decisions for improving data analysis and data management capabilities in your company.
As the volume and purpose of data and business intelligence (BI) has dramatically shifted, older notions and misconceptions—what amount to myths about data infrastructure—need to updated and corrected, too.
So we're here to pose some better questions about data, and provide up-to-date answers for running data-driven businesses that can efficiently and repeatedly predict dynamic market trends and customer wants in real time.
As the volume and types of data that are brought to bear on business analytics advance, the means to manage and exploit that sea of data needs to be none too costly nor too complex for mid-size companies to master. There are better ways than traditional data architectures.
To help identify what works best around modern big data management, BriefingsDirect interviews Darin Bartik, Executive Director of Products in the Information Management Group at Dell Software. The discussion is conducted by Dana Gardner, Principal Analyst at Interarbor Solutions. [Disclosure: Dell is a sponsor of BriefingsDirect podcasts.]
Here are some excerpts:
Gardner: Are people losing sight of the business value by getting lost in speeds and feeds and technical jargon around big data? Is there some sort of a disconnect between the providers and consumers of big data?
Bartik: You hit the nail on the head with the first question. We are experiencing a disconnect between the technical side of big data and the business value of big data, and that’s happening because we’re digging too deeply into the technology.
With a term like big data, or any one of the trends that the information technology industry talks about so much, we tend to think about the technical side of it. But with analytics, with the whole conversation around big data—what we've been stressing with many of our customers—is that it starts with a business discussion. It starts with the questions that you're trying to answer about the business; not the technology, the tools, or the architecture of solving those problems. It has to start with the business discussion.
That’s a pretty big flip. The traditional approach to BI and reporting has been one of technology frameworks, and a lot of things that were owned more by the IT group. This is part of the reason why a lot of the BI projects of the past struggled, because there was a disconnect between the business goals and the IT methods.
So you're right. There has been a disconnect, and that’s what I've been trying to talk a lot about with customers—how to refocus on the business issues you need to think about, especially in the mid-market, where you maybe don’t have as many resources at hand. It can be pretty confusing.
I've been a part of Dell Software since the acquisition of Quest Software. I was a part of that organization for close to 10 years. I've been in technology coming up on 20 years now. I spent a lot of time in enterprise resource planning (ERP), supply chain, and monitoring, performance management, and infrastructure management, especially on the Microsoft side of the world.
Most recently, as part of Quest, I was running the database management area—a business very well-known for its products around Oracle, especially Toad, as well as our SQL Server management capabilities. We leveraged that expertise when we started to evolve into BI and analytics.
I started working with Hadoop back in 2008–2009, when it was still very foreign to most people. When Dell acquired Quest, I came in and had the opportunity to take over the Products Group in the ever-expanding world of information management. We're part of the Dell Software Group, which is a big piece of the strategy for Dell overall, and I'm excited to be here.
Without disparaging the vendors like us, or anyone else, the current confusion is part of the problem of any hype cycle. Many people jumped on the bandwagon of big data. Just like everyone was talking cloud. Everyone was talking virtualization, bring your own device (BYOD), and so forth.
Everyone jumps on these big trends. So it's very confusing for customers, because there are many different ways to come at the problem. This is why I keep bringing people back to staying focused on what the real opportunity is. It’s a business opportunity, not a technical problem or a technical challenge that we start with.
Gardner: Even the name 'big data' stirs up myths right from the get-go, with 'big' being a very relative term. Should we only be concerned about this when we have more data than we can manage? What is the relative position of big data and what are some of the myths around the size issue?
Bartik: That’s the perfect one to start with. The first word in the definition is actually part of the problem. "Big." What does big mean? Is there a certain threshold of petabytes that you have to get to? Or, if you're dealing with petabytes, is it not a problem until you get to exabytes?
It’s not a size issue. When I think about big data, it's really a trend that has happened as a result of digitizing so much more of the information that we all have already and that we all produce. Machine data, sensor data, all the social media activities, and mobile devices are all contributing to the proliferation of data.
It's added a lot more data to our universe, but the real opportunity is to look for small elements of small datasets and look for combinations and patterns within the data that help answer those business questions that I was referencing earlier.
It's not necessarily a scale issue. What is a scale issue is when you get into some of the more complicated analytical processes and you need a certain data volume to make it statistically relevant. But what customers first want to think about is the business problems that they have. Then, they have to think about the datasets that they need in order to address those problems.
That may not be huge data volumes. You mentioned mid-market earlier. When we think about some organizations moving from gigabytes to terabytes, or doubling data volumes, that’s a big data challenge in and of itself.
Analyzing big data won't necessarily contribute to your solving your business problems if you're not starting with the right questions. If you're just trying to store more data, that’s not really the problem that we have at hand. That’s something that we can all do quite well with current storage architectures and the evolving landscape of hardware that we have.
We all know that we have growing data, but the exact size, the exact threshold that we may cross, that’s not the relevant issue.
Gardner: I suppose this requires prioritization, which has to come from the business side of the house. As you point out, some statistically relevant data might be enough. If you can extrapolate and you have enough to do that, fine, but there might be other areas where you actually want to get every little bit of possible data or information relevant, because you don't know what you're looking for. They are the unknown unknowns. Perhaps there's some mythology about all data. It seems to me that what’s important is the right data to accomplish what it is the business wants.
Bartik: Absolutely. If your business challenge is an operational efficiency or a cost problem, where you have too much cost in the business and you're trying to pull out operational expense and not spend as much on capital expense, you can look at your operational data.
Maybe manufacturers are able to do that and analyze all of the sensor, machine, manufacturing line, and operational data. That's a very different type of data and a very different type of approach than looking at it in terms of sales and marketing.
If you're a retailer looking for a new set of customers or new markets to enter in terms of geographies, you're going to want to look at maybe census data and buying behavior data of the different geographies. Maybe you want datasets that are outside your organization entirely. You may not have the data in your hands today. You may have to pull it in from outside resources. So there's a lot of variability and prioritization that all starts with that business issue that you're trying to address.
Gardner: Perhaps it's better for the business to identify the important data, rather than the IT people saying it’s too big or that big means we need to do something different. It seems like a business term rather than a tech term at this point.
Bartik: I agree with you. The more we can focus on bringing business and IT to the table together to tackle this challenge, the better. And it does start with the executive management in the organization trying to think about things from that business perspective, rather than starting with the IT infrastructure management team.
Gardner: What’s our second myth?
Bartik: I'd think about the idea of people and the skills needed to address this concept of big data. There is the term "data scientist" that has been thrown out all over the place lately. There’s a lot of discussion about how you need a data scientist to tackle big data. But big data isn't necessarily the way you should think about what you’re trying to accomplish. Instead, think about things in terms of being more data driven, and in terms of getting the data you need to address the business challenges that you have. That’s not always going to require the skills of a data scientist.
I suspect that a lot of organizations would be happy to hear something like that, because data scientists are very rare today, and they're very expensive, because they are rare. Only certain geographies and certain industries have groomed the true data scientist. That's a unique blend between a data engineer and someone like an applied scientist, who can think quite differently than just a traditional BI developer or BI programmer.
Don’t get stuck on thinking that, in order to take on a data-driven approach, you have to go out and hire a data scientist. There are other ways to tackle it. That’s where you're going to combine people who can do the programming around your information, around the data management principles, and the people who can ask and answer the open-minded business questions. It doesn’t all have to be encapsulated into that one magical person that’s known now as the data scientist.
There are varying degrees of tackling this problem. You can get into very sophisticated algorithms and computations for which a data scientist may be the one to do that heavy lifting. But for many organizations and customers that we talk to everyday, it’s something where they're taking on their first project and they are just starting to figure out how to address this opportunity.
For that, you can use a lot of the people that you have inside your organization, as well as, potentially, consultants that can just help you break through some of the old barriers, such as thinking about intelligence, based strictly on a report and a structured dashboard format.
That’s not the type of approach we want to take nowadays. So often a combination of programming and some open-minded thinking, done with a team-oriented approach, rather than that single keyhole person, is more than enough to accomplish your objectives.
Gardner: It seems also that you're identifying confusion on the part of some to equate big data with BI and BI with big data. The data is a resource that BI can use to offer certain values, but big data can be applied to doing a variety of other things. Perhaps we need to have a sub-debunking within this myth, and that is that big data and BI are different. How would you define them and separate them?
Bartik: That's a common myth. If you think about BI in its traditional, generic sense, it’s about gaining more intelligence about the business, which is still the primary benefit of the opportunity this trend of big data presents to us. Today, I think they're distinct, but over time, they will come together and become synonymous.
I equate it back to one of the more recent trends that came right before big data, cloud. In the beginning, most people thought cloud was the public cloud concept. What’s turned out to be true is that it’s more of a private cloud or a hybrid cloud, where not everything moved from an on-premise traditional model, to a highly scalable, highly elastic public cloud. It’s very much a mix.
They've kind of come together. So while cloud and traditional data centers are the new infrastructure, it’s all still infrastructure. The same is true for big data and BI, where BI, in the general sense of how can we gain intelligence and make smarter decisions about our business, will include the concept of big data.
So while we'll be using new technologies, which would include Hadoop, predictive analytics, and other things that have been driven so much faster by the trend of big data, we’ll still be working back to that general purpose of making better decisions.
One of the reasons they're still different today is because we’re still breaking some of the traditional mythology and beliefs around BI—that BI is all about standard reports and standard dashboards, driven by IT. But over time, as people think about business questions first, instead of thinking about standard reports and standard dashboards first, you’ll see that convergence.
Gardner: We probably need to start thinking about BI in terms of a wider audience, because all the studies I've seen don't show all that much confidence and satisfaction in the way BI delivers the analytics or the insights that people are looking for. So I suppose it's a work in progress when it comes to BI as well.
Bartik: Two points on that. There has been a lot of disappointment around BI projects in the past. They've taken too long, for one. They've never really been finished, which of course, is a problem. And for many of the business users who depend on the output of BI—their reports, their dashboard, their access to data—it hasn’t answered the questions in the way that they may want it to.
One of the things in front of us today is a way of thinking about it differently. Not only is there so much data, and so much opportunity now to look at that data in different ways, but there is also a requirement to look at it faster and to make decisions faster. So it really does break the old way of thinking.
Slowness is unacceptable. Standard reports don't come close to addressing the opportunity in front us, which is to ask a business question and answer it with the new way of thinking supported by pulling together different datasets. That’s fundamentally different from the way we used to do it.
People are trying to make decisions about moving the business forward, and they're being forced to do it faster. Historical reporting just doesn't cut it. It’s not enough. They need something that’s much closer to real time. It’s more important to think about open-ended questions, rather than just say, "What revenue did I make last month, and what products made that up?" There are new opportunities to go beyond that.
Gardner: When it comes to these technology issues, do you also find, Darin, that there is a lack of creativity as to where the data and information resides or exists and thinking not so much about being able to run it, but rather acquire it? Is there a dissonance between the data I have and the data I need. How are people addressing that?
Bartik: There is and there isn’t. When we look at the data that we have, that’s oftentimes a great way to start a project like this, because you can get going faster and it’s data that you understand. But if you think that you have to get data from outside the organization, or you have to get new datasets in order to answer the question that’s in front of us, then, again, you're going in with a predisposition to a myth.
You can start with data that you already have. You just may not have been looking at the data that you already have in the way that’s required to answer the question in front of you. Or you may not have been looking at it all. You may have just been storing it, but not doing anything with it.
Storing data doesn’t help you answer questions. Analyzing it does. It seems kind of simple, but so many people think that big data is a storage problem. I would argue it's not about the storage. It’s like backup and recovery. Backing up data is not that important, until you need to recover it. Recovery is really the game changing thing.
Gardner: It’s interesting that with these myths, people have tended, over the years, without having the resources at hand, to shoot from the hip and second-guess. People who are good at that and businesses that have been successful have depended on some luck and intuition. In order to take advantage of big data, which should lead you to not having to make educated guesses, but to have really clear evidence, you can apply the same principle. It's more how you get big data in place, than how you would use the fruits of big data.
It seems like a cultural shift we have to make. Let’s not jump to conclusions. Let’s get the right information and find out where the data takes us.
Bartik: You've hit on one of the biggest things that’s in front of us over the next three to five years—the cultural shift that the big data concept introduces.
We looked at traditional BI as more of an IT function, where we were reporting back to the business. The business told us exactly what they wanted, and we tried to give that to them from the IT side of the fence.
But being successful today is less about intuition and more about being a data-driven organization, and, for that to happen, I can't stress this one enough, you need executives who are ready to make decisions based on data, even if the data may be counter intuitive to what their gut says and what their 25 years of experience have told them.
They're in a position of being an executive primarily because they have a lot of experience and have had a lot of success. But many of our markets are changing so frequently and so fast, because of new customer patterns and behaviors, because of new ways of customers interacting with us via different devices. Just think of the different ways that the markets are changing. So much of that historical precedence no longer really matters. You have to look at the data that’s in front of us.
Because things are moving so much faster now, new markets are being penetrated and new regions are open to us. We're so much more of a global economy. Things move so much faster than they used to. If you're depending on gut feeling, you'll be wrong more often than you'll be right. You do have to depend on as much of a data-driven decision as you can. The only way to do that is to rethink the way you're using data.
Historical reports that tell you what happened 30 days ago don't help you make a decision about what's coming out next month, given that your competition just introduced a new product today. It's just a different mindset. So that cultural shift of being data-driven and going out and using data to answer questions, rather than using data to support your gut feeling, is a very big shift that many organizations are going to have to adapt to.
Executives who get that and drive it down into the organization, those are the executives and the teams that will succeed with big data initiatives, as opposed to those that have to do it from the bottom up.
Gardner: Listening to you Darin, I can tell one thing that isn’t a product of hype is just how important this all is. Getting big data right, doing that cultural shift, recognizing trends based on the evidence and in real-time as much as possible is really fundamental to how well many businesses will succeed or not.
So it's not hype to say that big data is going to be a part of your future and it's important. Let's move towards how you would start to implement or change or rethink things, so that you can not fall prey to these myths, but actually take advantage of the technologies, the reduction in costs for many of the infrastructures, and perhaps extend and exploit BI and big data problems.
Bartik: It's fair to say that big data is not just a trend; it's a reality. And it's an opportunity for most organizations that want to take advantage of it. It will be a part of your future. It's either going to be part of your future, or it's going to be a part of your competition’s future, and you're going to be struggling as a result of not taking advantage of it.
The first step that I would recommend—I've said it a few times already, but I don't think it can't be said too often—is pick a project that's going to address a business issue that you've been unable to address in the past.
What are the questions that you need to ask and answer about your business that will really move you forward? Not just, "What data do we want to look at?" That's not the question.
The question is what business issue do we have in front of us that will take us forward the fastest? Is it reducing costs? Is it penetrating a new regional market? Is it penetrating a new vertical industry, or evolving into a new customer set?
These are the kind of questions we need to ask and the dialogue that we need to have. Then let's take the next step, which is getting data and thinking about the team to analyze it and the technologies to deploy. But that's the first step—deciding what we want to do as a business.
That sets you up for that cultural shift as well. If you start at the technology layer, if you start at the level of let's deploy Hadoop or some type of new technology that may be relevant to the equation, you're starting backwards. Many people do it, because it's easier to do that than it is to start an executive conversation and to start down the path of changing some cultural behavior. But it doesn’t necessarily set you up for success.
Gardner: It sounds as if you know you're going on a road trip and you get yourself a Ferrari, but you haven't really decided where you're going to go yet, so you didn’t know that you actually needed a Ferrari.
Bartik: Yeah. And it's not easy to get a tent inside a Ferrari. So you have to decide where you're going first. It's a very good analogy.
Gardner: What are some of the other ways when it comes to the landscape out there? There are vendors who claim to have it all, everything you need for this sort of thing. It strikes me that this is more of an early period and that you would want to look at a best-of-breed approach or an ecosystem approach.
So are there any words of wisdom in terms of how to think about the assets, tools, approaches, platforms, what have you, or not to limit yourself in a certain way?
Bartik: There are countless vendors that are talking about big data and offering different technology approaches today. Based on the type of questions that you're trying to answer, whether it's more of an operational issue, a sales market issue, HR, or something else, there are going to be different directions that you can go in, in terms of the approaches and the technologies used.
I encourage the executives, both on the line-of-business side as well as the IT side, to go to some of the events that are the 'un-conferences', where we talk about the big-data approach and the technologies. Go to the other events in your industry where they're talking about this and learn what your peers are doing. Learn from some of the mistakes that they've been making or some of the successes that they've been having.
There's a lot of success happening around this trend. Some people certainly are falling into the pitfalls, but get smart by going to your peers and going to your industry influencer groups and learning more about how to approach this.
There are technical approaches that you can take. There are different ways of storing your data. There are different ways of computing and processing your data. Then, of course, there are different analytical approaches that get more to the open-ended investigation of data. There are many tools and many products out there that can help you do that.
Dell has certainly gone down this road and is investing quite heavily in this area, with both structured and unstructured data analysis, as well as the storage of that data. We're happy to engage in those conversations as well, but there are a lot of resources out there that really help companies understand and figure out how to attack this problem.
Gardner: In the past, with many of the technology shifts, we've seen a tension and a need for decision around best-of-breed versus black box, or open versus entirely turnkey, and I'm sure that's going to continue for some time.
But one of the easier ways or best ways to understand how to approach some of those issues is through some examples. Do we have any use cases or examples that you're aware of, of actual organizations that have had some of these problems? What have they put in place, and what has worked for them?
Bartik: I'll give you a couple of examples from two very different types of organizations, neither of which are huge organizations. The first one is a retail organization, Guess Jeans. The business issue they were tackling was, “How do we get more sales in our retail stores? How do we get each individual that's coming into our store to purchase more?”
We sat down and started thinking about the problem. We asked what data would we need to understand what’s happening? We needed data that helps us understand the buyer’s behavior once they come into the store. We don't need data about what they are doing outside the store necessarily, so let's look specifically at behaviors that take place once they get into the store.
We helped them capture and analyze video monitoring information. Basically it followed each of the people in the store and geospatial locations inside the store, based on their behavior. We tracked that data and then we compared against questions like did they buy, what did they buy, and how much did they buy. We were able to help them determine that if you get the customer into a dressing room, you're going to be about 50 percent more likely to close transactions with them.
So rather than trying to give incentives to come into the store or give discounts once they get into the store, they moved towards helping the store clerks, the people who ran the store and interacted with the customers, focus on getting those customers into a dressing room. That itself is a very different answer than what they might have thought of at first. It seems easy after you think about it, but it really did make a significant business impact for them in rather short order.
Now, they're also thinking about other business challenges that they have and other ways of analyzing data and other datasets, based on different business challenges, but that’s one example.
Another example is on the higher education side. In universities, one of the biggest challenges is having students drop out or reduce their class load. The fewer classes they take, or if they dropout entirely, it obviously goes right to the top and bottom line of the organization, because it reduces tuition, as well as the other extraneous expenses that students incur at the university.
The University of Kentucky went on an effort to reduce students dropping out of classes or dropping entirely out of school. They looked at a series of datasets, such as demographic data, class data, the grades that they were receiving, what their attendance rates were, and so forth. They analyzed many different data points to determine the indicators of a future drop out.
Now, just raising the student retention rate by one percent would in turn mean about $1 million of top-line revenue to the university. So this was pretty important. And in the end, they were able to narrow it down to a couple of variables that strongly indicated which students were at risk, such that they could then proactively intervene with those students to help them succeed.
The key is that they started with a very specific problem. They started it from the university's core mission: to make sure that the students stayed in school and got the best education, and that's what they are trying to do with their initiative. It turned out well for them.
These were very different organizations or business types, in two very different verticals, and again, neither are huge organizations that have seas of data. But what they did are much more manageable and much more tangible examples many of us can kind of apply to our own businesses.
Gardner: Those really demonstrate how asking the right questions is so important.
Darin, we're almost out of time, but I did want to see if we could develop a little bit more insight into the Dell Software road map. Are there some directions that you can discuss that would indicate how organizations can better approach these problems and develop some of these innovative insights in business?
Bartik: A couple of things. We've been in the business of data management, database management, and managing the infrastructure around data for well over a decade. Dell has assembled a group of companies, as well as a lot of organic development, based on their expertise in the data center for years. What we have today is a set of capabilities that help customers take more of a data-type agnostic view and a vendor agnostic view to the way they're approaching data and managing data.
You may have 15 tools around BI. You may have tools to look at your Oracle data, maybe new sets of unstructured data, and so forth. And you have different infrastructure environments set up to house that data and manage it. But the problem is that it's not helping you bring the data together and cross boundaries across data types and vendor toolset types, and that's the challenge that we're trying to help address.
We've introduced tools to help bring data together from any database, regardless of where it may be sitting, whether it's a data warehouse, a traditional database, a new type of database such as Hadoop, or some other type of unstructured data store.
We want to bring that data together and then analyze it. Whether you're looking at more of a traditional structured-data approach and you're exploring data and visualizing datasets that many people may be working with, or doing some of the more advanced things around unstructured data and looking for patterns, we’re focused on giving you the ability to pull data from anywhere.
We're investing very heavily, Dana, into the Hadoop framework to help customers do a couple of key things. One is helping the people that own data today, the database administrators, data analysts, the people that are the stewards of data inside of IT, advance their skills to start using some of these new technologies, including Hadoop.
It's been something that we have done for a very long time, making your C players B players, and your B players A players. We want to continue to do that, leverage their existing experience with structured data, and move them over into the unstructured data world as well.
The other thing is that we're helping customers manage data in a much more pragmatic way. So if they are starting to use data that is in the cloud, via Salesforce.com or Taleo, but they also have data on-premises sitting in traditional data stores, how do we integrate that data without completely changing their infrastructure requirements? With capabilities that Dell Software has today, we can help integrate data no matter where it sits and then analyze it based on that business problem.
We help customers approach it more from a pragmatic view, where you're taking a stepwise approach. We don't expect customers to pull out their entire BI and data-management infrastructure and rewrite it from scratch on day one. That's not practical. It's not something we would recommend. Take a stepwise approach. Maybe change the way you're integrating data. Change the way you're storing data. Change, in some perspective, the way you're analyzing data between IT and the business, and have those teams collaborate.
But you don't have to do it all at one time. Take that stepwise approach. Tackle it from the business problems that you're trying to address, not just the new technologies we have in front of us.
There's much more to come from Dell in the information management space. It will be very interesting for us and for our customers to tackle this problem together. We're excited to make it happen.
We automatically stop accepting comments 180 days after a post is published. If you would like to know more about this subject, please contact us and we'll try to help.
Published by: electronicdawn Ltd.