By Paul S. Barth, PhD
One of the critical gating factors to leveraging big data is understanding business needs. Most managers and users have little experience with advanced analytics and complex data systems, and their requirements often fall into two ends of a spectrum:
1. A small, incremental enhancement to today’s systems and reports.
Both ends of this spectrum lead to failure—the first doesn’t deliver significant business value, the second never gets built. Big data exacerbates the challenge in two ways. First, case studies and media coverage are raising expectations for the business value big data can deliver. Second, the variety, complexity, and quality of big data can increase scope and complexity of analytic systems dramatically. Even small increases in scope can drive significant growth in cost and development time. How, then, can business articulate its needs with enough vision to have a meaningful impact and a limited scope for rapid, low-risk delivery?
Historically, systems development methodologies have started with business requirements gathering, which captures current state capabilities and new features and functionality that are desired by the business. This process assumes, rightly, that the business needs are paramount, and that systems are designed to meet these needs and deliver tangible business value. Agile development has emerged to keep tight alignment between business and technology, and validate requirements along the way.
The transformational promise of big data is that it addresses unknown business requirements. According to MIT’s Erik Brynjolfsson,
“We have had a revolution in measurement, over the past few years, that has allowed businesses to understand in much more detail what their customers are doing, what their processes are doing, what their employees are doing. That tremendous improvement in measurement is creating new opportunities to manage things differently.”
Competing through data: Three experts offer their game plans
OCTOBER 2011, McKinsey Global Institute
In this world, data and analytics are critical to business strategy and, in turn, business requirements. New measurements provide insights into customer behavior, product effectiveness, pricing models, and service efficiencies—insights that define the real business opportunities. This turns the systems development model on its head: rather than managers describing requirements for new systems, managers use data and analytics systems to discover requirements for new business capabilities.
For example, one large bank knew that a portion of customers using their call-center would, if asked, purchase additional products and services during the call. However, it was too expensive—or inappropriate—to make an offer on every call. By analyzing billions of records of customer calls, web visits, and banking transactions, the bank identified the subset of customers most open to a sales discussion. Using this intelligence, they routed likely purchasers to specially trained representatives, whose sales increased 300%.
Without the analytics beforehand, the business requirements would be based on experience or intuitive rules, such as “Route all high-value customers to sales-ready representatives.” But this rule assumes that high-value customers are the best sales prospects (which, in fact, data analysis proved to be false.) Note that if there was an increase in sales for this segment due to other factors, it would mislead management into believing the rule was true. This highlights the importance of data analytics in developing business requirements: deep behavioral analytics are essential beforehand to accurately segmenting customers for differential treatment, and analytics are necessary afterward to understand what is working and what is not.
Business Insights Sandbox
To enable iterative business discovery, a new type of analytic environment must be created: the business insights sandbox. The sandbox is a place where business analysts and data scientists can quickly access and integrate enterprise data with external data, define new business metrics, and mine the data for latent trends and patterns. Basic business questions should be able to be answered in an hour; complex analyses shouldn’t take longer than a few days.
The main characteristics of the business are:
- Ready access to high-quality, documented enterprise data
- Pre-configured business intelligence, visualization, and data mining tools supporting business analysts, data analysts, and data scientists (e.g., statisticians and data miners)
- Large user data work spaces for loading external data, integrating data, and creating new derived business metrics for analysis and testing
- Analyses, definitions for business metrics, data quality and integration logic, and models produced on the sandbox are used as requirements for production systems. It is not used in any production processes.
This last point distinguishes this environment from a data warehouse, data mart, operational data store, or master data management hub—although these components simplify the creation and maintenance of the sandbox. Separating discovery processes from production responsibilities simplifies both environments. Because discovery must perform hundreds of analyses to gain an insight, speed and flexibility are more important than scale and robustness. The data governance and SLA’s surrounding the sandbox must be “light,” where individuals can load and integrate data, define new business logic, and work with moderate quality data while analyzing a hypothesis. As insights emerge, the logic and data quality is refined to ensure the accuracy of the insight. Unlike production systems, the sandbox needs to support a Darwinian process of trial-and-error, testing alternatives and variations until the best model emerges. The overhead of production data standards, common data models, and rigid quality controls would slow the iterative pace of discovery to a crawl.
That said, production systems and data sources are a valuable foundation for the sandbox. Our clients often employ a “virtual sandbox” architecture where a data warehouse or mart is extended with a domain supporting the sandbox. This domain can query the warehouse data, but not change it. Conversely, external data can be loaded into the domain and joined with the warehouse schema, but the external data is invisible to other warehouse users. This approach provides cost and performance benefits, but, more important, it ensures that analyses done in the sandbox stay aligned with production data.
Garnering value from business insights requires putting them into production, and this alignment is critical for generating business requirements. The analytic outputs of the sandbox are the foundation for solid, complete business requirements, including:
- the data sources needed
- cleansing, transformation, and integration logic
- business logic defining new metrics
- algorithms for segmentation and prediction
Because these requirements were developed through use of production data, they go much further toward a specification of the solution, and will accelerate the delivery of a quality solution. Also, new data and business logic developed in the sandbox will narrow the scope of sourcing and integration additional data to just that needed to put the insight into production.
Putting Data in the Drivers Seat
Positioning big data and analytics as a source of business insight, strategy, and requirements disrupts many current organizational paradigms. Most companies do not have experience using data to drive decisions, and those who do often put their data and analytics capabilities in a corner, making it hard to apply discoveries in production systems. Historically, says Brynjolfsson, “data was used more to confirm and support decisions that had already been made, rather than to learn new things and to discover the right answer.” That paradigm is changing, impacting more than just organizational structure—it demands new roles, skills, incentives, and business processes. Indeed, top executives often have to replace senior management to enable the changes for a data-driven strategy fueled by analytics. But they are rewarded with a powerful new method for discovering competitive opportunities and putting them into production.
About the Author
Paul Barth, PhD, Managing Partner/Founder, NewVantage Partners
Paul Barth co-founded NewVantage Partners in 2001. Paul brings decades of experience as a consultant to the nation’s largest companies. He is a recognized thought-leader and practitioner in leveraging information as a strategic asset and in emerging approaches and best practices in data management. Dr. Barth was founder and CTO of Tessera Enterprise Systems, a nationally-known systems integration firm formed in 1995. He became CTO of iXL Enterprises, a leading international Internet integration firm, following the merger of Tessera/iXL in December 1999. He holds a PhD in computer science from MIT, and a MS from Yale University. Paul was formerly vice president of technology at Epsilon Data Management (an American Express company), and held senior technology positions at Thinking Machines and Schlumberger.