Advertisement

The Three Techniques for Improving Analytics ROI in the Cloud

By on

Click to learn more about author Maurice Lacroix.

In an industry as competitive as eCommerce retail, the ability to turn data into actionable insights presents the opportunity to make business decisions that drive more revenue and control costs. Collecting and then analyzing retail data like customer visits, logistic fulfillment, pricing, and customer satisfaction presents a multitude of challenges that, if successfully overcome, can be the difference between a good business and a category leader.

It’s my responsibility as the business intelligence product owner my organization to help our business become truly data-driven. Today we are the leading online retailer in The Netherlands and Belgium, with over 11 million customers, 23 million items, and over 40,000 partners selling their products. Our 2,000 employees analyze data growing steadily year over year from over 250 data sources using 3,000 workbooks. It’s my job to make sure that all of that data can be analyzed to provide the insight the business needs to make business decisions.

Starting from one Oracle BI stack and now fully deployed in the cloud, I’ve been a part of a central BI team that has learned a lot about how to support the voracious appetite of the business analyst. Over our years of growth and evolution, we’ve identified three critical focal points that every business should consider when satiating the business’s thirst for data: the right technology, monitoring usage, and continuous improvement.

The Right Technology

Enterprise companies face challenges of overcoming the limitations of existing legacy technology and providing the performance necessary to drill into data at scale. The full analytics stack relies on three components: a data warehouse that can support the capacity demands of the business, a modeling platform to provide consistent data definitions analysts can use to drill into data, and a visualization tool to derive the insights that are ultimately used to make business decisions.

The first step in choosing the right technology is to establish the goals of your organization. What are the business outcomes that you’re trying to achieve? As an example, we wanted our organization to be data-driven at scale. Our 2,000 colleagues had to be able to do drill-down analysis on a rapidly growing data volume without having to overly rely on IT.

With your goals established, it’s important to define your technology evaluation criteria.

We settled on three criteria that we felt would drive performance and, ultimately, our business goals. These are: the capacity of the platform, usage of the platform, and the compute cost of the dashboard or data model.

Our evaluation landed us with Google BigQuery as our cloud data warehouse, AtScale for our Data Modeling and semantic layer, and Tableau for visualization. The results are, our team now generates over 200,000 workbook requests across 3,000 workbooks being modeled through 100 virtual cubes from 250 data sources.

Monitoring Technology and Usage

Adopting the right cloud technology offers a tremendous opportunity for both cost savings and performance scalability. However, if the technology is used without oversight, there is a very good chance that performance expectations will not be met, and unpredictable costs will erase any of the cloud’s value. This is why it’s incredibly important to implement a monitoring framework to get (and keep) your BI stack in shape.

Performance bottlenecks occur when resources exceed thresholds at peak loads, and user concurrency results in queuing. To identify resource utilization bottlenecks, we’ve set up very detailed, real-time monitoring of our systems. Metrics we track include CPU, memory, disk I/O, network traffic, and query response times.

In our experience, the most common bottleneck is user request queues. We’ve found that this can be overcome with small configuration changes in the data platform. In cases where the tuning of the existing environment isn’t enough, the next option is to scale horizontally with more machines or vertically with more powerful machines. This is always the second option, though, as scaling machines is never free!

Without this depth of monitoring, costs can quickly get out of control. In our case, we have to optimize for Google’s costs. Google offers two pricing options for processing data through BigQuery. The first is on-demand pricing, which allows a customer to pay as they go based on the amount of data processed. The second is flat-rate pricing, where there is a fixed fee for guaranteed processing capacity.

When we first adopted the Google Cloud Platform, we thought the on-demand option was the best fit for us. After seeing the bill over our first three months, we realized we needed to shift to the flat rate. With monitoring in place, we quickly understood how our users were querying data and found that we could support the business with fixed capacity for most of the week and pay for flex capacity during times where processing demand would increase. For example, Monday mornings tend to be when the business wants to update their sales reports from the previous week, which creates extra demand for processing power.

Continuously Improve Your Environment

With the right technology and the proper monitoring in place, it’s time to improve the outcomes of the investment. Improvement is a never-ending process. There are a number of initiatives that can make a world of difference for performance, such as adjusting filter settings in a dashboard, updating a data model, improving data preparation, and code rewriting. The answers for where to focus are in the logs.

The logs are a record of what users are experiencing and the impact those experiences have on a technical environment. To improve return on investment, it’s important to map the logs to the drivers of performance and cost. In our case, it’s optimizing Google BigQuery’s compute costs, which are measured in slot time. As we improve slot time, our query performance increases, and our cost per query improves.

The easiest way to interpret logs is by visualization. We export all of our logs, load them into Google BigQuery, and query the logs for analysis. That analysis is visualized in meaningful depictions like box plots and scatter plots to help identify areas of improvement. Be careful about using averages as they don’t provide a good depiction of performance.

Some of the most effective dashboards we run against are logs that evaluate query execution times against each of our virtual data cubes and the cost of a cube as it relates to compute usage. By better evaluating user logs, the ability to make improvements on compute costs and execution times will improve dramatically.

Putting It All Together

Every company has more data than they know what to do with. The issue is that most companies don’t know how to use it. Establish a strategy to choose the right technology for your business, monitor that technology to make sure your business is realizing the value of the investment, and then improve upon that technology by understanding how it’s being used by your team. You can learn even more about this strategy from my in-depth webinar on how to increase your cloud analytics ROI. When you’re able to institute these three techniques, you’ll take your business from being data conscientious to data-driven.

Leave a Reply