Advertisement

Case Study: Cornell University Automates Data Warehouse Infrastructure

By on

Cornell University is a privately endowed research university founded in 1865. Ranked in the top one percent of universities in the world, Cornell is made up of 14 colleges and schools serving roughly 22,000 students.

Jeff Christen, data warehousing manager at Cornell University and adjunct faculty in Information Science, and Chris Stewart, VP and general manager, USA at WhereScape, talked with DATAVERSITY® about how Cornell dealt with the end-of-life for the primary product they used to manage their data warehouse.

The Primary Issue

Cornell was using Cognos Data Manager to transform and merge data into an Oracle Data Warehouse. IBM purchased Data Manager and decided to end support for the product. “Unfortunately, we had millions of lines of code written in Data Manager, so we had to shop around for a replacement,” said Christen. He looked at it as an opportunity to add new functionality so that their data warehouse ran more efficiently.

The Assessment

Christen’s IT team had to confine processing to hours when the university was closed, so batch processing from financial, PeopleSoft, or student records couldn’t start warehouse processing until the end of normal operations and had to be completely finished by 8:00 a.m. when staff arrived as they needed access to the warehouse.

“It was getting really close. We were frequently bumping into that time,” said Christen. Because their processing window was so short, errors and issues could be very disruptive.

“Our old tool would just log it if there was an issue, but then we couldn’t load the warehouse, because some network glitch that probably took seconds was enough to take out our nightly ETL processing,” elaborated Christen.

Outdated documentation was also a problem. Stewart said that they joke with their customers about documenting a data warehouse. “There are two types of documentation: nonexistent and wrong. People laugh, but nobody ever argues that point because it’s the thing that people don’t like to do, so it rarely gets done,” said Stewart.

Because it is an academic institution, licensing and staffing costs were important factors for Cornell. Stewart often sees this in government and in higher education organizations where the administration has increasing data needs, yet the pool of available people is small, like Christen’s staff of four.

Stewart said that automation can lift much of that workload so staff can get more accomplished in a shorter amount of time. “You can’t just go out and add two more people. If you have more work, you need to get more out of your existing staff,” said Stewart.

Finding a Solution

Christen started to shop around for ETL tools, with an eye to adding some improvements. There were several key areas he focused on when evaluating vendors: documentation, licensing costs, improving performance and being able to work within existing staffing levels. In 2014, Christen attended the Higher Education Data Warehousing conference to research options.

WhereScape was one of the exhibitors at the conference and one of the features that caught his attention was its approach to documentation. “Our customers were used to having outdated and incomplete documentation, and that was something WhereScape definitely had a handle on,” he said.

Most of the products Cornell considered required licensing by CPU, which could prove cost-prohibitive as Cornell’s extensive data warehouse environment was scaled for end-user query performance.

“We have a ton of CPUs,” Christen said. CPU-based licensing costs would be significant, and they found themselves trying to figure out how to re-architect the entire system to reduce the CPU footprint enough so that the licensing could work, a process that would create other limitations. WhereScape’s license model is a developer seat license, so with four full-time warehouse developers, they only needed to purchase four named user licenses.

“There’s no separate license for the CPU run-time environment with WhereScape, so if we’re successful, we’ll get everything converted, but there’s no penalty for how we configure the warehouse for end-user performance or query performance,” Christen said.

Being able to integrate and use the product without increasing the number of developers was a clear advantage. “That’s has been a key driver for organizations evaluating automation for their teams,” Stewart added.

Cornell didn’t just rely on marketing material to make their decision. They did an on-site proof of concept where one of their developers worked with the product on a portion of their primary general ledger model. They discovered that WhereScape was intuitive enough that one of their ETL developers was able to code a parallel environment in the proof of concept with minimal assistance from WhereScape. The developer hadn’t gone through any formal training, which proved that the learning curve would be manageable. \

The proof of concept allowed them to get a nearly apples-to-apples comparison, which showed “huge improvements” in load time performance compared to Data Manager. “So, it was a robust enough tool, but also intuitive enough that it could be mastered in a few weeks,” said Christen.

About WhereScape

WhereScape helps IT organizations of all sizes leverage automation to design, develop, deploy and operate data infrastructure faster.

“We realized long ago that there were patterns in data warehousing that really transcend any industry vertical or any size of company,” said Stewart.

Because the process of building a data warehouse out is primarily mechanical, and much of that is common among data warehousing organizations, WhereScape automates both the design and modeling of the data warehouse, all the way through to the physical build.

“Even deployments, as you’re moving a project from development to quality assurance environment (QA), and then on to production, we’re scripting all that out as well,” said Stewart. These are all processes companies usually use multiple tools to address – a resource-heavy process that can create a silo for each tool.

“We have one tool suite that covers data warehousing end-to-end and it’s just one set of tools to learn,” said Stewart. Instead of licensing separate tools to for each part of building a data warehouse, then finding a place to install all those tools, and spending weeks for staff training and management – teams have just one tool to learn and use. Handing off the build to WhereScape’s automated process frees up time and energy so that the business can take advantage of that data and produce useful analytics.

Evaluation

The initial wins of the conversion from their traditional ETL tool to WhereScape allowed Cornell to cut their nightly refresh times in half, or better, in some cases. Although they didn’t start that way, they are now a 100 percent WhereScape solution, with 100 percent Amazon-hosting as well.

“We did a major conversion which took a few years to get to WhereScape from our old tool, but that’s behind us. We’re running WhereScape on Amazon Web Services in their Oracle RDS service,” said Christen.

Although they just finished this conversion in the last year, since 2014 when they purchased WhereScape, all new developments and enhancements have been done in WhereScape.

“There’s actually an option to fix the problem, restart it, and still complete before business hours, which is a big win for our customers,” said Christen. “Essentially, we’ve cut our refresh times in half, so not only can the team complete all the processing they need with their batch windows, we’re not brushing up against business hours anymore.”

By automatically generating documentation, WhereScape solved the problem of outdated and incomplete documentation.

What’s Next?

To take full advantage of the automated documentation process, Cornell decided to build in some new subject areas, but the speed of the tool outstripped their internal modified waterfall approval process. Christen believes they can speed up their process now that they can quickly put out a prototype. They can start receiving feedback immediately from customers within days rather than weeks, and from there, refine the model until they’re ready for production.

“So, it’s changing our practices now that we have some new abilities with WhereScape,” said Christen. One of the next steps is to more fully leverage and market the documentation so they can start providing their customers with more information about the attributes that are available in the warehouse.

An unexpected benefit is that Christen’s Business Intelligence Systems students get to use WhereScape to learn Dimensional Data Modeling, ETL concepts, and Data Visualization hands-on with real datasets.

“We’re teaching the concepts of automation so they learn the hard way, with SQL statements, and then we use WhereScape and they can see how quickly they can create these structures to build out real dimensional model data warehouses,” explained Christen.

Stewart noted that they’ve had inquiries from other universities that have heard about Christen’s use of WhereScape in the classroom and are interested in incorporating WhereScape into their curriculum, so the students can get more work done in a semester.

“It’s a similar benefit to what our customers are receiving in their ‘real-world’ application of automation, and it is giving students the chance to understand the full data warehousing lifecycle,” said Stewart.

Image used under license from Shutterstock.com

Leave a Reply