Case Study: Indiana University, Data Virtualization, and The Decision Support Initiative

“We can throw business intelligence tools at problems and we can have really smart technicians write code but if it’s not serving the needs of the business and actually answering questions that people need to know, then ultimately, our work is futile,” said Dan Young, Chief Data Architect at Indiana University (IU). Young said that IU found a way to answer those questions for multiple schools and departments by using data virtualization provided by Denodo.

Young remarked that the program, called the Decision Support Initiative (DSI) is, “An opportunity for IU to reinvent ourselves with respect to analytics and how we use data to make decisions.” Indiana University, is a multi-campus institution with eight locations throughout Indiana and more than 19,000 employees, serving a student body of more than 114,000 individuals.

Data Assessment

Historically, finding accurate and timely data for decision making had been a challenge at IU. Their data warehouse served for 15 years and had become “cluttered with multiple copies of similar data,” he said. Definitions were inconsistent across departments and campuses as well, and pivotal decisions require timely, relevant, and accurate information, which had not been available. According to the DSI website, key university decision makers, “Are often dealing with too much data, in too many formats, in too many places to be useful. They might not even know where to find all of this data, or that it even exists.”

Young was looking for a way to take the guesswork out of major decisions, but also wanted a project that could be based on an Agile framework. “We really wanted to try to focus our business intelligence efforts and data development work around the idea of Agile BI. We wanted to try to iteratively deliver value to the university.”

The notion of incremental delivery: “Taking what you have and producing just slices, pieces of the complete picture early, so that you can demonstrate value, and people can begin using it sooner rather than later,” was a key piece of the solution he envisioned.

“In a traditional data warehouse type of project, you would go out and spend months gathering requirements, and write all your documentation, and then you build the data, and then you build the visualization. By the time you come to the end of that cycle, you take your product back to the end user and they say, ‘Yes, it’s great. It’s what I asked for, but I don’t need it anymore because my requirements have changed, and it’s been 18 months.’”

With a mission provide timely, relevant, and accurate data to facilitate better decision making across the University, the Decision Support Initiative was born.

Program Development

Young considered several tool-object relational modelers, like Hibernate and .NET/Link, but further thinking about the context of recording tools brought him to some different ideas.

“Ultimately, one of my challenges was to look at the technology space and try to figure out [if there were] tools that could help us to actually become an Agile BI type of organization.”

He continued searching and found a technology called ‘data virtualization.’ Young said that he had heard about data aggregation, but data virtualization offered different tools that were, “Specifically designed to help [with] this Agile approach of exposing data.

While looking at technologies and “trying to understand what might fit,” he said, they went about assembling a team and hiring developers. After exploring the possibilities, in June 2015, “We decided that Denodo was going to be a good fit for us as we were trying to move forward in the Agile BI methodology.”

Although it’s a small company, Ravi Shankar, CMO of Denodo, says that they have been specifically working in data virtualization technology for 20 years:

“The concept of data virtualization has been around for some time. It used to be called Enterprise Information Integration (EII), then it changed to data federation, and now it is data virtualization. So the technology as a concept has been there for a long time – it’s just that the nomenclature has evolved as the functionality evolved.”

Data virtualization integrates data from disparate sources, locations and formats, without replicating the data, to create a single “virtual” data layer that delivers unified data services to support multiple applications and users.

Shankar said there are four key attributes of data virtualization:

Data Abstraction: This allows customers to access and use data without having to be concerned with where the data comes from. “The consumers just go to the data virtualization layer and ask for the information, data virtualization goes and figures out that information and brings it back from the different sources, which are all in different formats.”
Connect Don’t Collect: With data virtualization there is zero replication. “It’s the holy grail of any company to be able to integrate information so that they can provide a holistic view of the business.” But data virtualization is a very clean way of enabling access no matter where the data resides, without having to aggregate the data physically in a place, he said. “So if you’re not replicating the data, that actually improves the speed. You connect to the sources and access the data without having to collect them all into a single place.”
Data is Delivered in Real-time: When a business is continuously operating and source systems are being continuously updated, data virtualization provides accurate reporting in real-time as the source data changes.
Data Virtualization provides Agility: It is very Agile technology, allowing changes to be made underneath without having the business being impacted.

There are a lot of benefits to data virtualization, Shankar said. “They’re able to deliver things much faster than [if they were to] use other similar data integration technologies like ETL.” Also, he said, “They can accomplish it with fewer resources. They don’t need that many developers to do it – one-fourth [of] the developers and one-fourth the time it takes with other technologies.”

Gartner Research, also predicts savings with data virtualization: “By 2020, organizations with data virtualization capabilities will spend 40% less on building and managing data integration processes for connecting distributed data assets.”

Young adds a fifth important attribute of Denodo’s data virtualization that IU needed: Security. “Particularly with the logical data warehouse, but [also with] any of the services that the tool offers, it allows us to provide column level, row level, policy based security, and integrates well with our active directory.” He said that the complications of writing controls are no longer an issue because, “You just get the right active directory groups in the right spot, and people have access. It’s a much more straightforward implementation for us in that end.”

How IU Uses Data Virtualization

The University offers decision makers the opportunity to request a ‘charter’ or project using the Decision Support Initiative, Young said. A user fills out a form online requesting a report or dataset, with a required section outlining, “The business problem it will help us to solve.”

The form also requires assignment of a product owner, to use ‘Agile-speak,’ he said: “’If this charter is selected, here are the business people that we are willing to commit to help define requirements and to work through that process with you.” This guarantees that the project will have someone designated, “To work with our business analysts to help translate those requirements into incremental deliverables,” he said. A steering committee then evaluates and prioritizes the charters.

One example of what can be accomplished through the DSI is called Academic Metrics 360 (AM360). Considered IU’s ‘crown jewel,’ the project uses data virtualization to provide a 360 view of an academic center, he said. “They can look at how credit hours, what they are teaching, and the volume of students that are taking these things, and how that ultimately translates into revenue and funding.” The end goal is to ensure that, “All divisions can essentially have the same information,” allowing each to take that data to annual reviews with the provost to justify increased funding or to highlight challenges or accomplishments, he said.

Program Evaluation

“We’ve been doing the execution portion of the DSI for a little over two years,” and it has been helping decision-makers know where to find data, where it comes from, and how it is derived, Young said.

The AM360 project is providing, “A common reporting structure for many of these discussions so that the provost has a defendable version of truth that she can use,” and the school’s divisions and departments can all essentially work from the same numbers.

“A school in itself may not agree with our implementation. They might say, ‘Well, those aren’t right because of the XY and Z.’ That’s okay, because it starts a conversation, it starts a discussion with that school, and then we’re willing to refine our algorithms to create these visualizations so that we can make them better.”

Young says that Denodo has given IU a way to build a logical data warehouse by bringing together disparate sources and forms of data together in a unified view:

“So whether the data comes from an ERP, or a finance system, or online, cloud-based Learning Management System, the users just come to Denodo to actually retrieve that, and they don’t know – it looks like a database to them, it smells like a database, and they don’t necessarily know some of the technical details of where the data comes from and how it’s all sort of stitched together in that data fabric in the back.”

Conclusion

Young recommends, “Using the right tool for the job, understanding the differences in the technologies and the strengths of each,” and knowing where to put them. Denodo and IU have had ongoing conversations about features and enhancements, he said. “It’s very fluid. Denodo delivers iteratively as well, so it doesn’t take long to get fixes and those sorts of things. We do have a good vendor relationship,”

Shankar said:

“One thing that I just want to mention here is that I have been in the technology space for close to 30 years [and]we have always chased the eternal holy grail of having a single source for all data. Instead of chasing this, having a single repository, to me, data virtualization is a technology that we can use to gain that single virtual repository.”

Denodo’s strength is in data access and delivery, he said.

“If you’re trying to do heavy transformations and you want to store that information and sell the backup, we are not the tool for that particular [task]. There are places where you would use Data Virtualization instead of ETL and there are places where you have to use ETL because you need that level of bulk load especially if you’re moving large volumes of data into a data warehouse or you’re trying to do a lot of transformations that are in different formats.”

Young says that they’ve had, “Very good organic adoption of Denodo from outside technical departments.” Several other groups across campus have seen the usefulness of the tool “to blend and pull data together.” The team has worked to make Denodo “a very low barrier tool,” using accessible documentation and by making it easy to adopt by non-technical users, he said, yet users with technical experience are also onboard.

“If you’ve got folks who’ve used database development tools, Denodo fits right in that category. It’s not a stretch for someone who’s used Oracle or a SQL server tool to go in and begin to use Denodo and understand conceptually how this all fits together.”

Young said it’s become “really valuable to us across the university.”