A Report on John O. Biderman and Cameron McLean’s Enterprise Data World 2010 Conference Presentation
by Charles Roe
How does an organization migrate from a legacy, monolithic core system to a componentized architecture integrated via SOA? How do they accomplish such a task while also engaging business SMEs in the ownership of the new data definitions within the latest system, which also includes the adoption of an Enterprise Data Warehouse? That was the imperative given to John O. Biderman (an Information Architect at Harvard Pilgrim Health Care) and Cameron McLean (CEO of World Class Objects) and their team members as the adoption of the new system at Harvard Pilgrim (HPHC) began to move forward. HPHC is a not-for-profit health plan serving approximately one million members in Massachusetts, New Hampshire and Maine. They’ve been ranked the #1 commercial Health Plan in America now for eight consecutive years by U.S. News & World Report and the National Committee for Quality Assurance (NCQA), ranked #1 in Member Satisfaction in the Northeast region by J.D. Power and Associates and rated in top 10 places to work in both the Boston Business Journal and The Boston Globe. Yet, even with such a positive business resume, the problem of changing the entire data environment within the organization, that included building a more friendly and collaborative front end for their metadata, presented issues none of them had been faced with before. All graphical representations are taken from their PowerPoint presentation.
The Problem Statement
The planned shutdown of the legacy system loomed on the horizon at HPHC; the sense of impending doom for the business analysts was quickly becoming a reality as questions of the semantic challenges had not yet been alleviated. The business analysts all spoke the native vocabulary of the old monolithic system, but soon they were going to get an entirely new semantics. The EDW semantics were going to be based on the ELDM (Enterprise Logical Data Model), with the business terms independent of any source applications. The new system would be application neutral, with homogenized data coming in from multiple sources. The metadata was going to change and everyone, including the top executives, involved in the adoption of the new EDW believed that quality metadata was a significant priority for a successful migration to the new system. A quote from one of the business users summed up the problem statement that was facing them: “You can have the best data warehouse and the best BI tool in the world, but if we don’t have good descriptions of the data nobody will be able to use them.” The HPHC adoption team needed to build a quality metadata system that supported the entire strategy that included engaging business SMEs in ownership of the new metadata, with ease of navigation, search functionality and ability for end users to add annotations, comments and articles that transcended the discrete data elements or subject areas.
HPHC had been documenting metadata for many years, so they didn’t have to start from scratch. It had been an IT function that was mostly ODS or warehouse-based. Their metadata database was stored in a commercial repository and mostly used for the publication layer of the legacy system. They had a metamodel of their own design for staging metadata in the repository, but overall most business users found the legacy data definitions far too technical, without any context and generally unhelpful to their jobs. The new metadata repository needed to have business and project team involvement with a transformation of the data definitions’ ownership from IT to business users, and quickly. The present tool also had a number of usability issues that included poor navigation and lack of search capabilities. In essence, they needed a business user-friendly Data Dictionary. After interviewing a cross-section of stakeholders involved in the project from both the business and IT sides, along with full buy-in from important C-level executives, it was decided the system had to include some key driving requirements:
- Structured: It had to contain formal, approved, seldom-changing data definitions and notes.
- Collaborative: It needed an area where business users could contribute knowledge, insights, and best practices about the data.
- Search: The users wanted a “Google-like” search ability.
- Governance: There had to be an intersection with the budding enterprise data governance structure in place to oversee the quality and comprehensiveness of the data definitions and annotations.
Graphic One shows the Business Context Diagram of how the entire system would work:
The system included an oversight process with an Executive Oversight Committee and a Data Stewardship Board. There were SMEs who really understood the data and wrote the formal definitions that went into the metadata database. They would feed the data into the Structured Content of the Data Dictionary, while other people within the various user communities would annotate and contribute their own experience with the data into the Collaborative Content. A feedback loop connected those areas back to the SMEs and others involved in Authorization Oversight. The entire system drove a number of important assumptions:
- The Data Dictionary would be a business-friendly front end on metadata along with collaboration extensions that addressed the current problems with usability and business ownership.
- The metadata would be stored and displayed through an as-yet unknown metadata management tool.
- The Data Dictionary would leverage the metadata tool to help facilitate EDW adoption and the implementation of a new BI tool.
The adoption team had many possible directions to go in, so they engaged a consultant with widespread market research experience and had him deliver a two-day workshop. They looked at more than 15 different products and received demos of five of them, but in the end came up empty handed. They concluded that there were many good tools available on the market (this evaluation was in 2008), with robust features and numerous advances in various areas like lineage, SOA and others. But, most lacked truly collaborative components, were aimed primarily at technical users and still had quite complex UIs. Added to that, their implementation would be costly and take much longer time than the project team had.
The MediaWiki Solution
Their first solution to the problem was an acceptance that metadata management and metadata presentation were two entirely different problems that didn’t necessarily require a single solution. They considered writing their own presentation layer, but that would take far too long and the costs of such a project were too high. The driving requirements, assumptions, time frame for completing the project, necessary budgetary limitations and other factors led the team towards using MediaWiki. Its diverse array of add-ons, rich semantic extensions for the project and Open Source availability (it was already used in-house with other projects) made it a suitable fit for the project. It has programming interfaces built into it that enable pushing content into the tool from outside the UI, along with the ability to protect pages from editing which fulfilled the “Structured” requirement, while also having inherent collaborative and search components as integral parts of the entire system.
Graphic Two demonstrates their general solution for the new Data Dictionary:
The formal metadata capture processes that includes the original ERwin data models, ELDM and PDM workbooks, ETL specifications and the reporting and extracting of metadata would remain the same. They then would use a custom data load application and their current CMC as the metadata store. They didn’t see any reasons for changing that system; it worked fine for their needs and was the metadata management problem that was not included into the same equation as the metadata presentation problem. The MediaWiki database was stored in a MySQL database and then the actual metadata presentation was completed though MediaWiki and its semantic extensions. This allowed for full user-contributed collaborative content and the possibility, in the future, of going back and working on an improved metadata management system without having to completely change their metadata presentation system.