by Joe LaFeir
Service oriented architecture concepts have gained significant traction in IT environments during the past several years. When applied effectively, these concepts hold the promise of more flexible development and operational environments, and significant opportunities for improved software reuse and overall IT productivity.
Many of the implementations of SOA to date have been centered on transaction-level user applications and analytical and reporting applications. Large scale data processing and data warehousing applications of SOA have lagged because of legitimate throughput concerns related to the implementation of XML for large-scale, high volume database production.
With continued improvement in technology and SOA-oriented tools, particularly in the area of service orchestration engines, it has now become more practical to apply these powerful concepts in data processing and warehousing environments. But SOA applications in these arenas present different problems, opportunities, and challenges. The objective of this article is to provide you with some insight into these problems, opportunities, and challenges.
SOA Reference Architecture For Large-Scale Database Production
An essential and inherent principle of service oriented architecture design is the ability to loosely couple services and minimize inter-service dependencies. In our work in large-scale database production environments, we have been able to work effectively within a reference architecture framework that includes the following major component services:
Data Capture – Within this function, we fetch or receive input data, validate the input data package, and route the file through the relevant production jobs and enhancement services.
Standardize – Within this step, we convert the inbound data file from its native format into a common XML format for the type of data file received. We also perform simple code translations. Our objective in this step is to eliminate or significantly reduce “between sources” of common document types. We defer more complex data enhancement logic (that typically is required across data sources) to the next downstream step in our framework.
Enhance – Now that we have standardized inbound documents across data sources, we can fully utilize SOA concepts to enhance the data using web services. The power of XML and service orchestration technology allows us to act only on that data necessary for any individual enhancement service. It also allows us to process data in parallel and in smaller units of work when possible.
Load – Once the inbound data file processing is complete, we convert the data back from XML and load it into a relational database, or a “Single Source of Truth,” subject to applicable business rules.
Assemble – Finally, we create appropriate data marts for downstream applications. While we generally process inbound data files as soon as they become available, assembly jobs are typically scheduled at the time most appropriate for the individual downstream application.
Consistent and disciplined use of this framework has helped us reduce new application design cycle times and has also helped us maximize our ability to leverage services.
Key Ingredients for Effective Application of SOA in Database Production Applications
Take full advantage of XML. The XML document structure provides tremendous advantages for data management purposes but has generally been underutilized to date because of historical technical barriers to high-volume XML processing. New technical approaches are helping to overcome these barriers and enable data processing applications to move data much more intelligently and efficiently.
Use vendor neutral technology standards wherever possible. The fundamental building blocks of SOA architecture (WSDL program definitions, SOAP messaging structures, common network protocols etc.) are well-developed and their disciplined application can significantly increase module re-use and long-term flexibility.
Separate business rules from the processing flow. By consistently applying a common reference architecture framework like the one described earlier, you can more consistently structure processing flow and the ability to separate different types of business rules.
Maximize the use of common reference data. Simple and common elements of almost any data management and data processing environment are lists and the relationship between lists. They represent a particular type of business rule, and maximizing the use of standard lists or tables across the data processing environment is obviously preferable. We have used a common data management infrastructure to maintain and publish system-wide tables to promote and optimize their use.
Insist on a strong data governance process. In our operational environment, we only process files that conform to previously defined and published canonical data models; all web services available to the system conform as well. All reference databases available to the system are managed by a limited number of people with appropriate security access.
Integrate data quality inspection points throughout the process. The combined use of loosely-coupled services, XML, and common “governed” data structures creates a very powerful opportunity for parameter driven, re-useable data quality inspection or profiling routines throughout the production process. Effective use of such tools can significantly reduce the cost and effort of rework as well as increase the quality of the databases themselves.
What to Watch Out for
Of course, the opportunities and benefits associated with service oriented architecture do not come without some pain. When adopting an SOA framework for large scale data management and production environments, it is important to be aware of a number of potential barriers or pitfalls.
Your operating system environment will become more complex. Loose coupling, increased interoperability requirements, increased use of parallel processing and grid computing environments, and similar derivatives of SOA architecture inherently increase infrastructure and configuration complexity. It is important to factor this dimension into your planning as you move ahead with an SOA approach.
Organizational issues are at least as important as technical issues. Service oriented architecture is not just a technical architecture. It implies a new paradigm for establishing and leveraging business rules and sharing information and information systems across an organization. Resolving the business and technical issues inherent in rationalizing databases and systems invariably brings organizational issues and conflicts to the surface. Anticipating these organizational issues and preparing ahead of time with effective processes to resolve these issues in a timely manner is essential to program success.
Don’t underestimate the data migration and data testing effort. When “centralizing” or integrating business rules within a service that previously existed in multiple forms within a legacy environment, there are often trade-offs made and there are always differences in prior processing rules. Even though cleaning up the differences is generally desirable and the approach to doing so may have been “approved” by the appropriate parties during design efforts, the real implications of the changes may not become evident until full-scale data migration and parallel data testing is underway. It is wise to thoroughly plan out and allocate sufficient time to the data testing effort and set the proper expectations around data differences.
The opportunity to apply SOA in large-scale data management and data processing environments has been relatively untapped for a variety of historically technical reasons; but new advances in foundational technologies and toolsets have opened up this new arena and created significant opportunities for improved business flexibility and productivity.
ABOUT THE AUTHOR
Joe LaFeir, Vice President, Product Development & CTO for RLPTechnologies
LaFeir is responsible for the planning, design and development of RLPTechnologies’ software products. He has a proven track record delivering large-scale software solutions in several industries for more than 14 years, and is a key technology leader at RLPTechnologies. Lafeir provides thought leadership and vision for the development of world-class data management solutions.
Prior to joining RLPTechnologies, LaFeir was Vice President of Application Development and Support for Polk North America. In this role, he was responsible for setting the strategic direction for the application development and support organization, including the development, enhancement, deployment and maintenance of several strategic products.
Mr. LaFeir has led several major initiatives, including a major re-engineering effort for Polk – a system optimization initiative focused on revolutionizing Polk’s internal data operations.
While with Polk /RLPTechnologies, Mr. LaFeir’s leadership on several projects have been recognized by many in the industry as innovative. Acknowledgement includes; 2006 Innovator of the Year JBoss|RedHat, 2006 SOA Innovation JBoss|RedHat, Computerworld BI Best Practices Award, 2006 Innovation Award DataFlux, DMReview Innovation Award 2007, Computerworld Laureate Honors 2007 and Ventana Research Leadership Award 2007.
Mr. LaFeir is also the primary inventor of Polk’s proprietary web service orchestration technology, embedded on the OneView360˚ data integration platform, and currently has a patent pending on, Method and System for Data Processing Service Orchestration, Patent Application# 11/767,34.
Previously, LaFeir was a Senior Manager with Capgemini Ernst & Young, where he was aligned with the Critical Technology practice with a primary focus on the automotive sector and specialized in the custom development and integration of large-scale information systems. Lafeir’s experience includes leading the successful delivery of eBusiness, ERP, client/server and mainframe solutions across a broad range of industries involving automotive, energy/utility, retail, healthcare and government.