Recently, we published Yefim “Jeff” Zhuk’s article, “IT of the Future: Semantic Cloud Architecture.” The paper has been a very popular free download (available here).
One of the readers, Lev Gorodinski (CTO, EPaySpot), approached Jeff directly with some questions and the two engaged in a conversation filled with insights that they wanted to share with our readers. They are kindly allowing us to republish the thread in its entirety.
Lev Gorodinski: I’ve read the article and have some bigger picture questions and comments which likely warrant several discussions. Overall, I am interested in methodologies which aim to bridge the gap between knowledge and its technical manifestation and am therefore interested in the subject matter.
The term “sandbox” in BASE may create the impression that it isn’t a production level system. I think that both the goal of BASE to “Decrease the number of manual operations required for business changes” and its notion of a “playground” are essential to making it ready for production. This will allow an agile and iterative development and exploration process.
Jeff Zhuk: The primary purpose is setting a common ground where business analysts and developers can collaborate on real business tasks. For some companies this ground can serve in production and other companies might feel more comfortable to use it as a playground for safe development and testing before copying to production. The role of the playground will be growing without growing maintenance cost.
LG: “Add a semantic layer to Enterprise Service Bus to enable semantic listening and prepare for canonical model integration with the systems speaking different business dialects” – what is the nature of this semantic layer? What does it achieve?
JZ: Enterprise Service Bus was a great step in the integration evolution. It is an obvious winner over the point-to-point interface design, when integrating many systems. ESB is the natural intersection and listener of enterprise messages. Multiple applications subscribe to the specific messages to manage enterprise. Managing enterprise is more business than technical subject, but we currently describe messages and events in very technical terms, with the precise description of every message signature (thousands of them!), provided by developers.
Semantic layer includes additional listener, semantic listener, which can direct messages into business topics (think of mail boxes) based on message semantics. Semantic listener will allow business people to specify their interests in business terms and will provide resulting reports with proper formats. Semantic layer includes business ontology, a graph of process-event relationships, and will map business terms of the requestor (of a report) to the knots of the business ontology graph effectively subscribing the requestor to proper business message topics. Same message can come to several topics and this is OK.
Another purpose for the semantic layer (placed on the top of ESB) is to bridge different business dialects while integrating the systems from multiple companies. This is the next step in the integration evolution – canonical model integration. The canonical model will be represented by a common ontology, like FIBO, with the maps to proprietary language and model cases of specific companies.
LG: What are your thoughts on the potential drawbacks of maintaining a canonical schema/ontology? Specifically, one issue that exists with a canonical schema (in SOA) is that various systems often operate upon a unique perspective of the model in which case adherence to a centralized schema may be prohibitive. Not prohibitive in terms of performance but in terms of the efforts required to ensure that a unified schema can fulfill all requirements as well as efforts required to implement mappings between the canonical schema and particular use cases. A strategic pattern for battling the complexities of centralization is the partitioning of a system into bounded contexts which are specifically designed to delimit the applicability of the model. The design of a bounded context follows similar principles to that of OOD/SOA, namely the pursuit of high cohesion, low coupling and encapsulation. Ideally, bounded contexts can divide the problem space into units which are manageable both in terms of the mental effort required in reasoning about them and performance. Given that bounded contexts bear relationships to other bounded contexts, a context map can be used to represent this higher level structure. (The terms bounded context and context map are borrowed from Domain-Driven Design (DDD) although DDD doesn’t (yet) attempt to formalize, standardize or automate these concepts).
JZ: I completely agree with you that the modular approach to ontology and its specializations is the key. Ontology integration and mapping tools are works in progress. They must reflect multiple dialects and map them; still should allow minimum size extraction for specific cases, scaling down is extremely important for practical applications. This is one of the tasks listed by FIBO.
DDD and semantic technology have some common ideas. In both cases domain expertise captured as a model plays a crucial role. DDD mostly focuses on a developer as a domain expert. Semantic technology works hard to offer languages, methods, and tools to describe domain expertise. There could be some beneficial connections between these two.
LG: I can envision a Conversational Semantic Decision Support system bridging the gap between information and technology effectively serving as a translator between human knowledge and a formalized ontology. This seems to be part of what was missing in traditional model-driven architectures (MDA).
JZ: good point!
LG: Are there existing implementations of a Conversational Semantic Decision Support system? I can imagine this type of system can become quite complex on its own.
JZ: I started with simple forms (specialized wizards), had some components to map ambiguous sentences by people to specific subjects indicated in the business ontology. I worked with GATE (UK open source), OWLIM-Lite (Ontotext), and Cyc – components. There is still a lot of work ahead.
LG: The stated answer to why over 50% of IT budgets are spent on technical concerns is “Different types of information were historically present by different systems.”. I think this warrants a discussion on its own. While I would generally agree with this stance I think the reasons are more complicated. I think we’re facing challenges in raising the level of abstraction in IT (http://gorodinski.com/blog/2012/05/31/abstractions/).
JZ: very interesting, I tend to agree.
LG: The Semantic Integrator presented in section “How a semantic approach improves development and prevents duplication” seems like a difficult component to implement. Does it utilize any NLP techniques to extract terms? How is it calibrated? What if data sources aren’t relational databases but NoSQL databases where schema must be accessed in a different way?
JZ: This is another task from the FIBO’s list: providing methodology to help creating specific ontology based on a standard ontology, like FIBO. I only did a small portion of this task by mapping proprietary database fields to standard ontology subjects. This is very helpful for most of existing enterprise data systems as this mapping allows us to represent existing data system in the common terms of a standard ontology, understandable to human and computer programs. Hopefully, NoSQL data systems are created quite recently with the readable and meaningful names that reflect common concepts. If this is about financial industry, this would be FIBO concepts.
LG: What are some good resources on FIBO? Is this one?: http://www.umiacs.umd.edu/~louiqa/2012/BMGT499B/RESOURCES/FIBO_Bennett.pdf
JZ: This one is a good one, Michael Bennett is the main driver on FIBO. I also recommend another reference: www.edmcouncil.org
LG: The semantic model, once stored in a triple store, can be understood by a computer. What are some example use cases of this understanding? A graph can certainly represent relationships between business entities, but how is this information put into use?
JZ: Semantic tools like AllegroGraph and Fluid Operations can manage unified landscape of information, where growing number of subjects and details, will not increase infrastructure. No new tables or applications will be needed. New reports will be generated automatically based on new requirements. Conversational semantic decision support will help the program to understand new requirements.
LG: There is an example at the beginning of section “Establish the rules of the game with the Decision Tables” regarding the detection of duplicated customer data. I agree that the concept of duplicate customer can be reused but I think a primary challenge is not the reuse of this concept but in integrating the ontology with the various endpoints where it must manifest. For example, the semantic model may contain the rule that a person should be unique by SSN, but how will that rule translate to a web application implemented with Ruby? The challenge is not the representation of information but in mapping and delivering specific chunks of that information to appropriate places.
JZ: This is another dimension of the same problem, agree it is even more challenging. The example provided in the article illustrates implementation of a rules engine with the decision tables with the concept of “data know how to handle data”. The main idea was to implement it once, so multiple applications, which deal with data set (like SSN), can use this single implementation. If new regulations change the rules related to this data set, only this place will be changed. Today all applications that touch these data must be modified.
LG: Would you elaborate on: “Each data attribute can be considered as an extended Java Bean, a placeholder for retrieval and data handling methods. In this world of linked data any application or a rule, which uses a data attribute, will automatically know how about major data handlers, because ‘data know how to handle data’.”
JZ: Similar to Java Bean, which has get/set methods, DataAttribute have extended set of methods, like validate, isExisting, etc. There is a default set of these methods but it is not fixed and can be extended.
LG: What are your thoughts on this: http://www.udidahan.com/2009/06/07/the-fallacy-of-reuse/
JZ: The author positions faster development as the main goal of reuse and reuse not always helps to achieve this goal. True. But if we think about cutting overall expenses, we need to bring the maintenance cost into the equation. Maintenance of multiple approaches is expensive.
Reuse cuts the maintenance cost. Of course, writing from scratch is often easier… and more expensive in the long run.
LG: “Collecting alert stories into a critical situational description” this seems like an example of complex event processing.
JZ: — Yes, there are some common ideas with the Complex Event Processing (CEP). At the same time, I’d like to give credit to the authors of The Decision Model book, Barbara von Halle and Larry Goldberg.
LG: There are more questions as well as some of my own thoughts on the subject matter … to discuss with you later….
JZ: Thank you for great questions. I will be happy to continue…
Add Your Voice!
If you have additional questions or insights about the topics discussed here, join in by commenting below.