When President Theodore Roosevelt commissioned construction of the Panama Canal (bear with me; this will be made relevant to data topics shortly) it was overseen by a committee in Washington who hired a seasoned engineer, John Findley Wallace, and dispatched him to the isthmus with the directive to “make the dirt fly.” Wallace imported two gigantic steam shovels, capable of picking up eight tons of earth in a single scoop, and commenced the digging. The only problem was that there were too few trains to move all this spoil, and many of them buckled or de-railed under the intense loads. One year and $78 million into the project (a huge sum in 1905 currency) Wallace had little to show. The dirt just couldn’t fly.
Enter Wallace’s replacement, John Stevens, who dubbed the mandate to “make the dirt fly” an “idiotic howl.” Instead, recognizing the enormous scale of the project, Stevens stopped the digging altogether and set about laying the infrastructure that would make the effort possible. Central to this was an extensive railroad system that would move spoil efficiently and would shift and extend as the project went along.
I learned all this from a fascinating documentary about the building of the canal that ran on US public television’s “American Experience” series not long ago. And as I watched I said to myself, “Ah-ha! Here is exactly the conflict between data management and ‘agile’ development.”
For those who might not know, agile software development involves a series of practices intended to deliver software better, faster, and more fit-for-use. It includes such methods as iterative delivery of functional chunks, often in two- or four-week “sprints” each of which represents a full lifecycle, and close collaboration of development team members and their sponsors or stakeholders. The “scrum” approach to agile enforces collaborative communication through daily 15-minute “stand-up” meetings in which everyone reports on what they’ve gotten done and what they are working on next. The list of features to be developed is held in a “product backlog” from which is culled the next set to be worked on in each sprint. Two principles in agile development are that you don’t need to know all requirements up front, but can code to what you know; and that you should prepare the minimum necessary documentation, i.e. don’t do documentation overload.
There is a lot about agile practices that I like, including the open and frequent communications (nobody can hide) and the close involvement of stakeholders throughout the development process (business is at the table with IT all along the way). I especially like the iterative delivery; my colleagues and I were doing much the same thing back in the late 1980s, showing increments of functionality to users every few days so they could see something tangible and react (a fine antidote to the general truism that users don’t really know what they want until they’ve seen it).
But I fear the trap agile proponents get us into is that they are all about delivering software fast, rather than about building the durable architectures that actually allow us to be agile. They want to “make the dirt fly.” This is the core of the cultural dissonance between data architects/modelers and agile developers. Data folks are more like Stevens: we want to lay a good infrastructure before we start shoveling.
We also have different cultures around documentation. Data models are inherently requirements-driven, and though models can be reviewed iteratively as requirements are fleshed out our nature is to have a pretty thorough picture before releasing the model to developers. And in the metadata world, we are all about documentation — rigorously describing the data and defining the semantics of the information system. Agile folks employ rolling requirements (go with what you know now) and minimal documentation. (In fact, in some cases I fear agile is used as an excuse to bypass good requirements definition and good documentation, to the detriment of the ultimate deliverable.)
These issues are exacerbated in data integration environments (a SOA enterprise message bus, for example, or a data warehouse project). Agile folks approach these first as a software development challenge, when the real challenge is understanding the data and doing the mapping and modeling to achieve semantic alignment across disparate data environments — in other words, requirements and documentation. The agile team wants to start writing code: working software is the deliverable, not some documents or models. Yet to a data person, the “truth” lies in an explicitly articulated information model.
Maybe we can meet each other midway by considering data architecture to be the first foundational phase of building a system, yet still embracing the time-boxed approach of agile development to deliver models and data maps incrementally and iteratively, extending them as requirements are better understood. A good piece of a model or XML schema or set of data transformation rules is as valid for a deliverable at the end of a sprint as is a piece of working software. At the end of this process would be a more comprehensive and stable data foundation for the software to be built on, and better documented business and information requirements which should only help to accelerate development of the software. Alex Glaros has proposed something like this in an intriguing model he calls “Enterprise Focused Development” which blends the best of incremental, customer-focused delivery with a requirements-driven, information model-centric approach. In this methodology, an agile development team is led and organized by a data modeler (of all things!) — or, you could say, by someone in data management — who, sitting as we do at the intersection of business and IT, has an enterprise perspective and brings key analytical capabilities to the table.
In the agile world, you code to what you know now and “refactor” later. Refactor is a kinder, gentler term for “fixing it ’cause it’s wrong.” Wouldn’t fleshing out the requirements and architecture forestall a lot of refactoring? After all, even in Panama John Stevens discovered that attempting to make a sea-level canal was practically impossible. He had to sail back to Washington to convince Roosevelt that they needed to build a lock canal instead if they had any hope for timely success. Hapless Wallace, focused chiefly on making the dirt fly, was fast at work solving the wrong requirement.
So remember the sublime metaphor presented by the Panama Canal should you find yourself needing to make the case for a data-centric, architectural approach to system development in contrast to pure agile development.