I recently wrote about my observations at the recent NoSQL Now! Conference and am happy to bring you the second and final part of my observations.
Hardware: Scale Up Before Scaling Out
Surprisingly, hardware came up several times. This has been largely ignored in the past but is now coming to the forefront as a hardware revolution is emerging at the same time the industry embraces the database revolution. The reality is that a commodity hardware node is a multi-core node, which will be stunningly true within the next 18-24 months. To remain efficient, databases need to scale up before they scale out or the new generation of hardware will be wasted, creating unnecessary operational costs for production deployments.
Some industry leaders like Michael Stonebreaker argued that the cost of computing in databases is tied to the problems of dealing with concurrency and how implementations leverage such things as locking, latching and buffer management to overcome design issues. In short, the argument is that if you move to a single thread and only scale out, you can somehow scale as needed by removing those aspects of implementation. But it isn’t a reasonable expectation for a machine with eight or more cores, limited to eight or more threads, to consume all the cycles of those cores. Simply put, the behavioral patterns of computation are just not uniform enough such that you can queue them up and have them exit the front of the queue right when needed. There are not hundreds but thousands of concurrent users, and lining them up behind eight single core threads isn’t the right solution, except for perhaps a narrow use case. What do you think?
At the conference there were a number of companies with different NoSQL database products that had different implementation designs and operational characteristics. With all these varying database products, it seems difficult to distinguish what exactly is classified as NoSQL. The answer could lie in soft schema, which is a major theme I observed in the vast majority of products. This shift toward soft schema likely represents what NoSQL is coming to mean.
The shift toward soft schema is really a movement of architectural responsibility for the management of relations between data from database data-models to application models. This means moving away from database joins to application linking. By moving to soft-schema, all of these products are able to speed the access to related data through links, which amounts to a high speed B-tree style lookup for any relation. Further, since the responsibility for resolution is moved up a layer in the software stack, distribution can be implemented to deliver scale-out. It is the architectural shift to soft schema and data linking, delivering the ability to perform at scale that defines a product as NoSQL. How would you define NoSQL?