by David Schlesinger, CISSP
This was an exciting and well-attended DATAVERSITY Webinar and we are thankful for our guest speakers for taking their time to respond to pressing questions from our audience. As we know, Information Security is a big concern of many CIO’s and must be addressed to maximize the maturity and usefulness of NoSQL technology. For a full replay of the Webinar (we know you were very busy that day) please go to: http://www.dataversity.net/webinar-panel-big-data-nosql-security/.
Obviously, you cannot ask your own questions from a recording, but many in our audience asked wise questions and it is profitable to hear the expert answers. Indeed, many of the questions were spot on the issue of integrating Big Data and NoSQL technology into the mainstream of IT toolsets with security. Rather than give you a word for word synopsis, we’ll cover the main points. For more details you should click on the above link to see the entire webinar.
The big questions were:
- Can we have precise access control of highly scalable key-values stores for thousands of users?
- Can we lock down specific documents or sub-sections of documents in document stores?
- Can we provide complete role-based access control (RBAC) to match policy?
- Can we provide Sarbanes-Oxley caliber audit trails that tell us who changed what parts of documents and when?
On the area of Information Security, the panelists pointed out some trends and some gaps. Most of the security requirements they presently receive are for access security. Thus, only certain users can see certain data. However, the granularity of these user entitlements is still immature. It was suggested in discussion that there are a number ways to enforce granular access control, which includes putting in access-limiting code at the DB level and also at the application level. This is not very different from many typical access-control solutions using a traditional RDBMS or Data Warehouse.
Another strategy suggested claims that keeping certain data in specific nodes of the system would be a way to control exposure of data to unauthorized users. It was also advocated that putting Web Services in front of the NoSQL DB can integrate access control with Active Directory. Services access security is still an immature technology however, so we have a long way to go.
According to several vendors, while row-level access control is a goal, less granular access control at a more aggregate level is currently deemed sufficient by many users, especially when it is joined with a security technology such as Kerberos, which digitally authenticates the identity of all the servers in the configuration.
Security among Top Three Requirements
Most of the vendors already have this capability since Information Security is usually among the top three requirements of customers. However, they were quick to point out that information regulation and privacy rules are pushing for greater security capabilities in the future. Sarbanes Oxley, FISMA, NIST requirements, and HIPAA were mentioned as driving this area of development.
Another point made by our esteemed panelists was that Big Data and NoSQL are not the same thing and that they need not be instantly connected to benefit a business. Various Big Data platforms are a technological hardware utilization solution to greatly accelerate information processing. NoSQL is a name for DB technologies that work well on information that is poorly formatted due to conditions under which it is collected.
Naturally, there are certain business realities and panelists addressed one of them: there are a number of situations where traditional RDBMS are desirable. OLTP was mentioned several times. These technologies are well understood and already in-house for many enterprises. Plus, the granularity and access control capability of more mature traditional DB systems may make them appropriate for specific business security requirements. Potentially, they may be required in certain cases for regulatory control and audit reasons.
A Tool for a Specific Purpose at First
The issue of poorly formatted data generally enters the picture in situations where Internet generated data does not fit traditional data acquisition practices, and is also of such significant size that transforming it into a traditional RDB schema is cost-inefficient or time-limited. Thus, it is to be expected that many companies will simply add a specific NoSQL system to their IT environment rather than replacing an existing RDBMS.
NoSQL will not replace everything, according to the panel, but one specific NoSQL product should be chosen by a company to handle specific data issues and to solve specific business problems. It will sit alongside the existing RDBMS and Data Warehouse in the corporation’s arsenal of IT tools.
Several panelists responded that these technologies are already starting to move from “Early Adopters” to more traditional organizations (perhaps we could call these companies early adaptors, but that may be presumptuous). The point was that each of these technologies offer capabilities to solve specific problems and any company with one of those business problems would be wise to look into all possible solutions.
A Business-Driven Solution
All the panelists agreed that it is the business that is the driver for these technologies. With the development of global on-line business, the numbers of people involved in generating, buying, commenting, and selling information has grown in scope tremendously.
Thus, a perfectly acceptable, traditional RDBMS may suddenly no longer perform with the speed required for specific business situations. When computation speed becomes a competitive advantage, then IT folks begin looking around for better solutions. In this case, Big Data distributed servers and NoSQL technologies are ripe for examination.
The panelists offered another interesting perspective regarding adoption and use of Big Data and NOSQL technologies: mainstream brands of RDBMS and Data Warehouses most often provide data to a number of different applications serving different business uses. The panelists currently see NOSQL systems technologies being used for only one specific business purpose or application within the enterprise.
In terms of support, it was mentioned that many of the traditional support organizations are already maintaining these technologies, and that this should make it easier to make the business case for mainstream acceptance when the need for their capabilities arise. One way to make these technologies better accepted is to build familiar API structures for easier integration into the existing IT infrastructure, which is going on in several places.
The panel mentioned that there are efforts and a consortium working to develop a “standard” language, such as UNQL, that looks familiar to knowledge workers to ease learning, but allows scanning over “soft schema” databases.
Using SQL with a Distributed Database?
While there are places in the modern enterprise for traditional and advanced data store technologies, it was mentioned that they are quite different in terms of how best to optimize the speed of responses. Certain SQL strategies do not work well over a distributed database. A traditional SQL query can take a very long time to work its way through decentralized cluster-like data storage. This is exacerbated by any inner joins required. Perhaps this is due to the different paradigm of the models, since the nature of physical and logical data storage is quite different. In any event, this use of SQL was not recommended.
Overall, the future of Big Data and NoSQL security is improving all the time, with a number of acceptable “work-arounds” being present already. They agreed that as these technologies mature, distributed Big Data and NoSQL database systems will develop the granular levels of access control and audit capability desired.
Our thanks to the Panel:
Anjul Bhambhri,Vice President of Big Data Products, IBM
Anjul Bhambhri has 23 years of experience in the database industry with engineering and management positions at IBM, Informix and Sybase. She is currently IBM’s Vice President of Big Data Products, overseeing product strategy and business partnerships. Previously at IBM, Anjul focused on application and data lifecycle management tools and spearheaded the development of XML capabilities in DB2 database server. In 2009, she received the YWCA of Silicon Valley’s “Tribute to Women in Technology” Award.
View Anjul’s DATAVERSITY blogs HERE.
Dwight is CEO and co-founder of 10gen, and one of the original authors of MongoDB. In 1995, Dwight co-founded DoubleClick and served as its CTO for ten years. Dwight was the architect of the DoubleClick ad serving infrastructure, DART, which serves tens of billions of ads per day. Dwight is co-founder, Chairman, and the original architect of Panther Express (now part of CDNetworks), a content distribution network (CDN) technology which serves hundreds of thousands of objects per second. Dwight is also a co-founder and investor in BusinessInsider.com and Gilt Groupe.
View Dwight’s Keynote speech at NoSQL Now! 2011 HERE.
James Phillips, Co-Founder, Couchbase
A twenty-five year veteran of the software industry, James Phillips started his career writing software for the Apple II and TRS-80 microcomputer platforms. In 1984, at age 17, he co-founded his first software company, Fifth Generation Systems, which was acquired by Symantec in 1993 forming the foundation of Symantec’s PC backup software business. Most recently, James was co-founder and CEO of Akimbi Systems, a venture-backed software company acquired by VMware in 2006. Book-ended by these entrepreneurial successes, James has held executive leadership roles in software engineering, product management, marketing and corporate development at large public companies including Intel, Synopsys and Intuit and with venture-backed software startups including Central Point Software (acquired by Symantec), Ensim and Actional Corporation (acquired by Progress Software). Currently James is co-founder and Senior Vice President of Products at Couchbase.
View our On Demand Webinar with James on What You Need to Know to Move from a Relational to NoSQL Database HERE.
Robert Greene, Vice President of Technology, Versant
Robert Greene is responsible for providing strategic technical expertise for the Versant product portfolio as it relates to NoSQL solutions and Big Data management including product architecture, roadmap and overall software eco system integration. He is also involved in application implementation within Versant’s key customer accounts, ensuring successful adoption of Versant’s emerging Enterprise NoSQL database technology.
View Robert’s DATAVERSITY blogs HERE.
Srini Penchikala, Editor, InfoQ
Srini currently works as a Security Architecture Program Manager at a major financial services organization in Austin area. He has over 17 years of experience in security and risk program management. Srini’s main areas of interest are Agile Enterprise and Security Architecture, Agile Risk Management. He has presented at conferences like JavaOne, SEI Architecture Technology Conference (SATURN), IT Architect Conference (ITARC), No Fluff Just Stuff, and Project World Conference. He has also published several articles on Security Architecture, Agile Security Methodologies on websites like InfoQ.com, ServerSide.com, ONJava, DevX Java, java.net and JavaWorld. Srini publishes a blog on Java, JEE, and other topics at http://srinip2007.blogspot.com/.
View the slides from Srini’s NoSQL Now! 2011 presentation HERE.
MODERATOR: Dan McCreary, Consultant/NoSQL Evangelist, Kelly-McCreary & Associates
Dan McCreary is an enterprise data architect and strategist with over 25 years of experience helping organizations leverage advanced technologies. He is interested in NoSQL and advanced web architectures based on W3C standards. He has worked for organizations such as Bell Labs and Steve Job’s NeXT Computer as well as founding his own consulting firm of over 75 people. He has a background in object-oriented programming, semantics and declarative and functional programming. He has published articles on various technology topics including XQuery, XForms, XRX, Semantic Web, metadata registries, enterprise integration strategies. He is author of several articles and Wikibooks on XRX-related technologies. He is also an invited expert of the W3C forms working group. Dan will also be speaking at Enterprise Data World 2012, including an introduction to NoSQL. Dan is also the founder of the August conference, NoSQL Now! in San Jose.