<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DATAVERSITY &#187; Articles</title>
	<atom:link href="http://www.dataversity.net/category/education/articles/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dataversity.net</link>
	<description></description>
	<lastBuildDate>Mon, 20 May 2013 07:10:11 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Analyzing Big Data: Lavastorm Analytics Engine</title>
		<link>http://www.dataversity.net/analyzing-big-data-lavastorm-analytics-engine/</link>
		<comments>http://www.dataversity.net/analyzing-big-data-lavastorm-analytics-engine/#comments</comments>
		<pubDate>Thu, 16 May 2013 07:10:25 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Education]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=19803</guid>
		<description><![CDATA[by Jelani Harper Any quick search indicates that there’s no shortage of analytics technologies to extract meaning from Big Data. Yet, a review of the recent activity of Lavastorm Analytics reveals that its Lavastorm Analytics Engine may be one of the more viable. At the end of April 2013, the company partnered with Datawatch Corporation to include its engine in the latter’s Information Optimization Platform, enabling customers to create analytics applications for a variety of unstructured and structured data significantly faster than before. At the beginning of April, Lavastorm collaborated with Cyfeon Solutions to include its analytics engine in Cyfeon’s Answer Factory solution, which also utilizes MongoDB, Apache Solr, and Hadoop, to increase the speed of analytics and capacity for optimization, while processing massive quantities of data. In February, Lavastorm unveiled its Lavastorm Analytics Engine 4.6 with updates that increase support to QlikTech QlikView, VMWare 5 virtual machines, and data visualization tools for Excel. More importantly, the company is currently giving out 14 day trials of the Professional Plus desktop version of its analytics engine, as well as a desktop public edition for free. A third version, the Professional Edition, is available for purchase only. Professional Plus Versus Public Although [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.dataversity.net/wp-content/uploads/2013/05/AgileAnalytics.jpg"><img class="alignleft size-medium wp-image-19804" alt="AgileAnalytics" src="http://www.dataversity.net/wp-content/uploads/2013/05/AgileAnalytics-300x210.jpg" width="300" height="210" /></a>by Je<a title="Jelani Harper" href="http://www.dataversity.net/contributors/jelani-harper/" target="_blank">lani Harper</a></p>
<p>Any quick search indicates that there’s no shortage of analytics technologies to extract meaning from Big Data. Yet, a review of the recent activity of Lavastorm Analytics reveals that its Lavastorm Analytics Engine may be one of the more viable.</p>
<p>At the end of April 2013, the company <a href="http://www.prnewswire.com/news-releases/datawatch-and-lavastorm-announce-strategic-alliance-205214681.html">partnered with Datawatch Corporation</a> to include its engine in the latter’s Information Optimization Platform, enabling customers to create analytics applications for a variety of unstructured and structured data significantly faster than before. At the beginning of April, Lavastorm <a href="http://www.marketwatch.com/story/lavastorm-analytics-selected-by-cyfeon-solutions-to-help-business-analysts-operationalize-big-data-for-greater-business-value-2013-04-02">collaborated with Cyfeon Solutions</a> to include its analytics engine in Cyfeon’s Answer Factory solution, which also utilizes MongoDB, Apache Solr, and Hadoop, to increase the speed of analytics and capacity for optimization, while processing massive quantities of data.</p>
<p>In February, Lavastorm unveiled its Lavastorm Analytics Engine 4.6 with updates that increase support to QlikTech QlikView, VMWare 5 virtual machines, and data visualization tools for Excel. More importantly, the company is currently giving out <a href="http://www.lavastorm.com/resources/software-downloads-trials/">14 day trials</a> of the Professional Plus desktop version of its analytics engine, as well as a desktop public edition for free. A third version, the Professional Edition, is available for purchase only.</p>
<p><b>Professional Plus Versus Public</b></p>
<p>Although the desktop version of this visually-based, incrementally agile analytics tool is designed for single users, there are limited reasons to download the Public Edition. Users can get a feel for the tool and how it works, but will be severely restricted in its capabilities. Whereas the Professional Plus processes up to 10 million rows via databases and Excel, delimited files, and supports a range of XML, HTML, image files, and interfaces for Java, SQL, Python, and SQL; the Public Edition is merely good for 100,000 rows of Excel, delimited files – and supports none of the other aforementioned functions.</p>
<p>The Professional Plus enables users to access a variety of advanced features such as Lavastorm’s vaunted InFlow reporting and visualization for analytics, collaborative library nodes for custom controls, access to the Lavastorm Analytics Engine server, and fee-based, on-site training. The Public Edition only comes with basic collaboration and analytics capabilities, although it hints at the speed and full capacity of the engine. The Professional Edition only lacks the advanced collaboration library controls and the programming interface potential of the Professional Plus; it has a maximum of a million rows.</p>
<p><b>Engine Diagnostic</b></p>
<p>The value of the Lavastorm Analytics Engine lies in its ability to allow users to perform data discovery, automated controls, and ad-hoc analytics within the same environment. Its highly scalable architecture (especially with the Professional Plus Edition) also integrates a multitude of data types from disparate sources, so that organizations can preserve legacy silos and still gain a unified picture of their data – without an explicit warehouse. Users can run continuous analytic models by automating process and freely sift through different types of data with discovery tools due to the visual nature of Lavastorm’s analytics. Schema is not required, allowing business users to combine data sources without using code.</p>
<p>The engine works by granting users access to over 100 different analytics nodes in the Lavastorm Analytics Library. Each node is pre-packaged to serve a different purpose, such as to gather information about metadata, data acquisition, qualitative or distributive patterns of data, correlations and more. Since each node already comes ready to perform a specific task, users spend less time programming and can simply deploy analytics on the fly or in automated processes, enabling them to spend more time actually analyzing data.</p>
<p>Professionals can supplement their libraries by downloading additional nodes from Lavastorm that are designed for certain types of functions and frequently-used business systems. Recent packs of nodes include those for R Analytics and for Advanced Analytics. The contents of the May 7<sup>th</sup> Enhanced Analytics Node Pack include nodes for statistics such as “Quick Stats”, which provides averages, maximums, minimums, and null counts, as well as nodes for encryption, decryption, and for interfacing URL requests to HTTP servers.  Nodes can also be modified by IT for more specific deployments.</p>
<p>Another distinct advantage of using Lavastorm’s analytic engine is its InFlow Reporting, which offers graphical outputs for specific points within a node and helps users to gain insight via visual representation. This feature is not only essential for ascertaining quick information through discovery tools, but it also assists with the reliability of information gleaned from data. The result is improved data transparency and a clear auditing trail from decisions to the information that substantiated them. This sort of self-documentation by visual representation is easily repeated and helps to optimize data use. Users get a clearer understanding of where information from data comes from, enabling them to rely on data more frequently and accurately.</p>
<p>The relative ease in which discovery tools and ad-hoc analytics are performed is ideal for an Agile work environment. Lavastorm Analytics Engine’s schema-less approach permits users to create iterations before developing formal tables, which increases the celerity in which they can be performed. Additionally, the visual representations of analytics indicate the precise point in which future iterations should occur by displaying anomalies. Users can streamline processes via automation, enabling greater efficiency and continuous analysis of data types. Results can be published in a variety of application environments utilizing conventional BI tools, ERP, data warehouses, and other data management systems.</p>
<p><b>Lavastorm Analytics Platform</b></p>
<p>In addition to working in conjunction with other solutions or on its own, Lavastorm also offers a more comprehensive analytics platform in which its analytics engine functions as the central component. When used in conjunction with the other components in the Lavastorm Analytics Platform, the analytics engine serves as a means of federating data from five different components and applying user-tailored analytics capabilities to all of them.</p>
<p>Other than the analytics engine, the most integral component of the Lavastorm Analytics Platform is a transaction warehouse that provides automated analyses of data from transactions. The warehouse is extremely scalable, can process billions of records in almost real time, and can aggregate a variety of data sources such as CRM, crash data retrieval systems, billing events, and IP networks. Its processing expedience enables users to discern patterns and trends almost as soon as they take place, which is valuable for detecting fraud and other threats.</p>
<p>The platform also comes with a resolution center that provides a central dashboard with reporting and visualization tools for case management, which serves as an environment to process concerns found in the transaction warehouse and analytics engine. Lavastorm Bill Analyzer functions as a practical extension of the analytics engine and displays combined views of services and accounts to customers through various billing systems. The Lavastorm Data Acquisition module is optional and utilizes metadata to process structured data without using typical ETL systems or code. Lavastorm has a desktop version of its platform for single users which enables them to deploy rules-based analytics for ad-hoc projects.</p>
<p><b>Final Thought</b></p>
<p>As its recent partnerships with Datawatch Corporation and Cyfeon Solutions indicate, Lavastorm’s analytics engine is capable of the type of analytics to transform Big Data into valued information. Its key assets are its extreme scalability and data integration utility. The ease of use of its node library allows for virtually anyone to create analytics to their liking while minimizing the involvement of IT departments – which can still enhance analytics options as needed.</p>
<p>Users have the capacity to integrate, analyze, and optimize Big Data, increasing its viability and importance in business and organizational processes. The highly visual nature of the analytics engine facilitates a degree of traceability which enhances data lineage and agility, allowing users to generate continuous automated analytics as readily as those for ad-hoc purposes.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/analyzing-big-data-lavastorm-analytics-engine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Closer Look at the MongoDB Document Database</title>
		<link>http://www.dataversity.net/a-closer-look-at-the-mongodb-document-database/</link>
		<comments>http://www.dataversity.net/a-closer-look-at-the-mongodb-document-database/#comments</comments>
		<pubDate>Tue, 14 May 2013 07:10:42 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[NoSQL]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=19729</guid>
		<description><![CDATA[by Paul Williams The document database, MongoDB, is currently one of the most prevalent NoSQL databases on the market. MongoDB offers both open-source and enterprise versions with the complete support of its developers 10gen. It provides an innovative tool for an individual developer experimenting with the freely available community edition and more robust functionality for a large corporation looking for additional flexibility and scalability than what is found with most relational databases. 10gen began development of MongoDB in 2007 and the database entered production ready status three years later. 10gen found their original inspiration in trying to create an &#8220;as a Service&#8221; product similar to Google&#8217;s App Engine, or Microsoft&#8217;s Cloud computing service, Azure. Seeing how relational databases could not support the requirements of the modern Web for scalability and wide distribution, the company began developing a database to meet these needs, eventually discontinuing work on the Cloud services platform after open sourcing MongoDB in 2009. MongoDB is currently in use at a wide array of organizations, both in the public and private sector. Forbes, Craigslist, MTV Networks, The National Archives, Disney Interactive Media Group, Cisco, and Reverb Technologies collectively make up a portion of MongoDB&#8217;s user base. The name, [...]]]></description>
				<content:encoded><![CDATA[<p style="text-align: left;" align="center">by <a title="Paul Williams" href="http://www.dataversity.net/contributors/paul-williams/" target="_blank">Paul Williams</a></p>
<p>The document database, <a href="http://mongodb.org/">MongoDB</a>, is currently one of the most prevalent <a href="http://www.dataversity.net/the-nosql-movement-document-databases/">NoSQL databases</a> on the market. MongoDB offers both open-source and enterprise versions with the complete support of its developers <a href="http://www.10gen.com/">10gen</a>. It provides an innovative tool for an individual developer experimenting with the freely available community edition and more robust functionality for a large corporation looking for additional flexibility and scalability than what is found with most relational databases.</p>
<p>10gen began development of MongoDB in 2007 and the database entered production ready status three years later. 10gen found their original inspiration in trying to create an &#8220;as a Service&#8221; product similar to Google&#8217;s App Engine, or Microsoft&#8217;s Cloud computing service, Azure. Seeing how relational databases could not support the requirements of the modern Web for scalability and wide distribution, the company began developing a database to meet these needs, eventually discontinuing work on the Cloud services platform after open sourcing MongoDB in 2009.</p>
<p>MongoDB is currently in use at a wide array of organizations, both in the public and private sector. Forbes, Craigslist, MTV Networks, The National Archives, Disney Interactive Media Group, Cisco, and Reverb Technologies collectively make up a portion of MongoDB&#8217;s user base. The name, MongoDB, stands for &#8220;huMONGOus&#8221; database.</p>
<p><b>MongoDB Basic Features </b></p>
<p>MongoDB offers the same capabilities at the database level in both editions, including the free community version. For larger companies looking for enterprise-ready features and support, 10gen provides commercial licenses that include those features, noted later in this article.</p>
<p>Documents marked up using a binary formatted version of JSON (JavaScript Object Notation) are the basic storage element in a MongoDB database instance. Flexible schemas are fully supported, so successive documents in the same collection don&#8217;t have to share a common structure. Even if they do hold a common structure, the same fields in successive documents can each store a different data type.</p>
<p>This data modeling flexibility allows developers and modelers to closely align a database structure with an application&#8217;s object models. The documents are essentially programming objects, after all. Indexes, as well as normalization and de-normalization are possible, allowing fine-tuning of the performance of any application using MongoDB as its persistence engine.</p>
<p>MongoDB easily scales both horizontally and vertically, with full support for sharding and map/reduce processing. Mirroring across LANs and WANs, in addition to MongoDB&#8217;s replication functionality, combine to help ensure a high availability factor.</p>
<p>As mentioned before, documents in MongoDB are stored in BSON, which is a binary representation of JSON. Since BSON documents are limited in size to 16MB, MongoDB supports GridFS, allowing larger documents to be successfully managed by the database. GridFS breaks up larger files into chunks, with one collection storing the chunks, while another collection stores the file&#8217;s metadata.</p>
<p>MongoDB provides a command line shell that fully supports the JavaScript language. In fact, all database CRUD operations are available through JavaScript commands like <i>find, update, save, insert, upsert, </i>and<i> remove.</i> A wide range of client drivers are also provided, supporting the following languages: Python, Ruby, PHP, Perl, Java, Scala, C#, C, C++, Haskell, and Erlang. The C# driver includes support for the .NET Framework&#8217;s LINQ (Language Integrated Query) feature.</p>
<p><b>Getting Started with MongoDB</b></p>
<p>The open-source version of MongoDB is <a href="http://www.mongodb.org/downloads">freely available for download</a> at MongoDB.org. Versions exist for Mac OSX (64-bit), Linux (32 and 64-bit), Windows (32 and 64-bit), and Solaris. Additionally, a demo version of MongoDB Enterprise is available directly from 10gen.</p>
<p>Before experimenting with MongoDB, it is necessary to know how to use a command line shell as well as being at least somewhat familiar with JavaScript syntax. A host of documentation and tutorials are available for MongoDB, but the database install doesn&#8217;t include a graphical IDE for administration. The mongo shell provides a full JavaScript environment with access to the standard database functions; it assumes a running database server available through a <i>localhost</i> interface.</p>
<p>For users who just want to try out the shell without dealing with setting up a database server, 10gen provides a useful browser-based version at MongoDB.org. It automatically connects to a database and includes a basic tutorial that covers all the basic database CRUD functions.</p>
<p align="center"> <a href="http://www.dataversity.net/wp-content/uploads/2013/05/MongoDB-Pic1.png"><img class="alignnone size-large wp-image-19730" alt="MongoDB Pic1" src="http://www.dataversity.net/wp-content/uploads/2013/05/MongoDB-Pic1-1024x629.png" width="620" height="380" /></a></p>
<p align="center"><b>The convenient MongoDB browser-based shell steps a user through creating, reading, updating, and deleting a few document database records.</b></p>
<p>When running the tutorial, the user creates a few document records and adds them to a database collection using JavaScript functions. The tutorial then explains how to query those records using the <i>find</i> statement, in addition to the other CRUD functionality. While running the tutorial, this handy <a href="http://docs.mongodb.org/manual/reference/sql-comparison/">SQL to MongoDB conversion chart</a> serves nicely to illustrate the differences between the two databases, especially when it comes to syntax.</p>
<p style="text-align: center;"><a href="http://www.dataversity.net/wp-content/uploads/2013/05/MongoDB-Pic2.png"><img class="alignnone size-large wp-image-19731" alt="MongoDB Pic2" src="http://www.dataversity.net/wp-content/uploads/2013/05/MongoDB-Pic2-1024x629.png" width="620" height="380" /></a></p>
<p align="center"><b>Querying the new &#8220;artists&#8221; collection for jazz and rock albums using the MongoDB browser shell.</b></p>
<p>The MongoDB browser shell and its embedded tutorial are a great way to wet the appetite and get a feel for this version of the CRUD process before diving into a full MongoDB install and setting up a database server. Installation instructions for MongoDB are <a href="http://docs.mongodb.org/manual/installation/">available here</a> for each platform.</p>
<p><b>Enterprise</b><b> Ready Commercial MongoDB Licenses and Support Options</b></p>
<p>As mentioned earlier, larger companies needing a range of enterprise-ready features for their MongoDB application have a host of options provided by 10gen. A commercially-licensed version of MongoDB, branded as MongoDB Enterprise, includes Kerberos Authentication which allows easy integration into existing security systems.</p>
<p>On-premise monitoring is another feature of Enterprise, offering a collection of over 100 system metrics. It works in a similar fashion as 10gen&#8217;s Cloud-based monitoring service which is freely available to all MongoDB users, regardless of license. Users with support subscriptions gain the additional benefit of having 10gen&#8217;s engineers predictively analyze deployment issues and make recommendations before problems occur.</p>
<p>MongoDB Enterprise also offers support for SNMP – Simple Network Management Protocol – which facilitates the integration of the database into other enterprise applications and monitoring services.  Additionally, Enterprise is certified to operate with a variety of Linux distributions, providing additional piece of mind for larger corporate installations.</p>
<p>While the Enterprise version of MongoDB includes some additional functionality suitable for an enterprise deployment, the open source edition essentially works the same at the database level. 10gen offers three levels of MongoDB support subscriptions, with the highest level including access to MongoDB Enterprise. Additional differences between support levels, other than the per server cost, relate to the Service Level Agreement, support availability, and whether or not emergency software patches are included.</p>
<p>Finally, 10gen recently introduced into limited release a Cloud-based backup service for MongoDB instances. It follows a &#8220;pay as you use&#8221; billing model, with an agent process running on the server performing backups in the background every 6 hours. It retains multiple copies based on a set retention policy, with restores performed by 10gen on demand.</p>
<p><b>Options for Learning More about MongoDB</b></p>
<p>The MongoDB.org website includes the <a href="http://docs.mongodb.org/manual/">documentation</a>, installation guides, and tutorials that serve nicely in getting anyone started using the database. There are separate sections suitable for developers and administrators as well as a full reference covering shell methods, in addition to database commands and query operators. For a higher level view, 10gen provides a collection of case studies, datasheets, and white papers on their website.</p>
<p>Considering the rapidly growing popularity of MongoDB, it makes sense that a large collection of books have hit the stores in the past year or so. <i>MongoDB: The Definitive Guide </i>from O&#8217;Reilly Books is recommended by 10gen; it looks to be a robust volume that provides a full overview of the database, including administrative functions, in addition to client development examples for the Java, PHP, Python, and Ruby languages.</p>
<p>The NoSQL movement continues to increase in relevance as society becomes ever more interconnected with a resultant exponential increase in data volume. MongoDB is one of the major databases at the center of this movement. It is a technology worth exploring for every database professional.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/a-closer-look-at-the-mongodb-document-database/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Enterprise Data World 2013 Conference Overview</title>
		<link>http://www.dataversity.net/enterprise-data-world-2013-conference-overview/</link>
		<comments>http://www.dataversity.net/enterprise-data-world-2013-conference-overview/#comments</comments>
		<pubDate>Thu, 09 May 2013 07:10:31 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Conference and Webinar Communities]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Enterprise Data World]]></category>
		<category><![CDATA[Enterprise Information Management]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=19669</guid>
		<description><![CDATA[by Jelani Harper Numbers don’t lie: 800 attendees. 27 different countries. 7 continents. 12 individual tracks (plus combinations). 5 days and one idyllic location. Representing a range of industries from financial to education, media to healthcare, and attended by professionals from franchises such as Walmart, Bank of America, and McDonald’s, the event was truly a global affair at San Diego’s Sheraton Hotel &#38; Marina from April 28 to May 2. Such diversity was well represented in the presentations, which ran the gamut from metadata and modeling to analytics and enterprise information architecture. Some speakers discussed overviews of their organizations or specific products. Presentations on Big Data and semantics were particularly well attended, while Agile principles, data driven business, and “smart” data were common throughout the show. Conference producers DATAVERSITY™ and DAMA International stratified the sessions by interest while maintaining an educational slant. In addition to workshops and tutorials (some of the latter which ran over three hours) on cutting-edge Data Management technologies, attendees were privy to industry-specific presentations and those designed for growth and coaching. A lunchtime session held by Global IDs was particularly insightful, as were the five minute “lightning talks” hosted by InfoAdvisors’ Karen Lopez, which included a [...]]]></description>
				<content:encoded><![CDATA[<p style="text-align: left;" align="center"><a href="http://www.dataversity.net/wp-content/uploads/2013/03/edw.png"><img class="alignleft size-medium wp-image-18315" alt="edw" src="http://www.dataversity.net/wp-content/uploads/2013/03/edw-300x78.png" width="300" height="78" /></a>by <a title="Jelani Harper" href="http://www.dataversity.net/contributors/jelani-harper/" target="_blank">Jelani Harper</a></p>
<p>Numbers don’t lie: 800 attendees. 27 different countries. 7 continents. 12 individual tracks (plus combinations). 5 days and one idyllic location.</p>
<p>Representing a range of industries from financial to education, media to healthcare, and attended by professionals from franchises such as Walmart, Bank of America, and McDonald’s, the event was truly a global affair at San Diego’s Sheraton Hotel &amp; Marina from April 28 to May 2.</p>
<p>Such diversity was well represented in the presentations, which ran the gamut from metadata and modeling to analytics and enterprise information architecture. Some speakers discussed overviews of their organizations or specific products. Presentations on Big Data and semantics were particularly well attended, while Agile principles, data driven business, and “smart” data were common throughout the show.</p>
<p>Conference producers DATAVERSITY™ and DAMA International stratified the sessions by interest while maintaining an educational slant. In addition to workshops and tutorials (some of the latter which ran over three hours) on cutting-edge Data Management technologies, attendees were privy to industry-specific presentations and those designed for growth and coaching. A lunchtime session held by Global IDs was particularly insightful, as were the five minute “lightning talks” hosted by InfoAdvisors’ <a href="http://www.dataversity.net/contributors/karen-lopez/">Karen Lopez</a>, which included a Skype presentation by Unifyo’s Benjamin Wirtz and an exhortation from Millennium Data Management’s Cynthia Hauer about the manageability of Big Data.</p>
<p><a href="http://www.dataversity.net/wp-content/uploads/2013/05/RFAV71161.jpg"><img class="alignright size-medium wp-image-19671" alt="RFAV7116" src="http://www.dataversity.net/wp-content/uploads/2013/05/RFAV71161-300x199.jpg" width="300" height="199" /></a>From Sunday evening’s kick-off panel – in which attendees were urged to broaden their horizons by learning about data topics that they were less than familiar with – it became clear that the implicit theme of the conference was expanding the traditional conceptions and deployments of data. DATAVERSITY’s Tony Shaw commented that:</p>
<p style="padding-left: 30px;">“The whole purpose of this particular gathering at the start of the event is to set the tone for the sort of information sharing we’re going to have for the next few days and give people a sense of the fact that there is an existing community of others who are willing to help you, who are willing to share their experiences, and at the same time who want to learn from you.”</p>
<p><b>Critical Keynotes</b></p>
<p>Nowhere was such an expansion of learning and data technology prevalent than in the keynote addresses, particularly during the April 30<sup>th</sup> speech given by GitHub’s Tim Berglund, “Then Our Buildings Shape Us: A New Way to Think About Technology Selection”. By spanning different facets of the humanities such as architecture, music, and literature in various historical periods, Berglund was able to demonstrate how form inevitably shapes content. The technologies that data professionals use invariably affect the way data is ingested, governed, and applied. He validated such points with a plethora of disparate examples utilizing the works and words of John Coltrane, Winston Churchill, and Frederick Winslow Taylor. Berglund was able to craft his presentation so that it mirrored the innovation he was advocating in the consideration of technology selection. He stated that:</p>
<p style="padding-left: 30px;">“I want you to get the form into your head and interpret the form, not the content. I want you to ask yourselves how is this system going to change you intellectually as a decision maker, as an educator, as whatever your position is. You are a creative person. You’re creating and molding technology just like novelists, just like composers, just like architects. You have to be conscious of the form you’re creating in and of the impact it’s having on you.”</p>
<p>While Berglund emphasized the need for professionals to be cognizant of the fact that technologies used for Data Management affect their perception and use of data, James Fowler’s Social Data Keynote on May 1<sup>st</sup>, “Connected: How Social Networks Affect Everything You Feel, Think and Do” focused on expanding data applications beyond the traditional realms of operations and business into social purposes. The University of California at San Diego professor of political science and medical genetics described an exacting research process that utilized data to document the influence that people have on one another. His findings were framed within the context that while social media such as Facebook provides a means of online socialization and facilitation of relationships between people, such social networks have always existed and play a considerable role in the shaping of individual and collective lives.</p>
<p>Fowler’s keynote was preceded by the Big Data Keynote of Qubole’s Ashish Thusoo, “Taming Elephants, Bees and Pigs—The Big Data Circus”, which largely documented the process in which Facebook – for whom the speaker worked from 2007 to 2011 – devised the increasingly ubiquitous technology for Big Data. In addition to providing a historical analysis of several drivers and concerns that led to the development of Hadoop and Hive, Thusoo addressed Big Data’s evolution to include Cloud-based applications for access to analytics. He posited that:</p>
<p style="padding-left: 30px;">“Cloud architecture has for the first time allowed you to do things like get analytics resources on demand. And if you think about that, analytics is a lot of the Big Data workload. So if you marry these two things together – marry the Cloud with Big Data – what you get is these resources when you need them. You don’t have to be Facebook to get 100,000 nodes on the Cloud itself.”</p>
<p><b>Big Fun: Vendors and Food</b></p>
<p>While most attendees seemed to value the education-based approach of the individual sessions, the shameless promotional tactics of the vendors was just as much – if not more so – appreciated  As if the complementary beer, wine, and hors d&#8217;oeuvres (which included an assortment of chicken, veggies, breads, and even ice cream) weren’t enough, several vendors augmented live demonstrations of various products with giveaways and raffle prizes.</p>
<p>Gold sponsor Denodo lured participants to its data virtualization demonstration by raffling off a Kindle Fire HD, while Attacama enticed participants to view its Data Quality Center (DQC) by handing out packs of gummy bears – and raffling off a 6,000 calorie gummy bear. Gold sponsor Informatica dished out collateral CD overviews of its data quality and Master Data Management solutions in between raffling off a pair of Bose headphones. Data Advantage Group, another gold sponsor, gave out an iPad mini while recruiting attendees to view its MetaCenter platform. Platinum sponsor Global IDs talked to participants about its Enterprise Information Management Suite 8.0 and kitted out attendees with computer-carrying satchels; fellow platinum sponsor Adaptive<b> </b>demonstrated metadata management tools and dished out solar ruler calculators.</p>
<p>Gold sponsor HP discussed aspects of its Vertica Community Edition which includes a column orientation that reduces storage costs, while gold sponsor IBM disseminated foldable water bottles and demonstrated its Information Governance Unified Process software. Other notable demos included Intellicus’ Big Analytics and Business Intelligence products and Top Quadrant’s Semantics software. San Mateo’s SnapLogic provided information about its Big Data-as-a-Service solution. There was even a certain Data Management education website that got in on the action by handing out DATAVERSITY t-shirts, sweatshirts, and sweatpants.</p>
<p>IMCue Solutions’ <a href="http://www.dataversity.net/contributors/john-ladley/">John Ladley</a> best summarized the sentiment about the vendors and the plethora of freebies:</p>
<p style="padding-left: 30px;">“Let’s recap: why are we here? Valuable merchandise. iPad – another electronic distraction. Something else to mess with your sleep pattern. Kindle – Amazon’s version of messing with my sleep pattern. Storage devices, also known as belt drives or flash drives. Software licenses – someone is giving away software licenses to their product. There’s cookies, candy, all these little things&#8230; You can get your holiday shopping done right here.”</p>
<p><b>In Perspective</b></p>
<p>Despite the giveaways, the continental breakfasts and extravagant three-course lunches (including sumptuous chicken and quinoa dishes, multi-layered vegetable lasagna, raspberry iced tea, and an assortment of tarts and pies for dessert), the shuttle rides into historic parts of San Diego, and the choice location nestled against the bay, it would be hard to dispute the fact that the most valuable resource at EDW 2013 was the participants, speakers, and company representatives themselves.</p>
<p>The sense of community among data professionals appears to be growing, as evinced by the presence of co-conference producer DAMA, who had representatives from chapters in Brazil, Canada, Japan, South Africa, and the U.S. present. The non-profit Data Management organization issued out awards to Walgreen’s Mike Jennings for professional excellence, Philip LaPlanet for academic excellence, Daniel Moody for community excellence, and Daniel Paolini for government excellence. According to DAMA’s Peter Aiken:</p>
<p style="padding-left: 30px;">“What DAMA is about is the facilitating of discipline into IT organizations, and we spend a lot of time doing it. Our organization is 100 percent volunteer based. We’re the largest vendor independent, technology independent, methodology independent Data Management organization in the world. Unlike for profit organizations, we want to make sure we stay as neutral as possible so that practicing data managers can get the best advice and expertise.”</p>
<p>Ultimately, EDW 2013 was about the exchange of such expertise and the fostering of a global community in which to do it. The many panels, sessions, and conversations between professionals reveal that the applications for data are growing. As data usage increases throughout the public and private spheres of professional industries, the world is truly getting smaller and bringing people closer together.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/enterprise-data-world-2013-conference-overview/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Notes from the World of Data Science – April 2013</title>
		<link>http://www.dataversity.net/notes-from-the-world-of-data-science-april-2013/</link>
		<comments>http://www.dataversity.net/notes-from-the-world-of-data-science-april-2013/#comments</comments>
		<pubDate>Thu, 02 May 2013 07:08:38 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Data Science]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Education]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=19550</guid>
		<description><![CDATA[by Paul Williams This article takes a glance at some stories in the last few weeks emerging from the growing world of Data Science. Included are news about a Data Science Consortium taking shape in North Carolina, a new tool for yet another &#8220;as a Service&#8221; acronym (DSaaS), and a look at Software Defined Networking technology and its importance for the future in making Big Data Science operate smoothly for the enterprise. Data Science Gets Attention from the New York Times The New York Times recently trained its learned eye towards to the practice of Data Science, focusing on the trend of new university programs in Data Science; something previously covered here at DATAVERSITY™. The Times article talks about many of the issues leading to the importance of data in today&#8217;s connected society, most obviously the parallel growths of Big Data and social networking. Of special note is a new Masters program specializing in data at Columbia University. Rachel Schutt, a research scientist for Johnson Research Labs, taught an introductory course in Data Science at Columbia. She commented on what makes up a data scientist: &#8220;A hybrid computer scientist software engineer statistician. The best tend to be really curious people, [...]]]></description>
				<content:encoded><![CDATA[<p style="text-align: left;" align="center"><a href="http://www.dataversity.net/wp-content/uploads/2013/05/DS-News.jpg"><img class="alignleft size-medium wp-image-19551" alt="DS News" src="http://www.dataversity.net/wp-content/uploads/2013/05/DS-News-300x198.jpg" width="300" height="198" /></a>by <a title="Paul William" href="http://www.dataversity.net/contributors/paul-williams" target="_blank">Paul Williams</a></p>
<p>This article takes a glance at some stories in the last few weeks emerging from the growing world of Data Science. Included are news about a Data Science Consortium taking shape in North Carolina, a new tool for yet another &#8220;as a Service&#8221; acronym (DSaaS), and a look at Software Defined Networking technology and its importance for the future in making Big Data Science operate smoothly for the enterprise.</p>
<p><b>Data Science Gets Attention from the New York Times</b></p>
<p>The New York Times recently <a href="http://www.nytimes.com/2013/04/14/education/edlife/universities-offer-courses-in-a-hot-new-field-data-science.html?pagewanted=all&amp;_r=0">trained its learned eye towards to the practice of Data Science</a>, focusing on the trend of new university programs in Data Science; something previously covered here at <a href="http://www.dataversity.net/data-science-programs-on-the-increase-at-universities/">DATAVERSITY</a>™. The Times article talks about many of the issues leading to the importance of data in today&#8217;s connected society, most obviously the parallel growths of Big Data and social networking.</p>
<p>Of special note is a new Masters program specializing in data at Columbia University. Rachel Schutt, a research scientist for Johnson Research Labs, taught an introductory course in Data Science at Columbia. She commented on what makes up a data scientist: &#8220;A hybrid computer scientist software engineer statistician. The best tend to be really curious people, thinkers who ask good questions and are O.K. dealing with unstructured situations and trying to find structure in them.”</p>
<p>Chris Wiggins, an applied mathematics professor at Columbia, talked about how today&#8217;s generation is already experienced in the use cases typically found in Data Science:</p>
<p>“This is a generation of kids that grew up with data science around them — Netflix telling them what movies they should watch, Amazon telling them what books they should read — so this is an academic interest with real-world applications. And, they know it will make them employable.”</p>
<p>The big challenge for these universities and their nascent programs is training the data scientists quickly enough to satiate the growing need in business. Another issue is the wide array of different skills needed in Data Science, and trying to fit the right mix of classes into the curriculum. Some mixture of industry input will be needed to help these colleges grow the next group of important technology workers.</p>
<p><b>Software Defined Networking to Make Data Science Hum in the Future</b></p>
<p>Earlier in April, PC World took a look at <a href="http://www.pcworld.com/article/2035654/bigdata-science-requires-sdn-internet2-chief-says.html">Software Defined Networking</a>, a set of technologies at the forefront of Internet hardware research. Essentially, they allow the virtualization of the switches and routers that transport Internet traffic all over the world today. What now runs in an array of racks in a data center would in the future operate inside a computer.</p>
<p>Internet2 is a firm currently operating a network linking various research institutions using elements of SDN in their overall infrastructure. Company head David Lambert feels SDN is an important part in ensuring the success of Big Data science. He feels the current Internet is unable to handle the large data sets necessary for this kind of research. “The genomics community finds very little in our current-generation Internet that is capable of supporting the needs they have,” Lambert said.</p>
<p>The current research into SDN technologies reflects more of an open-source development model than the commercial products currently purveyed by the giants in the Internet telecommunications industry. This openness remains vital in fostering the innovation for the <i>new</i> Internet, according to Lambert. The advancement of Data Science obviously will benefit from a similar growth in SDN technology.</p>
<p><b>Data Science Research Goes to Tobacco Road</b></p>
<p>A <a href="http://www.news-medical.net/news/20130422/North-Carolina-set-to-become-a-leading-hub-for-data-intensive-business-data-science-research.aspx">new Data Science consortium</a> was formed earlier this year in the Research Triangle area of North Carolina. Called the National Consortium for Data Science (NCDS), this new organization brings together Data Science researchers at universities and those involved in the practice at the corporate and government level. The group hopes to address the issues surrounding the emergence of Data Science – namely handling the challenges in collecting, managing, and sharing Big Data.</p>
<p>NCDS launched at the University of North Carolina&#8217;s Renaissance Computing Institute (RENCI), and part of the organization&#8217;s charter is to make Tobacco Road central to Data Science research in the United States. &#8220;Those who harness the power of big data and use it to develop new data-intensive business sectors will be the winners in the 21st century economy,&#8221; said Stanley C. Ahalt, professor of computer science at UNC-Chapel Hill and director of RENCI. &#8220;Our members understand that, want to find solutions to big data problems and put North Carolina on the map as a center of data science innovation.&#8221;</p>
<p>In addition to data-oriented research, the consortium plans to serve as an incubator for Data Science-focused businesses, as well as fostering the growth of data analytics programs at the university level. The industry members of the consortium include Cisco, GE, IBM, NetApp, and SAS. Academia is represented by UNC-Chapel Hill, RENCI, North Carolina State University, UNC Charlotte, UNC General Administration, Duke University, and Drexel University. The Hamner Institutes for Health Sciences, MCNC, the National Institute of Environmental Health Sciences, RTI International, and the U.S. Environmental Protection Agency make up the governmental and non-profit contingent.</p>
<p><b>A Tool for Data Science as a Service Emerges</b></p>
<p>The continued growth of Cloud computing services has led to a plethora of &#8220;as a Service&#8221; acronyms most of which were <a href="http://www.dataversity.net/2013-trends-in-data-as-a-service-and-cloud-computing/">previously mentioned at DATAVERSITY</a>. It makes perfect sense that Data Science follows this trend, and the upstart Swedish company, <a href="http://www.augify.com/">Augify</a>, recently introduced a <a href="http://www.marketwatch.com/story/augify-launches-new-data-science-visualization-platform-for-financial-markets-2013-04-15">visualization tool providing Data Science as a Service</a> (DSaaS) for consumers in the financial industry.</p>
<p>The Cloud-based tool weans meaningful stock-related activity out of the Big Data morass of social networks and other sources. One of the ultimate goals of the system is detect buy and sell signals, and thus predict large investment transactions on the stock market. Developing a system to project the various market cycles has long been the holy grail of many working in financial markets:</p>
<p>&#8220;The new tape is not only Twitter &#8211; it&#8217;s Twitter plus thousands of other data sources. We need new ways to aggregate, add meaning, and visualize this data in real-time,&#8221; said Augify chief, Jay Solomon. &#8220;Data Science-as-a-Service is a crucial tool for human understanding as we transition from the Information to the Knowledge Revolution.&#8221;</p>
<p><b>The Data Scientist as a Rockstar</b></p>
<p>If coverage in the New York Times is proof that Data Science is a hot topic, an early April <a href="http://www.wired.com/wiredenterprise/2013/04/phd-data-scientist/">feature article</a> in the hip technology magazine, Wired, is just icing on the proverbial cake. This article focuses on the premise that working with data isn&#8217;t necessarily something for nerds; that the Data Scientist is the 21st Century rockstar. Don&#8217;t expect a collection of data professionals on stage performing analytics on their laptops to start selling out 20,000-seat arenas, but there is no denying the growing importance of the practice in today&#8217;s socially-connected world.</p>
<p>The article mentions the success of Nate Silver&#8217;s predictions in the 2012 Presidential Election, as well as the data analysis-heavy <i>Moneyball</i> techniques that have allowed some baseball front offices to successfully compete against franchises with much larger budgets. It also illustrates how advanced degrees in mathematics aren&#8217;t a necessity to become a successful Data Scientist.</p>
<p>“In fact, I argue that often Ph.D.s in computer science in statistics spend too much time thinking about what algorithm to apply and not enough thinking about common sense issues like which set of variables (or features) are most likely to be important,” says Anthony Goldbloom, CEO of the Big Data analytics company, Kaggle. Answering the &#8220;why&#8221; is sometimes more important than answering the &#8220;what&#8221; or the &#8220;how.&#8221;</p>
<p>The science itself is a vital part of the Data Science equation, but many current practitioners feel there is a measure of art involved as well: “That’s because math is only half the problem when it comes to data science — it’s also an art as well. The artistry comes in the form of people who have intuition and who creatively approach a problem,” says Douglas Merrill, a former CIO at Google that formed his own data analytics firm, Zest Finance.</p>
<p>Still, intuition and artistry only go so far, as once someone decides to be a data scientist, it is still important that he or she keeps their finger on the pulse of new technology changes, lest they get left in the wake of a quickly growing movement. The best Data Scientists will be the ones able to merge the science and the art to solve the problems found within Big Data.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/notes-from-the-world-of-data-science-april-2013/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Chief Data Architect</title>
		<link>http://www.dataversity.net/the-chief-data-architect/</link>
		<comments>http://www.dataversity.net/the-chief-data-architect/#comments</comments>
		<pubDate>Tue, 30 Apr 2013 07:10:26 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Enterprise Information Management]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=19511</guid>
		<description><![CDATA[by Michael Brackett Over the years many titles have been proposed and used for primary data management positions, such as data administrator, data guardian, data czar, data custodian, and so on.  A recent title is data governor, presumably to manage the data governance function.  However, data cannot be governed—only people can be governed.[i]  Another recent title is data scientist, presumably for managing big data.[ii]  However, that implies that a data scientist is not needed for small data. The whole scenario of primary data management titles seems to be searching for a title that works, then keep that title—a scenario known as silver bullet titles.  That scenario is simply another form of the ongoing hype-cycles in data management.  It avoids precise definitions of position responsibilities and position location in an organization. The scenario continues with proposals for the creation of a Chief Data Officer (CDO).  The good news is that the proposal finally admits that the Chief Information Officer (CIO) has no, and is not, properly managing the data resource.  The proposals also recognize the difference between data and information, meaning the CDO manages the data resource and the CIO manages the production of information from that data resource.  However, the [...]]]></description>
				<content:encoded><![CDATA[<p style="text-align: left;" align="center"><a href="http://www.dataversity.net/wp-content/uploads/2013/04/Architect.jpg"><img class="alignleft size-medium wp-image-19512" alt="Architect" src="http://www.dataversity.net/wp-content/uploads/2013/04/Architect-300x300.jpg" width="300" height="300" /></a>by <a title="Michael Brackett" href="http://www.dataversity.net/contributors/michael-brackett" target="_blank">Michael Brackett</a></p>
<p>Over the years many titles have been proposed and used for primary data management positions, such as data administrator, data guardian, data czar, data custodian, and so on.  A recent title is data governor, presumably to manage the data governance function.  However, data cannot be governed—only people can be governed.<a title="" href="#_edn1">[i]</a>  Another recent title is data scientist, presumably for managing big data.<a title="" href="#_edn2">[ii]</a>  However, that implies that a data scientist is not needed for small data.</p>
<p>The whole scenario of primary data management titles seems to be searching for a title that works, then keep that title—a scenario known as <i>silver bullet titles.  </i>That scenario is simply another form of the ongoing hype-cycles in data management.  It avoids precise definitions of position responsibilities and position location in an organization.</p>
<p>The scenario continues with proposals for the creation of a Chief Data Officer (CDO).  The good news is that the proposal finally admits that the Chief Information Officer (CIO) has no, and is not, properly managing the data resource.  The proposals also recognize the difference between data and information, meaning the CDO manages the data resource and the CIO manages the production of information from that data resource.  However, the bad news is that the proposals likely won’t work any better than previous titles.</p>
<p>The CIO evolved from the Chief Financial Officer (CFO) largely because data processing originated in Finance in most organizations.  As data processing evolved into Information Technology (IT), the CIO position became part of IT.  The problem is that the CIO and IT never adequately managed data as a critical resource of the organization, as evidenced by the huge quantities of disparate data that are blocking efficient use of the data resource to support the operational and analytical business information demand.</p>
<p>Information Technology, including the CIO, is typically hardware and software centric.  It is primarily concerned with hardware and software acquisitions, physical data structures, and the manipulation of data to produce information.  It is seldom interested in the formal design and development of a high-quality data resource within a single organization-wide data architecture.</p>
<p>That orientation is not likely to change with a CDO.  The CDO retains the <i>Officer</i> concept and will likely remain oriented toward hardware and software acquisition, database management, and the physical manipulation of the data.  The CDO will not likely have a major orientation toward the formal design and development of data as a critical resource of the organization based on business needs.  Very little will have been gained.</p>
<p>The question becomes <i>What can be done to resolve the disparate data situation so that the business information needs are met?</i>  Specifically, <i>What should the title be and where should data management be placed within public and private sector organizations to ensure that an organization’s data resource is properly designed and implemented to fully support the operational and analytical business information demand?</i></p>
<p>The answer is the establishment of a Chief Data Architect.  The Chief Data Architect was introduced in the mid-1990s to face challenges of managing an organization’s data as a critical resource and to build a single organization-wide data architecture, according to formal concepts, principles, and techniques, and within which an organization’s data are formally designed and managed.<a title="" href="#_edn3">[iii]</a></p>
<p>An architect is one who designs and advises on the construction of something; one who plans and achieves a difficult objective; a master builder.  The Chief Data Architect is an architect that expedites and facilitates the design and development of a high-quality data resource, within a single organization-wide data architecture, to meet an organization’s current and future business information demand.  The Chief Data Architect must understand the business and be <i>Of the business, by the business, and for the business</i>.</p>
<p>The Chief Data Architect is responsible for analyzing the business needs and synthesizing a solution.  They must follow the sequence of business understanding, to logical data resource design, to physical data resource implementation.<a title="" href="#_edn4">[iv]</a>  They must avoid any brute-force-physical development effort, stop the burgeoning data disparity, and resolve the existing data disparity.</p>
<p>Data architects work for the Chief Data Architect and are responsible for the detailed design and development of the data resource.  Data modelers work with the data architects to produce complete models of the data architecture, including formal names, comprehensive definitions, proper structures, precise integrity rules, and robust documentation.  Data architects typically have people skills and data modelers typically have the tool skills.  Both must be skilled at portraying the business needs in a manner that is readily understandable to business professionals based on their perception of the business world.<a title="" href="#_edn5">[v]</a></p>
<p>The Chief Business Architect thoroughly understands the business activities and works with closely with the Chief Data Architect to design and develop the data resource.  They expedite and facilitate the development of a single organization-wide business activity architecture and for resolving the business activity disparity, which is often larger than the data resource disparity.  They ensure that business professionals step up to the task of clearly explaining their data needs.</p>
<p>Information Technology manages the hardware, system software, databases, system upgrades, backup / recovery, data storage and retrieval, performance, migrations, and so on.  Database professionals within IT work with the data architects and data modelers for physical implementation of the logical design without compromising that logical design.</p>
<p>A small organization may have only one Chief Data Architect.  A medium size organization could have a Chief Data Architect and one or more data architects and data modelers.  Larger organizations could have a Chief Data Architect, senior data architects and data modelers, and junior data architects and data modelers.</p>
<p>The Chief Data Architect must be located at an executive level in the business, not in IT, to ensure that design and development of the data resource progresses from business needs, to logical design, to physical implementation.  The position must be equivalent to the directors of finance, human resources, and facilities management, and must have responsibility across all business functions.  Data architects and data modelers may be assigned to the Chief Data Architect, or to specific business functions or subject areas.  They can be permanently located or assigned as needed depending on the size of the organization, its structure, and the workload.  The basic principle is that they must follow formal data resource design and development rules, just like following finance rules, human resource rules, or property management rules.</p>
<p>Several guidelines are available for the primary data management responsibility.  Public and private sector organizations have a better chance of successfully managing their data as a critical resource of the organization when:</p>
<p style="padding-left: 30px;">The primary responsibility for managing data as a critical resource is located in the business rather than in IT, and is separate from the Chief Information Officer.</p>
<p style="padding-left: 30px;">A Chief Data Architect is assigned that primary responsibility which extends across all business functions.</p>
<p style="padding-left: 30px;">The design and development of the high-quality data resource is driven by the current and future operational and analytical business information needs, and is based on formal concepts, principles, and techniques.</p>
<p style="padding-left: 30px;">The data resource is designed and developed within a single organization-wide data architecture, avoiding multiple independent and competing data architectures, and preventing data disparity.</p>
<p style="padding-left: 30px;">The data resource is designed and developed from business needs, to logical design, to physical implementation, avoiding any brute-force-physical approaches.</p>
<p style="padding-left: 30px;">Business professionals are actively involved in the design and development of the data resource.</p>
<p style="padding-left: 30px;">Database professionals are actively involved in the physical implementation and operation of the data resource without compromising the logical design.</p>
<p style="padding-left: 30px;">A teamwork approach is established between the design, development, and implementation of the data resource, and the preparation of information from that data resource.</p>
<p>The primary responsibility for data resource management—managing data as a critical resource of the organization—must go to the business, because IT, including the CIO, has not adequately managed data as a critical resource.  A Chief Data Architect must lead data resource management, must be located at an executive level, and must have responsibility across all business functions.  The Chief Data Architect expedites and facilitates the analysis of business information needs, the logical design of a data resource that supports those needs, and the physical implementation of that design.</p>
<p>The CIO has responsibility for the development of information from the data resource.  Database professionals will remain in IT as long as they properly implement the logical design.  Otherwise, that function must go to the business under the Chief Data Architect.</p>
<p>Data management professionals and business professionals must actively promote and support the creation of a Chief Data Architect if they ever expect to develop a high-quality data resource that fully support their operational and analytical business information demand.  Organizations that don’t establish a Chief Data Architect have a greater chance of failing.</p>
<div><br clear="all" /></p>
<hr align="left" size="1" width="33%" />
<div>
<p><a title="" href="#_ednref1">[i]</a> Michael Brackett, <i>Can Data Really Be Governed.</i> DATAVERSITY, May 29, 2012.</p>
</div>
<div>
<p><a title="" href="#_ednref2">[ii]</a> Michael Brackett, <i>Is Big Data A Meaningful Term.</i> DATAVERSITY, February 26, 2013.</p>
</div>
<div>
<p><a title="" href="#_ednref3">[iii]</a> Michael Brackett, <i>Data Sharing Using a Common Data Architecture.</i> John Wiley &amp; Sons, New York, 1994.</p>
</div>
<div>
<p><a title="" href="#_ednref4">[iv]</a> Michael Brackett, <i>The Five Horsemen of Disparate Data.</i> DATAVERSITY, November 29, 2012.</p>
</div>
<div>
<p><a title="" href="#_ednref5">[v]</a> Michael Brackett, <i>The Umwelt Principle</i>. DATAVERSITY, January 29, 2013.</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/the-chief-data-architect/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Big Data Plus Data Virtualization Equals?</title>
		<link>http://www.dataversity.net/big-data-plus-data-virtualization-equals/</link>
		<comments>http://www.dataversity.net/big-data-plus-data-virtualization-equals/#comments</comments>
		<pubDate>Thu, 25 Apr 2013 08:08:42 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Enterprise Information Management]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=19419</guid>
		<description><![CDATA[by Jelani Harper Organizations are becoming more data-driven in their operational and business processes, and it has become clear that Big Data is here to stay. International Data Corporation forecasts the Big Data’s services and technologies market to expand from $3.2 billion in 2010 to $16.9 billion in 2015 – a 40 percent annual growth rate. One of the factors influencing both little and Big Data’s hold on the IT world is Data Virtualization, otherwise known as Data-as-a-Service (DaaS) accessed through the Cloud. The benefits of this approach are manifold: customers only pay for data that they need, have access to a plethora of data sources, and are not burdened by inflexible architecture and costly storage issues. Until recently, Big Data was considered a resource for only the largest and most well-financed operations. However, a number of highly reputable vendors – including IBM, Oracle, and Microsoft – have developed solutions that facilitate Big Data-as-a-Service (BDaaS). Suddenly, the architecture necessary to handle the velocity, volume, and variety of Big Data has become more accessible to small- and medium-sized businesses. In addition to utilizing major tech vendors to access data via the Cloud, users can also employ a third-party Managed Service Provider [...]]]></description>
				<content:encoded><![CDATA[<p style="text-align: left;" align="center"><a href="http://www.dataversity.net/wp-content/uploads/2013/04/bigdatavirtualization.jpg"><img class="alignleft size-medium wp-image-19420" alt="bigdatavirtualization" src="http://www.dataversity.net/wp-content/uploads/2013/04/bigdatavirtualization-300x214.jpg" width="300" height="214" /></a>by <a title="Jelani Harper" href="http://www.dataversity.net/contributors/jelani-harper/" target="_blank">Jelani Harper</a></p>
<p>Organizations are becoming more data-driven in their operational and business processes, and it has become clear that Big Data is here to stay. International Data Corporation <a href="http://www.idc.com/getdoc.jsp?containerId=prUS23355112#.UWXlNjWykyc">forecasts</a> the Big Data’s services and technologies market to expand from $3.2 billion in 2010 to $16.9 billion in 2015 – a 40 percent annual growth rate. One of the factors influencing both little and Big Data’s hold on the IT world is Data Virtualization, otherwise known as Data-as-a-Service (DaaS) accessed through the Cloud. The benefits of this approach are manifold: customers only pay for data that they need, have access to a plethora of data sources, and are not burdened by inflexible architecture and costly storage issues.</p>
<p>Until recently, Big Data was considered a resource for only the largest and most well-financed operations. However, a number of highly reputable vendors – including IBM, Oracle, and Microsoft – have developed solutions that facilitate Big Data-as-a-Service (BDaaS). Suddenly, the architecture necessary to handle the velocity, volume, and variety of Big Data has become more accessible to small- and medium-sized businesses.</p>
<p>In addition to utilizing major tech vendors to access data via the Cloud, users can also employ a third-party Managed Service Provider (MSP) to extract pertinent data from Cloud-based sources. The critical component in informing real-time decisions based on Big Data is the deployment of analytics and intelligence that transforms raw data into useful information. Although MSPs are happy to provide this service as well (more on that later), it behooves organizations to integrate Big Data with their own analytics.</p>
<p><b>Oracle’s Exadata</b></p>
<p>Oracle’s Exadata system is a parallel data warehouse that operates in private Clouds. Its primary strengths include its security (via transparent encryption and fine-grained access control) and its expedient analytics querying, which utilizes specialized indexes and materialized views. A recent <a href="http://www.oracle.com/us/corporate/press/1898156">Oracle press release</a> indicates that Australian supermarket chain Coles, which has approximately 2,000 stores and 100,000 employees, accelerated its processes and querying nearly six times using Exadata.</p>
<p>However, Exadata’s analytics capabilities require a substantial amount of disk space and CPU power. Oracle offers training certifications for Exadata administrators and the solution is compatible with SQL and utilizes advanced compression capabilities. Exadata’s storage and intelligence capacity can be increased by integrating it with Oracle’s Exalytics In-Memory Machine and Big Data Appliance.</p>
<p><b>IBM</b></p>
<p>IBM’s recent acquiring of Netezza and its technologies has enhanced its potential for delivering BDaaS. Netezza powers IBM’s <a href="https://www-304.ibm.com/partnerworld/wps/servlet/ContentHandler/pw_com_nws_expert_integrated_systems?cmp=pw&amp;cpb=pw&amp;ct=pwrss&amp;cr=pwrss&amp;ccy=zz">PureData System for Analytics</a> (one of several tools in its PureSystems offering) which delivers extremely expedient analysis of terabytes of data. PureSystems is an integral component of the company’s SmartCloud services, through which users can set up private and hybrid Clouds to access Big Data. SmartClouds is based on open standards and offers a scalability that is essential for Big Data and facilitates an Agile environment.</p>
<p>PureData and SmartCloud suites include a variety of tools that are useful for managing and accessing Big Data through the Cloud. The latter’s SmartCloud Desktop Infrastructure is designed to manage virtual desktops — a benefit for those computing in various locations — while the latter’s PureApplication System relies on POWER7+ to organize Cloud-based analytics and transactions while expediting access to individual Cloud deployments.</p>
<p>These tools enable users to simplify their Cloud-computing experience while accessing all of the conventional benefits of Big Data. Other IBM products that aid in the delivery of BDaaS include Netezza Customer Intelligence Appliance, which comes close to presenting a unified architecture for data by compiling and categorizing data based on mobile devices, social media, web-based transactions, and physical retail transactions. Such solutions exemplify the predictive capacity of analytics while preserving the security that Cloud-computing is known for.</p>
<p><b>Managed Service Providers</b></p>
<p>Another viable option for small- and mid-sized business to access BDaaS is through third-party Managed Service Providers (MSPs). Oftentimes, MSPs utilize some variety of the aforementioned solutions before distributing results to end users. MSPs are a vital part of service oriented architecture and supply all of its conventional applications, such as SaaS and PaaS.</p>
<p>This critical intermediary between customers and data related services has influenced the research and development of Big Data solutions, since MSPs patronage not only represents its own market growth, but also that of MSPs customers. Certain solutions (such as IBM’s PureSystems) offer a number of options specifically for MSPs including variable pricing and editions designed to foster new models of service. The objective of MSP editions of Cloud-based solutions is to allow these organizations to expand their architecture while minimizing gaps in customer service.</p>
<p>DBaaS provided through MSPs frees customers from traditional support and architectural issues and enables them to access Big Data at affordable prices. Although they reap the benefits of Big Data analytics, they are reliant upon MSPs to share and implement their vision for actionable data – which may not always be congruent. This option sacrifices autonomy and agility for access and lower financial risk. Organizations can reduce the amount of that sacrifice by utilizing their own analytics and intelligence tools, which increases costs associated with processing Big Data.</p>
<p>According to <a href="http://www.idc.com/getdoc.jsp?containerId=IDC_P17988">IT Consulting and Systems Integration Services</a> senior research analyst Ali Zaidi, “Talent gap and lack of knowledge base in the analytics space will continue to force businesses to rely on service providers to fulfill their business analytics needs in the near future.” Other concerns associated with MSPs include security issues related to an outside party’s knowledge (and determination) of an organization’s data, which is why this option is best suited for obtaining general data to base decisions on or information pertinent to a particular industry (such as education).</p>
<p>The concern for security is a principle part of the symbiotic relationship between MSPs and their customers. By discerning what sort of data is most valued and requested by customers, MSPs are able to refine their querying, visualization, and reporting capabilities to accommodate them. The knowledge of what customers actually need is useful in spurring future developments in predictive analytics and Big Data intelligence solutions as well as cloud providers – which will in turn benefit the end user.</p>
<p><b>Microsoft</b></p>
<p>Microsoft has released analytics tools that integrate with its Azure Cloud platform to offer timely querying and reporting of Big Data. <a href="http://research.microsoft.com/en-us/projects/daytona/default.aspx">Project Daytona</a> enables users to analyze large amounts of data from a variety of sources and computers without requiring extensive knowledge of Cloud computing. Daytona utilizes MapReduce to access large chunks of data, which are in turn stratified in Azure for further analysis before being output to users. Ease of use is partially facilitated by preset algorithms that users can upload and which span across a number of computer networks in the Azure Cloud.</p>
<p>There are many customization options for Daytona, which lets users manipulate the number of computers and machines Daytona scours and enables them to create their own algorithms. The solution is available via free download to Azure customers and supports agility via the iterative capabilities of MapReduce. The actual data is stored in non-persistent disks and in memory, is paralleled in Azure blob storage so that data is accessible without maximizing computer disk space, and is securely backed up in case of failures. Daytona was designed for analytics, but can integrate with virtually any intelligence or data manipulation tools.</p>
<p><b>Beyond Business: The Future of BDaaS</b></p>
<p>Ultimately, the emergence of BDaaS will continue the trend of utilizing data upon which to base operational and business decisions. More importantly, BDaaS has the potential for expanding the data-driven insights beyond marketing, financial and health information industries. In many ways it already has, with scientists, doctors, <a href="http://www-03.ibm.com/press/us/en/pressrelease/40314.wss">stock analysts and media analysts</a> utilizing this technology to instantly access reams of data to aid their respective industries.</p>
<p>The ease of use of MSP oriented BDaaS also increases the possibility of applying Big Data architecture and cloud-based services to help with little data needs. This aspect of service oriented architecture has the capacity to level the playing field of data resources between individuals and the largest organizations, which should only increase the reliance of data and its role in shaping operational processes – not just those of business – for a plethora of industries in the future.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/big-data-plus-data-virtualization-equals/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Teradata Closes In On Unified Data Analytics</title>
		<link>http://www.dataversity.net/teradata-closes-in-on-unified-data-analytics/</link>
		<comments>http://www.dataversity.net/teradata-closes-in-on-unified-data-analytics/#comments</comments>
		<pubDate>Tue, 23 Apr 2013 07:10:59 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Enterprise Information Management]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=19363</guid>
		<description><![CDATA[by Jelani Harper Teradata has long aspired to provide an environment in which it can readily analyze and integrate all forms of data for the enterprise. On April 15, 2013 it revealed a number of technologies that have made the dream a reality. The prominent data analytics solutions provider substantially bolstered its Unified Data Architecture (UDA) with the Teradata Enterprise Access for Hadoop technology and its fabric-based computing system, Mellanox’s Infiniband. Both technologies revolutionize the ease of accessing and running analytics for Big Data through Hadoop. Teradata also announced the release of an updated Teradata Active Enterprise Data Warehouse 6700 and a smaller version, the Teradata Data Mart Appliance 670, a departmental warehouse designed for testing and development. Fabric-Based Computing The company’s commitment to fabric-based computing is essential for carrying out its goal of providing universal analytics through UDA. Infiniband operates as the common backing through which users can move data seamlessly between UDA’s principle components, the Teradata Integrated Data Warehouse and the Aster Discovery Platform. Enterprise Access for Hadoop includes Hadoop in the free exchange and analysis of data, which UDA expands to conventional data marts and analytical archives. According to Teradata’s strategic deployment of Big Data solutions head [...]]]></description>
				<content:encoded><![CDATA[<p style="text-align: left;" align="center"><a href="http://www.dataversity.net/wp-content/uploads/2013/04/Fabric.jpg"><img class="alignleft size-medium wp-image-19364" alt="Fabric" src="http://www.dataversity.net/wp-content/uploads/2013/04/Fabric-300x232.jpg" width="300" height="232" /></a>by <a title="Jelani Harper" href="http://www.dataversity.net/contributors/jelani-harper/" target="_blank">Jelani Harper</a></p>
<p>Teradata has long aspired to provide an environment in which it can readily analyze and integrate <i>all</i> forms of data for the enterprise.</p>
<p>On April 15, 2013 it revealed a number of technologies that have made the dream a reality.</p>
<p>The prominent data analytics solutions provider substantially bolstered its <a href="http://www.teradata.com/white-papers/Teradata-Unified-Data-Architecture-A-Visionary-Framework-for-Leveraging-the-Potential-of-All-Your-Data/">Unified Data Architecture</a> (UDA) with the Teradata Enterprise Access for Hadoop technology and its fabric-based computing system, Mellanox’s Infiniband. Both technologies revolutionize the ease of accessing and running analytics for Big Data through Hadoop. Teradata also announced the release of an updated Teradata Active Enterprise Data Warehouse 6700 and a smaller version, the Teradata Data Mart Appliance 670, a departmental warehouse designed for testing and development.</p>
<p><b>Fabric-Based Computing</b></p>
<p>The company’s commitment to fabric-based computing is essential for carrying out its goal of providing universal analytics through UDA. Infiniband operates as the common backing through which users can move data seamlessly between UDA’s principle components, the Teradata Integrated Data Warehouse and the Aster Discovery Platform. Enterprise Access for Hadoop includes Hadoop in the free exchange and analysis of data, which UDA expands to conventional data marts and analytical archives.</p>
<p>According to Teradata’s strategic deployment of Big Data solutions head Tasso Argyros – who founded Aster Data before Teradata acquired it in 2011:</p>
<p style="padding-left: 30px;">“The reason this is important is because one size doesn’t fit all anymore. You can’t build your data architecture on only one data source. There’s a huge advantage to using best of breed technologies working together. Having a monitoring infrastructure that allows you to view all sources from one place is very important for operation efficiency.”</p>
<p>UDA’s “best of breed” technologies not only include Big Data access through Hadoop and Apache developments like HCatalog, but also those of a variety of products such as <a href="http://hortonworks.com/products/hortonworksdataplatform/">Hortonworks Data Platform</a>, Intel Xeon Processors, and Linux’s enterprise server operating system. UDA users can monitor and manage data in any location with Viewpoint, while InfiniBand provides the hardware for Teradata’s BYNET V5 software for massive parallel processing broadcast functions.</p>
<p>Infiniband provides the foundation for Teradata’s fabric-based computing, and is considered a highly scalable, swift means of enabling connectivity between analytic and reporting tools, and transferring of data between sources. Its reliability, speed, and scalability are largely enhanced by BYNET, which boosts the capacity of Teradata’s Enterprise Data Warehouse to 61 petabytes and works best when moving data between dual networks. The result is that users can perform real-time in-query data sorting in an environment that is designed to optimize the speed and performance of business intelligence and analytics – which is crucial for integrating various types of structured and unstructured data under tight time constraints. BYNET also increases network fail-over capability.</p>
<p><b>Enterprise Access for Hadoop</b></p>
<p>The ultimate benefit of fabric-based computing is the uniformed analytics it makes possible through shifting the data into various sources, which is facilitated through the Teradata Enterprise Access for Hadoop when Big Data is involved. The most important features of this release are Teradata’s SQL-H (as in Hadoop) and Teradata’s Smart Loader for Hadoop. The latter enables analysts and laymen to manipulate and move data from Hadoop to Teradata’s secure, proprietorial integrated data warehouse. It is able to do so through the power of the former, which allows users to formulate queries and issue reports on Big Data using SQL, the impact of which Argyros says should not be taken lightly:</p>
<p style="padding-left: 30px;">“Now, all the SQL analysts that most enterprises already have can do SQL analytics on Big Data without knowing anything about Hadoop. That’s one of the reasons SQL-H has been so successful so far, because instead of enterprises having to go and hire 30 Hadoop data scientists, they can utilize the 25 SQL analysts they already have and, with SQL-H, only hire five more people.”</p>
<p>SQL-H also mitigates security concerns about accessing data in Hadoop (which is open source), since it allows users to move data into their own data warehouses. Thanks to InfiniBand and BYNET, analysts can access Big Data in real time and issue queries and reports without code or script. This self-service aspect of SQL-H encourages operations, business, and executive use for either ad-hoc or planned analysis. Organizations can still extract information from Big Data sources utilizing the conventional architecture and methods for BI that they’re already acquainted with, without extensive overhead costs for hiring and training in No-SQL.</p>
<p>SQL-H integrates with Hortonworks Data Platform and <a href="http://incubator.apache.org/hcatalog/">Apache HCatalog</a> to facilitate intelligent data across a multitude of systems. The latter enables users to minimize replication and data movement costs by only moving data into Teradata’s data integration warehouse that is required for a query. The combination approach of integrating and performing analytics on data from virtually all sources is the basis for Teradata’s claim for UDA. Users can choose between Cloudera Distribution and Hortonworks Data Platform for commercial distribution of Hadoop, while Teradata’s integration warehouse grants numerous users simultaneous access.</p>
<p>Teradata Studio with Smart Loader for Hadoop simplifies the Hadoop browsing experience by presenting data in tables (with table properties) for an easy, point-and-click experience. Bi-directional table copies create maps of data by type between Teradata and Hadoop sources for ready comparison. Other features include transfer status and history functions for users to track statuses of loads.</p>
<p><b>6700</b></p>
<p>The Teradata Active Enterprise Data Warehouse (EDW) 700 provides operational and strategic intelligence with real-time updates. Its speed is due in part to running BYNET on Infiniband, as is the extreme scalability it offers. The most recent version of the Active EDW Platform is available in two different models, the 6700C and the 6700H. The 6700H has more memory, storage capacity, and a higher Teradata performance per node. One of the central differences between the two is that the 6700H comes with a hybrid storage architecture that utilizes both Solid State Drive (SSD) and Hard Disk Drive (HDD) technologies; the 6700C comes with HDD and can be upgraded to include SDD. One of the primary benefits of this platform is the fact that more regularly used “hot” data is placed in SSD for expedient access, whereas less frequently used data is relegated to HDD. Teradata’s Virtual Storage allows users to specify in which technology they would like data placed.</p>
<p>The primary distinction between the recently released Teradata Active EDW Platform and its predecessor is that the updated version incorporates an Eight Core Intel Xeon Processor and high performance computing nodes that, when combined with recent fabric-based computing technologies, makes it significantly faster. It utilizes Viewpoint for convenient monitoring of data and supports subsequent and prior platform generations to increase investment protection and encourage sustainability. The Data Mart Appliance 670 also features an Intel Xeon Processor and high performing computer nodes and is available in both HDD or hybrid versions, yet has substantially less storage than the 6700. Argyros commented:</p>
<p style="padding-left: 30px;">“We have products that are very cost effective and geared toward point problems all the way to high end products like the 6700. That allows you to integrate structured data from across the enterprise scaled to many terabytes, and it supports hundreds of thousands of users.”</p>
<p><b>Unified</b></p>
<p>Ultimately, Terradata representatives base the validity of UDA’s viability and comprehensive data analytics on the strength of its integrated data warehouse, which utilizes Hadoop’s Big Data and Aster’s discovery tools to unlock its full potential. When one considers all of the other data sources that can integrate with it, Teradata’s claim for offering unified analytics appears convincing. Argyros reflected on the process of UDA’s development:</p>
<p>“We were looking for how we could unify the analytics and the processing trail. In order to do so you need to be able to move data from Hadoop and Teradata into Aster, and from Aster and Teradata into Hadoop. And you ideally want to make sure that analytics frame data from Aster, Teradata and Hadoop at the same time. That’s kind of the holy grail of software integration, and we’ve done that.”</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/teradata-closes-in-on-unified-data-analytics/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Mobile Business Intelligence</title>
		<link>http://www.dataversity.net/mobile-business-intelligence/</link>
		<comments>http://www.dataversity.net/mobile-business-intelligence/#comments</comments>
		<pubDate>Tue, 16 Apr 2013 07:10:01 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Education]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=19223</guid>
		<description><![CDATA[by Jelani Harper Mobile Business Intelligence (BI) extends the decision-making and analytics capabilities of conventional BI beyond the office. The ubiquity of smartphones and tablet devices (such as iPads), in addition to a growing number of vendors, mobile platforms, and applications, has rendered this form of BI one of the most viable means of accessing and extracting value from data today. Depending on which app is selected and integrated with which particular device, Mobile BI offers all of the capabilities and features of traditional BI, plus additional benefits such as displaying analytics on other portable and desktop devices in any location. However, the nature of Mobile BI presents a set of considerations that are distinct from enterprise versions. The principle concern is to select an appropriate app and mobile device that best integrates with existing BI software. All of the traditional concerns for mobile BI, such as issues of security and platform extensions with current BI tools, have largely been addressed. Users are still responsible for determining what sorts of data will be analyzed most frequently via Mobile BI, while evaluating platforms and devices to determine which is most compatible. Although all forms of BI are accessible via mobile devices, [...]]]></description>
				<content:encoded><![CDATA[<p style="text-align: left;" align="center"><a href="http://www.dataversity.net/wp-content/uploads/2013/04/Mobile-BI.jpg"><img class="alignleft size-medium wp-image-19224" alt="Mobile BI" src="http://www.dataversity.net/wp-content/uploads/2013/04/Mobile-BI-300x226.jpg" width="300" height="226" /></a>by <a title="Jelani Harper" href="http://www.dataversity.net/contributors/jelani-harper/" target="_blank">Jelani Harper</a></p>
<p>Mobile Business Intelligence (BI) extends the decision-making and analytics capabilities of conventional BI beyond the office. The ubiquity of smartphones and tablet devices (such as iPads), in addition to a growing number of vendors, mobile platforms, and applications, has rendered this form of BI one of the most viable means of accessing and extracting value from data today. Depending on which app is selected and integrated with which particular device, Mobile BI offers all of the capabilities and features of traditional BI, plus additional benefits such as displaying analytics on other portable and desktop devices in any location.</p>
<p>However, the nature of Mobile BI presents a set of considerations that are distinct from enterprise versions. The principle concern is to select an appropriate app and mobile device that best integrates with existing BI software. All of the <a href="http://timoelliott.com/blog/2012/01/what-mobile-bi-used-to-look-like-and-where-its-going-back-to-the-future.html">traditional concerns</a> for mobile BI, such as issues of security and platform extensions with current BI tools, have largely been addressed. Users are still responsible for determining what sorts of data will be analyzed most frequently via Mobile BI, while evaluating platforms and devices to determine which is most compatible. Although all forms of BI are accessible via mobile devices, some data is better visualized than others, depending on the portable device and app selection.</p>
<p><b>Modernizing Mobile BI</b></p>
<p>Around the turn of the millennium, Mobile BI was severely limited in its utility. It required a dedicated server and was difficult to integrate with existing tools. With the development of tablet devices and smart phones in more recent years however, BI vendors began supporting mobile devices so that they served as extensions of conventional tools on the same server. Querying, reporting, visualizations, and animations are shared between conventional and Mobile BI.</p>
<p>Other than the proliferation of mobile devices in recent years, the most significant factor to influence the rapid rate of adoption of Mobile BI is the virtual elimination of security concerns for it. Most forms of Mobile BI offer security at <a href="http://www.information-management.com/newsletters/mobile_BI_integration_apps_data_management-10020527-1.html?portal=performance_management">three respective levels</a>: the device, the network, and at the point of transmission. Handsets have security features such as firewall and antivirus software, full disk encryption and passcodes. Secure socket layers and virtual private networks help fortify security at the network level, while the same security clearances required for BI tools in the office are required for mobile devices as well.</p>
<p><b>Operational and Business Value</b></p>
<p>Aside from the convenience of accessing BI as needed in any location mobile devices are supported, Mobile BI can assist real-time decision-making for those who need it most in the field – such as as sales representatives. Access to information from remote locations helps to complete transactions more expediently, increase productivity, and decrease administrative and material costs. Mobile BI is ideally suited for informing short-term decisions, which makes it valuable for analyzing operational data and flexible metrics related to pricing.</p>
<p>Although Mobile BI analyzes the same data that traditional BI does, it comes with a variety of features that significantly enhances its use. In addition to providing filters and alerts, Mobile BI has become increasingly characterized by intuitive graphical user interfaces that can accommodate sophisticated levels of visualization. Whereas early Mobile BI apps could only access previously canned reports, current tools can generate new queries. Particularly competitive apps like those from <a href="http://www.logianalytics.com/index.php?q=see-logi">Logi Analytics</a> (LogiXML, Logi Info, Logi Ad Hoc) enable users to do so without code, simplifying the querying process, and eliminating input from IT. Apps integrate with all of the conventional sources of data such as relational databases, warehouses, CRM and ERP, and provide a bevy of dashboards, reports, graphs and grids which can be rapidly deployed many times.</p>
<p>Other features found in products like <a href="http://www.microstrategy.com/mobile/">MicroStrategy Mobile</a> and others specifically pertain to mobile devices, such as integration possibilities with email, calendars, phone lists, and other apps on the mobile unit itself. Those that incorporate HTML 5 have a plethora of offline capabilities, which may be enhanced by IT personnel. The sensor-based querying feature is a distinct advantage over traditional BI, with which users can scan barcodes to generate and change queries. Queries can also be facilitated via convenient voice to text and voice recognition functions. Several solutions accommodate ad hoc querying and reporting, which can be updated via GPS. Mobile platforms can support a variety of Mobile BI apps, while key stroke shortcuts and multi-touch gestures facilitate ease of use.</p>
<p><b>Selectivity Concerns</b></p>
<p>Mobile BI utilizes <a href="http://blogs.forrester.com/business_process/2009/11/not-all-mobile-bi-applications-are-created-equal.html">web browsers on portable devices</a> to access conventional BI tools. Costs typically revolve around purchasing devices, a BI solution, mobile apps, and whatever training and design assistance is required. Most BI vendors have their own mobile apps that extend enterprise service remotely via interfaces that are similar to the desktop version. There may be additional licensing fees for desktop users to access Mobile BI. Other apps have been designed to integrate with a variety of BI platforms. Although not all apps can integrate with existing BI tools, popular BI vendors are supported by a number of different apps. There are more limitations on the type of mobile device that a particular app can work with, as some are designed only for particular manufacturers or for certain smartphone operating systems.</p>
<p>Other selectivity concerns relate to the design of Mobile BI, which is considerably different than that for the desktop version. Limitations related to screen size, memory capacity, and processing speed/capacity of mobile devices influence the way reports will look – which affects their overall utility. Too much information may appear clustered on smaller devices. Mobile BI design should ideally limit the number of objects on the screen or dashboard to increase usability. A chief determinant in achieving this objective is the process of data categorization, in which organizations specify which types of data will be accessible to which individuals and design tools for apps accordingly.</p>
<p>Although Mobile BI can handle all of the functions of desktop versions, the presentation of data on individual handsets factors heavily into what data is most advantageous to base designs around. There is a direct correlation between the type of mobile device an app supports and the type of data that it works best with. The primary goal in selecting Mobile BI solutions is to standardize data, yet the particular form of data analysis most frequently used factors into what sort of device is desirable. Lengthier data mining processes tend to work better on bigger tablets, while smartphones are ideal for scanning data and making quick decisions relating to real-time information for pricing and operations.</p>
<p><b>Facilitating Simplification</b></p>
<p>Adoption rates of Mobile BI are <a href="http://www.gartner.com/newsroom/id/1513714">projected to increase</a> in the very near future. How rapidly they do so largely depends on the effectiveness of implementing these platforms with current BI solutions. Vendor support on both ends (from the mobile and enterprise community) is already there. Most tablet devices present functionality and usability similar to desktops, while certain uses of smartphones (including scanning and GPS incorporation) make them viable options as well.</p>
<p>In that respect, the trend towards Mobile BI merely reflects the larger movement towards the simplification of BI. Cloud-based Data-as-a-Service options are another integral component in moving BI out of the realm of IT and into the daily world of operations and business professionals. Many of the features of Mobile BI – alerts, ad hoc tools, trends related to key performance indicators – increase usability, particularly in products in which code is not required for design.</p>
<p>Although such platforms and applications are increasing, the primary challenge in utilizing Mobile BI today lies in facilitating a design that optimizes visualization, reporting, and other key tools on the mobile device a particular solution supports. As more solutions allow such tools to be manipulated without the need of an IT team, the variety of data types which can be presented optimally (per device) should only increase, further spurring adoption rates. The potential for enhancing decision-making and incorporating data to generate business value will become significantly more accessible.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/mobile-business-intelligence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What to Look for in a an Enterprise Cloud Provider</title>
		<link>http://www.dataversity.net/what-to-look-for-in-a-an-enterprise-cloud-provider/</link>
		<comments>http://www.dataversity.net/what-to-look-for-in-a-an-enterprise-cloud-provider/#comments</comments>
		<pubDate>Thu, 11 Apr 2013 07:36:01 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Enterprise Information Management]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=19145</guid>
		<description><![CDATA[by Ezekiel James Cloud technology is being adopted on a large scale by all sorts of organizations and users, yet the technology itself is still quite new. As with any new-ish technology, there is always some level of reluctance when it comes to early or even on-time adoption. The reality is that the benefits of the cloud are – depending on who you ask – virtually undeniable. On the Data Management front, enterprise-level organizations are rapidly – or at least rapidly moving towards – adopting and implementing Cloud-based tech. Now with new advances in enterprise-level cloud computing it’s no longer a matter of if the enterprise will move to the Cloud, but rather a matter when these organizations will make the leap. Still, there are plenty of questions to answer in regards to security, stability and big-picture cost. Predicting the Future of the Enterprise Cloud When it comes to looking into the future of enterprise Cloud adoption, all we have to really go off are some key insights from IT research firms. As new service providers break out of the simplistic IaaS, PaaS and SaaS provider scenarios and lead the way in enterprise-specific Cloud technologies, more and more enterprises are [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.dataversity.net/wp-content/uploads/2013/04/cloud-computing-2.jpg"><img class="alignleft size-medium wp-image-19146" alt="cloud computing (2)" src="http://www.dataversity.net/wp-content/uploads/2013/04/cloud-computing-2-300x284.jpg" width="300" height="284" /></a>by <a title="Ezekiel James" href="http://www.dataversity.net/ezekiel-james/" target="_blank">Ezekiel James </a></p>
<p>Cloud technology is being adopted on a large scale by all sorts of organizations and users, yet the technology itself is still quite new. As with any new-ish technology, there is always some level of reluctance when it comes to early or even on-time adoption. The reality is that <a href="http://www.creativsymantec.com/webhosting/downl/b-choosing-a-cloud-hosting-provider-with-confidence_WP.pdf">the benefits of the cloud</a> are – depending on who you ask – virtually undeniable. On the Data Management front, enterprise-level organizations are rapidly – or at least rapidly moving towards – adopting and implementing Cloud-based tech.</p>
<p>Now with new advances in enterprise-level cloud computing it’s no longer a matter of if the enterprise will move to the Cloud, but rather a matter when these organizations will make the leap. Still, there are plenty of questions to answer in regards to security, stability and big-picture cost.</p>
<p><b>Predicting the Future of the Enterprise Cloud</b></p>
<p>When it comes to looking into the future of enterprise Cloud adoption, all we have to really go off are some key insights from IT research firms. As new service providers break out of the simplistic IaaS, PaaS and SaaS provider scenarios and lead the way in enterprise-specific Cloud technologies, more and more enterprises are expected to move to the Cloud at a rapid rate.</p>
<p>For instance, the newcomer to the enterprise Cloud table, Business-Process-as-a-Services (BPaaS) is expected to grow from roughly $84 billion to upwards of $144 billion by 2016, <a href="http://www.forbes.com/sites/louiscolumbus/2012/07/02/forecasting-public-cloud-adoption-in-the-enterprise-2/">according to a recent report by the respected IT research firm, Gartner</a>.</p>
<p>On top of this, Application-as-a-Service/Software-as-a-Service (AaaS/SaaS) are expected to see significant growth in the coming months and years. According to the same Gartner report, enterprise-focused SaaS providers are expected experience growth of roughly 17.5%. The bottom line here is that powerful Cloud technologies are being geared towards the enterprise. In the SaaS category, enterprise-specific providers dealing with BI are expected to grow by 27% by 2016.</p>
<p><b>Benefits of the Cloud for the Enterprise</b></p>
<p>When it comes to the Cloud, proponents throw around a variety of different reasons why the enterprise should adopt. The trouble is it can be hard to sift through the real benefits of a Cloud solution when a company is simply trying to sell you their solution. It’s important to have a solid grasp on how any Cloud provider will directly benefit your organization on every level. With all the types of Cloud services available, it’s almost impossible to drill down every possible benefit, but in taking a 30,000-foot view here are some of the most common benefits of the enterprise Cloud:</p>
<ul>
<li><b>Low Cost: </b>Managing IT budgets, costs and actual impacts of technology is a complex process. The Cloud promises to dramatically cut costs. In some circles, from a cost perspective, the Cloud is being touted as a budgetary magic bullet. If we’re being honest this is the noisiest aspect of the Cloud discussion, and clarity is needed. The big argument for the Cloud is that through simplified business processes made more accessible through Cloud solutions, enterprise companies will always enjoy the benefits of cost effective measures as a result. While this is mostly true, it’s important to see both sides of the coin. For instance, Joe Weinman, respected Cloud computing analyst and founder of Cloudonomics, said in his talk at CloudCamp last year that this isn’t always true. The low-cost discussion usually assumes that every enterprise org will want to migrate every application, service and process to the Cloud, which is almost never the case. In short, cost is relative to the needs of the organization in question.</li>
</ul>
<ul>
<li><b>Agility: </b>Agility can mean different things to different people. But in most cases – especially in enterprise Data Management circles – agility refers to the ability to easily respond to rapid changes in business/Data Management processes. In very basic terms, the Cloud gives the enterprise the wiggle room to scale up and out – storage, computing resources and network capacity – without much strain on budget or resources.</li>
</ul>
<ul>
<li><b>Streamlined IT: </b>Perhaps one of the key benefits of the Cloud has to do with streamlined business processes – freedom from IT constraints and misdirected focus. In other words, by minimizing  &#8211; or at least by effectively managing – risk and consolidating mission/business critical resources, the enterprise can focus on more business-critical processes.</li>
</ul>
<p><b>Barriers to Full-Scale Enterprise Cloud Adoption</b></p>
<p>Beyond the basic benefits of the Cloud, there are plenty of potentially negative things to consider. <a href="http://www.cisco.com/en/US/solutions/collateral/ns340/ns517/ns224/ns836/ns976/white_paper_c11-617239.html">These barriers</a> often – and rightfully so – keep many enterprise CIOs from taking their company through the in-house to Cloud migration process:</p>
<ul>
<li><b>Security: </b>If you want to know the biggest issue that CIOs have with Cloud providers, performing a quick Google search on the subject will do the trick. Most IT professionals and CIOs are primarily concerned with security, and in some circles this is a heated discussion. This largely comes down to the type of Cloud platform a company uses to deploy Cloud services. The real security argument comes down to public, private and hybrid Clouds. The bottom line is that as the enterprise is attempting to manage most or all of its data through a third-party. This means that the provider you choose needs to match or beat the level of security you already provide for in-house applications.</li>
</ul>
<ul>
<li><b>Stability: </b>The enterprise needs to know that, above all odds, that <a href="http://accentplus.com/2011/03/stability-and-security-in-cloud-computing/">the Cloud infrastructure that is housing and delivering their data is stable</a>. Outages with major Cloud providers like Amazon and Google doesn’t do a lot to quell any fears related to Cloud stability. In fact, enterprise CIOs should be concerned about these outages, and how this can affect their overall business goals. As any IT professional knows, big server outages have major consequences. <a href="http://www.evolven.com/blog/2011-devastating-outages-major-brands.html">2011 was a particularly bad year for outages. </a> Heavy hitters like Google, Amazon, Verizon, Yahoo and Microsoft all experienced outages that significantly impacted their business. There’s no question that <a href="http://readwrite.com/2012/12/11/should-enterprise-users-be-wary-of-cloud-apps">the enterprise was paying attention.</a></li>
</ul>
<ul>
<li><b>Control: </b>Lastly, there’s the issue of controlling Data Management processes. This is part barrier, part myth. The big idea is that when a company uses a Cloud provider that they are relinquishing a great deal of control over their data processes. On one level this is undeniably true. When you migrate your data to a third-party service provider you are entrusting them to effectively manage and distribute your data better than you. On the flip side of that coin, few enterprise companies relinquish control over all of their data processes or applications.</li>
</ul>
<p><b>Finding the Perfect Cloud Provider: Is it Possible? </b></p>
<p>Perfect is a pretty loaded word with often-unrealistic expectations and implications, but the short answer is “maybe.” The truth is that what’s perfect for one Enterprise Company is completely unrealistic for another company. In an effort to avoid oversimplifying the issue, let’s take a look at some elements of a “perfect” Cloud provider:</p>
<ul>
<li><b>Cross-platform Interoperability: </b>For every enterprise CIO, the most important way to view a Cloud provider is as someone who helps the company meet its business objectives. Part of this partnership means getting the most mileage out of the service. In other words, an optimal cloud provider will offer the capability and flexibility to use their service across multiple platforms and environments. Some would argue that this is simply a hybrid Cloud solution with an interoperability costume on. However you slice it, interoperability in the Cloud should be non-negotiable.</li>
</ul>
<ul>
<li><b>SLAs: </b>With the influx of datacenter outages in recent years, it’s more important than ever to really dig into the fine print of Service Level Agreements. For instance, it’s becoming common that extended outages might actually be allowed under certain SLAs. Avoid these agreements if at all possible.</li>
</ul>
<ul>
<li><b>Security: </b>We covered this in some detail earlier in this article, so it goes without saying that an ideal Cloud provider places a high emphasis on providing the highest level of security possible. Many Cloud providers implement security measures after the infrastructure has been built and deployed. Search out Cloud providers that think outside of the security box – this often means the provider builds its Cloud architecture on robust security principles. Don’t be afraid to spend ample time investigating and evaluating a cloud provider before giving a “yes” or “no.”</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/what-to-look-for-in-a-an-enterprise-cloud-provider/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Closer Look at Neo4j &#8211; the Graph Database</title>
		<link>http://www.dataversity.net/a-closer-look-at-neo4j-the-graph-database/</link>
		<comments>http://www.dataversity.net/a-closer-look-at-neo4j-the-graph-database/#comments</comments>
		<pubDate>Tue, 09 Apr 2013 07:10:07 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[NoSQL]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=19090</guid>
		<description><![CDATA[by Paul Williams The recent upward tick in the popularity of graph databases within the NoSQL movement reflects the trend of Big Data, partly generated by social data. Enterprises are turning to graph databases to figure out customer purchasing patterns, voting history, as well as that proverbial &#8220;needle in the haystack&#8221; bit of valuable information. The latter use-case is especially important to those working with law enforcement or homeland security applications. One of prominent examples of a graph database is Neo4j. Developed by Neo Technology, a company dually located in San Francisco and Sweden, the open source Neo4j is available in a variety of licenses. The free of charge Community version is an easy download and install, making it straightforward for exploring the database and its web-based admin interfaces. Additional modules suitable for more esoteric or production level functionality are part of the Advanced version, with an Enterprise edition also an option for larger corporations. A Robust Graph Database At its core, Neo4j features a fast and agile graph database where the data gets stored in nodes, with each containing a number of properties. The relationships between nodes are what matters in a graph database – a model which ties [...]]]></description>
				<content:encoded><![CDATA[<p style="text-align: left;" align="center">by <a title="Paul Williams" href="http://www.dataversity.net/contributors/paul-williams" target="_blank">Paul Williams</a><b><br />
</b></p>
<p>The recent upward tick in the popularity of graph databases within the <a href="http://www.dataversity.net/the-nosql-movement-what-is-it/">NoSQL movement</a> reflects the trend of Big Data, partly generated by social data. Enterprises are turning to graph databases to figure out customer purchasing patterns, voting history, as well as that proverbial &#8220;needle in the haystack&#8221; bit of valuable information. The latter use-case is especially important to those working with law enforcement or homeland security applications.</p>
<p>One of prominent examples of a graph database is <a href="http://www.neo4j.org/">Neo4j</a>. Developed by Neo Technology, a company dually located in San Francisco and Sweden, the open source Neo4j is available in a variety of licenses. The free of charge Community version is an easy download and install, making it straightforward for exploring the database and its web-based admin interfaces. Additional modules suitable for more esoteric or production level functionality are part of the Advanced version, with an Enterprise edition also an option for larger corporations.</p>
<p><b>A Robust Graph Database</b></p>
<p>At its core, Neo4j features a fast and agile graph database where the data gets stored in nodes, with each containing a number of properties. The relationships between nodes are what matters in a graph database – a model which ties in nicely with the applications found in the world of social networking. Both the nodes and the relationships can be indexed, which enhances the database&#8217;s overall performance.</p>
<p>Even considering its overall power, Neo4j remains flexible; the database is suitable in distributed environments, and its smaller footprint means the graph database technology can be embedded inside another application written in a language supporting the JVM, including Scala, Clojure, and of course, Java. The database supports a REST interface for client applications, in addition to providing a Java API for more robust client development. Neo4j itself was primarily developed in Java.</p>
<p>Neo4j is cross-platform, with support for the Windows, Mac OS, and Linux platforms. Typical of open source software, there are usually a variety of current, legacy, and still-in-development versions of Neo4j available for <a href="http://www.neo4j.org/download">download from their website</a>. With an array of older editions of the software, the company provides documentation to assist with the upgrade and migration process.</p>
<p><b>Neo4j is an Easy Installation</b></p>
<p>Installing the Community Edition of Neo4j was a breeze; there are also pretty detailed instructions and a video to follow on their website. On Windows, the ZIP package basically installs all the necessary files in a directory structure. The user simply then navigates to the bin directory and runs a batch file to start the database server.</p>
<p align="center"><a href="http://www.dataversity.net/wp-content/uploads/2013/04/Paul-Neo4J-Article-Pic-1.png"><img class="alignnone size-large wp-image-19091" alt="Paul Neo4J Article Pic 1" src="http://www.dataversity.net/wp-content/uploads/2013/04/Paul-Neo4J-Article-Pic-1-1024x629.png" width="620" height="380" /></a></p>
<p align="center"><b>Neo4j&#8217;s web administration screen is easily accessible after installation.</b></p>
<p>Once the Neo4j server is running, the web administration screen is easily accessible by typing in the local server address in a web browser. A few minutes is all it took to get up and running. The administration interface provides easy navigation to a data browser, a query language console, a screen for index maintenance, etc. Links to documentation and the Neo4j community site are definitely useful as well.</p>
<p>The community site is a great place to find sample databases and other information that helps with the Neo4j learning process. The database has engendered a vibrant user family providing help and fleshing out different aspects of the software. The web page about using <a href="http://www.neo4j.org/develop/ruby">Neo4j with Ruby on Rails</a> is a fine example of this kind of extra content generated by the Neo4j user community.</p>
<p><b>Interacting With a Neo4j Graph Database Using Java or Cypher</b></p>
<p>As mentioned earlier, Neo4j provides full support for interacting with a graph database using any language compatible with the Java Virtual Machine (JVM). There is also the native Cypher query language which provides a framework independent way to access the database.</p>
<p>Cypher is a powerful and descriptive query language that should be easy to learn by anyone familiar with SQL. There is even a <a href="http://console.neo4j.org/">live web-based console</a> for users to try out different Cypher queries. The console features a visual graph that reacts to the queries, helping to cement the concepts of graph databases for the novice user.</p>
<p align="center"><a href="http://www.dataversity.net/wp-content/uploads/2013/04/Paul-Neo4J-Article-Pic-2.png"><img class="alignnone size-large wp-image-19092" alt="Paul Neo4J Article Pic 2" src="http://www.dataversity.net/wp-content/uploads/2013/04/Paul-Neo4J-Article-Pic-2-1024x629.png" width="620" height="380" /></a></p>
<p align="center"> <b>Neo4j&#8217;s web console is great for experimenting with Cypher.</b></p>
<p>Cypher uses the START, RETURN, and MATCH statements to serve more or less as the graph database equivalents of SQL&#8217;s FROM, SELECT and WHERE. START is used to stipulate which node in the graph database to begin the query, while RETURN lists the properties, or fields, returned by the query. The following finds Paul&#8217;s friends in the graph, starting in the people node, who are older than 18:</p>
<p>START me=node:people(name=&#8217;Paul&#8217;)</p>
<p>[MATCH me-[:FRIEND]-&gt;friend ]</p>
<p>WHERE friend.age &gt; 18</p>
<p>RETURN me, friend.name</p>
<p>ORDER BY friend.age asc</p>
<p>SKIP 5 LIMIT 10</p>
<p>The ORDER BY, SKIP, and LIMIT statements should be self explanatory to anyone familiar with SQL. Cypher uses the CREATE and DELETE statements to add and remove nodes and relationships. The language also provides a host of aggregate and other types of functions. A measure of transaction support is also available at the console, using the BEGIN, COMMIT and ROLLBACK statements.</p>
<p>While Cypher provides a robust collection of console-level functionality, Neo4j also shines when controlled using Java or any other language that supports the JVM. In fact, the &#8220;4j&#8221; in the name more or less means, &#8220;for Java.&#8221; Remember, the Neo4j server also supports the REST interface, which provides an easy way to interact with the database; additionally, server plug-ins written in Java can be used to extend the basic REST functionality.</p>
<p>The native API offers another path to client development integrating graph database functionality. Neo4j also works when embedded into a JVM process; its small footprint makes embedding possible. All the nodes, relationships, and paths in a graph database are accessible as programming objects, in addition to the ability to run Cypher queries in code, allowing the development of a wide variety of applications.</p>
<p>There are <a href="http://www.neo4j.org/java/jvm">many examples</a> of projects using Neo4j technology in interesting new ways, including a Neo4j JDBC driver and the Spring Data Neo4j project which leverages the Spring Framework to offer object-graph mapping in concert with Neo4j. Once again, this speaks to the power of an engaged software developer user community.</p>
<p><b>Options for Neo4j Licensing and Support</b></p>
<p>Neo4j includes a wide array of licensing options depending on the installed version. Obviously, the Community edition is all that is needed to learn about graph databases as well as playing around with client development either using the REST interface or the Java API. There is enough documentation and help available to get anyone successfully up and running.</p>
<p>The other editions of Neo4j add the additional functionality that makes the product suitable for enterprise production instances. The Advanced edition enhances the monitoring provided by the database. The Enterprise edition adds online backups and clustering support to the Advanced edition features. Neo4j is available under a dual license that combines AGPL with a commercial license from Neo Technology; the commercial license allows enterprises to include Neo4j in a closed source system.</p>
<p>Neo Technology also provides support for Neo4j for owners of the Advanced and Enterprise edition. The Enterprise edition offers 24/7 phone support, as supposed to the email only provided by the Advanced edition. The list of organizations currently using Neo4j is impressive, including Cisco, Adobe, Accenture, and Lufthansa, among many other startups and enterprises.</p>
<p>Graph databases are growing in importance and popularity, driven by the exponential expansion of social data. Neo4j is helping to lead this revolution, all the while staying close to the innovative nature of their open source roots. Anyone interested in learning more about graph databases would do well to download the Community edition and explore Neo4j.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/a-closer-look-at-neo4j-the-graph-database/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced

 Served from: www.dataversity.net @ 2013-05-20 06:37:05 by W3 Total Cache -->