<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DATAVERSITY</title>
	<atom:link href="http://www.dataversity.net/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dataversity.net</link>
	<description></description>
	<lastBuildDate>Fri, 24 May 2013 07:04:10 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>CEVA Logistics Taps IBM SmartCloud to Deliver Real-time Supply Chain Services</title>
		<link>http://www.dataversity.net/ceva-logistics-taps-ibm-smartcloud-to-deliver-real-time-supply-chain-services/</link>
		<comments>http://www.dataversity.net/ceva-logistics-taps-ibm-smartcloud-to-deliver-real-time-supply-chain-services/#comments</comments>
		<pubDate>Fri, 24 May 2013 07:04:10 +0000</pubDate>
		<dc:creator>A.R. Guess</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Cloud-Based Data]]></category>
		<category><![CDATA[Data Daily]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Enterprise Information Management]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[CEVA Logistics]]></category>
		<category><![CDATA[cloud based data]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Smarter Commerce]]></category>
		<category><![CDATA[supply chain management]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=20021</guid>
		<description><![CDATA[by Angela Guess A recent article out of IBM reports that the company has announced &#8220;a four-year contract with CEVA Logistics to accelerate business delivery results for its customers through a new cloud-based information exchange for its supply network.  This solution is a combination of CEVA&#8217;s technology and an element of IBM&#8217;s growing SmartCloud portfolio for specific lines of business such as supply chain. By improving the quality of information shared across its customers&#8217; supply chain networks, CEVA speeds material and information flow to reduce costs and improve customer service.  CEVA estimates the IBM and CEVA solution will help the company reduce IT-related supply chain costs by upwards of five percent over the course of the four-year contract, resulting in potentially several million dollars in savings that can be directed toward developing new supply chain offerings for customers.&#8221; The article continues, &#8220;CEVA is a global supply chain management company serving industry leaders in the automotive, consumer, retail, healthcare, pharmaceutical, industrial, and technology sectors. Through its Smarter Commerce initiative, IBM is helping CEVA streamline its customers&#8217; supply chain and logistics processes with technologies that makes it easier to access, share, and process information in real time.  Built on IBM&#8217;s leading cloud B2B integration service and the company&#8217;s CEVA Matrix Connect, the new cloud service delivers a [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.dataversity.net/wp-content/uploads/2013/05/ce.jpg"><img class="alignleft size-medium wp-image-20022" alt="ce" src="http://www.dataversity.net/wp-content/uploads/2013/05/ce-300x164.jpg" width="300" height="164" /></a>by <a href="http://www.dataversity.net/contributors/angela-guess/" target="_blank">Angela Guess</a></p>
<p><a href="http://www.sacbee.com/2013/05/22/5440112/ceva-logistics-taps-ibm-smartcloud.html">A recent article out of IBM</a> reports that the company has announced &#8220;a four-year contract with CEVA Logistics to accelerate business delivery results for its customers through a new cloud-based information exchange for its supply network.  This solution is a combination of CEVA&#8217;s technology and an element of IBM&#8217;s growing SmartCloud portfolio for specific lines of business such as supply chain. By improving the quality of information shared across its customers&#8217; supply chain networks, CEVA speeds material and information flow to reduce costs and improve customer service.  CEVA estimates the IBM and CEVA solution will help the company reduce IT-related supply chain costs by upwards of five percent over the course of the four-year contract, resulting in potentially several million dollars in savings that can be directed toward developing new supply chain offerings for customers.&#8221;</p>
<p>The article continues, &#8220;CEVA is a global supply chain management company serving industry leaders in the automotive, consumer, retail, healthcare, pharmaceutical, industrial, and technology sectors. Through its Smarter Commerce initiative, IBM is helping CEVA streamline its customers&#8217; supply chain and logistics processes with technologies that makes it easier to access, share, and process information in real time.  Built on IBM&#8217;s leading cloud B2B integration service and the company&#8217;s CEVA Matrix Connect, the new cloud service delivers a competitive advantage for CEVA&#8217;s customers by enabling them to quickly and efficiently respond to market and customer changes.&#8221;</p>
<p><a href="http://www.sacbee.com/2013/05/22/5440112/ceva-logistics-taps-ibm-smartcloud.html" target="_blank">Read more here.</a></p>
<p><em>photo credit: CEVA</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/ceva-logistics-taps-ibm-smartcloud-to-deliver-real-time-supply-chain-services/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Banks Cashing In With Data Mining</title>
		<link>http://www.dataversity.net/banks-cashing-in-with-data-mining/</link>
		<comments>http://www.dataversity.net/banks-cashing-in-with-data-mining/#comments</comments>
		<pubDate>Fri, 24 May 2013 07:03:22 +0000</pubDate>
		<dc:creator>A.R. Guess</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Data Daily]]></category>
		<category><![CDATA[Data Modeling]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Enterprise Information Management]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Unstructured Data]]></category>
		<category><![CDATA[banks]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data management]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[financial industry]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=20016</guid>
		<description><![CDATA[by Angela Guess Mindy Powers recently wrote an article for Forbes stating, &#8220;Across industries, organizations are collecting and using large sets of data to get a leg up on the competition. One industry that has been particularly aggressive at harnessing the power of data is the banking business. And it’s no wonder. In 2009, the McKinsey Global Institute estimated that U.S. banks and capital markets firms collectively had more than one exabyte of stored data. And, according to IDC Financial Insights, the volume of digital content is expected to increase this year from last by 48 percent… So, what kind of data is being collected? Believe it or not, banks are working to figure out how to store and mine videos, images, news and social media data to draw out accurate customer profiles.&#8221; Powers continues, &#8220;Accurate profiles help banks improve customer experience and retention. The more banks know about customers, the more they can tailor solutions to meet customer needs. Mining data &#8216;tells us what the customer wants, not what we think they want,&#8217; says James Gifas, U.S. head of global transaction services for RBS Citizens. Analyzing customer data also allows banks to spot any potential customer problems, improve management of customer accounts, engage in real-time dialogue with customers, and improve overall customer [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.dataversity.net/wp-content/uploads/2013/05/stairs_of_escape.jpg"><img class="alignleft size-medium wp-image-20018" alt="Stairs of Escape" src="http://www.dataversity.net/wp-content/uploads/2013/05/stairs_of_escape-300x200.jpg" width="300" height="200" /></a>by <a href="http://www.dataversity.net/contributors/angela-guess/" target="_blank">Angela Guess</a></p>
<p><a href="http://www.forbes.com/sites/centurylink/2013/05/14/banks-looking-to-cash-in-by-mining-social-data/">Mindy Powers recently wrote an article for Forbes</a> stating, &#8220;Across industries, organizations are collecting and using large sets of data to get a leg up on the competition. One industry that has been particularly aggressive at harnessing the power of data is the banking business. And it’s no wonder. In 2009, the McKinsey Global Institute estimated that U.S. banks and capital markets firms collectively had <a href="http://www.informationweek.com/software/business-intelligence/big-data-brings-customer-challenges-oppo/232600653" target="_blank">more than one exabyte of stored data</a>. And, according to IDC Financial Insights, the volume of digital content is expected to <a href="http://www.banktech.com/business-intelligence/232600252?itc=edit_stub" target="_blank">increase this year from last by 48 percent</a>… So, what kind of data is being collected? Believe it or not, banks are working to figure out how to store and mine videos, images, news and social media data to draw out accurate customer profiles.&#8221;</p>
<p>Powers continues, &#8220;Accurate profiles help banks improve customer experience and retention. The more banks know about customers, the more they can tailor solutions to meet customer needs. Mining data &#8216;tells us what the customer wants, not what we think they want,&#8217; <a href="http://www.banktech.com/business-intelligence/232600252?itc=edit_stub" target="_blank">says James Gifas</a>, U.S. head of global transaction services for RBS Citizens. Analyzing customer data also allows banks to spot any potential customer problems, improve management of customer accounts, engage in real-time dialogue with customers, and improve overall customer experience online and over the phone. On the flip side, harnessing Big Data also helps banks to identify and contain fraud and to comply with money-laundering rules and sanctions.&#8221;</p>
<p><a href="http://www.forbes.com/sites/centurylink/2013/05/14/banks-looking-to-cash-in-by-mining-social-data/" target="_blank">Read more here.</a></p>

						<div id="pdrp_endAttribution">
						photo by: 
						 
							<a href="http://flickr.com/8510225@N07/6205752426" target="_blank" class="pdrp_link pdrp_attributionLink">
								John Loo</a>
						</div>
					]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/banks-cashing-in-with-data-mining/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Concurrent Introduces Pattern Scoring Engine for Hadoop</title>
		<link>http://www.dataversity.net/concurrent-introduces-pattern-scoring-engine-for-hadoop/</link>
		<comments>http://www.dataversity.net/concurrent-introduces-pattern-scoring-engine-for-hadoop/#comments</comments>
		<pubDate>Fri, 24 May 2013 07:02:35 +0000</pubDate>
		<dc:creator>A.R. Guess</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Data Daily]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Enterprise Information Management]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Concurrent]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[open-source]]></category>
		<category><![CDATA[Pattern]]></category>
		<category><![CDATA[platform]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=20012</guid>
		<description><![CDATA[by Angela Guess According to a recent article out of the company, &#8220;Concurrent, Inc., the enterprise Big Data application platform company, today introduced Pattern, a free, open source, standard-based scoring engine that enables analysts and data scientists to quickly deploy machine-learning applications on Apache Hadoop™. Leveraging the power and broad platform support of the Cascading application framework, Pattern lowers the barrier to Hadoop adoption by enabling companies to leverage existing intellectual property (IP) in predictive models, existing investments in software tooling and the core competencies of existing analytics staff to run Big Data applications from existing machine-learning models using Predictive Model Markup Language (PMML) or through a simple programming interface.&#8221; The article continues, &#8220;Hadoop is rapidly becoming the tool of choice for tackling enterprise Big Data analytics needs in an effort to make the most of growing volumes of unstructured and semi-structured data. The need for Hadoop to easily integrate with existing data management and analytics systems, however, has created a real barrier to comprehensive Hadoop adoption. With the introduction of Pattern, companies can now leverage existing skill sets, core competencies and product investments by carrying them over to Hadoop via the standards-based PMML technology. PMML is the standard export [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.dataversity.net/wp-content/uploads/2013/05/conc.png"><img class="alignleft size-medium wp-image-20013" alt="conc" src="http://www.dataversity.net/wp-content/uploads/2013/05/conc-300x114.png" width="300" height="114" /></a>by <a href="http://www.dataversity.net/contributors/angela-guess/" target="_blank">Angela Guess</a></p>
<p><a href="http://www.concurrentinc.com/posts/2013/05/21/concurrent-completes-the-big-data-hat-trick-for-hadoop-applications/">According to a recent article</a> out of the company, &#8220;Concurrent, Inc., the enterprise Big Data application platform company, today introduced Pattern, a free, open source, standard-based scoring engine that enables analysts and data scientists to quickly deploy machine-learning applications on Apache Hadoop™. Leveraging the power and broad platform support of the Cascading application framework, Pattern lowers the barrier to Hadoop adoption by enabling companies to leverage existing intellectual property (IP) in predictive models, existing investments in software tooling and the core competencies of existing analytics staff to run Big Data applications from existing machine-learning models using Predictive Model Markup Language (PMML) or through a simple programming interface.&#8221;</p>
<p>The article continues, &#8220;Hadoop is rapidly becoming the tool of choice for tackling enterprise Big Data analytics needs in an effort to make the most of growing volumes of unstructured and semi-structured data. The need for Hadoop to easily integrate with existing data management and analytics systems, however, has created a real barrier to comprehensive Hadoop adoption. With the introduction of Pattern, companies can now leverage existing skill sets, core competencies and product investments by carrying them over to Hadoop via the standards-based PMML technology. PMML is the standard export format for tools, such as R, MicroStrategies® and SAS®; and with Pattern, analysts and data scientists familiar with these technologies can now run predictive data models at scale and integrate ETL, data preparation and predictive analytics in the same application to greatly reduce development time and unlock accessibility to large Hadoop data sets. Pattern in turn will enable a whole new class of use cases and simplify experiments.&#8221;</p>
<p><a href="http://www.concurrentinc.com/posts/2013/05/21/concurrent-completes-the-big-data-hat-trick-for-hadoop-applications/" target="_blank">Read more here.</a></p>
<p><em>photo credit: Concurrent</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/concurrent-introduces-pattern-scoring-engine-for-hadoop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NoSQL Job of the Day: Director, Web Architecture</title>
		<link>http://www.dataversity.net/nosql-job-of-the-day-director-web-architecture/</link>
		<comments>http://www.dataversity.net/nosql-job-of-the-day-director-web-architecture/#comments</comments>
		<pubDate>Fri, 24 May 2013 07:01:41 +0000</pubDate>
		<dc:creator>A.R. Guess</dc:creator>
				<category><![CDATA[Job of the Day]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[NoSQL Job of the Day]]></category>
		<category><![CDATA[Covance]]></category>
		<category><![CDATA[Director of Web Architecture]]></category>
		<category><![CDATA[New Jersey]]></category>
		<category><![CDATA[NJ]]></category>
		<category><![CDATA[NoSQL job]]></category>
		<category><![CDATA[Princeton]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=20008</guid>
		<description><![CDATA[by Angela Guess Covance is looking for a Director of Web Architecture in Princeton, NJ. According to the post, &#8220;The Director, Web Architecture role’s responsibilities are to construct and implement the technical design for Covance&#8217;s web and portal based applications. This includes the design and development of web architectural elements, including data, security, and mobility requirements. The position involves the selection of Web/Portal application hardware and software platforms and designing an application framework to support Covance’s web application needs. This position also needs to work with other web development resources to manage web project deliveries and on-going support of web applications. Exceptional knowledge of various Web/Portal development platforms is required, with emphasis on content management systems and designing for mobility.&#8221; Qualifications for the position include: &#8220;10+ years experience with relational database development (SQL Server and Oracle). 5+ years experience with .Net technologies (ASP.Net, MVC, LINQ, WPF). Experience with enterprise collaboration and social media technologies. Experience with web application development using JavaScript, HTML and CSS. Experience with service oriented architectures with SOAP/REST services. Experience with NoSQL and semantic web technologies (RDF, SPARQL). Experience with cloud computing (EC2, S3, DynamoDB).&#8221; Learn more and apply here. photo credit: Covance]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.dataversity.net/wp-content/uploads/2013/05/co.jpg"><img class="alignleft size-medium wp-image-20009" alt="co" src="http://www.dataversity.net/wp-content/uploads/2013/05/co-300x149.jpg" width="300" height="149" /></a>by <a href="http://www.dataversity.net/contributors/angela-guess/" target="_blank">Angela Guess</a></p>
<p>Covance is looking for a <a href="https://sjobs.brassring.com/en/asp/tg/cim_jobdetail.asp?partnerid=20090&amp;siteid=5090&amp;jobId=359665&amp;codes=Indeed">Director of Web Architecture</a> in Princeton, NJ. According to the post, &#8220;The Director, Web Architecture role’s responsibilities are to construct and implement the technical design for Covance&#8217;s web and portal based applications. This includes the design and development of web architectural elements, including data, security, and mobility requirements. The position involves the selection of Web/Portal application hardware and software platforms and designing an application framework to support Covance’s web application needs. This position also needs to work with other web development resources to manage web project deliveries and on-going support of web applications. Exceptional knowledge of various Web/Portal development platforms is required, with emphasis on content management systems and designing for mobility.&#8221;</p>
<p>Qualifications for the position include: &#8220;10+ years experience with relational database development (SQL Server and Oracle). 5+ years experience with .Net technologies (ASP.Net, MVC, LINQ, WPF). Experience with enterprise collaboration and social media technologies. Experience with web application development using JavaScript, HTML and CSS. Experience with service oriented architectures with SOAP/REST services. Experience with NoSQL and semantic web technologies (RDF, SPARQL). Experience with cloud computing (EC2, S3, DynamoDB).&#8221;</p>
<p><a href="https://sjobs.brassring.com/en/asp/tg/cim_jobdetail.asp?partnerid=20090&amp;siteid=5090&amp;jobId=359665&amp;codes=Indeed" target="_blank">Learn more and apply here.</a></p>
<p><em>photo credit: Covance</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/nosql-job-of-the-day-director-web-architecture/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Beyond Big: Integration Possibilities for Big Data</title>
		<link>http://www.dataversity.net/beyond-big-integration-possibilities-for-big-data/</link>
		<comments>http://www.dataversity.net/beyond-big-integration-possibilities-for-big-data/#comments</comments>
		<pubDate>Thu, 23 May 2013 07:10:58 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Conference and Webinar Communities]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Enterprise Data World]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=19972</guid>
		<description><![CDATA[by Jelani Harper The increasing prevalence of Big Data in today’s business climate is indisputable, yet there are still several issues related to its integration with the enterprise that are preventing organizations from adopting it on a wider scale. The speed, size, and variety of Big Data present challenges to a number of conventional processes including governance, analytics, metadata, and storage. All too often, organizations that do incorporate Big Data do so in a silo format, which detracts from the value of enterprise-wide integration that Big Data can significantly enhance. Intelligent Business Strategies’ Managing Director Mike Ferguson addressed a number of these concerns for several hours during an Enterprise Data World 2013 presentation entitled “Integrating BIG Data Analytics Into The Enterprise.” Aside from denoting many of the key attributes of Big Data and its ramifications on integration, Ferguson also detailed a variety of solutions to such issues and current products that address them. Ultimately, he concluded that Big Data was merely a launching point for increased data integration throughout the enterprise. Big Data Governance Governance concerns for Big Data are similar to those for little data. The goal is to ensure data quality and a manageable format in which data [...]]]></description>
				<content:encoded><![CDATA[<p style="text-align: left;" align="center"><a href="http://www.dataversity.net/wp-content/uploads/2013/05/BigDataIntegration.jpg"><img class="alignleft size-medium wp-image-19973" alt="BigDataIntegration" src="http://www.dataversity.net/wp-content/uploads/2013/05/BigDataIntegration-300x194.jpg" width="300" height="194" /></a>by <a title="Jelani Harper" href="http://www.dataversity.net/contributors/jelani-harper/" target="_blank">Jelani Harper</a><b><br />
</b></p>
<p>The increasing prevalence of Big Data in today’s business climate is indisputable, yet there are still several issues related to its integration with the enterprise that are preventing organizations from adopting it on a wider scale. The speed, size, and variety of Big Data present challenges to a number of conventional processes including governance, analytics, metadata, and storage. All too often, organizations that do incorporate Big Data do so in a silo format, which detracts from the value of enterprise-wide integration that Big Data can significantly enhance.</p>
<p>Intelligent Business Strategies’ Managing Director Mike Ferguson addressed a number of these concerns for several hours during an <a href="http://www.enterprisedataworld.com" target="_blank">Enterprise Data World 2013</a> presentation entitled “Integrating BIG Data Analytics Into The Enterprise.” Aside from denoting many of the key attributes of Big Data and its ramifications on integration, Ferguson also detailed a variety of solutions to such issues and current products that address them. Ultimately, he concluded that Big Data was merely a launching point for increased data integration throughout the enterprise.</p>
<p><b>Big Data Governance</b></p>
<p><a href="http://www.dataversity.net/big-data-governance-over-streaming-data/">Governance concerns for Big Data</a> are similar to those for little data. The goal is to ensure data quality and a manageable format in which data is easily archived, stored, and accessed in order to assist those professionals who use it most. Still, there are a number of key aspects of Big Data that makes its governance concerns unique. The primary distinction between Big Data and little data is that the latter is structured, conforms to a universal definition of metadata, and is able to be readily categorized and accessed by professionals accordingly.</p>
<p>One may argue that the entire point of capturing and utilizing Big Data is to glean insights from unstructured data, the likes of which users themselves may not be fully aware of at the point of capture. Therefore, there is an aspect of data exploration (ideally performed by <a href="http://www.dataversity.net/data-science-programs-on-the-increase-at-universities/">data scientists</a>) that is vital to the integration of Big Data and occupies a primary place in its governance – which may be secondary or unnecessary for traditional data.</p>
<p>This distinction manifests itself in a number of different ways. Whereas data stewards are seen as the principle curators of the governance of traditional data, data scientists are often the front-line professionals who are responsible for not only exploring Big Data, but also for providing its essential governance principles. Regardless of what technology an organization uses to access Big Data (<a href="http://www.dataversity.net/the-apache-software-foundation-and-its-influence-on-data-management/">Apache Hadoop</a> is certainly one of the most popular), the first level of governance is for data scientists to explore various aspects of data in sandboxes to analyze and stratify its characteristics.</p>
<p>Thus, governance issues related to Big Data involve policies about data science projects, policies for the results of data once it has been moved into a warehouse, and policies for discovered schema and data processed through BI tools. Other governance concerns include what sources can be integrated into Big Data technologies and who can access such data while attempting to present as much structure (and avoidance of duplication) as possible. <a href="http://www.greenplum.com/products/chorus">EMC’s GreenPlum Chorus</a> is a tool that enables organizations to govern different sandboxes and workspaces; <a href="http://www-01.ibm.com/software/data/bigdata/enterprise.html">IBM’s Big Data Platform</a> also has governance capabilities. However, Ferguson claims there may be a more pressing issue:</p>
<p style="padding-left: 30px;">“I think we will see more data governance capabilities in the Hadoop world, but remember it’s un-modeled data and so some of the things associated with data governance – like common definitions for a data model – may not apply yet. Instead, we’ll need a data scientist team to work on a data source to derive structure from unstructured data. Then we’ll want to map that into some kind of model that may adhere to our standard canonical data names and definitions, so that we can then consume that data easily in the enterprise.”</p>
<p><b>Analytics Options</b></p>
<p>One particularly insightful aspect about Ferguson’s presentation was that it helped to clarify the analytics challenges of Big Data. Organizations may take terms such as structured, unstructured, semi-structured, and ploy-structured data for granted – until they hope to actually transform such data into information. Depending on an enterprises’ particular area of focus, Big Data encompasses not just sentiment data from social media and other websites, but clickstream data, transactional and vertical industries data, event and sensor data, all of which can range from text (in various languages and jargon) to audio/video and sensors. Most of this data is in a constant state of flux in which it is steadily coming in, leaving little time for analysis.</p>
<p>The approach towards analyzing Big Data is inverted from that of doing so with a conventional data warehouse in which users perform analytics on data that is already stored. The trick with running analytics on Big Data is to analyze it first and then determine whether or not such data should be stored. The primary drivers for Big Data analytics are transactional volume and analytics complexity.</p>
<p>The three principle platform types for Big Data analytics include SQL-based relational databases, NoSQL databases, and Hadoop. There are also hybrid solutions that combine Hadoop and SQL databases such as Teradata Aster, and conventional RDBMS that work with a finite amount of data volume. Developments in SQL technologies such as in-database analytics, columnar and in-memory data have greatly expanded their analytics capabilities for the size concerns of Big Data, while Hadoop’s extreme scalability and inexpensiveness (it is open source) make it one of the most sought after platforms for Big Data.</p>
<p>These two aspects of Hadoop, as well as its other frequently used components such as MapReduce – a data interpretation processing model – and its data warehouse Hive, have contributed to the fact that numerous SQL-based technologies have created applications that allow users to access and analyze Big Data through Hadoop. A number of top vendors package products with Hadoop, such as <a href="http://www.sas.com/software/information-management/big-data/hadoop.html">SAS</a> and IBM’s Big Insights. <a href="http://hortonworks.com/stinger/">HortonWorks’ Stinger Initiative</a> significantly increase the speed of Hive (up to 100 times), enabling self-service BI querying. <a href="http://www.cloudera.com/content/cloudera/en/products/cdh/impala.html">Cloudera Impala</a> allows users to make real-time queries with SQL technologies and is supported by a number of top BI vendors.</p>
<p>The principle boon of querying and analyzing data with SQL is that it simplifies data integration processes. Virtually all of the aforementioned solutions have data integration tools, while some platforms, such as the <a href="http://www.teradata.com/Teradata-Enterprise-Access-for-Hadoop/">Teradata Enterprise Access For Hadoop</a> (which is part of its <a href="http://www.dataversity.net/teradata-closes-in-on-unified-data-analytics/">Unified Data Architecture</a>), provide access to Hadoop and a myriad of other data sources in a fully integrated data warehouse. Thus, users can run advanced analytics on Big Data in nearly real-time and readily integrate it with the rest of their data.</p>
<p><b>More Than Just Big</b></p>
<p>Ferguson commented on this trend:</p>
<p style="padding-left: 30px;">“Data management vendors are working to exploit the Hadoop platform to get scalable ETL processing. This opens up the opportunity to potentially offload that kind of processing from data warehouses into a Hadoop environment which raises another question: could we turn Hadoop into a data hub where we load data in there and process it and then move it on to wherever it has to go for subsequent analysis? Maybe it could start moving just within the Hadoop cluster to data scientists’ sandboxes, or into other platforms like data warehouses for subsequent analysis.”</p>
<p>The implications for Hadoop as a data hub are significant because it suggests the possibility of utilizing numerous data sources and respective tools all on the same platform. According to Ferguson, this usage of Hadoop could represent the wave of the future:</p>
<p style="padding-left: 30px;">“There’s going to constantly be a call to move data between platforms, and it’s going to get faster and the degree of integration is going to get more tightly controlled. You may have Hadoop on a cluster in the cloud and your relational database in-house, but already we’re seeing vendors put both of them in the same place. We’re seeing relational databases putting multiple NoSQL stores in the database. These are all signs of deeper integration in order to allow you to control analytical workloads for multiple platforms, exploit the best platform for analytics and make sure you’re using your data.”</p>
<p>Subsequently, Hadoop’s potential for integrating Big Data with other enterprise data sources could result in a more profound integration throughout the enterprise. Such integration could require moving master data into Hadoop and truly streamlining various ETL, NoSQL, Hadoop, and warehousing technologies via common metadata terminology to effectively stretch Data Management to include all data solutions and assets.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/beyond-big-integration-possibilities-for-big-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>HOSTING Launches Cloud Visibility Reporting Tool</title>
		<link>http://www.dataversity.net/hosting-launches-cloud-visibility-reporting-tool/</link>
		<comments>http://www.dataversity.net/hosting-launches-cloud-visibility-reporting-tool/#comments</comments>
		<pubDate>Thu, 23 May 2013 07:04:29 +0000</pubDate>
		<dc:creator>A.R. Guess</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Cloud-Based Data]]></category>
		<category><![CDATA[Data Daily]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[360 Degrees Reporting]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[cloud visibility]]></category>
		<category><![CDATA[HOSTING]]></category>
		<category><![CDATA[platform]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=19998</guid>
		<description><![CDATA[by Angela Guess A new article out of the company reports, &#8220;HOSTING, a leading provider of managed cloud services for business-critical applications, today announced a new business insights tool that provides visibility into HOSTING&#8217;s cloud services using the five tenets of ITIL service design: availability, performance, recovery, security and capacity. The new 360 Degrees Report™ provides a concise summary of both historical and predictive cloud information that customers can use to help optimize their environment in HOSTING&#8217;s cloud. 360 Degrees Report is the only cloud report on the market that enables customers to compare their own performance levels with those of organizations in similar industries, running similar applications or with similar critical success factors.&#8221; The article continues, &#8220;The 360 Degrees Report offers flexibility to tailor reports based on an enterprise&#8217;s own critical success factors. Customers configure threshold targets within the five ITIL areas to align with their business priorities. This enables the delivery of relevant intelligence based on the operational performance and security posture deemed most important by individual customers. The monthly 360 Reports provide quantitative metrics for each ITIL area, along with recommendations for improving the score achieved.  For example, a 360 Degrees Report could recommend that a customer add monitoring to additional devices [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.dataversity.net/wp-content/uploads/2013/05/hos.png"><img class="alignleft size-medium wp-image-19999" alt="hos" src="http://www.dataversity.net/wp-content/uploads/2013/05/hos-300x172.png" width="300" height="172" /></a>by <a href="http://www.dataversity.net/contributors/angela-guess/" target="_blank">Angela Guess</a></p>
<p><a href="http://www.prnewswire.com/news-releases/hosting-launches-cloud-visibility-reporting-tool-for-measuring-operational-performance-206416401.html">A new article out of the company</a> reports, &#8220;<a href="http://www.hosting.com/" target="_blank">HOSTING</a>, a leading provider of managed cloud services for business-critical applications, today announced a new business insights tool that provides visibility into HOSTING&#8217;s cloud services using the five tenets of ITIL service design: availability, performance, recovery, security and capacity. The new 360 Degrees Report™ provides a concise summary of both historical and predictive cloud information that customers can use to help optimize their environment in HOSTING&#8217;s cloud. 360 Degrees Report is the only cloud report on the market that enables customers to compare their own performance levels with those of organizations in similar industries, running similar applications or with similar critical success factors.&#8221;</p>
<p>The article continues, &#8220;The 360 Degrees Report offers flexibility to tailor reports based on an enterprise&#8217;s own critical success factors. Customers configure threshold targets within the five ITIL areas to align with their business priorities. This enables the delivery of relevant intelligence based on the operational performance and security posture deemed most important by individual customers. The monthly 360 Reports provide quantitative metrics for each ITIL area, along with recommendations for improving the score achieved.  For example, a 360 Degrees Report could recommend that a customer add monitoring to additional devices to improve availability scoring.&#8221;</p>
<p><a href="http://www.prnewswire.com/news-releases/hosting-launches-cloud-visibility-reporting-tool-for-measuring-operational-performance-206416401.html" target="_blank">Read more here.</a></p>
<p><em>photo credit: HOSTING</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/hosting-launches-cloud-visibility-reporting-tool/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>IBM Unveils Watson Engagement Advisor</title>
		<link>http://www.dataversity.net/ibm-unveils-watson-engagement-advisor/</link>
		<comments>http://www.dataversity.net/ibm-unveils-watson-engagement-advisor/#comments</comments>
		<pubDate>Thu, 23 May 2013 07:03:11 +0000</pubDate>
		<dc:creator>A.R. Guess</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Cloud-Based Data]]></category>
		<category><![CDATA[Data Daily]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Enterprise Information Management]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Semantic Technology]]></category>
		<category><![CDATA[Unstructured Data]]></category>
		<category><![CDATA[Engagement Advisor]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[new]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[Watson]]></category>
		<category><![CDATA[webinar]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=19994</guid>
		<description><![CDATA[by Angela Guess IBM recently announced, &#8220;Ushering in a new era of cognitive computing systems, IBM today unveiled the IBM Watson Engagement Advisor, a technology breakthrough that allows brands to crunch big data in record time to transform the way they engage clients in key functions such as customer service, marketing and sales. Now businesses can better serve consumers with a cognitive computing assistant that learns, adapts and understands a company&#8217;s data quickly and easily, enabling users to have IBM Watson at work quickly, while increasing its knowledge and value over time.&#8221; The article continues, &#8220;Two years after its triumph on Jeopardy!, the IBM Watson Engagement Advisor is a first of a kind system designed to help customer-facing personnel assist consumers with deeper insights more quickly than previously possible. Delivered through cloud-delivered services and online chat sessions, IBM Watson will empower a brand&#8217;s customer service agents to provide fast, data-driven answers, or sit directly in the hands of consumers via mobile device. In one simple click, the solution&#8217;s Ask Watson feature will quickly help address customers&#8217; questions, offer feedback to guide their purchase decisions, and troubleshoot their problems.&#8221; Read more here, or watch a video about the announcement here. photo credit: IBM]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.dataversity.net/wp-content/uploads/2013/05/wat.png"><img class="alignleft size-medium wp-image-19995" alt="wat" src="http://www.dataversity.net/wp-content/uploads/2013/05/wat-300x176.png" width="300" height="176" /></a>by <a href="http://www.dataversity.net/contributors/angela-guess/" target="_blank">Angela Guess</a></p>
<p><a href="http://www-03.ibm.com/press/us/en/pressrelease/41122.wss">IBM recently announced</a>, &#8220;Ushering in a new era of cognitive computing systems, IBM today unveiled the IBM Watson Engagement Advisor, a technology breakthrough that allows brands to crunch big data in record time to transform the way they engage clients in key functions such as customer service, marketing and sales. Now businesses can better serve consumers with a cognitive computing assistant that learns, adapts and understands a company&#8217;s data quickly and easily, enabling users to have IBM Watson at work quickly, while increasing its knowledge and value over time.&#8221;</p>
<p>The article continues, &#8220;Two years after its triumph on Jeopardy!, the <a href="http://asmarterplanet.com/blog/2013/05/connect.html">IBM Watson Engagement Advisor</a> is a first of a kind system designed to help customer-facing personnel assist consumers with deeper insights more quickly than previously possible. Delivered through cloud-delivered services and online chat sessions, IBM Watson will empower a brand&#8217;s customer service agents to provide fast, data-driven answers, or sit directly in the hands of consumers via mobile device. In one simple click, the solution&#8217;s <a href="http://bit.ly/10JLFoj">Ask Watson</a> feature will quickly help address customers&#8217; questions, offer feedback to guide their purchase decisions, and troubleshoot their problems.&#8221;</p>
<p><a href="http://www-03.ibm.com/press/us/en/pressrelease/41122.wss">Read more here</a>, or <a href="https://www.youtube.com/watch?v=6X6W6Tc6E9A&amp;feature=youtu.be" target="_blank">watch a video about the announcement here</a>.</p>
<p><em>photo credit: IBM</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/ibm-unveils-watson-engagement-advisor/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>10 Data Governance Red Flags</title>
		<link>http://www.dataversity.net/10-data-governance-red-flags/</link>
		<comments>http://www.dataversity.net/10-data-governance-red-flags/#comments</comments>
		<pubDate>Thu, 23 May 2013 07:02:13 +0000</pubDate>
		<dc:creator>A.R. Guess</dc:creator>
				<category><![CDATA[Data Daily]]></category>
		<category><![CDATA[Data Governance and Quality]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Enterprise Information Management]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[data governance]]></category>
		<category><![CDATA[EDW]]></category>
		<category><![CDATA[implementation]]></category>
		<category><![CDATA[indicators]]></category>
		<category><![CDATA[problems]]></category>
		<category><![CDATA[red flags]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=19990</guid>
		<description><![CDATA[by Angela Guess Rick Vanover of TechRepublic recently shared ten signs that your company might have a data governance problem. He writes, &#8220;One thing we all can agree on is that hindsight is always very clear. When we’re discussing and reviewing issues in technology systems and failed projects, we can usually make a data path to the problem. Data governance issues can exist in organizations of any size, but if you don’t know much about data governance, that’s an indication of a potential issue. I recently attended Enterprise Data World, a conference around topics including Big Data. During that event, it was pretty clear that there is a priority to ensure that data of all sizes and profiles are managed well for organizations big and small. I came up with this list of 10 indications that you may have a data governance issue.&#8221; The first indication is pockets of adoption: &#8220;When it comes to data and its access, pockets of adoption may not cut it. If you hear this type of conversation, keep in mind that it takes only one problem spot to cause a data handling issue. Adopting data governance has to include the entire cycle and scope of the [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.dataversity.net/wp-content/uploads/2013/05/red_flags.jpg"><img class="alignleft size-medium wp-image-19991" alt="Red flags" src="http://www.dataversity.net/wp-content/uploads/2013/05/red_flags-300x225.jpg" width="300" height="225" /></a>by <a href="http://www.dataversity.net/contributors/angela-guess/" target="_blank">Angela Guess</a></p>
<p><a href="http://www.techrepublic.com/blog/10things/10-signs-that-you-might-have-a-data-governance-problem/3732">Rick Vanover of TechRepublic</a> recently shared ten signs that your company might have a data governance problem. He writes, &#8220;One thing we all can agree on is that hindsight is always very clear. When we’re discussing and reviewing issues in technology systems and failed projects, we can usually make a data path to the problem. Data governance issues can exist in organizations of any size, but if you don’t know much about data governance, that’s an indication of a potential issue. I recently attended <a href="http://edw2013.dataversity.net/" target="_blank">Enterprise Data World</a>, a conference around topics including Big Data. During that event, it was pretty clear that there is a priority to ensure that data of all sizes and profiles are managed well for organizations big and small. I came up with this list of 10 indications that you may have a data governance issue.&#8221;</p>
<p>The first indication is pockets of adoption: &#8220;When it comes to data and its access, pockets of adoption may not cut it. If you hear this type of conversation, keep in mind that it takes only one problem spot to cause a data handling issue. Adopting data governance has to include the entire cycle and scope of the organization. The reality is that it takes just one system to improperly handle a piece of sensitive data and cause an issue.&#8221;</p>
<p><a href="http://www.techrepublic.com/blog/10things/10-signs-that-you-might-have-a-data-governance-problem/3732" target="_blank">Read more here.</a></p>

						<div id="pdrp_endAttribution">
						photo by: 
						 
							<a href="http://flickr.com/50839356@N00/116017204" target="_blank" class="pdrp_link pdrp_attributionLink">
								rvw</a>
						</div>
					]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/10-data-governance-red-flags/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Big Data Job of the Day: Data Engineer</title>
		<link>http://www.dataversity.net/big-data-job-of-the-day-data-engineer/</link>
		<comments>http://www.dataversity.net/big-data-job-of-the-day-data-engineer/#comments</comments>
		<pubDate>Thu, 23 May 2013 07:01:05 +0000</pubDate>
		<dc:creator>A.R. Guess</dc:creator>
				<category><![CDATA[Big Data Job of the Day]]></category>
		<category><![CDATA[Job of the Day]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Big Data job]]></category>
		<category><![CDATA[CA]]></category>
		<category><![CDATA[California]]></category>
		<category><![CDATA[Chegg]]></category>
		<category><![CDATA[Data Engineer]]></category>
		<category><![CDATA[Santa Clara]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=19986</guid>
		<description><![CDATA[by Angela Guess Chegg is looking for a Data Engineer in Santa Clara, CA. According to the post, &#8220;As a Data Engineer you will execute the strategy and roadmap for data at Chegg. You will play multiple roles that span data architecture and design, data warehousing, and data quality control. You will work with a team of data engineers and leverage multiple data platforms including: Hadoop, MongoDB, Cassandra, MySQL, and Aster. You will work with analysts that leverage data with scientific and analytic tools such as R, Qlikview, and Tableau. You will engage with analysts and leaders to research and develop new data engineering capabilities; troubleshooting challenges as they arise.&#8221; Qualifications for the position include: &#8220;7-10+ years working as a developer in a Data Engineering, Data Warehousing / BI team. Extensive experience working with structured data platforms, ELT/ETL, and Unix/Linux shell scripting languages such as Bash, Perl, or Ruby. Expertise troubleshooting data quality issues, analyzing data requirements, and utilizing big data systems such as Hadoop. Experience in report development, dimensional data modeling. Previous experience with scientific programming frameworks such as R, Matlab, or Python is a plus. A strong desire to build data platforms that drive insights from Chegg’s network [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.dataversity.net/wp-content/uploads/2013/05/che.jpg"><img class="alignleft size-medium wp-image-19987" alt="che" src="http://www.dataversity.net/wp-content/uploads/2013/05/che-300x125.jpg" width="300" height="125" /></a>by <a href="http://www.dataversity.net/contributors/angela-guess/" target="_blank">Angela Guess</a></p>
<p>Chegg is looking for a <a href="http://www.chegg.com/jobs/listings/?nl=1&amp;jvi=oFhuXfw7,Job&amp;jvs=Indeed&amp;jvk=Job">Data Engineer</a> in Santa Clara, CA. According to the post, &#8220;As a Data Engineer you will execute the strategy and roadmap for data at Chegg. You will play multiple roles that span data architecture and design, data warehousing, and data quality control. You will work with a team of data engineers and leverage multiple data platforms including: Hadoop, MongoDB, Cassandra, MySQL, and Aster. You will work with analysts that leverage data with scientific and analytic tools such as R, Qlikview, and Tableau. You will engage with analysts and leaders to research and develop new data engineering capabilities; troubleshooting challenges as they arise.&#8221;</p>
<p>Qualifications for the position include: &#8220;7-10+ years working as a developer in a Data Engineering, Data Warehousing / BI team. Extensive experience working with structured data platforms, ELT/ETL, and Unix/Linux shell scripting languages such as Bash, Perl, or Ruby. Expertise troubleshooting data quality issues, analyzing data requirements, and utilizing big data systems such as Hadoop. Experience in report development, dimensional data modeling. Previous experience with scientific programming frameworks such as R, Matlab, or Python is a plus. A strong desire to build data platforms that drive insights from Chegg’s network of students, physical and digital content, and learning aids. An insatiable appetite to transform education through data.&#8221;</p>
<p><a href="http://www.chegg.com/jobs/listings/?nl=1&amp;jvi=oFhuXfw7,Job&amp;jvs=Indeed&amp;jvk=Job" target="_blank">Learn more and apply here.</a></p>
<p><em>photo credit: Chegg</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/big-data-job-of-the-day-data-engineer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Low Down on Recovering Deleted Files</title>
		<link>http://www.dataversity.net/the-low-down-on-recovering-deleted-files/</link>
		<comments>http://www.dataversity.net/the-low-down-on-recovering-deleted-files/#comments</comments>
		<pubDate>Wed, 22 May 2013 07:10:51 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[David Logue]]></category>
		<category><![CDATA[Discussion]]></category>
		<category><![CDATA[Enterprise Information Management]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=19978</guid>
		<description><![CDATA[by David Logue Recently, I’ve received several questions related to the recovery of deleted files.  What happens when a file is deleted on a Windows-based system, and what causes those files to be lost and therefore unrecoverable?  Further, what could I have done to prevent their loss?  To answer those questions, we first need to answer another very important question. How does Windows save file data on a NTFS volume? When you create a new file, like a picture from your vacation (vacation.jpg), and save it to your hard drive (formatted with the NTFS file system), Windows does a couple things.  It finds an open file record in the metadata area of the disk (called the Master File Table or MFT) and writes some information about the file, such as the file name and date.  If there are no open file records, Windows will expand the MFT and create a new file record. Windows then finds some free data blocks on the volume to write the actual file data to.  Once the data blocks are identified, Windows links the new file record to the data blocks and writes the actual data to the disk.  The picture below illustrates the vacation.jpg [...]]]></description>
				<content:encoded><![CDATA[<p>by <a title="David Logue" href="http://www.dataversity.net/contributors/david-logue/" target="_blank">David Logue</a></p>
<p>Recently, I’ve received several questions related to the recovery of deleted files.  What happens when a file is deleted on a Windows-based system, and what causes those files to be lost and therefore unrecoverable?  Further, what could I have done to prevent their loss?  To answer those questions, we first need to answer another very important question.</p>
<p>How does Windows save file data on a NTFS volume?</p>
<p>When you create a new file, like a picture from your vacation (vacation.jpg), and save it to your hard drive (formatted with the NTFS file system), Windows does a couple things.  It finds an open file record in the metadata area of the disk (called the Master File Table or MFT) and writes some information about the file, such as the file name and date.  If there are no open file records, Windows will expand the MFT and create a new file record.</p>
<p>Windows then finds some free data blocks on the volume to write the actual file data to.  Once the data blocks are identified, Windows links the new file record to the data blocks and writes the actual data to the disk.  The picture below illustrates the vacation.jpg file as written to the disk.</p>
<p style="text-align: center;"><a href="http://www.dataversity.net/wp-content/uploads/2013/05/DL-Pic1.png"><img class="alignnone size-full wp-image-19979" alt="DL Pic1" src="http://www.dataversity.net/wp-content/uploads/2013/05/DL-Pic1.png" width="615" height="378" /></a>
</p>
<p>So what happens when a file is deleted (assuming it is not going into the Recycle Bin)?  Two very important things happen (from a data recovery perspective):</p>
<ol>
<li>The file record is marked as deleted and available for reuse.</li>
<li>The data area is marked as free space and available for reuse.</li>
</ol>
<p style="text-align: center;"><a href="http://www.dataversity.net/wp-content/uploads/2013/05/DL-Pic2.png"><img class="alignnone size-full wp-image-19980" alt="DL Pic2" src="http://www.dataversity.net/wp-content/uploads/2013/05/DL-Pic2.png" width="617" height="392" /></a></p>
<p>The image above shows the areas of the disk that hold the data for the vacation.jpg file have now been marked as free space and are available for use for new files or to expand existing files.  The file record has also been marked as deleted and is available for reuse by the file system.</p>
<p>To recover deleted data, your data recovery company or software needs to be able to find deleted file records that have not been overwritten and the data blocks that relate to those files.  The DR company or software should also scan the unallocated space on the disk for data blocks that were in use, but whose file records have been overwritten.</p>
<p>An example of such a process is as follows:</p>
<ol>
<li>Limit access to the disk (write blocker)</li>
<li>Scan volume metadata for file records marked as deleted</li>
<li>Recover deleted file records and their related data blocks into new files</li>
<li>Scan volume for raw data that is currently in unallocated or free areas of the drive</li>
<li>Recover raw data blocks into new files</li>
</ol>
<p>What are some of the reasons deleted data cannot be recovered?</p>
<ol>
<li>File record is overwritten and:
<ol>
<li>No signature for the file data</li>
<li>Data is fragmented</li>
<li>Data is completely overwritten</li>
<li>Data is partially overwritten</li>
</ol>
</li>
</ol>
<p>The figure below illustrates a file that has been deleted, its file record overwritten by a new file, and the data is fragmented on the drive.</p>
<p style="text-align: center;"><a href="http://www.dataversity.net/wp-content/uploads/2013/05/DL-Pic3.png"><img class="alignnone size-full wp-image-19981" alt="DL Pic3" src="http://www.dataversity.net/wp-content/uploads/2013/05/DL-Pic3.png" width="609" height="370" /></a></p>
<p>Our example file (vacation.jpg) has been deleted and the file record overwritten with a new file (birthday.jpg).  The only recovery possible for the vacation.jpg file is to find and assemble the raw data blocks (assuming there isn’t another copy of the FR somewhere else on the volume).  The success rate for this type of recovery is very high as the data blocks (Blocks 1-4) in our example have not been overwritten by new data.</p>
<p>If the new file (birthday.jpg) had overwritten some of the data blocks like in the example below, then the file would only be partially recoverable (blocks 2 and 3 overwritten).</p>
<p style="text-align: center;"><a href="http://www.dataversity.net/wp-content/uploads/2013/05/DL-Pic4.png"><img class="alignnone size-full wp-image-19982" alt="DL Pic4" src="http://www.dataversity.net/wp-content/uploads/2013/05/DL-Pic4.png" width="620" height="384" /></a></p>
<p>If all of the data blocks had been overwritten like the example below then the file would not be recoverable (blocks 1-4 overwritten).</p>
<p style="text-align: center;"><a href="http://www.dataversity.net/wp-content/uploads/2013/05/DL-Pic5.png"><img class="alignnone size-full wp-image-19983" alt="DL Pic5" src="http://www.dataversity.net/wp-content/uploads/2013/05/DL-Pic5.png" width="609" height="375" /></a></p>
<p>So what can you do to make sure this doesn’t happen to you?</p>
<ol>
<li>Backup/replicate your data.  I know it sounds cliché, but a simple backup/replication can save a ton of heartache.</li>
<li>If you accidentally delete a file, and don’t have a backup, stop using the system as soon as possible.  Browsing the Internet or continuing to work writes additional data to the disk and can cause the data blocks to be overwritten.</li>
<li>Restore a backup to a different drive to make sure that the backup contains the data you need and that the files are in working order.</li>
<li>If you want to attempt the recovery yourself, make a copy of the drive if possible and work on the copy.  If it is not possible to make a copy, make sure the drive is slaved as a secondary disk to the system.  Do not install recovery software on the drive you want to recover from.</li>
<li>Seek professional assistance.  Good data recovery companies offer a free consultation, so you can discuss your specific needs with a data loss expert.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/the-low-down-on-recovering-deleted-files/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced

 Served from: www.dataversity.net @ 2013-05-25 01:10:34 by W3 Total Cache -->