Why Rich Discovery and Analysis Can Make or Break Your Unstructured Data Migration Plan

Click to learn more about author Brian Murphy.

Unstructured data is growing faster than ever. In fact, according to prevailing research from most analyst firms, it is growing significantly faster than structured data. According to IDC, 80 percent of worldwide data will be unstructured by 2025. And, at the same time that the sheer volume of unstructured data is growing so rapidly, so too is its value in enabling businesses to make faster, smarter, and more strategic data-driven decisions. Consequently, capacities in file systems today typically exceed hundreds of terabytes (TBs), and in many cases — multiple petabytes (PBs). This is data organizations want to keep, protect, and leverage.

This massive growth in unstructured data across file system environments has, however, made hardware refreshes an extremely arduous process. Exacerbating the challenge, frequently, the data sets in these environments are not fully understood when it comes to usage and access patterns. For instance, user home directories in many enterprise environments are storing employee data that is not critical to the business. Nonetheless, for some employees, their personal data has made it through multiple tech refreshes intact.

Moreover, various regulations, in some instances, can result in data sets that must be stored and protected for protracted periods of time, if not indefinitely. Finally, it is not unusual for customers to find themselves locked into their storage vendor due to the fear of cross-platform migrations.

“A goal without a plan is just a wish.” – Antoine de Saint-Exupéry

Inarguably, planning and preparation are the most critical steps in the unstructured data migration process. However, in order to implement the best plan, it is critical that you have the greatest visibility into your data environment. And, the best way to do that (I would argue — the only way) is with a software tool that provides deep discovery and analysis capabilities. When seeking such a tool, look for one that can hook directly into the management API of the industry’s top file system technologies in order to fully discover and analyze all of the relevant paths. In addition, the tool should be able to discover, analyze, and report on all of the following areas:

SMB Shares
NFS Exports and Aliases
Quota (advisory, soft, and hard)
Used Capacity
Replication

With such a planning report in hand, the user can determine the ideal strategy for the migration and then execute it.

Let’s take a deeper look into these critical visibility areas.

SMB/NFS Shares and Exports

With Shares and Exports, your software tool’s discovery process should have the ability to generate a report showing the following data points:

File Server
Path
Shared Status
List of Shares, Exports, and Aliases
Number of Shares and Exports
Number of Child Shares and Exports
Number of Parent Shares and Exports

This information, in turn, can be used to identify what paths are relevant in your environment and further develop a strategy for your migration. As an example, with this information, administrators can now identify multiple times exports are shared and identify multiple paths that can be helpful in determining the use of symlinks to provide the same user-experience in cross-platform migrations.

Quotas

Quota reports provide users with access to information that can be very helpful in determining which values might need to be migrated to the new storage. In Quota reports, you should find:

File Server Information
Path
Quota Origin
Capacity Quota
Capacity Origin
Used Capacity
Approximate Item Count

Capacity

Capacity reports can be used to identify capacity in the environment at various levels. Users can derive capacity by looking at the child level, for example. Capacity reports should provide users with the following data points:

File Server
Path
Capacity Origin
Used Capacity
Approximate Item Count

Replication

Replication reports can be used to identify if replication is configured on the path, if data is replicated as part of a parent directory replication, if child directories are replicated, if a filtered file set is replicated, or if there is no replication at all. The following data points should be captured in a replication report:

File Server
Path
Replicate
Replication Paths
Replication Targets

Measure Twice, Cut Once

The data discovery and analysis that can now be enjoyed via select software solutions will enable the creation and execution of intelligent data migration strategies. Not only that but for those that wish to then take the next step, the findings can also be leveraged to help determine how to carve-up all that storage in the new environment.

However, bottom-line — I am going to go out on a limb and say, absolutely no administrator can risk a failed data migration. And, as Sir Winston Churchill once stated so eloquently, “He who fails to plan is planning to fail.” But, with the right planning, enabled by the right tools, your next unstructured data migration will be a lot less stressful and, even more importantly, a complete success!

TAKE OUR DATA MANAGEMENT CERTIFICATION PREP COURSES

Data Topics