On March 29th, 2012, the Obama Administration announced what it called the “Big Data Research and Development Initiative,” a $200 million project to “bolster the tools and techniques needed to access, organize, and glean discoveries from huge volumes of digital data.” Under the initiative, several federal programs and agencies will receive federal investment to better comprehend the government’s Big Data problem.
Dr. John P. Holdren, Assistant to the President and Director of the White House Office of Science and Technology Policy stated in a press release (.pdf) the rationale behind this investment:
“In the same way that past Federal investments in information-technology R&D led to dramatic advances in supercomputing and the creation of the Internet, the initiative we are launching today promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security.”
On the same day, the Office of Science and Technology Policy (OSTP) released a Big Data Fact Sheet (.pdf) through the Executive Office of the President that highlighted the federal agencies and programs that will be a part of the initiative. The OSTP described this document as “highlights […] that address the challenges of, and tap the opportunities afforded by, the big data revolution to advance agency missions and further scientific discovery and innovation.”
According to the document, the agencies include the Department of Defense (DoD), Department of Homeland Security (DHS), Department of Energy (DoE), Department of Veterans Affairs (VA), Health and Human Services (HHS), the Food and Drug Administration (FDA), the National Archives and Records Administration (NARA), National Aeronautics and Space Administration (NASA), National Institutes of Health (NIH), the National Science Foundation (NSF), the National Security Agency (NSA), and the United States Geological Survey (USGS).
For the Department of Defense, the initiative will improve a variety of programs and projects. One of these is the Anomaly Detection at Multiple Scales (ADAMS) program that “addresses the problem of anomaly-detection and characterization in massive data sets.” Upgrades to ADAMS will help the federal government recognize significant changes and anomalies in an array of digital projects and data that is shared and transported over secure networks. Another project, the Cyber-Insider Threat (CINDER) program, will seek “to develop novel approaches to detect activities consistent with cyber espionage in military computer networks.” Further projects that fall under the DoD initiative include those of enhancing surveillance activities, artificial intelligence, security challenges with cloud computing, military imagery analysis, and hard intelligence analysis.
For the Department of Energy, the funds will help bolster its Office of Advanced Computing Research to more efficiently compile data in coordination with the High Performance Storage System (HPSS), software developed by the DoE and IBM that “manages petabytes of data on disks and robotic tape systems.” Other areas of the initiative will seek to improve methods of moving large amounts of data, collect data about climate change, atmospheric radiation, and cross-check data between internationally shared data sets in nuclear physics, nuclear energy, and fusion energy.
The Department of Veterans Affairs will update a system it has used called ProWatch, or, Protecting Warfighters using Algorithms for Text Processing to Capture Health Events, a “surveillance program that relies on newly developed informatics resources to detect, track, and measure health conditions associated with military deployment.” Other projects will allow the VA to more efficiently store and capture hand-written texts associated with medical files and paperwork onto a digital framework.
In regards to Health and Human Services, the Center for Disease Control and Prevention (CDC) will update its BioSense 2.0 system, “the first system to take into account the feasibility of regional and national coordination for public health situation awareness through an interoperable network of systems.” Other big data improvements will target the CDCs Special Bacteriology Reference Laboratory, Medicare & Medicaid Services’ data warehouse and CMS platforms.
With the funding, the Obama Administration will also seek to create a Virtual Laboratory Environment (VLE) for the FDA. According to the Big Data Fact Sheet, the VLE will “enable a virtual laboratory data network, advanced analytical and statistical tools and capabilities, crowd sourcing of analytics to predict and promote public health, document management support, tele-presence capability to enable worldwide collaboration, and basically make any location a virtual laboratory with advanced capabilities in a matter of hours.”
The National Archives and Records Administration seeks to create a the Cyber Infrastructure for a Billion Electronic Records (CI-BER), a “testbed notable for its application of a multi-agency sponsored cyber infrastructure and the National Archives’ diverse 87+ million file collection of digital records and information [that will] evaluate technologies and approaches to support sustainable access to ultra-large data collections.”
Despite massive budget cuts earlier in the year, NASA will mature Big Data capabilities to “reduce the risk, cost, size and development time of Earth Science Division space-based and ground-based information systems and increase the accessibility and utility of science data.” NASA will also make further improvements to its Global Earth Observation System of Systems (GEOSS), a “collaborative, international effort to share and integrate Earth observation data.” NASA will continue to enhance projects that are “testing the utility of hybrid computers systems using a highly-integrated, non-SQL database as a means for data delivery to accelerate the execution of modeling and analysis software.”
The National Institute of Health aims to bolster The Cancer Imaging Archive (TCIA), The Cancer Genome Atlas (TCGA), Big Data needs for the National Heart Lung and Blood Institute (NHLBI), and several other NIH-supported projects concerning biometrics, biomedical data banks, medicine libraries, and advanced neuroscience research that will be shared internationally with the German Federal Ministry of Education and Research.
The National Science Foundation has produced a more robust mandate, seeking to improve the new Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA), “a joint solicitation between NSF and NIH that aims to advance the core scientific and technological means of managing, analyzing, visualizing and extracting useful information from large, diverse, distributed and heterogeneous data sets.” It also seeks to mitigate Big Data problems in several projects such as its Data and Software Preservation for Open Science (DASPOS), the Digging Into Data Challenge, EarthCube, Ideas Lab, Information Integration and Informatics, The Laser Interferometer Gravitational Wave Observatory (LIGO), and The Theoretical and Computational Astrophysics Networks (TCAN).
The NSA will also address compilation and analysis of Big Data sets for national security needs such as domestic intelligence and coordination with other agencies. There is also a plan to “explore the feasibility of conducting an online contest for developing data visualizations in the defense of massive computer networks.”
Finally, the United States Geological Survey will create eight new research projects for transforming Big Data sets and big ideas about earth science theories into scientific discoveries. This will be facilitated at the USGS John Wesley Powell Center for Analysis and Synthesis.
For many of the commentators in the Big Data community, this investment is a positive sign that the public sector is recognizing the benefits of researching methods and strategies in managing its data glut. The Obama Administration is one of the first executive offices to take this on with real concern.
As Gigaom.com has summarized, the Big Data Research and Development Initiative is just a part of a larger picture:
“The White House has also teamed with Amazon Web Services to make the 1,000 Genomes Project data freely available to genetic researchers. The data set weighs in at a whopping 200TB, and is a valuable source of data for researching gene-level causes and cures of certain diseases. Hosting it in the cloud is critical because without access to a super-high-speed network, [one] wouldn’t want to move 200TB of data across today’s broadband networks. While the data itself is free, though, researchers will have to pay for computing resources needed to analyze it.”
As private sector organizations around the world have already begun to leverage Big Data analytics to gain more insight from their data, the Obama Administration’s initiative wins on several fronts. First, it enhances the structure and functionality of government agencies and programs that have a “Big Data problem.” Second, it sends a signal to all public agencies that Big Data is an issue on the agenda of government and the public sector. Finally, the initiative sustains the perception of the President as the first tech-savvy Commander in Chief who has repeatedly stated that American supremacy is dependent on the pursuit technological innovation.