Case Study: PrecisionProfile Advances Healthcare Analytics with Improved Data Preparation

By on
Data Preparation

There’s one phrase that people never want to hear from their doctor: “I’m sorry, but you have cancer.”

According to the National Cancer Institute, an estimated 1,735,350 new cases of cancer will be diagnosed in the United States this year and 609,640 people will die from the disease. Fortunately, and despite these statistics, many types of cancer are very treatable and have high survival rates. But there’s an opportunity to further improve outcomes for patients with all types of cancer, and it is based on helping data scientists and doctors work smarter with big data.

That’s what PrecisionProfile is aiming to do. Cancer is driven by mutations in the human genome and different forms of cancer come from one or several mutations in a person’s genomic makeup. While that is a well-understood principle, it’s only recently become achievable due to reduction in cost to sequence a cancer tumor as the price is about $1,000 or less, according to Dave Parkhill, co-founder of PrecisionProfile. The increased popularity of sequencing a cancer tumor has resulted in a new source of big data that may hold unprecedented intelligence for the healthcare community. And new big data platforms from tech giants like Google, Facebook, and Twitter have made it possible to store and process large volumes of data. But transforming raw data into ready-to-use information remains a challenge.

PrecisionProfile takes on the job of collecting genomic and affiliated data and applying big data management technologies so that it’s possible to pull in genomic data in a variety of formats from various web-based, and internal and external clinic information systems. Once compiled, it can then be structured to be transformed into something useful for researchers to analyze and from which oncologists can draw conclusions.

Using PrecisionProfile’s Oncology Workbench platform, systems and molecular biologists can perform genomic or other -omic research on alterations driving diseases, such as cancer, or any genomic driven disease. Then, practicing oncologists can distill information on biomarkers, molecular diagnostics, and relevant treatment protocols, along with relevant clinical trial guidance. Pharmaceutical researchers can use the system, too, either looking for drug targets or managing genomic profiles and phenotypes for clinical trial purposes.

Aiming at Better Disease Treatment

In the case of cancer, once an oncologist confirms a specific diagnosis for a patient – non-small cell lung cancer (NSCLC), for example – PrecisionProfile’s Oncology Workbench helps the doctor in assessing what treatment option would be most appropriate to apply to that specific patient. This requires requesting a molecular profile (genomic sequence) of the patient’s tumor; ingesting the patient’s molecular diagnostic report; presenting that data with preloaded patient data sets identified as having similar genomic aberrations; and then combining that information with additional third-party data about known treatments and their outcomes for patients with genetically, phenotypically and molecular similar characteristics.

The concept of precision medicine is not new, but previously it would take weeks of effort, multiple technologies and teams of people to gather and structure the data for the whole process. According to PrecisionProfile, using its technology reduces that to just a few hours.

An important piece of the Oncology Workbench software is its use of Paxata’s Data Preparation solution: the Paxata Adaptive Information Platform.. The enterprise-grade information management platform supports Self-service Data Preparation for large data at scale and at speed and provides PrecisionProfile with the ability to quickly and efficiently separate the range of multiple data elements that most molecular biology tools tend to cram together in a single field, for example. More importantly, it allows PrecisionProfile to combine, clean, and shape Big Data so it can be useful for Analytics. Incorporating Paxata’s algorithms into the Oncology Workbench represents a big time-savings for molecular biology research analysts, who can spend upwards of 50 to 80 percent of their time wrangling data.

“They are spending that time taking various data sources, restructuring them, then putting them in a format to analyze them. With traditional tools, as little as 20 percent of their time is spent on analyzing the data to conduct research that will be applied to the tasks oncologists undertake,” Parkhill says. “There’s a real opportunity to help Data Scientists get over the drudgery of shaping data and to avoid writing JavaScripts to make sense of these enormous data sets.”

PrecisionProfile provides sophisticated interactive filtering capabilities and visual presentations that help oncologists more easily review, dice, and otherwise use the data results, too.

According to Parkhill, the solution isn’t trying to be as ambitious as IBM Watson’s Artificial Intelligence is in the healthcare space. When it comes to assessing and using Oncology Workbench’s collected and coherent data, “we leave treatment decisions in the hands of practicing oncologists,” he says.

“The goal is to give them all the information they need to make treatment decisions for any given patient. We want to impact the outcomes for cancer patients by helping oncologists save time, get more accurate information and come to quicker treatment plans.”

PrecisionProfile conducted an original pilot project with the University of Colorado, which involved merging a significant sample size of bladder cancer tumor data that had been sequenced so that mutations could be identified with data from the Cancer Genome Atlas. When originally combined, the dataset totaled about 18 terabytes but, after applying Paxata, PrecisionProfile was reduced the size down to a couple of hundred gigabytes, which made it much easier for researchers to work with. “The molecular biology community says that genomic data is going to be the big data problem as it can easily run into the petabyte scale. Needless to say, that swamps all other big data applications,” says Parkhill.

Where PrecisionProfile is Headed

On any given file type, Parkhill says, they save about two to three months in preparing it for use.

“This data preparation problem exists across the genomic field as there are very few standards and technologies are evolving at a rapid pace,” he says. “The whole process of analyzing someone’s genomic profile is software-driven with significant probability analysis buried deep in the process.”

From year to year there may be additions to the fields that are included in files or changes in the meaning of the field category, and formats change as that happens. “They let us absorb those changes pretty easily,” he says.

Moving forward, PrecisionProfile and Paxata are thinking ahead to self-service analytics capabilities for those doing cancer or genomic research.

“We will we get there 100 percent,” Parkhill says. “But right now, if we can cut out 50 to 60 percent of the data preparation process, that’s a huge win so that people can go back to the lab and do discovery and oncologists can see more patients and get treatment plans to them more quickly.”

Photo Credit:

Leave a Reply