Diagnosis Fraud: How Data Science Can Diagnose Healthcare Fraud and Identify Dirty Doctors

By on

Click to learn more about authors Gaurav Deshpande and Todd Blaschka.

Six doctors and seven pharmacists were among those charged in a Texas healthcare fraud and opioid takedown. As part of the investigation, officials uncovered Medicare fraud schemes and networks of “pill mill” clinics. The result: $66 million in losses and the distribution of 6.2 million pills, according to the Department of Justice.

Eleven physicians were charged with illegally dispensing opioids though pill mills in a targeted opioid crackdown in Appalachia. The  crackdown exposed the over prescription of controlled substances that totaled more than 17 million pills.

A California doctor was charged with murder in the deaths of four patients after he allegedly overprescribed opioids and narcotics.

A federal judge in Virginia sentenced a former doctor to 40 years in prison for opioid-related crimes, including 859 counts of writing illegal prescriptions.

Behind the Headlines: Healthcare Fraud, Abuse

The stories about dirty doctors and massive opioid pill trafficking networks are all too familiar today. Healthcare-related fraud costs the U.S. about $68 billion annually and accounts for up to 10 percent (and rising) of total healthcare spending. Some perspective: Americans spent $3.65 trillion on healthcare in 2018, and this number is expected to climb to $6 trillion by 2027. Fraud is a major contributing factor to escalating healthcare costs.

Is there a way to prevent or preempt healthcare-related fraud, or at least make a dent in these numbers? 

Enter the combination of data science, machine learning (ML), and graph databases. These technologies can fuel risk assessment for healthcare providers while illustrating the relationships among patients, providers, and doctors. Discovering fraudulent activities requires finding patterns in the data and integrating data from silos across the organization — this includes patient, hospital, and physician data. In other words, advanced analytics enable us to discover any hidden connections that exist within the datasets. 

The Devil is in the Details and the Data

As it stands, most tools for storing and analyzing healthcare data are built on relational databases. These databases store the data for each entity (member, prescriber, claim, or facility) in separate tables or databases. However, to understand the relationships among members, prescribers, facilities, and the claims connecting them to each other, all of these tables or databases must be joined together. Relational databases fall short in identifying patterns among disparate datasets (similar treatment, similar providers) that may help detect patterns. Graph databases, however, are up to the challenge; these databases model the data the way we naturally think, connecting information to understand the context.

Good Doctor vs. Bad Doctor

If we look at graph databases in action, we can see how questionable relationships and fraudulent patterns are flagged. We can generate graph-based ML features for a “good doctor” (low risk) and a “bad doctor” (high risk) and use these features to train the ML model to look for these profiles within huge healthcare datasets. 

For example, we can drill down on a feature called “stable group for routine ICD codes.” The stable group includes ICD (International Classification of Diseases) code groups that are billed frequently in claims for the provider over a period of time. The low-risk provider has a stable group with one or more ICD code groups. If a particular provider’s specialty is “emergency medicine,” they are likely to see members with various healthcare conditions. This results in multiple ICD codes. But this is justified, as their specialty isn’t associated with a specific ICD code group. However, if a doctor is routinely generating claims for multiple ICD codes that are unrelated to their specialty (such as a podiatrist generating claims for “respiratory system” and “nervous system” code groups), they are a high-risk provider or bad doctor. 

Another key graph-computed feature called “cost of care” compares the cost of patient-prescribed medications, tests, and procedures with the average cost for treating a medical condition among similar members within a referral network. A low-risk prescriber (good doctor) has an average or below-average cost, while a high-risk prescriber (bad doctor) will have a higher-than-average cost for the prescribed medications, tests, and procedures. 

This treatment plan “cross-comparison” is difficult to perform with traditional relational database solutions, as it requires extensive mapping. The member journey starts with a prescriber (such as a general practitioner), then goes to other providers (such as a substance abuse treatment center for opioid addiction), and then produces a set of follow-on claims for medication, tests, and behavioral therapy. This detailed mapping requires computationally expensive joins across massive tables in a relational database that connects patient, claim, doctor, pharmacy, and treatment center data.

Native graph databases can simply traverse from the patient to claims, doctors, pharmacies, and treatment centers to easily map out the journey. After plotting out the member journey (with necessary treatment center visits and medication claims), the graph-based solution calculates the cost of care for each member. The native graph database solution finds similar patients and calculates the average cost of care for opioid addiction treatment for each patient population.  Dr. UptoNoGood is revealed as a bad doctor because his cost of care for opioid addiction is 180 percent more than the area average, for example. 

The Proof Is in the Patterns

Graph databases also can identify potential collusion among providers and facilities with a graph-computed feature called “potential undeclared prescriber-facility relationships.”

Here, we dig into undisclosed connections among providers and pharmacies as well as substance abuse facilities. The low-risk good doctor does not have such connections while a high-risk bad doctor has undisclosed connections. Deep link analysis, or the ability to look across multiple data connections, can help find a doctor who is referring a large number of patients to a specific opioid addiction treatment center. This deep analysis could uncover that one of the previous addresses for the administrator of the substance abuse treatment center is the same as the address for the doctor referring patients to that treatment center. Finding this hidden relationship between the doctor and the administrator requires a deep link query with eight hops (or connections) across claims data for the doctor visits as well as for the opioid treatment center by the same patients. Only then can we see that Dr. UptoNoGood shares an address with the owner of “New Day Opioid Treatment Facility.” 

AI, ML, and Graph for Good

We often read about the transformative power of AI and ML when it comes to the clinical side of healthcare (and even detecting hard-to-diagnose heart defects on MRI scans). These same technologies, -along with graph databases, can be applied to the operational side to “diagnose” fraud, abuse, and collusion. These potentially deadly data patterns – much like an actual medical condition – can then be addressed or even prevented before they cause additional harm.

Think of it as AI, ML, and graph for good by taking a proactive and preemptive approach to identifying patterns and eradicating illegal and irresponsible activities. Then, we will see fewer headlines about doctor arrests, opioid pill mills, and surging healthcare costs. At the same time, we can ensure patients get the care they need, when, where, and how they need it.

We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept