Loading...
You are here:  Home  >  Data Education  >  BI / Data Science News, Articles, & Education  >  BI / Data Science Articles  >  Current Article

Risk Analysis: Is It Based On The Big Data Picture?

By   /  November 4, 2014  /  No Comments

risk analysis of big data x300by Jennifer Zaino

Businesses understand people by certain data points, which will formally, or in some cases informally, factor into determining whether they are a worthy credit risk, a safe driver, an appropriate job prospect, and so on. But companies – and therefore their risk analysis of any one individual – may be missing the bigger picture when they both limit the data they’re taking into consideration, and understand it at only a surface level.

That’s bad for the individual who is kept from getting a good deal on a loan, qualifying for a low-cost auto insurance policy, or being promoted into his dream position. But it’s bad for the company, too, because it may have eliminated a completely worthy prospect, one who would have proved an asset to its own business.

Things can get better both for people and the companies that want to do business with them when those enterprises go beyond the data and analysis they’re used to considering and conducting to inform their decisions. “You have to see the fundamentals, not just a person’s past track record,” says Paul Gu, co-founder of ex-Googler populated startup Upstart, which is doing just that as it helps people obtain unsecured funding through fixed rate loans.

“Looking at how someone has done in the past to know how good he will be in the future is an approach that works pretty well, yes. But the past is not the only indicator, nor is it a perfect indicator, of the future,” he says.

What’s the True Risk?

At Upstart, for example, the goal is to understand people’s true level of credit risk, and the process of qualifying people looking for loans – for pretty much anything – takes more than their FICO credit scores into consideration. FICO scores are based upon a person’s payment history, amounts owed, length of credit history, type of credit in use, and new credit that’s been obtained. While lenders can add to that mix, with data such as salary and employment history, Upstart says it’s looking at a slate of core variables, and how they connect to each other, to understand the true drivers of credit risk.

“The drivers are the fundamentals, which are the employability, income potential, and level of debt and personal responsibility,” Gu says.

What is the risk of an individual becoming unemployed, for example, based on her education and occupation in context with unemployment trends in that field? And if that risk is high, how will the other expenses and debt obligations she is under affect her ability to weather a short- or long-term loss of employment, while still being able to meet the new loan obligations?

Or, in another example, a person with a low college GPA who wants a low-interest loan would need to offset that with stronger employment, better track record on credit, or less debt. “Take two people working the same job and earning the same income; the one with the higher GPA is less likely to default on loan obligations,” according to Upstart’s analytics, Gu says. Roughly speaking, GPA is a more important predictor of defaulting or not defaulting on a loan than an SAT score, he says. The reason is that SAT scores tend to be more a measure of native intelligence relative to a GPA, which tends to be a measure of both intelligence and organizational ability, Gu explains. And, “for something like paying loans in a timely fashion, you care about organizational ability – not just raw ability.”

Upstart’s Ideas at Work in Any Industry

Other verticals also could benefit from a deeper understanding of an individual’s risk profile. The auto insurance industry, for example, tends to hold youth against young drivers. While it’s true that they are riskier on average, Gu says, “there might be more fundamental inputs indicating their level of responsibility” in other areas of their life that should be considered in pricing their policy. And those insurance companies that do so might turn that cautious and good young driver into a loyal lifelong customer.

The consumer financial services sector at large also is missing opportunities to better leverage data that will put them in good stead with the clients who will be great wealth creators in the future.

“They wait until they are well off before they engage them,” he says. “But if you use data to identify who you want as customers in five years a few years earlier, you would be in a much better position from a competition standpoint to build relationships with those people, and enjoy better business as a result.”

A lot of the principles Upstart employs, and indeed a lot of the work that it has done, can be extrapolated to other industries – consumer financial services in particular – to solve their own risk analysis challenges, Gu notes. What Upstart has done is spent more than two years collecting anonymized data from open government datasets, private licensed datasets, and mining and scraping the Web, and modeling this information, using “a lot of non-traditional methods to join it all together,” he says, to discover what patterns emerge among loan-takers and correlations across various data attributes. “It is a very data science intensive process.”

Specific user information to run against its models is collected from loan applicants themselves, with their full knowledge and consent, and is added to its datasets in an anonymous way to further inform Upstart’s data models. People come to Upstart, Gu says, because they know the company collects more data and uses the data it collects in creative ways, and that in turn can lead good prospects to being offered lower rates.

Getting Better All the Time

The Data Modeling, Gu says, improves every day. Machine Learning in pretty much all its forms – including the use of Monte Carlo probabilistic methods, linear, logistic and random regressions, and other categorization techniques – is at its core. As Upstart organizes more loans and as more loans are repaid, the company continues to collect more information and figure out more variables, building what Gu says is “the best dataset in the world for understanding the kind of credit risk we are trying to predict.”

An important part of the picture is cross validating all its models to prevent over fitting; that is, Upstart doesn’t want to end up creating models so specific that they cannot be extrapolated beyond the dataset of origin.

All the work takes place in the Cloud. For each applicant the system runs through thousands of different computations and simulations and theoretical outcomes. It currently takes about seven seconds to run all the algorithms, and work is ongoing to keep driving that down. “Speed is very important, and being able to get people information in real time is very important,” he says.

The company to date has taken applications from over 70,000 people and conducted over 1,000 loans, with close to $20 million of lending completed. And, Gu says, things have been pretty much spot on in terms of its analysis of which users should qualify for loans based on its models and which actually do.

This work in risk analysis is just the tip of the iceberg, though. Gu says he believes that “there are a lot more ways to creatively use data to solve problems for people in lending and elsewhere.”

You might also like...

Analytics Translator vs. Citizen Data Scientist: What is the Difference?

Read More →