[Editor's Note: This guest post was provided by Tom Ilube, Managing Director, Callcredit Consumer Markets, and includes additional contributions from Rob Styles, Principal Technical Consultant, Callcredit)]
If you intend to introduce Big Data and Linked Data approaches to your company you may wish to learn from the way pygmies hunt elephants.
Forty years ago my father took me on safari in remote parts of Kenya. We ended up slightly off the beaten track, in a pygmy village. They were very welcoming and one evening as we sat around chatting some of the young men told us how they hunt elephants. Pygmies, I am reliably informed, approach this formidable task in three easy steps.
Step 1: A team of two pygmies roll around in elephant dung, so that they can sneak up on said elephant without being detected;
Step 2: Pygmy A (let us call him “Joe”) climbs on Pygmy B’s (“Fred”) shoulders, underneath the elephant (“Nelly”). Joe uses a short, sharp spear to attack the soft underbelly and aim directly for Nelly’s heart. Then they run. Fast.
Step 3: Joe and Fred return to the village, carrying an ear each, and are received as heroes as the whole village feasts for weeks.
Mind you, if Joe and Fred return covered in dung and without ears then they are given pretty short shrift by their fellow village, let me tell you! There is also the minor risk of being stomped on. But otherwise, it’s a foolproof plan.
This is precisely the approach that Rob Styles (Pygmy A) and I (Pygmy B) are taking in introducing Big Data concepts at Callcredit, the UK credit reference agency. Well, perhaps not precisely. But let’s see how far I can push this ridiculous analogy.
The first point is that introducing Big Data must be done from the inside, by people that “smell” like the company. They need to really understand the company and be part of it, not approach it as outsiders trying to foist clever new ideas on it, otherwise they risk being stomped on! So when as an organisation we decided that Big Data was important to our future, I sought out one of the UK’s most experienced practitioners and asked him to join as a full-time member of staff. I wanted an expert on the team who could immerse himself in the culture and dynamics of the company.
The second point is that our expert with the short, sharp spear (Pygmy A or “Rob”) stands on the shoulders of two very influential executives within the organisation i.e. Pygmy B (me) and the Chief Operating Officer. There is no ambiguity about the support and sponsorship he has. I often see “skunk-work” projects within companies that don’t have the right sponsorship. That’s a recipe for failure.
The third point is that we asked Rob to move very quickly. Often people think Big Data equals Big Budgets. They refuse to get started unless they have enough hardware to boil the ocean. Rubbish. Get started with whatever you can lay hands on. Within weeks of joining, Rob had assembled a platform that included all the key elements that we were interested in: Hadoop for processing data from multiple sources and spitting it out in a form we could work with; a graph store to allow us to do network analysis; a simple web based front end to let us explore the results; and as much data from within and outside the organisation as we could lay our hands on easily. We worked with a business owner on a specific problem that had a business case with immediate revenue potential. Within two months we were able to demonstrate a Big Data based solution to a real business problem to the executive team and win ongoing sponsorship and significant investment.
Point four is to be very focused on what your immediate target is. Get to the heart of things quickly. Find a business problem that has the characteristics that will enable a Big Data approach to make a real difference and find a business executive willing to say something like “if you can build what you claim, then I can sell £mm next year.” Don’t approach Big Data as an R&D project. It’s a real solution to a real world problem that someone out there will pay real money for.
We wanted a problem domain that involved combining structured data from within and across the company with unstructured data harvested from outside; where analysing this data in a graph format has clear benefits and where the ability to pre-process that data using our distributed Hadoop cluster illustrated the advantages and potential of that approach.
Over the past few years much of the focus of Big Data has been on document stores and key/value stores. Building applications on these often requires de-normalisation of the data to serve a specific application. More recently, though, we see renewed discussion of graphs and the power they bring. The combination of Big Data (lots of stuff) and Linked Data (graphs) gives us Big Linked Data. These large graphs are where we think much of this is heading as it allows us to use the horsepower of Big Data technologies without building data structures specific to an individual application.
That brings me to my final point. The elephant ears (gosh, this pygmy analogy is hard work!).
Our view is that the Big Data approach for an organisation will only deliver meaningful results when it brings home the twin benefits of giving you a graph based view across your multiple, heterogeneous datasets and delivering near instantaneous responses as a result of large scale, Hadoop-based pre-processing.
It takes a shift in mindset to appreciate the value of large-scale pre-processing. As software engineers we are trained to optimise resources. I started life as an assembler programmer on mainframe based airline reservation systems. My entire programme had to fit into a 4k block and sometimes you would resort to writing machine code rather than that “high-level” assembler stuff just to squeeze in another instruction or two. So the idea of simply pre-processing everything knowing that 95% of it is wasted effort is anathema to old software engineers like me. But the true Big Data mindset says go ahead and pre-process. Why? Because we can! And because if we have “all” the answers pre-processed then we can answer your queries instantaneously.
To conclude, introducing Big Data to your company requires an approach that involves
- A very small, agile, hands-on, expert team inside the company
- Executive level sponsorship from the start
- Very fast progress towards a working solution that solves a real business problem
- A mindset shift to graph based analysis and near-instantaneous response times enabled by large scale pre-processing
If you achieve this then you will be rewarded with Big Data elephant ears and the village will feast for weeks. If you don’t, you had better learn to run very fast before you get stomped on!
About the Authors
Tom Ilube is Managing Director of Callcredit Consumer Markets and founder of Noddle.co.uk, its new consumer initiative. In 2005, he founded Garlik, the venture-backed online identity company acquired by Experian in 2011, which made extensive use of linked data technologies to deliver services to hundreds of thousands of consumers. Prior to Garlik, Tom was Chief Information Officer of Egg plc, the UK's pioneering online bank.
Rob Styles is Principal Technical Consultant for Big Data at Callcredit Group. He has spent the past seven years working with Big Data and Linked Data technologies at Semantic Web firm Talis where he worked with clients including the BBC, UK and US governments and the British Library. Prior to Talis he worked with Tom as one of the technical leaders behind internet bank Egg.