Dr. Cindi Thompson considers Data Scientists to be raccoons: “Raccoons are both smart and dangerous, and that’s kind of the way we think about Data Scientists.” Speaking at the DATAVERSITY® Enterprise Analytics Online 2017 Conference, in her presentation titled “How to Get the Most Value from Data Science Teams” Thompson said, “Raccoons have that desire to get beneath the surface,” to keep digging until the problem is solved. She talked about the care and feeding of Data Science teams, and how to get business value by ensuring that Data Science is tied strategic business goals.
Start with Communication
Thompson stressed that the starting point for building a team of Data Scientists should always be questions that reflect your strategic business goals, such as attracting new customers, targeting VIP customers, or automating processes. “You want to be able to get your stakeholders, your peers, and decision makers on board with what you’re doing and be able to argue for that return on investment.” To ensure that Data Science is successful, she said, consider these questions:
- What will drive an optimal out come and what incentives will there be?
- How does the Data Science team work with stakeholders? With the engineering team?
- How are investments in infrastructure prioritized, approved, funded, and managed?
- How are costs allocated and benefits understood?
- How will business, legal, IT and data teams operate without creating unacceptable risk?
“No matter which approach you decide to take in structuring and growing your Data Science capabilities, you really have to think a lot about communication, priority settings, and expectation management.”
Structuring the Data Science Team
The organizational structure of the Data Science team will vary based on the size of the company, how many varied business functions there are, geographical distribution, and company culture, but there are common factors to consider when integrating Data Scientists into a larger data-driven organization, she said. Team structure can range from a centralized model to a highly distributed model. A centralized model is sometimes called a ‘shared’ model, or a ‘center of excellence,’ where all the scientists are in the same location working together. This model encourages collaboration and cohesiveness of the Data Science team, allowing them to, “Bounce ideas off of each other and get that quick help for questions with the coding work,” she said.
In the center of the spectrum is the ‘hub-and-spoke model,’ where scientists are in a centrally located team, but team members are embedded as needed in different business units, similar to “a temporary consultancy within your company.” Those Data Scientists benefit from spending time with the business unit and gaining that knowledge, she said.
As organizations get larger, they tend to choose a distributed or decentralized model, where the business units themselves hire their own Data Scientists, allowing them to be close to managers, engineers, and stakeholders. “So the Data Scientist’s job then is to learn the domain language and the problem set that the group they’re working with faces every day,” she said, adding that there are many variations on these models.
An Agile Approach to Problem Solving
There is a common belief that insights are gained after a series of established stages, starting with business goals and ending with deployment, but in reality, the process is more iterative than linear, she said. “You go back to those earlier phases later in your project because you’re reassessing the information that you’ve discovered along the way.” Thompson used a street map analogy to illustrate how the routes on a journey from one place to another will change based on feedback, such as closed roads or high traffic. “The agility that we want is to be able to respond to the feedback that we get as we’re trying to maneuver through that landscape.”An Agile Approach works similarly in a Data Science project – when one path to a goal isn’t fruitful, another path is tried. “So we’re agile in how we solve the problem but we’re intentional in the actual problem being solved and in how we understand success.”
Thompson broke her approach into four levels of granularity:
- Charter: Defining the project, staying focused on business goals
- Investigation Themes: Finding themes tied to the business goals, such as segmenting customers or predicting product failure, which are carried out in time-boxed sprints
- Epics: Defining the hypothesis and the methods used to test
- Stories: Identifying specific tasks and activities that validate themes – or not – and then feeding back into satisfying the charter
“Once you complete a sprint, you want to be able to talk about what happened in that sprint with your client, and your stakeholder, and we do that in retrospective.” This is where next steps for the project are discussed and all parties reach an understanding of value created so far.
Five steps of the retrospective process:
- Discuss highlights of the process to date
- Review stories from the current sprint, including whether or not it was completed
- Provide a demo of the work produced so far – Thompson called this “the core of the retrospective”
- Discuss lessons learned
- Formulate recommendations for the next sprint
These tasks are often split into a ‘sprint review’ and a ‘sprint retrospective,’ but Thompson finds this combined method saves meeting time.
Hiring and managing Data Scientists
Thompson cautions that the perfect Data Scientist doesn’t exist, so the focus should be on finding someone with the ability to solve the problems you’re working on, especially if you have a smaller team. “You can’t just hire the perfect unicorn Data Scientist and hope that they’re going to do all the things that you need,” she said. It’s better to hire someone with the specific skills needed to meet your organizational objectives. Consider whether your need is more for ad hoc data analysis or product development. Companies that have a greater need for ad hoc data insights should look for scientists with an ability to communicate well with business. On the other end of the spectrum, if product development is more important, look for strong software engineering skills. “Identifying what side of that spectrum you’re on can inform the types of skills that you’re looking for.”
Critical Skills and Mindsets
Although the first instinct is often to create a checklist of needed tools or technical skills, she said that the actual critical skills are outside that realm, and are more like mindsets than skills:
- Communication: “No matter where you are, you’re going to need communication skills”
- Ability to learn quickly, due to the rapidly changing nature of Data Science
- Ability to translate a problem from business language into a hypothesis: ”The business knows they have a problem but they can’t quite articulate what they’re going after, or how Data Science might help them.”
- Impactful vs. interesting: “We’re recognizing that a lot of PhDs and technical people can get stuck in doing things that are interesting, but they might not actually be impactful to the problem they’re trying to solve,” so look for Data Scientists who can resist that temptation.
- Intellectual curiosity: An ability to drill down a problem into a clear set of hypotheses that can be tested.
- Experimentation mindset: “You need to be willing to fail and not get trapped in assumptions along the way.”
- Agility: As previously discussed
- Attention for detailed technical work
Additional skills, which she called ‘the more the merrier skills,’ such as Machine Learning and programming can add flexibility to the team. “You don’t need someone who’s an all-star in all of the categories here. You’re really looking for a Data Scientist with a subset of skills that meet your needs.” If you hire Data Scientists who are able to learn, it’s not critical to hire for a specific tool or language; however, “If your whole team is working with the same tools and platforms and evolving together, it’s better to have that consistency once they’re hired.”
The Well-Rounded Interview
Thompson recommends that engineers, key people from the business, stakeholders, and other Data Scientists be involved in on-site interviews. Many organizations now use a take-home or in-house problem for applicants to solve as an assessment tool. Not all qualified applicants will be able or willing to do this. In that case, she recommends having them walk through a technical project they’ve completed or provide a portfolio of their work.
To assess a candidate’s communication skills, she convenes a team of solution architects, data engineers and someone from the business side, and the candidate is asked how they would approach a scenario using a typical problem.
Care and Feeding of Data Scientists
Once you have a team assembled, “Giving them that support for their professional development and lifelong learning is really important,” she said. Many Data Scientists have an academic mindset and a willingness to experiment, but in the pursuit of a perfect solution, they may become ‘lost in the weeds,’ she said, so it’s essential to check in with them. Stay connected, but also allow enough autonomy so that they can to continue to publish, contribute to open source, or pursue other meaningful activities in their field, as well. Make your support visible.
“The end goal for all of this is that it’s all in the service of value to the business.”
Check out Enterprise Analytics Online at http://eanalyticsonline.com/
Here is the video of the Enterprise Analytics Online 2017 Presentation:
Photo Credit: kentoh/Shutterstock.com