Kristen Serafin, associate director at Financial Industry Regulatory Authority (FINRA) and Lizzie Westin, lead systems analyst at FINRA, speaking at DATAVERSITY® Enterprise Analytics Online Conference, shared how they were able to gain traction for a successful machine learning program. The presentation was titled Ushering in the Age of Machine Learning.
FINRA is a private, not-for-profit organization, responsible for regulating equities and options trading activity. Although FINRA reports to the federal Securities and Exchange Commission (SEC), it is not a government agency. “We’ve ushered an age of machine learning here at FINRA and we encourage all of you to evaluate whether it’s appropriate for you to do the same,” Serafin said. In March 2019 they had two active projects in their machine learning program, and seven months later they had ten.
Westin provided a brief overview of machine learning, drawing parallels between how children and machines learn. When parents take children to a zoo, she said, they point to an animal and say its name, essentially creating a ‘label.’ A child learns to label animals properly by being reinforced when they succeed, as well as by mis-identifying and being corrected, such as seeing an animal similar to a dog, and learning the name for fox. After repeated encounters with animals, a child eventually learns how to correctly label them without help.
Labels play an important role in supervised machine learning as well, and training the machine is part of the process. Distinguishing one animal from another and knowing its name (or ‘label’), occurs within the human mind without conscious focus. “We don’t have to verbalize details, such as explaining the shape of the tail, the eyes or the ears. Similarly, with machine learning, we provide labeled data to the computer.” With supervised machine learning, preset labeled examples are used for training, and during the validation process, the machine draws conclusions as to what label is appropriate. Without relying on formulas or algorithms, the machine then learns how to identify patterns, make its own decisions, and derive its own labels.
Three Components for Success
Serafin identified three steps contributing to the success of their program: Understanding the nature of the data, evaluating skills and tools, and building stakeholder engagement.
The Nature of the Data: Characteristics and Volume
Westin said that the characteristics and volume of the data influenced their approach to machine learning. FINRA has structured data, such as the date a stock is traded, price, and quantity of shares, as well as unstructured data from customer service inquiries, emails and calls. Volume is another consideration. In 2018, FINRA processed 66.7 billion records on average per day and this was a key driver in the move to machine learning as a potential solution. At that level, she said, “It’s not efficient to throw more analyst resources toward reviewing the growing volume. We need an innovative approach.” Although it’s not necessary to have data volumes of this size to successfully use machine learning, there must be enough data to create workable training and validation sets, she said.
The Nature of the Data: Quality and Context
Westin showed an image of a dog, obscured through a cloudy lens. “Machine learning can’t work optimally if the inputs are not clear.” If the data is damaged, incomplete or has other quality issues, such as in the image, she said, machine learning model outputs can be impacted. With market data, for example, if a data value is outside of an expected range, it could look like an anomaly when it’s actually a data issue. “You need to make sure your processes continue to include data validations, just like with anything else.” Understanding data context is just as critical. Pointing to a slide of a human being and a chicken, she asked what links existed between the two species:
“If a machine learning model is telling you that the number of legs is an important criterion, and your knowledge about this context suggests otherwise, question the process and maybe try a different approach.”
Evaluate Skills & Tools
Having the right people set up for success with the most appropriate tools is a critical component, said Serafin. It’s a myth that organizations need to hire highly skilled data scientists to start a machine learning program. “Chances are someone on your team already has a deep understanding of the data or the business context, or you may already have a very skilled data analyst or engineer.” To foster growth of machine learning skills, look to subject matter experts who already have a deep understanding of the data and business context; then evaluate existing resources against required skill sets to determine where new skills can be developed. Good data analysts or data engineers may be in a position to build or learn machine learning skills quickly.
Westin had an analyst on her team with coding skills and a deep understanding of the data who displayed an interest in machine learning. She freed him from some of his existing work, giving him time to take classes and start experimenting with machine learning. “It wasn’t overnight, and there was a lot of trial and error, but eventually we had our first data scientist.” The hardest part of the skill development process, she said, wasn’t finding someone who was interested in machine learning, nor was it trying to get funding: It was giving existing staff time and space. “As a manager, I had to continually remind myself that he needed to be freed up to experiment, which meant figuring out which deadlines could move and which ones couldn’t.” While developing staff resources and building the program, determine when to augment or fill gaps by partnering with vendors or by hiring staff.
Not everyone will need the same training. For best results, offer internal and external training opportunities, she said, which can be targeted towards specific audiences, such as analysts or developers, and involve your stakeholders when appropriate. Another potential internal candidate was an engineer who could identify features and create models, but she lacked the business context and an understanding of FINRA data. In this situation, external trainings were not the answer, but there was a building full of stakeholders next door, Serafin said:
“Getting your stakeholders involved opens up a whole new world. They can provide a ton of training and insights into not only the data, but also feature identification, feedback, and validation that your approach to the problem makes sense.”
Although effective tools are important, Westin said that tools are always secondary to skills. Serafin adds: “A misguided person with a good tool is not particularly helpful.” That said, even a skilled data scientist with an incompatible tool can quickly encounter roadblocks and issues. Keep in mind the nature, the volume, and the type of data when selecting the appropriate tool to use within your organization. If purchasing new tools is not an option, look for existing tools that can be utilized for machine learning. “You don’t need to spend top dollar on a shiny Cadillac when a used car will get you rolling to your next destination,” she said.
Build Stakeholder Engagement
Skills and tools alone are not enough, Serafin said. Stakeholder engagement is the paramount component for success. Some FINRA stakeholders already knew about machine learning, were excited to give it a try, and readily articulated additional benefits, which helped make a stronger case for funding a formalized program. Stakeholders suggested using machine learning for real-world problems, such as keeping up with the ever-changing stock market.
Another forum that provided a good environment for training and coaching is an annual ‘hack-a-thon’ event. The event became so popular that it was opened up to the entire organization and rebranded the ‘Create-a-thon.’ The 2018 theme, “AI-Ready,” encouraged participants to experiment with artificial intelligence (AI) and machine learning. “This provided a fantastic opportunity to obtain buy-in and sponsorship from the business by offering training and classes in all facets of machine learning leading up to the event.” This also provided a forum to demonstrate real-world applications, although Serafin said that a special event isn’t necessary. In 2018, more than 500 participants and 57 teams worked on six different business challenges. In 2019 the number of participants exceeded 600, generating a multitude of useful ideas, most with working prototypes, said Westin. Over a year later, Create-a-thon projects continue to be introduced into the R&D pipeline, continuing to support the flow of ideas and foster innovation.
Achievable, Valuable, and Transformative
“As your team generates more ideas and projects, you need to establish a system for prioritizing and assessing those projects,” said Serafin. Experiment with prototypes to solve existing problems through machine learning, selecting ideas that are achievable incrementally. For example, chat bots can save call center time spent on answering common questions. The human component isn’t removed, she said – it’s now just focused on more difficult problems. The Create-a-Thon evolved into a year-round formal R&D Analytics program, with a team that includes managers, data engineers, data scientists and subject matter experts. The team meets regularly to share ideas and to evaluate and prioritize proposals. FINRA looks for proposals that are achievable, valuable and have the potential to transform the company. It’s important to ensure that measures of success are in line with company values and goals.
Innovation Through Experimentation
Westin used the process of drilling for oil as a metaphor for how they innovate through experimentation. Management supports a trial and error approach and encourages staff to submit a multitude of quick, inexpensive ideas without worrying about risk in an experimental context. Similar to oil drilling, they first select a general area to explore, with the goal to quickly find the best place to concentrate their efforts. They may spend two or three months pursuing a particular path, but if the exploration is not going as planned, they quickly move on to the next one. If it’s promising, they explore further or scale it out more broadly, she said. “The R&D program fosters a culture of innovation and allows the organic growth of ideas alongside [our] regular projects.” Anyone in the organization can submit new ideas for discussion into this forum, which fosters collaborative experimentation company-wide.
Communities of Practice
Communities of practice are groups of people who share an interest in a subject and a desire to learn how to do it better as they interact regularly. At FINRA, these communities have evolved as a forum to share ideas and evaluate machine learning proposals. In addition to the R&D Analytics program and communities of practice, FINRA holds a weekly data science forum which provides a more technical, deep dive into a subject.
Determine Next Steps
Retrospectively, they identified additional best practices:
- Identify challenges unique to your particular business, such as implementing machine learning in a regulatory environment
- Establish criteria and use to assess and prioritize potential projects
- Determine economic viability
- Develop measures and tracking mechanisms for success, such as an R&D Analytics Program, and reassess project status periodically
Westin credits their effective communication and collaboration structure as key to the successful integration of machine learning at FINRA. Although it can’t solve every problem, Serafin adds, “machine learning has helped us solve some tough issues, so that we can focus on some even more difficult ones.”
Want to learn more about DATAVERSITY’s upcoming events? Check out our current lineup of online and face-to-face conferences here.
Here is the video of the Enterprise Analytics Online Presentation:
Image used under license from Shutterstock.com