Machine learning (ML) has enabled a whole host of innovations and new business models in fintech, driving breakthroughs in areas such as personalized wealth management, automated fraud detection, and real-time small business accounting tools. For a long time, one of the most significant challenges of machine learning has been the amount and quality of data that is required to train machine learning models. Recent developments of Transformer architectures, however, have started to change this equation.
Transformer-based models like BERT (Bidirectional Encoder Representations from Transformers, developed at Google) and GPT (Generative Pre-Training, developed at OpenAI) have brought about the biggest changes in machine learning in recent years. These technologies were initially developed to process natural language data but are now creating exciting new opportunities across many applications, including fintech.
One of the main benefits of Transformer-based architectures is that these models consider context when analyzing text. Earlier approaches to turn text into vectors like Word2Vec were not able to distinguish between “bank” as a financial institution and “bank” as an edge of a river. By contrast, Transformer-based models are able to generate context-specific vectors. This ability to differentiate between concepts based on context drastically improves the quality of these models’ predictions.
Developing data-driven fintech products means dealing with high volumes of complex and, at times, unstructured data. Natural language processing mechanisms like classification and named entity recognition are crucial to turning disparate or unstructured transaction information into data sets that can be analyzed much more efficiently. Once processed, this data can be used for various applications. For example:
- Fraud detection: Classifying transactions represented by Transformer-based vectors as “fraud” or “non-fraud”
- Product recommendations: Comparing Transformer-based vectors representing product descriptions and calculating the similarity between those descriptions
- Semantic search: Retrieving search results by comparing vector representations of a natural language search query with vector representations of all searchable data
While these language processing techniques have been around for a while, Transformer models have made them much more accurate and efficient.
Transformer models bring the concept of transfer learning to natural language processing. Companies like Facebook or Google train large models to have a broad concept of language understanding. Many of those pre-trained models are available as open-source downloads, on platforms like TensorFlow Hub or HuggingFace. These pre-trained generic models can then be fine-tuned for any domain-specific application. Because these models already have a basic understanding of the world, fine-tuning requires much less training data than what would be required to train an entire model from scratch. This approach not only delivers better performance in tasks like classification or entity extractions. It also means that building proofs-of-concept of ML applications is much easier, reducing the risk of failure.
Transformer models contain an extensive amount of language knowledge. This comes at a cost: an immense number of parameters. For example, the full-size version of BERT contains around 340 million parameters, while recent GPT-3 models contain 175 billion parameters. These enormous amounts of model data create two major challenges:
- Domain-specific fine-tuning of such models requires access to GPUs or TPUs
- A large number of parameters also means that it takes more time to compute predictions. This increased latency can impede the viability of Transformer-based models in FinTech applications: Banking applications often require high throughput, which may require expensive GPU-based hardware, sometimes even running 24/7
Another easily underestimated pitfall is inherent bias in pre-trained models. Since those models are trained on collections based on real-world text (such as Wikipedia), they can reflect and perpetuate stereotypes expressed in these texts. The exact corpus used to pre-train large models is often not accessible, so it is often unclear if the initial corpus did or did not contain personal information or an underlying bias. This area has received a lot of attention recently, especially in the context of predictions that can have personal and financial consequences.
The mechanisms behind Transformer models are still being improved in several ways, for example in versatility: While the original BERT model was trained on two tasks (mask prediction and next sentence prediction), more recent versions are now focused on meta-learning: learning how to learn. Current models like GPT-3 are even able to understand brief instructions, like turning natural language sentences into SQL statements or Python code.
At the same time, model sizes are increasing drastically, to a point where only companies like OpenAI will be able to host such models. GPT-3, for example, is only available as a hosted API service. It is possible that highly evolved meta-learning models will become a reality, but they will most likely be marketed as paid APIs.
Transformer models can be powerful tools for improving existing fintech applications and creating new opportunities. But they also bring new challenges that require careful planning and considerations of bias and possible hidden costs.