Interview: Alejandra Litterio, Chief Research Officer, Eye Capital
Unlocking the potential of NLP for Finance
During the AI and Data Science in Trading Conference in New York, I had the honor to participate with outstanding speakers in the panel discussion about the latest developments in NLP.
In this opportunity, I’d like to share my personal insights about NLP −I prefer the term NLU (Natural Language Understanding) − in order to address three interrelated topics:
• Open Source NLU libraries and their value for financial applications
• An all purpose NLU Model
• Institutions’ investment in terms of resources to get started with NLU/NLP
It seems that when it comes to NLP or rather NLU, it all revolves around a decision making process. Pragmatically speaking, whether open source or proprietary libraries, there is a philosophical and politically –focused matter.
So, what is the primary reason for adopting Open Source technology in the financial arena? Cost effectiveness, customizability and communal validation seems to be the “correct” answer.
Among the most popular open source libraries for NLP you can find: the Natural Language Toolkit (NLTK), Gensim, spaCy, Flair and BERT. Choosing the right one for the enterprise level represents not only a challenge but also a deep understanding of the domain (financial services) and the type of data.
Testing and deployment having no need of getting a budgetary sign-off is not the unique advantage. Open-sourceness goes far beyond eliminating vendor lock-in while trusting on a community for support, it means personal development: data scientists would rather learn a foundational technology than one vendor’s proprietary system. Besides making your own software open source drives the state-of-the-art to encourage dialogue, collaboration and feedback.
Now talking about models: Is it possible to develop an all purpose NLP engine? Does source content matter?
Designing an all purpose NLU engine needs a profound analysis. Let’s think for a minute of humans as perfect machines, then their cognitive process no doubt will be able to produce an all purpose “engine” that could be adapted to any scenario, but there’s a catch: human beings have different approaches when confronting with the same piece of discourse or data, to then adjust it to each different contextual situation or scenario.
Needless to say, the main challenge considering different data sources has to do with what we call hermeneia and semiotics. And financial markets are clearly not an exception. And, of course, the source matters mainly for qualitative analysis of unstructured data when a holistic approach that combines quantitative and qualitative methods is essential.
Today the training of different NLU algorithms makes it possible to better understand and identify contexts: the crux here is "interpretation".
How big an investment is it in terms of resources to get started with NLP? How companies should go about it?
Without any doubt the answer or at least part of it relates to the magnitude of the type of project and the institution where it will be carried out. From my experience, anyone who wants to undertake a project from scratch needs a team of both developers and traditional linguists, or what I have defined as the MetaQuant Model since most of the time projects do not end well when considering the Quantamental side only.
Having a good balance in a multidisciplinary team implementing Open Source solutions is the ideal approach when looking for the best results, although it can be very laborious at first. Another aspect that I highly recommend is working with your own financial lexicons and with rules of discursive patterns created by linguists.
Nevertheless, when either implementation times are shorter or the R&D teams are not sufficiently entrenched, there are interesting alternatives to explore, such as IBM/Watson NLU. Naturally, this puts much at a stake in terms of institutional interests: Would a renowned company use a black box developed by a third-party? In any case I find it worth mentioning as an alternative to reduced costs and human resources, if that is the primary focus for the market players.