Interview: Peter Hafez, Chief Data Scientist, RavenPack

04/12/2022

Hi Peter, thank you for joining us today for this interview we are interested to learn more about you and what you do at Ravenpack. 

Our research this year showed a lot more firms and practitioners talking about NLP than usual – why do you think this is? Where are you seeing the optimal utility for NLP and where does it have the potential to go?

Recent developments in (transformer-based) NLP technology have been driving more intense media attention, which has, in turn, elevated NLP to its next phase of adoption. Analyzing human language is hard, especially at scale. Large attention-based deep learning models like GPT-3 have shown great promise and will keep advancing as the technology evolves. There are still a lot of challenging problems yet to be solved, such as text summarization, trending theme detection and topic modeling, especially as the amount of textual data grows. And it has been growing exponentially on all fronts due to better data digitization practices, growing disclosure requirements and increasingly affordable storage. Previously inaccessible content is also being made available.

The techniques to exhaustively analyze all this data are also evolving, but are not there yet. But with each milestone, we’ll progressively be able to tap nuanced information related to sustainability, labor trends, innovation or geopolitical issues that drive corporate strategy and market narratives. For example, comparing sustainability narratives from different sources can allow us to identify new trends or detect greenwashing with greater confidence. Bottom-up analysis can inform company-specific trends on any of these fronts, while a top-down aggregated approach can be used in assessing global macro trends. We can combine the various sources and analytical approaches in countless combinations to ultimately derive better insights.


Don't miss new reports! Sign up for Quant Strats Newsletter


Alternative data is still considered a source of alpha for many – what roadblocks do firms tend to come across in sourcing, cleaning, and using this data? How do you view the alt data market at present?

Alt data is commonly referred to as the new oil. But building an infrastructure to process data efficiently and timely requires heavy investments, as data scientists usually spend at least 80% of their time scrubbing the data, and even more so when it comes to textual data. The alpha capture race has become particularly competitive as institutional investors are getting experienced, consuming more and more diverse sources (e.g. machine-readable news, SEC filings, transcripts, geolocation and satellite data), while excavating shorter-lived predictive signals. As a result, we have seen more and more pure players across the investment industry adopting a holistic and tech-oriented business model, integrating data science teams while investing heavily in artificial intelligence.

Sourcing content has become a full-time job. Many firms have invested heavily in building out Data Strategy teams that not only identifies interesting (alternative) data to consume, but also to evaluate or even improve the quality before the data is shown to internal clients. This means that Data Scientists are becoming directly involved in data strategy at a very early stage. This is especially true on the ESG scene, where most players have been adding dedicated ESG resources to their teams. There has also been a lot of consolidation in the market, through a very dynamic acquisition spree. The bigger alt data vendors are going more direct and looking to leverage their own brands and client relationships by acquiring the smaller players.

For alt-data vendors, it can be challenging to maintain a high quality data offering. To be relevant for clients and users, the data needs to be maintained in a point-in-time fashion, entities need to be tracked, captured and mapped as efficiently and painlessly as possible. The process of data collection needs to be extremely robust to ensure the reliability of timestamps and attached analytics.

There is also a misconception that alt-data needs to be about “new data”. This is not entirely true. The underlying data can be static, but the insights that we derive from it can change, as we develop better models and smarter ways to extract actionable signals from it.


ESG and sustainable investing is still a large topic, with many funds listening to investors’ demands and influencing their portfolio management. How do you see this progressing in the coming years?

Given the direction that the regulation is taking and the momentum in the adoption of ESG standards, we can expect that the sustainability movement will go truly mainstream. In a few years, it will be necessary for fund managers to have at least a degree of ESG offering, from a product perspective. We can also expect a strong push for more transparency in the way managers integrate ESG into their investment framework, and stricter reporting guidelines and requirements.


What are the tools currently available to consume ESG data points, how will they change in the future?

There are a lot of emerging datasets available in the data market, to serve a variety of applications. We can distinguish two different types of datasets available today:

The generalist datasets, like high level aggregated ratings. Typically analyst ratings. The specialist datasets that are more targeted and focus on one specific dimension of ESG.

At RavenPack, we have a specialized solution. Building on our 20 years of expertise in natural language processing (NLP) and news analytics, we have a unique ability to identify and track ESG controversies in the news.

We anticipate two sources of change in the near future:

Investors currently at the forefront of ESG are starting to experience the limitations of traditional ratings, and we believe that the trend in the adoption of new specialist datasets will accelerate. This means that investors will progressively move away from ratings and focus on fundamental data, targeted to specific use cases.

We can expect that the regulation will take over and make the disclosure of ESG data mandatory.

We have seen the premise of this in Europe, with the SFDR regulation setting strong requirements on how funds can be labeled as "ESG". For now, most of the pressure is carried by investors, but ultimately, we can expect that there will be legislation in place to make the corporate disclosure of ESG data mandatory.


What are the challenges that businesses have in measuring and analyzing data around ESG?

There is a general lack of definition and transparency on the concepts surrounding ESG. Data providers are using different frameworks that may not be consistent across datasets, which makes the comparison of ESG scores from different sources a challenging exercise. Moreover, most high level ratings are too broad in their scope and fail to capture the precise nuances and concepts. In addition, the more granular ratings do not have the transparency required. They are mostly analyst ratings, which can be biased and mostly based on opinions.

Today, most fundamental ESG data points are voluntary disclosures from companies, and the numbers reported may be calculated using inconsistent frameworks. Reported numbers may not be comparable apples-to-apples from one company to the next, and reported numbers for the same company may not be consistent over time. Mapping can also be challenging. Most ESG data sources are semi-structured, and attaching data points to a useful system of tickers is always problematic.


Any concluding remarks?

Sustainability-focused investors are at a turning point. Sustainability, by definition, is concerned with the future and the long-term effects of our actions. NLP will enable practitioners to identify trends in the ESG world that otherwise may have been missed. And it can help us move beyond traditional short-term market analysis to a broader understanding of their effects on society at large. Investors will need to make the transition from qualitative indicators to measurable data points, and using NLP is a good way to start. True success will lie in getting a handle on what's contained in ESG reports, what's going on with companies, and if they're actually delivering on their ESG goals. More and more textual data is being generated on corporate ESG practices, from corporate sustainability disclosures, to reports from NGOs, sell-side analysts, or industry experts. A lot of work is going into trying to find ways to make sense of the data that is coming from these diverse sources. And we think that this trend will only continue, as the potential benefits of incorporating NLP into ESG investing become more clear. 


Thank you Peter for joining us today and answering our questions, we are looking forward to hearing more at Quant Strats New York. 

Did you know Peter is talking at Quant Strats event on May 5th? Don't miss him on the keynote panel 'Identifying unexplored sources of alpha within underutilized assets for increased performance and returns' where he will be exploring the technology, tools and application of quantitative investing within digital assets and other alternative assets alongside Sameer Gupta, Head of Data Solutions, Point 72; Gilbert Haddad, Head of Investment Decision Science, Fidelity Investments; Vlasios Voudouris, Chief Data Officer, Argus Media; and Carlos Gomez, Chief Investment Officer, Belobaba Crypto Asset Fund. Find out more by downloading the agenda today. 


Subscribe to Our Free Newsletter

Thanks for signing up! We'll let you know whenever a new article is published.

We respect your privacy, by clicking 'Submit' you will receive our e-newsletter, including information on Webinars, event discounts, online learning opportunities and agree to our User Agreement. You have the right to object. For further information on how we process and monitor your personal data, and information about your privacy and opt-out rights, click here.