Data lineage is an increasingly critical capability in data management. What do treasurers need to know as they move closer to the role of data scientist? We ask an expert.
Data lineage pinpoints the origins of data, what happens to it during processing and how it evolves over time. For Mark Hermeling, CTO of Asset Control (a provider of data management software solutions to banks, broker-dealers, hedge-funds and investment managers), having an ability to track and visualise data lineage is increasingly important in managing regulatory compliance and financial reporting across a wide range of sectors.
“Data lineage is becoming hard-wired in regulations and in data quality frameworks,” notes Hermeling. “Ultimately this is all related to the need for ‘explainability’. If a bank, for example, values a position at US$25m, it might need to explain why it is valued at that amount, how it came to that decision and what data points it used in arriving at that valuation. All this context and more may need to be tracked.”
Treasury transparency
Corporate treasurers and finance staff have a major role to play in ensuring that the process of reporting and consolidating financial data is consistently transparent and explainable – and ultimately the ability of the business to achieve visibility of its data will be key to this.
However, says Hermeling, treasury departments and other finance operations can also use data lineage and the related concept of ‘explainability’ in other contexts. When changes are made to existing processes, for example, data lineage can be used in diagnostics to improve data quality. It can also be key for data licensing. If an organisation is licensing third-party content and therefore has to abide by the associated restrictions, it will need to know what data is a derivative of another piece of data.
Finally, data lineage can be key in achieving better management of client records and ensuring greater care is taken about where client data is moved to and used moving forwards. “After all,” adds Hermeling, “if a business client, under GDPR, demands that their data be expunged from the records, the organisation will need to know where all of that customer’s details have ended up in order to be able to achieve this”.
Implementation essentials
To do all this efficiently and well, organisations will effectively need to implement two different kinds of data lineage: horizontal and vertical.
Horizontal data lineage traces the journey of a piece of data as it moves through the system from source to destination. It effectively tracks that journey of a specific item of data – typically across systems and reports.
Vertical data lineage, in contrast, describes the transformations that happen to a piece of data on that journey. It could be an element feeding into a calculation: one of the sources of a bond curve calculation, for example. And the lineage in this case would be to ‘go back’ from the bond curve and see what individual bonds formed part of the input at a specific point in time.
“In short, horizontal data lineage traces data back to the original source, while vertical data lineage reverse-engineers the transformations that happen along the way, whether they are simple processes like cross-referencing, or tracking the different taxonomies that exist for financial instruments or industry classifications.”
Often, in order to compare like for like, the organisation might want to express an issuer or counterparty within the same taxonomy. So, for example, if one taxonomy labels a segment ‘IT’ and another called the same segment ‘computer systems’, it might want to ensure that the same label was used for both.
Meeting the challenge
“The specific challenge in terms of the organisation’s ability to reverse engineer is that it will need to keep track both of its input data sources and their value at the time the transformation took place,” says Hermeling. This includes all the calculation parameters that fed into the calculation and their value at the time it was done, and the algorithm that was used.
Horizontal data lineage is much more focused on the process of keeping track of the data that the business has consumed, where it subsequently went and who touched it on its journey. The objective is to trace the journey of the data upstream while vertical data lineage involves the ability to reverse-engineer the manipulations that happened to the data in the past.
“Most enterprises and their treasury departments may know that their audit trail is broken within a specific application,” comments Hermeling, “but they don’t typically have an overarching view or the ability to follow data around on its journey across the organisation or to efficiently document it.”
To address the data lineage challenge, firms have a need for ‘bi-temporality’ so they know the value of the data and associated business logic at the time the calculation took place. They need to be able to track metadata and keep cross-reference tables between different taxonomies and classification schemes up-to-date but they also need a clear administrative process, detailing who can access data, where does it go, where did they get it from and what their sources were. Moreover, they also need a sourcing hierarchy, so they have the process of looking at data sources clearly documented and accessible by everybody who needs it.
“It’s a complex undertaking overall,” says Hermeling, “but those organisations, and, more specifically, treasury departments that understand the requirement and can put the right combination of processes and technology in place to support it will be best placed both to meet data lineage requirements and gain competitive edge”.