Recently the Financial Conduct Authority (FCA) explored the use of synthetic data in financial services. The plan, launched in March, focused on incumbents and startup companies and explored industry views on the potential for synthetic data to boost innovation in finance and the possible risks and limitations. Synthetic data refers to artificial data created via algorithms. One of the most infamous types of synthetic data is ‘deep fakes’, which produce artificial information. The technology is generated by studying patterns and the statistical properties of data and with algorithms creating these patterns within a synthetic dataset, replicating real-world information. The main advantage of this format, compared to real-world data, is that synthetic data utilises information without identifying specific people. As long as no person can be identified within the synthetic data, data-protection measures do not apply.
As companies focus more on data business strategies, the opportunities to use data analytics to generate more valuable insights based on business and customer data continue to rise. However, as more data is integrated within a company, the risk associated with data privacy controls required to manage personal information increases. In the finance industry, the bulk of customer data is considered very sensitive. This is where synthetic data can provide an opportunity for finance businesses. Synthetic data is a privacy-controlled system that fabricates information in a way that replicates various trends within ‘real’ data sets. The synthetic data can replace other real data sets to support insights gathered from synthesised data, protecting privacy rights that could be compromised within a real data set.
With many data analysis techniques, there is a potential risk that information can be connected to a person, but synthetic data does not carry this risk. In the finance industry, synthetic data is used as test data for new products, for model validation and AI training. The FCA has emphasised that many challenges of today’s AI industry are related to a lack of data, datasets being too small, or a lack of access without potentially breaching privacy rights. In a recent consultation, the FCA explained that historical data can often be biased and unrepresentative, and algorithms based on this information will replicate these biases. Synthetic data could provide a solution to these problems.
Aside from eliminating data privacy concerns, the technology can fill in specific gaps where data required is low or doesn’t exist. Synthetic information can be used to create realistic but uncommon scenarios, such as risk management within financial services.
Synthetic data could offer a solution to the challenges between emerging technologies and the barriers concerning what production data can be leveraged. Many financial businesses operate expensive processes to control the risk of privacy and data protection breaches.
When applied correctly, synthetic data for analytics eliminates the overall risk of a breach. Synthetic data represents a major mitigating factor in managing privacy risk. Detached from operational overheads, the marginal costs of analytics are reduced considerably, enabling companies to scale their analytical goals and accelerate innovation.
Synthetic data could enable further access to data across the finance industry by widening access to data assets with incumbents and new businesses. As reported by the FCA, data access on an individual basis is possible through consent processes, but developing new technologies requires broader access to large data sets.
A key barrier impacting the adoption of synthetic data relates to trust – questioning whether the data represents an accurate representation for generating valuable insights. There is an opportunity here for regulators to support and promote the integration of synthetic data through a transparent standardised framework. The FCA has shown an interest in possibly taking responsibility for being a synthetic data regulator to manage the potential challenges. Implementing an FCA-approved standard would enable businesses to take their data and create a synthetic dataset to apply to their projects. This approach would drive greater adoption of synthetic data, increasing trust in this information being representative, and regarding compliance, the risk is managed by ensuring synthetic data meets regulator-defined criteria.
Further collaboration with other regulators will also be critical to creating additional standards for producing synthetic data from a business’s information. Without this, wide-scale adoption would struggle as the investment to deliver specific synthetic datasets would require significant funding.