In recent years, some of the biggest advances in diagnostic healthcare have come from leveraging the power of big data. Supplied with enough information, machine learning models can identify and diagnose many types of disease, sometimes with greater accuracy than experienced medical practitioners. This technology is powerful because it can overlay millions of data points, learn, and become increasingly accurate.
But until recently, there has been a dilemma. Machine learning models are hungry. They need a lot of data… and the more data they get, the more accurate they become. Patient data, however, is personal, sensitive and are subject to specific processing conditions. Thus, the healthcare sector needs to provide preventive measures for avoiding data breaches, complicating data alliances. So, potentially life-saving data are being siloed – with thousands of hospitals across the world protecting patient records to maintain confidentiality.
All of this is perfectly right for patient privacy. But the approach is preventing machine diagnosis from becoming more accurate and generating better outcomes. Which is where a concept called ‘federated learning’ comes in. It introduces a way for algorithms to be learnt – without the need to store the data in a centralised location.
The approach was initially proposed by Google to overcome data protection issues around machine learning for mobile devices. Google needed to feed its algorithms with enough data to improve functionality including speech recognition, text entry and photo selection. But the tech giant had to find a way to train machine learning models, without transmitting personal information like passwords, photos, messages and URLs.
Using federated learning, each client or device itself computes updates of the current global model and only encrypted updates are communicated to the server.
Handling the data in this way means that data protection laws – including the European Union’s General Data Protection Regulation (GDPR) is complied with. This is because all personal data stays on the device – and only masked results are flowed out to external servers. Further, there is no need to store the updates after they improved the current model.
In a hospital setting, federated learning allows hospitals to efficiently train machine learning models using data related to different patients from different hospitals. Each hospital locally computes model updates, which are encrypted, communicated, aggregated and fed back into the hospital’s respective models.
In short, a hospital gets to benefit from the data of many other hospitals – without breaking a single data privacy law.
Maintaining privacy and being able to learn from the data is clearly a huge advantage of this approach. But federated learning also suffers from the problem of data heterogeneity. This is where different organisations and institutions collect and manage data in different ways – which can make it difficult to compare one data set with another. In this case, we can apply transfer learning techniques to address the data heterogeneity problem and increase the scale of available data. In brief, transfer learning applies knowledge and skills that are learned in precursory tasks to novel related tasks.
For example, researchers already used transfer learning techniques to attempt to identify Alzheimer’s disease. They used a pre-trained machine learning model and fine-tuned their model by plugging in a dataset containing 3D MRI images. This allowed them to achieve accuracy rates that outperformed traditional approaches.
To give another example, suppose there exist two hospitals treating two different populations (e.g., older and younger citizens) and want to form a data alliance. As there is a difference between the domains of the patients between the two hospitals, it is possible that machine learning models will fail. Luckily, transfer learning techniques enable us to incorporate domain adaptations. Further, by using federated learning, models can be trained without potentially exposing sensitive data. In short, the combination of transfer and federated learning, which is called federated transfer learning (FTL), might offer a solution to our problem.
Since data regulations, like the GDPR, give more power to customers and less to organisations –organisations are incentivised to develop cybersecurity measures and improve data-protection awareness. As workarounds and manual processes for regulation compliance may become infeasible, organisations are obliged to develop smart ways to handle and learn from data in compliance with the regulations.
Federated learning has a huge potential to boost machine learning, in an environment where individual privacy is – quite rightly – heavily protected. This is true in medicine – and it is also true in a number of other fields.
In finance, it could be used to identify customer behaviour and fraud. In smart retail, it could improve understanding of shopper behaviour and needs. Indeed, federated learning could have a role to play in everything from education to edge computing – all while maintaining personal privacy.
The Centre for Financial Leadership and Digital Transformation conducts action-oriented research involving the finance function of tomorrow and also serves as a knowledge platform. Finance leaders who want to develop a competitive and effective finance department or accounting office, embrace the technology dimension, and stay up-to-date with the most recent technological developments impacting the finance function could benefit from our unique knowledge platform and research.