Healthcare data is being collected at an unprecedented volume and speed (36% compound annual growth rate!). This opens up the door for applying advanced data-driven modeling techniques to exciting applications like reliable disease diagnostics and rapid drug discovery.
However, due to the complicated nature of the healthcare system, patient data is fragmented and distributed, usually across multiple organizations. Clinical institutions, public health agencies and pharmaceutical companies, cannot access the right data (either volume or diversity) to build high performing models. Centralizing data across organizations is too risky. Healthcare data is sensitive, and any risk of data leakage makes this a serious privacy issue (not to mention a serious violation of HIPAA, GDPR, and other health privacy regulations).
Federated learning, as a collaborative learning paradigm, has seen a surge in popularity as a promising framework for tackling the challenge of learning advanced models across siloed data. In this post, we will introduce several applications of federated learning in healthcare.
We will focus on three particular use-cases below, and discuss how federated learning is used to achieve the corresponding tasks.
Doctors diagnose diseases based on patterns they observe from various sources such as symptoms, physiological signals, medical images etc., and machine learning models have been developed to assist them in identifying patterns faster and more reliably. But such models usually require a high volume of training data, which often is not available to any individual institution.
Federated learning is becoming a popular solution to data limitations in diagnostics. Li et al., (2020) conducted a study on identifying autism spectrum disorders (ASD) using distributed time series of rs-fMRI brain imaging data. The data came from four different sites with the total number of patients ranging from 52 to 167. Correlation vectors were extracted as predictive features from the raw image data, and a multi-layer perceptron (MLP) was used to build a classifier to identify ASD from healthy controls.
There were many parameters to learn in this model. Data from any single site was not sufficient to produce an accurate result due to the limited number of patients available. The study compared a federated model trained with data across four sites in a collaborative manner, with separate models trained within a single site. Results (you can see the details in Table 3 of the original paper) showed that the best single-site model achieved an accuracy of 0.695, while the federated model achieved 0.849 on the same test data, which was a significant improvement.
One interesting point this paper discussed was the issue of data heterogeneity across sites, called "domain shift" by the authors, which raises barriers to collaboration. The authors proposed techniques to combat domain shift, enabling data sources to not be completely homogenous, and opening up new collaboration opportunities. Applications of federated learning in this area will greatly improve the accuracy and efficiency of disease diagnostics.
Understanding various properties of compounds, such as the absorption, distribution, metabolism and excretion (ADME), is critical in the early stages of drug discovery. Obtaining this information from chemical or biological experiments is expensive and inefficient, so pharmaceutical companies are using a technique called Quantitative structure-activity relationship (QSAR) analysis, which builds predictive models to predict properties of compounds with inputs like theoretical molecular descriptors.
Due to the complex nature of compounds, QSAR models are often complex and high-dimensional, requiring large volumes of training data. However, collecting training data is cost intensive and time consuming. Any collaboration across institutes has been limited by concerns sharing IP about compounds (e.g., proprietary structures) and other trust issues. Chen et al., (2020) investigated using federated learning as the key to unlock collaboration among entities to build QSAR models jointly. The experiment results (see Figure 1 below) showed a significant improvement in prediction performance of the federated model over any single-client model trained with private data. Federated learning opens up new opportunities for collaboration in the pharmaceutical industry, and enables researchers to discover groundbreaking drugs that they could not have discovered before.
Quality of Care
Hospital mortality ratio is often considered an important measure to improve patient safety and the quality of care. There is an urgent need to understand which factors have an impact on patient mortality, to help doctors deliver more tailored care and treatments. Identifying which factors affect mortality requires learning from large and diverse patient populations, but most hospitals only have access to their own patient data, which limits their ability to create accurate predictions.
Vaid et al., (2020) built federated learning models to predict mortality in patients diagnosed with COVID-19 within seven days of hospital admission using electronic health records (EHR) data. The study considered data from COVID-19 positive patients derived from Epic EHR systems of five hospitals within the Mount Sinai Health System (MSHS) in New York City (NYC), including patient demographics, past medical history, and admission vitals and labs (e.g., heart rate, respiration rate, glucose). Two types of models were used to predict mortality: a multilayer perceptron (MLP) and logistic regression with L1- regularization.
The authors compared the performance of federated models with that of local models, measured by AUC-ROC, and observed that federated models outperformed local models at most sites. Federated learning makes it possible to deliver safer, more effective, and more personalized care by equipping healthcare providers with a better understanding of risk factors that impact treatment decisions.
There are so many more examples of ways to use federated learning in healthcare applications. Here are a few more of our favorite papers that discuss some of these use cases:
- The future of digital health with federated learning by Rieke, Nicola, et al. (2020)
- Federated learning for smart healthcare: A survey by Nguyen, Dinh C., et al. (2021)
- Federated learning for healthcare informatics by Xu, Jie, et al. (2021)
Integrate.ai makes it easy and safe for healthcare organizations to train models across sites. Unlock groundbreaking health products and insights that improve patient outcomes, while protecting patient privacy. Click here to learn more about integrate.ai, and sign up for a 30-day free trial.
 Chen, Shaoqi, et al. "FL-QSAR: a federated learning-based QSAR prototype for collaborative drug discovery." Bioinformatics (2020): 5492-5498.
 Li, Xiaoxiao, et al. "Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results." Medical Image Analysis (2020): 101765.
 Nguyen, Dinh C., et al. "Federated learning for smart healthcare: A survey." arXiv preprint arXiv:2111.08834 (2021).
 Rieke, Nicola, et al. "The future of digital health with federated learning." NPJ digital medicine (2020): 1-7.
 Vaid, Akhil, et al. "Federated learning of electronic health records improves mortality prediction in patients hospitalized with COVID-19." medRxiv (2020).
 Xu, Jie, et al. "Federated learning for healthcare informatics." Journal of Healthcare Informatics Research (2021): 1-19.