85% of data science projects will fail.
One of the top reasons for project failures is lack of access to the right data. This is a shame, because in most cases the right data exists, it just sits in data silos that data scientists cannot get access to. Federated learning solves problems with data access by enabling machine learning across data silos. But before investing in federated learning, you need to know when to use it.
Before we dive into thinking about when to use federated learning…
Let’s define what we mean by federated learning
In traditional machine learning, all data must be centralized in one database before training a model. In federated learning, models are trained on decentralized datasets - that is, the data resides in two or more separate databases and never needs to be moved. Portions of a machine learning model are trained where the data is located and model parameters are shared among participating datasets to produce an improved global model. Since no data moves within the system, organizations can train models while complying with privacy regulations, protecting data security and sensitive IP, and avoiding the pain and expense of moving data.
One important point to keep in mind is there are two broad categories of federated learning.
- Federated learning at the edge - Federated learning was originally developed by Google as a way to train models across Android phone users (specifically, for their keyboard predictions) without needing to collect user data in the cloud, and it continues to be used as a way to train models across networks of thousands of edge devices.
- Federated learning for data silos - More recently, federated learning has been used to train models across decentralized datasets. We think this category is particularly exciting. For machine learning projects at risk of failure because they don’t have access to the right data, this can be a lifeline, by unlocking new data sources.
For the sake of this article, we will be focusing on federated learning across data silos, not across edge devices.
Now that you understand what Federated Learning is, how can you, as somebody working on delivering value from data, think about whether investing in federated learning will accelerate your business goals?
5 steps to spot a federated learning use case
Look out for these five criteria to determine whether you have a project that will get value out of federated learning:
Step 1: You have a use case for machine learning
Think about anywhere your team has thought about using Machine Learning, or is already using Machine Learning today. If you’re in product or marketing, this could be for personalization; in consulting or software, facilitating data consortia across clients; or in healthcare, better patient outcomes forecasting.
Step 2: You can’t build a powerful enough machine learning model for your use case today
In some cases, your team may not be able to build any model. In others, they may be able to build a model, but it is not accurate enough to meet your business objectives. For a personalization engine, this could mean that your personalizations are not improving customer conversions; in data consortia, your model isn’t any more accurate than a model your clients could create on their own; and in healthcare, you haven’t reached your accuracy goal in forecasting patient outcomes.
Step 3: You can’t build that model because you don’t have enough data
Simply put, your team has squeezed as much value as possible out of available data. Model performance cannot improve unless there is more data for training the model.
Step 4: You know what data you would need to build a better model
Your data science team should have an idea about what data could improve the model. It may be that they need data about more individuals, or they need more data about individuals that are already in your data set. Some teams also have biased data sets (where their data is not representative of the population as a whole) and are looking for more diverse data.
Step 5: The data you need resides in data silos
Even though you know this data exists, it may not be possible for your team to access this data for machine learning. This is a result of data silos or data fragmentation, and they exist for several reasons:
- Regulatory - Regulations like GDPR, PIPEDA, CCPA, or HIPAA restrict the usage of personally identifiable information or sensitive health information PII outside of their intended use or jurisdiction
- Contractual - Companies may already be sharing data (for example, when you integrate a database with a SaaS tool), but they only give consent for that data to be used for a very specific purpose
- Trust - Many companies are unwilling to share their data, due to the cost of structuring and enforcing data sharing agreements, the risk of leaking sensitive information, or the risk of exposing intellectual property
- Technical - Particularly in large organizations or those that have a history of mergers, legacy systems and distributed data architecture makes it costly to centralize and maintain governance over data for machine learning
If your project meets all five of these criteria, then federated learning could mean new or better machine learning models, and new product and analytics opportunities.
Putting federated learning in context: helpful use cases
To help illustrate how federated learning is unlocking new opportunities that were never possible before, let’s look at a few examples of where federated learning is already being used today:
- Saving patient lives with better cancer diagnostics - Research hospitals across the globe rely on machine learning to identify cancer on MRI images, but due to healthcare privacy regulations, they aren’t allowed to combine their patients’ data to build more powerful machine learning models. With federated learning, they can take advantage of large, diverse, global datasets to train highly accurate machine learning models and improve their ability to identify cancer patients so that they can deliver life saving treatments.
- Facilitating consortiums between clients - Consulting companies and software companies have clients who lack sufficient data to build fraud risk models that rival the big banks, or personalization models that rival Amazon. With federated learning, these companies facilitate value exchange across their clients by building powerful global models and delivering offline reports, copies of the models, or access to APIs that deliver predictions in real time. The data remains on each client’s server, enabling clients to benefit from more powerful models without risking data security.
- Building trust to enhance franchisees’ marketing campaigns - Franchisees of a global brand use their data to train conversion models for marketing campaigns. Previous efforts to enter data partnerships to build more accurate models failed due to mistrust between the franchisees. With federated learning, franchisees can collaborate on data to build more powerful models without providing each other access to the raw data, enabling them to improve ad targeting and campaign ROI.
- Personalizing the customer experience in a new market - A meal kit subscription company relies on a personalized web experience to convert and retain customers. However, when entering a new geographic market, privacy regulations restrict them from using customer data from their other markets to build personalization models. With federated learning, the company can train a global model across geographies and deliver a personalized experience from the start, and then localize the model in the new market as they begin acquiring local customer data.
- Protecting against global financial fraud - Financial institutions lose trillions of dollars to fraud. They build sophisticated models to identify fraud as it happens, but the limitation of their own data restricts the power of their models, and fraudulent transactions fall through the cracks. Privacy regulations and a lack of trust between competing companies has prevented data collaboration. With federated learning, banks across the globe can collaborate to improve fraud detection, and stop more fraudulent transactions before they occur.
Now it’s your turn to think about how federated learning can help accelerate your business. Where could you unlock new opportunities, if only your company could train machine learning models on more data?
integrate.ai enables you to unlock the full value of your users' private, siloed, and distributed data to do machine learning and analytics. Click here to learn more and explore how we can partner together to build new data-driven products with federated learning and analytics.