February 01, 2022 · Federated Learning

Federated learning: useful, slow, and not for most of you

Bhaskar Paratey
Bhaskar Paratey
CEO & Founder
Federated learning: useful, slow, and not for most of you

Federated learning is a clever idea that gets reached for by the wrong people about half the time. The idea: train a model by learning from a thousand devices without the raw data ever leaving any of them. When you have the problem it solves, nothing else will do. When you don't, you've just signed up for a distributed-systems headache to impress someone with the word "federated."

The mechanism is simpler than the marketing. The cloud holds a master model. Each device works out how that master would need to change to better fit its own local data — an update, not the data itself — and sends only the update up. The cloud averages thousands of these, applies the result, and ships the improved master back down. The raw data stays put. The owner of the fleet learns from everyone and sees no one.

That's worth something in four situations, and you should be able to point at one of them before you start.

Privacy, the obvious case. A model can learn everyone's keyboard patterns without anyone's typing leaving their phone. That's how your autocorrect improves.

Regulation, the underrated case. GDPR, HIPAA and their cousins make it genuinely hard to move certain data across organisational lines. Federated learning sidesteps the whole problem because nobody moves anything.

Rare events that need many sources. If you need a model good at a rare medical condition or an unusual industrial fault, you need examples from many sites. This pools the learning without pooling the data.

Collaboration between rivals. Two hospitals, two banks, two competitors who can't share raw records can still jointly improve a model. Fraud detection across banks is the standard example.

In the wild it's real, not vapour: Gboard predicts your next word from a model trained across millions of phones; Apple improves speech recognition the same way; research consortia train diagnostic models across hospitals that legally can't share patient records.

Now the parts vendors skip, and these are the ones that decide whether you should touch it.

Privacy is not automatic. A model update can, in the wrong hands, leak information about the data behind it. Serious deployments pair federated learning with differential privacy, which adds calibrated noise, or secure aggregation, which encrypts updates so only the average is ever visible. If someone's selling you federated learning with neither, they either don't understand it or hope you don't.

It's slower. Convergence drags because you're averaging updates from a messy, uneven fleet of devices on flaky connections. Centralised training on the same data would finish far sooner.

Some devices lie. Train across user hardware and a fraction of it is hostile. Defending against poisoned updates is an open research problem, not a checkbox.

Debugging is genuinely painful. When a federated model misbehaves you can't open the data and look. You infer from updates and from metrics you had the foresight to design. Teams underestimate this cost every time, without exception.

A middle path I've come to like: federated fine-tuning. Train the base model centrally on public or licensed data, then let each organisation fine-tune it on their own data, federally, without sending any of it back. General knowledge in the cloud, specific knowledge at the edge. It's a sensible division of labour and it sidesteps a lot of the misery above.

So before you commit, answer three questions honestly. Do you have a real data-residency or privacy constraint, or do you just like the word? Can you accept slower, messier training as the price of not moving the data? Do you have the engineering maturity to run a distributed system in production? Three yeses and federated learning may be exactly the right shape for your problem. Any no, and you almost certainly want centralised training with access controls you actually enforce.

Bhaskar Paratey
Bhaskar Paratey
CEO & Founder

Bhaskar founded Partech Systems after three decades of building software that had to work the first time — newsroom systems at Reuters, case-management for government departments, and a long run of enterprise projects since. He started the company because he was tired of watching good technology fail for boring, human reasons. He writes here about where AI actually earns its keep, and where it doesn't.