Appendix E — Federated & Privacy-Preserving Recommendation

1. The idea: move the model, not the data

Federated learning inverts the usual flow. Instead of sending data to the model, you send the model to the data, and only model updates come back. One round of the standard protocol:

the server broadcasts the current global model \(w\) to a sample of client devices;
each client trains \(w\) for a few steps on its own local data (which never leaves the device), producing a local update;
clients send only the updated parameters / gradients back — never raw interactions;
the server aggregates the updates into a new global \(w\), and the loop repeats.

The raw data stays put; what travels is math. “Federated” is the apt word — many autonomous parties cooperate on a shared model under a coordinator, like a federation of states, without surrendering their local sovereignty (their data).

Figure E.1: **The federated-learning loop.** The server broadcasts the global model \(w\) to client devices; each client trains on its *own* data \(D_k\) (which never leaves the device) and returns only a parameter update \(\Delta_k\); the server combines them by **FedAvg** (§2). Across rounds a single shared model improves while the raw interactions stay distributed.

2. FedAvg: aggregate by a data-weighted average

The aggregation rule that started the field is Federated Averaging (FedAvg), McMahan et al. (2017): the new global model is the average of the clients’ local models, weighted by how much data each holds,

\[ w_{\text{global}} \;=\; \frac{\sum_k n_k\,w_k}{\sum_k n_k}, \]

where \(w_k\) is client \(k\)’s locally-updated model and \(n_k\) its number of local examples. The weighting matters: a client that trained on \(1000\) interactions should count more than one that saw \(5\).

Worked example — one FedAvg step. Two clients report a single parameter. Client 1 has \(n_1=30\) local examples and ends at \(w_1=0.8\); client 2 has \(n_2=10\) and ends at \(w_2=0.4\). The data-weighted global is \[ w_{\text{global}} = \frac{30(0.8) + 10(0.4)}{30+10} = \frac{24+4}{40} = \mathbf{0.7}. \] Note it lands at \(0.7\), pulled toward the data-rich client 1 — not the plain mean \(0.6\) you would get by ignoring the counts. And the server learned this without seeing a single raw interaction from either device: only \(w_1, w_2\) and the counts crossed the wire.

Figure E.2: **FedAvg is a data-weighted average (the §2 worked example).** Two clients report \(w_1{=}0.8\) (from \(n_1{=}30\) examples — large dot) and \(w_2{=}0.4\) (from \(n_2{=}10\) — small dot). A plain mean would sit at \(0.6\); weighting by data size pulls the global model to \(\mathbf{0.7}\), toward the data-rich client 1 — and no raw interaction ever leaves a device.

3. Federated recommendation — and its unique leak

Drop matrix factorization (Traditional Recommender Systems §5) into this loop and you get Federated Collaborative Filtering (FCF) (Ammad-ud-din et al., 2019): the item factors are the shared global model the server aggregates, while each user’s own factor vector stays on their device and is updated locally. A client computes gradients for the item factors from its private ratings and sends only those gradients up. The user’s taste vector — the most personal part — is never transmitted.

But recommendation has a leak generic federated learning does not, and it is worth seeing clearly: the very act of sending an update for item \(i\) reveals that the user interacted with item \(i\). The gradient is non-zero only for items the user touched, so the set of items you send updates for is itself private information. FedRec (Lin et al., 2021) addresses exactly this for explicit feedback with tricks like sampling extra “fake” items and hybrid filling of their ratings, so the server cannot tell the real interactions from the decoys. More generally, because raw gradients can still leak information, federated systems layer on secure aggregation (the server sees only the sum of updates, not any individual one) and differential privacy (calibrated noise added to updates) — at some cost in accuracy.

4. The catch, and where it pays off

Federation is not free, and a from-zero reader should weigh the honest costs:

Communication cost. Many rounds of sending models to and from millions of devices is expensive; FedAvg’s “several local steps per round” exists precisely to cut the number of rounds.
Statistical heterogeneity (non-IID data). Each user’s data is small and unrepresentative of the whole, so naive averaging can converge slowly or poorly — an active research problem.
Residual privacy risk. Updates alone can leak; secure aggregation and differential privacy are needed for real guarantees, and they trade off against accuracy.
An accuracy gap. A federated model typically trails a centralized one trained on the same data pooled — the price of privacy.

It pays off where the data is too sensitive or too regulated to centralize (health, finance, keyboard/typing models, on-device personalization) — the setting where the alternative is not “a slightly better central model” but no model at all. Like the RL appendix, this is a pointer: an adjacent paradigm the book’s collaborative-filtering through-line touches at the edge (it is matrix factorization and gradient descent, rearranged for privacy) rather than a core chapter.

5. Glossary

Term	Plain meaning
Federated learning	Train a shared model across devices that keep their raw data local; only updates are shared.
Client / server	Devices holding private data / the coordinator that aggregates their updates.
FedAvg	Aggregate by a data-size-weighted average of clients’ local models (McMahan et al., 2017).
FCF	Federated collaborative filtering: item factors are global; each user’s factor stays on-device.
The recsys leak	Sending an update for item \(i\) reveals the user touched \(i\); FedRec masks this with decoy items.
Secure aggregation	The server sees only the sum of client updates, never any individual one.
Differential privacy	Add calibrated noise to updates so no single user’s data can be inferred.
Non-IID data	Each client’s data is small and unrepresentative of the whole — the core difficulty of federation.

6. References

McMahan, B., Moore, E., Ramage, D., Hampson, S., & Agüera y Arcas, B. (2017). Communication-efficient learning of deep networks from decentralized data (FedAvg). In Proceedings of AISTATS (PMLR 54). arXiv:1602.05629
Ammad-ud-din, M., et al. (2019). Federated collaborative filtering for privacy-preserving personalized recommendation system. arXiv:1901.09888 (FCF; preprint).
Lin, G., Liang, F., Pan, W., & Ming, Z. (2021). FedRec: Federated recommendation with explicit feedback. IEEE Intelligent Systems, 36(5), 21–30. https://doi.org/10.1109/MIS.2020.3017205
Wang, L., et al. (2024). Horizontal federated recommender system: A survey. ACM Computing Surveys. https://doi.org/10.1145/3656165 (a current survey of the field, including privacy mechanisms).

Online sources verified June 2026.