Recommender Systems from Zero
A Hand-Checkable Path from Linear Algebra to Graph- and LLM-Based Collaborative Filtering
A free, hand-checkable, from-zero handbook on recommender systems — a vertical path from linear algebra and calculus through losses, neural networks and the Transformer, to graph- and LLM-based collaborative filtering (LightGCN, contrastive SSL, the spectral view, RLMRec), plus how to train and serve a recommender. Every worked example is reproducible by hand. Open access, CC BY 4.0.
Preface
You are reading the live web edition, chapter by chapter; new versions land here first. To download the full book (PDF) and cite it, use the archived release on Zenodo — doi.org/10.5281/zenodo.20952963 (CC BY 4.0).
Welcome
This is a handbook about recommender systems — the technology that decides which movie, product, song, or article to put in front of you — written for someone starting from zero. Not “zero recommender-systems knowledge”. Zero machine learning. If you remember a little high-school algebra and you are willing to follow a worked example with a calculator, you have everything you need to begin. Every other idea — a vector, a derivative, a probability, a neural network, an embedding — is built up in front of you, in order, with a picture and a number you can check yourself.
You are here: the Preface (front matter). It tells you who the book is for, the two promises it keeps, and how to read it. The chain of chapters it introduces:
Who this handbook is for
You are the reader if you are any of these:
- a student who wants to understand how modern recommenders really work, not just run a library;
- a working engineer pivoting into recommendation, who needs the foundations and the current state of the art in one place;
- an aspiring researcher who wants the unbroken through-line from the math to the LLM-and-graph frontier where the field is moving now.
What you do not need: a degree in math, prior machine-learning experience, or a powerful computer. The worked examples are deliberately tiny — three users, three movies, two-dimensional vectors — so you can check every step by hand. The goal is understanding you can reconstruct, not facts you have to trust.
A promise to the complete beginner. Nowhere in this book will a symbol, a term, or a technique be used before it has been explained. If you ever meet one that wasn’t, that is a bug in the book, not a gap in you.
The two promises
Every chapter keeps the same two promises. They are the whole method.
Promise 1 — intuition first, and nothing unexplained. A new idea is introduced with a plain-language intuition and, wherever it helps, why it carries its name (the history of a name is often the fastest way to remember what it does — you will learn why a sigmoid is “S-shaped”, why a Transformer “transforms”, why Long Short-Term Memory is a short memory that lasts). Then every formula is decoded component by component — what each symbol is and what it does — so an equation is never a wall of notation.
Promise 2 — everything is hand-checkable. Each concept carries one fully worked numerical example, small enough to verify with a calculator, and every number in it was computed with code before it was written down. A worked example with a wrong arithmetic step is worse than none, so the book treats its own numbers as claims to be checked.
Together these serve one aim that matters most for a beginner: understanding that sticks in memory, because you saw the intuition, watched the formula come apart into meaning, and reproduced the number yourself.
How the handbook is organized
The book climbs a ladder in three tiers (eight parts). Each tier is a prerequisite for the next; within Tier 1, the three math primers are independent siblings you can read in any order; and the three recommender views of Tier 3 — sequence, graph, and LLM — each build on the Traditional/Evaluation foundation rather than on each other.
Figure. The ladder. Master a tier and the next one has no unexplained prerequisites.
Part I · Mathematical Foundations — the language. Linear algebra (vectors, the dot product = a similarity score), calculus (the derivative, the gradient, how a model learns by descending it), and probability & statistics (distributions, likelihood, Bayes, how we test whether a result is real).
Part II · Machine-Learning Fundamentals — the machine. What a loss function is and how a model is trained to minimize it; neural networks and back-propagation; and representation learning — how a discrete thing (a word, a movie) becomes a vector, up through the Transformer and the LLM.
Parts III–VIII · Recommender Systems — the destination. Traditional recommenders (similarity, matrix factorization) and how we evaluate them; the sequence view (next-item prediction — Markov chains and FPMC, GRU4Rec, and the self-attentive SASRec and BERT4Rec); the graph view (LightGCN, self-supervised/contrastive learning, the spectral/filter lens); the LLM era — how large language models fuse with the collaborative signal; training and serving a recommender end to end; and a bounded Frontiers part at the field’s edges — deep-CTR ranking, bandits and online learning, with reinforcement-learning and federated/privacy pointers.
The recommender families, at a glance. One idea recurs in every chapter — turn things into vectors, score by comparing them, learn the vectors — but it takes many forms. Here is the map of the families this book builds, in reading order (the what-kind companion to the prerequisite ladder above):
| Family | Core idea | Built in |
|---|---|---|
| Content-based | match an item’s features to a user’s profile | Traditional §2 |
| Memory-based CF (\(k\)-NN) | “users like you also liked…” — score by similarity | Traditional §4 |
| Matrix factorization | learn user/item embeddings; score by their dot product | Traditional §5 |
| Sequential / session | predict the next item from the order of past ones | Sequential & Session-Based |
| Graph CF | propagate embeddings over the user–item graph | From Graphs to LightGCN · SSL · Spectral |
| LLM-augmented | fuse a language model’s world knowledge with the collaborative signal | LLM × RecSys |
| Deep CTR (ranking) | predict a click from rich feature interactions | Click-Through Rate Prediction |
| Bandits / online | choose what to show and learn from the feedback | Bandits & Online Recommendation |
…and how they assemble into a real system. A deployed recommender is two-stage: a cheap retrieval step narrows millions of items to a few hundred, then a richer ranking step orders those. The families above slot into one stage or the other — which is why the book builds the embedding (retrieval) models first and the feature-rich (ranking) models later:
Figure. The two-stage funnel of a real recommender. Cheap retrieval narrows the catalogue to a few hundred candidates with a fast embedding dot-product (the models of Traditional Recommender Systems and From Graphs to LightGCN); a richer ranking model then orders them (Click-Through Rate Prediction). The two stages are assembled end-to-end in Training and Serving a Recommender.
How to read a chapter
Every chapter has the same shape, so you always know where you are:
- The body — concepts built bottom-up, each defined the first time it appears (and, where a name has a real origin, why it is called that), each formula decoded symbol by symbol, and each carrying a small worked example you can check by hand.
- A glossary — one plain-language line per term the chapter introduces.
- References — every external work web-verified, in author–year (Harvard) form.
- “Where this fits in the book” — a short closing map of how the chapter connects to the ones before and after it.
- A one-sentence summary and a “Next:” pointer to the following chapter.
Two practical notes. Each chapter is a folder holding a .md (the source you
can read or edit on screen) and a .pdf (typeset, for reading or printing) —
they are kept in lockstep. And the navigation banner is identical in every chapter, so the
map never changes under you.
How to actually study it. Read with a pen. When a worked example appears, do the arithmetic yourself before reading the answer. When a formula is decoded, cover the decode and try to name each symbol’s job first. The book is built so that this works — that is the entire point of keeping the numbers tiny.
The one idea that recurs
If you remember nothing else, remember this — it is the spine that connects all sixteen chapters:
Turn each thing into a vector whose location carries its meaning; score two things by comparing their vectors (a dot product); learn those vectors so the comparisons match reality.
A user is a vector. A movie is a vector. “Will this user like this movie?” is, at heart, how aligned are their two vectors — a single dot product (Part I). Every method in the book is a more powerful way to learn those vectors: from counting co-occurrences, to matrix factorization, to word2vec, to graph propagation (LightGCN), to a self-supervised second view, to an LLM’s semantic view. Same skeleton; better and better embeddings.
What this handbook is not
To stay deep where it matters, the book stays narrow on purpose. It is not a comprehensive AI/ML textbook: it does not cover computer vision or general natural-language processing beyond the representation-learning lineage that recommenders actually use. Within recommendation itself it follows one through-line — collaborative filtering, from its classical form to graph- and LLM-based models — and does not try to survey every branch of the field. Many adjacent topics the through-line genuinely reaches get real treatment: factorization machines (the bridge from matrix factorization to feature-rich models), the bias / fairness / feedback-loop questions, a full chapter on sequential / session-based recommendation (Markov chains and FPMC, GRU4Rec, the self-attentive SASRec and BERT4Rec), and a full chapter on training and serving a recommender (negative sampling, the training loop, the retrieval-then-ranking funnel). A deliberately bounded Frontiers part then carries the field’s edges at calibrated depth — not a chapter for everything, but the right amount for each: full chapters on deep-CTR (wide-and-deep, DeepFM, DCN — the ranking-stage workhorse) and bandits / online recommendation (explore–exploit, UCB, Thompson sampling); a deepened treatment of generative recommendation (semantic IDs, RQ-VAE) inside the LLM chapter; and thin appendix pointers for reinforcement learning and federated / privacy-preserving recommendation — the framing and the key works, not a full course. The rest stays deliberately out of scope, with a one-line pointer where the path meets it rather than a chapter of its own: causal / counterfactual and context-aware / conversational recommendation, and knowledge-graph, multi-stakeholder, and large-scale MLOps settings. A focused ladder you can finish beats an encyclopedia you cannot.