Recommender Systems from Zero

A Hand-Checkable Path from Linear Algebra to Graph- and LLM-Based Collaborative Filtering

A free, hand-checkable, from-zero handbook on recommender systems — a vertical path from linear algebra and calculus through losses, neural networks and the Transformer, to graph- and LLM-based collaborative filtering (LightGCN, contrastive SSL, the spectral view, RLMRec), plus how to train and serve a recommender. Every worked example is reproducible by hand. Open access, CC BY 4.0.

Authors

Affiliations

Duong Tan Nghia

Hanoi University of Science and Technology (HUST), School of EEE

Minastik Technology JSC — Chief AI Officer

Tran Manh Hoang

Hanoi University of Science and Technology (HUST), School of EEE

Published

June 28, 2026

Doi

10.5281/zenodo.20952963

Preface

Read free online — download & cite on Zenodo

You are reading the live web edition, chapter by chapter; new versions land here first. To download the full book (PDF) and cite it, use the archived release on Zenodo — doi.org/10.5281/zenodo.20952963 (CC BY 4.0).

Welcome

This is a handbook about recommender systems — the technology that decides which movie, product, song, or article to put in front of you — written for someone starting from zero. Not “zero recommender-systems knowledge”. Zero machine learning. If you remember a little high-school algebra and you are willing to follow a worked example with a calculator, you have everything you need to begin. Every other idea — a vector, a derivative, a probability, a neural network, an embedding — is built up in front of you, in order, with a picture and a number you can check yourself.

You are here: the Preface (front matter). It tells you who the book is for, the two promises it keeps, and how to read it. The chain of chapters it introduces:

Who this handbook is for

You are the reader if you are any of these:

a student who wants to understand how modern recommenders really work, not just run a library;
a working engineer pivoting into recommendation, who needs the foundations and the current state of the art in one place;
an aspiring researcher who wants the unbroken through-line from the math to the LLM-and-graph frontier where the field is moving now.

What you do not need: a degree in math, prior machine-learning experience, or a powerful computer. The worked examples are deliberately tiny — three users, three movies, two-dimensional vectors — so you can check every step by hand. The goal is understanding you can reconstruct, not facts you have to trust.

A promise to the complete beginner. Nowhere in this book will a symbol, a term, or a technique be used before it has been explained. If you ever meet one that wasn’t, that is a bug in the book, not a gap in you.

The two promises

Every chapter keeps the same two promises. They are the whole method.

Promise 1 — intuition first, and nothing unexplained. A new idea is introduced with a plain-language intuition and, wherever it helps, why it carries its name (the history of a name is often the fastest way to remember what it does — you will learn why a sigmoid is “S-shaped”, why a Transformer “transforms”, why Long Short-Term Memory is a short memory that lasts). Then every formula is decoded component by component — what each symbol is and what it does — so an equation is never a wall of notation.

Promise 2 — everything is hand-checkable. Each concept carries one fully worked numerical example, small enough to verify with a calculator, and every number in it was computed with code before it was written down. A worked example with a wrong arithmetic step is worse than none, so the book treats its own numbers as claims to be checked.

Together these serve one aim that matters most for a beginner: understanding that sticks in memory, because you saw the intuition, watched the formula come apart into meaning, and reproduced the number yourself.

How the handbook is organized

The book climbs a ladder in three tiers (eight parts). Each tier is a prerequisite for the next; within Tier 1, the three math primers are independent siblings you can read in any order; and the three recommender views of Tier 3 — sequence, graph, and LLM — each build on the Traditional/Evaluation foundation rather than on each other.

Figure 1: three independent siblings – read in any order

Figure. The ladder. Master a tier and the next one has no unexplained prerequisites.

Part I · Mathematical Foundations — the language. Linear algebra (vectors, the dot product = a similarity score), calculus (the derivative, the gradient, how a model learns by descending it), and probability & statistics (distributions, likelihood, Bayes, how we test whether a result is real).

Part II · Machine-Learning Fundamentals — the machine. What a loss function is and how a model is trained to minimize it; neural networks and back-propagation; and representation learning — how a discrete thing (a word, a movie) becomes a vector, up through the Transformer and the LLM.

Parts III–VIII · Recommender Systems — the destination. Traditional recommenders (similarity, matrix factorization) and how we evaluate them; the sequence view (next-item prediction — Markov chains and FPMC, GRU4Rec, and the self-attentive SASRec and BERT4Rec); the graph view (LightGCN, self-supervised/contrastive learning, the spectral/filter lens); the LLM era — how large language models fuse with the collaborative signal; training and serving a recommender end to end; and a bounded Frontiers part at the field’s edges — deep-CTR ranking, bandits and online learning, with reinforcement-learning and federated/privacy pointers.

The recommender families, at a glance. One idea recurs in every chapter — turn things into vectors, score by comparing them, learn the vectors — but it takes many forms. Here is the map of the families this book builds, in reading order (the what-kind companion to the prerequisite ladder above):

Family	Core idea	Built in
Content-based	match an item’s features to a user’s profile	Traditional §2
Memory-based CF (\(k\)-NN)	“users like you also liked…” — score by similarity	Traditional §4
Matrix factorization	learn user/item embeddings; score by their dot product	Traditional §5
Sequential / session	predict the next item from the order of past ones	Sequential & Session-Based
Graph CF	propagate embeddings over the user–item graph	From Graphs to LightGCN · SSL · Spectral
LLM-augmented	fuse a language model’s world knowledge with the collaborative signal	LLM × RecSys
Deep CTR (ranking)	predict a click from rich feature interactions	Click-Through Rate Prediction
Bandits / online	choose what to show and learn from the feedback	Bandits & Online Recommendation

…and how they assemble into a real system. A deployed recommender is two-stage: a cheap retrieval step narrows millions of items to a few hundred, then a richer ranking step orders those. The families above slot into one stage or the other — which is why the book builds the embedding (retrieval) models first and the feature-rich (ranking) models later:

Figure. The two-stage funnel of a real recommender. Cheap retrieval narrows the catalogue to a few hundred candidates with a fast embedding dot-product (the models of Traditional Recommender Systems and From Graphs to LightGCN); a richer ranking model then orders them (Click-Through Rate Prediction). The two stages are assembled end-to-end in Training and Serving a Recommender.

How to read a chapter

Every chapter has the same shape, so you always know where you are:

The body — concepts built bottom-up, each defined the first time it appears (and, where a name has a real origin, why it is called that), each formula decoded symbol by symbol, and each carrying a small worked example you can check by hand.
A glossary — one plain-language line per term the chapter introduces.
References — every external work web-verified, in author–year (Harvard) form.
“Where this fits in the book” — a short closing map of how the chapter connects to the ones before and after it.
A one-sentence summary and a “Next:” pointer to the following chapter.

Two practical notes. Each chapter is a folder holding a .md (the source you can read or edit on screen) and a .pdf (typeset, for reading or printing) — they are kept in lockstep. And the navigation banner is identical in every chapter, so the map never changes under you.

How to actually study it. Read with a pen. When a worked example appears, do the arithmetic yourself before reading the answer. When a formula is decoded, cover the decode and try to name each symbol’s job first. The book is built so that this works — that is the entire point of keeping the numbers tiny.

The one idea that recurs

If you remember nothing else, remember this — it is the spine that connects all sixteen chapters:

Turn each thing into a vector whose location carries its meaning; score two things by comparing their vectors (a dot product); learn those vectors so the comparisons match reality.

A user is a vector. A movie is a vector. “Will this user like this movie?” is, at heart, how aligned are their two vectors — a single dot product (Part I). Every method in the book is a more powerful way to learn those vectors: from counting co-occurrences, to matrix factorization, to word2vec, to graph propagation (LightGCN), to a self-supervised second view, to an LLM’s semantic view. Same skeleton; better and better embeddings.

What this handbook is not

To stay deep where it matters, the book stays narrow on purpose. It is not a comprehensive AI/ML textbook: it does not cover computer vision or general natural-language processing beyond the representation-learning lineage that recommenders actually use. Within recommendation itself it follows one through-line — collaborative filtering, from its classical form to graph- and LLM-based models — and does not try to survey every branch of the field. Many adjacent topics the through-line genuinely reaches get real treatment: factorization machines (the bridge from matrix factorization to feature-rich models), the bias / fairness / feedback-loop questions, a full chapter on sequential / session-based recommendation (Markov chains and FPMC, GRU4Rec, the self-attentive SASRec and BERT4Rec), and a full chapter on training and serving a recommender (negative sampling, the training loop, the retrieval-then-ranking funnel). A deliberately bounded Frontiers part then carries the field’s edges at calibrated depth — not a chapter for everything, but the right amount for each: full chapters on deep-CTR (wide-and-deep, DeepFM, DCN — the ranking-stage workhorse) and bandits / online recommendation (explore–exploit, UCB, Thompson sampling); a deepened treatment of generative recommendation (semantic IDs, RQ-VAE) inside the LLM chapter; and thin appendix pointers for reinforcement learning and federated / privacy-preserving recommendation — the framing and the key works, not a full course. The rest stays deliberately out of scope, with a one-line pointer where the path meets it rather than a chapter of its own: causal / counterfactual and context-aware / conversational recommendation, and knowledge-graph, multi-stakeholder, and large-scale MLOps settings. A focused ladder you can finish beats an encyclopedia you cannot.

A note from the author

My own road in began in 2016, at the Kyushu Institute of Technology in Japan, where I worked on implementing computer-vision algorithms on Altera FPGA boards. I fell for the field completely — which was not a small decision. Before Japan I had spent years on fingerprint recognition and had real results to show for it; choosing AI/ML meant setting that hard-won expertise aside and starting over from zero, a genuinely risky move for someone about to begin a PhD.

I made the leap anyway. Back in Vietnam I pursued a PhD in AI/ML, straight into real difficulty. The material was not just scarce and scattered — it was often inconsistent, different sources using different notation and telling the same story in incompatible ways. Worse, the mathematical foundation was rarely explained with any intuition or worked example, so the symbols stayed opaque and the why went missing. And at that time few people around me knew the field or could support a direction this new and this challenging, so I learned largely alone, getting stuck for days on ideas that felt like they should be simple.

But I was never entirely on my own. Beyond my formal PhD supervision, one person has been the steadiest champion of the whole journey — Tran Manh Hoang, the second author of this book. He has guided and encouraged me for more than fifteen years — ever since I stood up to present as a final-year undergraduate. He held no official title over my thesis; he simply never stopped pushing me to go further, and he opened doors I could never have opened myself, introducing me to close friends of his who are among the world’s leading AI researchers and practitioners. Their generosity, and his belief in me, changed what I thought was possible. That his name now stands beside mine on this cover is no formality: it marks a road we have walked together for a very long time.

Whenever I found a blog, course, or paper that was genuinely intuitive — one that made an idea visible with a good picture, an honest worked example, or an explanation of why a thing was true rather than only that it was — I felt a small jolt of relief, and I saved it. Those rare finds did more than teach me; they kept me going. But the collection only grew, and the harder, quieter work became synthesizing those scattered gems into something coherent I could actually reuse, or relearn from later — a patience and dedication that, on a frustrating day, quietly tempts a capable learner to give up. A single clear diagram could dissolve a week of confusion; the lack of one could cost a month.

This handbook is my attempt to be the resource I kept searching for — and to do that synthesis for the reader. I wrote it from two chairs at once: as a lecturer and researcher at the School of Electrical and Electronic Engineering, Hanoi University of Science and Technology (HUST), where I care that nothing is hand-waved and a curious beginner can rebuild every claim; and as the AI lead at Minastik, where I care that it reaches what is actually used in production today — honestly, trade-offs and all. It puts the fundamentals and the newest trend on one consistent, unbroken staircase, and every chapter leads with intuition, decodes each formula symbol by symbol, and carries a number you can check by hand — because those are the things that once made the difference for me.

It is, in short, the explanation I wish someone had handed me when I started.

Welcome aboard. Turn the page and let’s build the first vector.

— Duong Tan Nghia, Hanoi · 2026