AI Study Notebook AI-generated
The TeXbook
Donald Knuth
On this page
The TeXbook — Chapter-by-Chapter Outline
Author: Donald E. Knuth First published: 1984 (Addison-Wesley; Volume A of the Computers and Typesetting series) Edition covered: Standard edition (21st printing, revised 1992; the text has been stable since 1986 — Knuth has frozen TeX at version π and the book at version e, so all printings after the first major revision are essentially identical in content)
Central thesis
TeX is a typesetting system that treats the production of a document as a precise mathematical problem: given a set of glyphs, measurements, and aesthetic constraints, find the globally optimal arrangement of characters, words, and lines on the page. Knuth argues that high-quality typography requires a computer to make hundreds of micro-decisions simultaneously — breaking paragraphs into lines, breaking lines into pages, placing mathematical symbols, stretching and shrinking spaces — and that these decisions are best expressed as algorithms, not ad hoc rules.
The TeXbook is simultaneously a user manual and a philosophical treatise on the nature of quality in typesetting. Knuth's central claim is that a document's appearance should be determined by a small set of clearly defined primitives — boxes, glue, and penalties — composed by a macro language of arbitrary power. Everything visible on a page, from a letter to an aligned table to a complex integral, is ultimately a box containing other boxes separated by glue. Learning TeX means learning to think in these terms.
How do you instruct a computer to arrange marks on a page so that the result looks as though a master craftsman set it by hand?
Chapter 1 — The Name of the Game
Central question
What is TeX, where does its name come from, and what promises does the book make to the reader?
Main argument
The etymology of TeX. The word TeX is derived from the Greek letters τεχ (tau, epsilon, chi), the root of the English words "technology" and "technique." Knuth is careful about pronunciation: the ch should sound like the ch in "loch" or "Bach," not like the ch in "tech." The X is the Greek letter chi, not the Latin letter X. This orthographic exactness — TeX, not TEX or Tex — is itself an early signal of the book's insistence on precision.
The scope of the system. TeX is a program for producing high-quality typeset documents, especially those containing mathematics. Knuth distinguishes it from a word processor: TeX separates content from formatting instructions, and its output is device-independent (the DVI format), later rendered by a driver for any output device.
The layered audience. Knuth acknowledges that The TeXbook addresses readers at different levels of expertise. Sections marked with a single dangerous-bend symbol (⚠) are for advanced users; double dangerous-bend sections are for experts and language implementors. Beginners are encouraged to skip these sections on a first reading.
Key ideas
- TeX's name reflects its Greek mathematical heritage; the pronunciation "tech" (as in Bach) is non-negotiable.
- The system is designed so that the same input always produces the same output on any machine — reproducibility is a design goal.
- The dangerous-bend notation structures the book for progressive mastery.
- TeX was created because Knuth was dissatisfied with the appearance of the second edition of The Art of Computer Programming produced by then-current phototypesetting technology.
Key takeaway
TeX is a precision instrument for typography, named from the Greek root for "art and craft," and the book is structured so that novices and experts can both extract what they need.
Chapter 2 — Book Printing versus Ordinary Typing
Central question
What are the systematic differences between professional typography and typewriter conventions, and why do they matter?
Main argument
Quotation marks. A typewriter has one quote character ('); professional printing uses matched opening and closing curly quotes. In TeX, produces "opening double quotes" and '' produces "closing double quotes." The distinction affects readability and signals typographic literacy.
Dashes. There are three kinds: the hyphen (-), the en-dash (--), and the em-dash (---). Typewriters collapse all three into a single hyphen. Proper usage: hyphens join compound words, en-dashes indicate ranges (pages 3--7), em-dashes mark parenthetical breaks---like this.
Ligatures. Certain letter pairs (fi, fl, ff, ffi, ffl) are routinely combined into single glyphs (ligatures) in professional typography because their natural shapes collide. TeX inserts ligatures automatically from font metric information.
Spacing after punctuation. A typewriter convention uses two spaces after a period; professional typography uses a slightly larger space (an "inter-sentence space") but not two full word spaces. TeX applies the right spacing automatically, treating a period followed by a capital letter as an abbreviation (no extra space), and a period at end-of-sentence as sentence-ending (extra space).
Key ideas
- Typography has accumulated conventions over centuries that a typewriter cannot represent; TeX restores them.
- Many of these conventions (ligatures, kerning, dash distinction) are handled automatically; the user need only know when to override them.
- The difference between TeX output and typewriter output is not cosmetic — it reflects whether you are communicating in a professional medium.
- Knuth uses this chapter to calibrate expectations: TeX output will look different from a typed page, and the differences are intentional.
Key takeaway
Professional book typography differs from typewriter convention in dozens of systematic ways; TeX implements the correct conventions by default, so users must learn what to expect rather than fight the system.
Chapter 3 — Controlling TeX
Central question
How does a user give TeX instructions, and what is the fundamental syntax of TeX's command language?
Main argument
The escape character. The backslash \ is TeX's escape character: it signals that what follows is a control sequence rather than literal text. Everything in TeX is either plain text (characters to be typeset) or a command beginning with \.
Control words and control symbols. TeX distinguishes two kinds of control sequences: control words, which are backslash followed by one or more letters (e.g., \TeX, \hfill, \noindent), and control symbols, which are backslash followed by exactly one non-letter character (e.g., \ for a forced space, \$ for a literal dollar sign, \% for a literal percent sign). A control word absorbs all following spaces; a control symbol does not.
The role of spaces. Spaces following a control word are consumed by TeX and produce no output. To get a space after a control word like \TeX, the user must write \TeX\ (with a control-space) or \TeX{}. This asymmetry is a frequent source of confusion for new users.
Case sensitivity. TeX is case-sensitive: \Box and \box are different control sequences.
Key ideas
- The backslash/control-sequence syntax is the foundation of all TeX programming — every built-in primitive and every user-defined macro uses this form.
- The distinction between control words (absorb trailing spaces) and control symbols (do not) is a design decision that enables readable source but requires awareness.
- TeX source is essentially a stream of tokens — characters and control sequences — and the syntax governs how that stream is tokenized.
Key takeaway
TeX is controlled by escape sequences beginning with a backslash; the distinction between control words and control symbols governs how spaces are consumed, which is one of the first subtleties a new user must internalize.
Chapter 4 — Fonts of Type
Central question
How does TeX select and use typefaces, and what typographic varieties are available?
Main argument
Built-in font families. TeX's default format (plain TeX) provides several font-family commands: \rm (roman), \it (italic), \sl (slanted/oblique), \bf (bold), \tt (typewriter/monospace), and \sc (small caps). Each is a shorthand that switches the current font.
Font loading with \font. Individual fonts are loaded with the \font command, which maps a control sequence name to a specific font file: \font\bigrm=cmr12 loads 12-point Computer Modern Roman as \bigrm. The font files contain character metrics (widths, heights, depths, italic corrections, kerning pairs, ligatures) in TFM (TeX Font Metric) format.
Italic correction. When switching from italic to roman, an italic letter's tail extends beyond its bounding box. The \/ command inserts an "italic correction" — a small amount of extra space — to prevent the roman letter from crowding the tail: \it italics\/ roman.
Sizing. Plain TeX defines magnification via \magstep and \magstephalf, scaling fonts at fixed ratios. TeX's internal unit system (see Chapter 10) governs all size specifications.
Key ideas
- TeX does not embed fonts in the source; it references external font metric files that describe glyph dimensions.
- The Computer Modern family, designed by Knuth using METAFONT, is TeX's default and provides consistent metrics across roman, italic, bold, math, and special symbol fonts.
\/(italic correction) is one of the most frequently forgotten commands; omitting it produces visually poor italic-to-roman transitions.- Font choice in TeX is a low-level operation; higher-level systems like LaTeX automate it via font-selection schemes (NFSS).
Key takeaway
TeX treats fonts as metric files that describe the geometry of each glyph; selecting a font loads these metrics and thereafter every box TeX builds for that character uses the file's precise measurements.
Chapter 5 — Grouping
Central question
How does TeX scope its settings so that local changes do not contaminate the rest of a document?
Main argument
Curly braces as group delimiters. In TeX, { begins a group and } ends it. Any change to a parameter, font, or register made inside a group is automatically undone when the group closes. This is the fundamental scoping mechanism: {{\bf bold} roman again}.
What can be grouped. Font changes, register assignments (counters, dimensions, token-list registers), category-code changes, and virtually all TeX parameter changes are local to the enclosing group by default. The \global prefix overrides this, making an assignment global regardless of nesting depth.
Groups are not just for scoping. Groups also delimit arguments to macros and to built-in commands. When TeX reads \hbox{some text}, the {...} is both a group (scoping any changes inside) and the argument that defines the content of the box.
The "semi-simple group." Knuth distinguishes ordinary groups {...} from "semi-simple groups" opened by \begingroup and closed by \endgroup. These behave identically for scoping purposes but allow nesting that does not have to match {...} pairs — useful in macro programming.
Key ideas
- Grouping is the mechanism by which TeX achieves lexical scoping: every temporary change is automatically rolled back.
- Without grouping, a font change anywhere in a document would propagate to all following text.
- The
\globalprefix escapes the scope, enabling shared mutable state — used sparingly for counters and flags in macro programming. - Groups delimit both scopes and arguments, so understanding grouping is prerequisite to understanding macro expansion.
Key takeaway
Curly braces in TeX are not merely delimiters; they implement a save-and-restore stack for all parameter changes, giving the language its lexical scoping discipline.
Chapter 6 — Running TeX
Central question
How does a user actually interact with the TeX program, and what happens from the moment a source file is submitted until a DVI file appears?
Main argument
The five experiments. Knuth structures this chapter as a hands-on session: the reader sits at a terminal and runs five progressively more complex TeX jobs. Experiment 1 is a trivial "hello world" — a single line of text, \bye to signal end-of-file, and TeX producing a one-page DVI. Experiments 2–5 introduce errors, multi-paragraph text, font changes, and the use of an existing format file (plain TeX), building the user's intuition for the TeX interactive session.
The TeX run cycle. Running TeX produces several output files: a .dvi file (the formatted document), a .log file (a transcript of the run, including error messages), and possibly .aux or format-specific files. The DVI file must be processed by a driver program (e.g., dvips, xdvipdfmx) to produce PostScript or PDF.
Error recovery. When TeX encounters an error, it pauses and prompts the user. The interactive session supports several responses: ? (get help), H (get a hint), I<text> (insert replacement tokens), X (abort the current run), or just pressing Enter to continue. This interactive error recovery is a deliberate design feature allowing on-the-fly patching.
The format mechanism. Loading all macros from scratch at every run would be slow. TeX supports "format files" (.fmt) — precompiled memory images — that include a pre-loaded macro package. Invoking tex &plain myfile loads the plain TeX format before processing myfile.tex.
Key ideas
- TeX is a batch-mode program at heart; the source file is processed top-to-bottom and the result is a DVI file.
- Interactive error recovery is built in; understanding the error prompt shortcuts hours of frustration.
- Log files record the complete history of a TeX run, including every file loaded, every error, and page output.
- Format files (
initexversus production TeX) are how large macro packages (plain TeX, LaTeX, ConTeXt) achieve practical startup speed.
Key takeaway
Running TeX means invoking a batch process that tokenizes your source, expands macros, builds boxes, breaks lines, and ships pages to a DVI file — with an interactive error-recovery protocol that lets you inspect and patch problems mid-run.
Chapter 7 — How TeX Reads What You Type
Central question
How does TeX convert a stream of ASCII characters in a source file into the tokens that its processing engine acts upon?
Main argument
Three stages of input processing. Knuth describes TeX's input pipeline as having three stages: (1) characters are read from the file and converted to tokens using category codes; (2) tokens are either expanded (if they are expandable control sequences like \if or \csname) or passed to the stomach (the portion of TeX that builds boxes); (3) the stomach assembles the tokens into boxes and glue.
Category codes (catcodes). Every character in TeX's input has a category code (catcode), a number from 0 to 15 that determines how TeX treats it. Catcode 11 is "letter," catcode 12 is "other character," catcode 1 is "begin group" ({ by default), catcode 2 is "end group" (} by default), catcode 3 is "math shift" ($), catcode 10 is "space," catcode 0 is "escape character" (\). TeX reads these codes — not the characters themselves — to tokenize input.
Changing catcodes. The command \catcode\\@=11makes the at-sign@a letter (catcode 11), allowing it to appear in control-word names. This is the mechanism by which LaTeX and other packages create "private" command names like\@ifstar` that cannot be accidentally typed by document authors.
Tokens. The output of the input stage is a stream of tokens. Each token is either a character token (a character paired with its catcode) or a control sequence token (a \name). The rest of TeX operates on this token stream, never on raw characters.
Key ideas
- Catcodes are the most fundamental hook in TeX; by changing them, you can turn any character into a command escape, group delimiter, math shift, or anything else.
- The tokenization stage is irreversible: once a character is read and its catcode applied, the resulting token carries no memory of the original character code.
\active(catcode 13) makes a single character behave like a control sequence — used for special characters like~(non-breaking space) and for clever macro tricks.- Understanding catcodes is prerequisite to understanding why certain TeX behaviors seem mysterious to users who think in ASCII strings rather than token streams.
Key takeaway
TeX does not process raw characters; it processes tokens produced by pairing each character with a category code, and by changing category codes a TeX programmer can fundamentally alter the language's syntax.
Chapter 8 — The Characters You Type
Central question
Which characters have special meaning in TeX by default, and how do you produce literal versions of them?
Main argument
The ten special characters. In plain TeX, ten characters have non-default catcodes and thus special behavior: \ (escape), { (begin group), } (end group), $ (math shift), & (alignment tab), # (parameter), ^ (superscript), _ (subscript), ~ (active, non-breaking space), and % (comment). To produce any of these as literal characters, prefix them with a backslash: \$, \#, \%, \&, \{, \}.
Comments. The % character makes the rest of the line (including the newline) a comment. This is useful for removing unwanted space in macro programming — a technique that Chapter 20 relies on heavily.
ASCII and extended characters. TeX works on 256-character input. Characters 0–127 follow standard ASCII; characters 128–255 are available for extended character sets. Knuth introduces \char (which typesets a character by its code number) and ^^ notation (which encodes a character by XOR with 64 — e.g., ^^@ is character 0).
Generating characters with \char. \char65 produces the character at position 65 in the current font (the letter A in most fonts). This mechanism is how TeX accesses glyphs that have no direct keyboard representation.
Key ideas
- The ten special characters must be escaped with
\to appear as literal output; memorizing them is a prerequisite for basic TeX use. - The
%comment character also swallows the newline, preventing unwanted space tokens in macro definitions. ^^notation allows any byte value to be inserted into TeX source; it is also how TeX itself handles certain control characters in log files.\charis the primitive underlying all font access; higher-level font commands ultimately invoke it.
Key takeaway
Ten characters carry special meaning in plain TeX; to produce them literally, precede with \; to produce any glyph in a font by number, use \char.
Chapter 9 — TeX's Roman Fonts
Central question
What is the structure of the Computer Modern Roman font family, and how does TeX access accents, special symbols, and international characters?
Main argument
The Computer Modern family. Knuth designed the Computer Modern (CM) typefaces using his METAFONT program. The default plain TeX format loads several CM fonts: cmr10 (10pt roman), cmti10 (italic), cmsl10 (slanted), cmbx10 (bold), cmtt10 (typewriter), cmmi10 (math italic), cmsy10 (math symbols), and others. The family is parameterized — METAFONT generates different sizes from a single set of equations — and is metrically consistent across styles.
Accents. Plain TeX provides accent commands: \' (acute, é), \` (grave, è), \" (umlaut, ë), \^ (circumflex, ê), \~ (tilde, ñ), \= (macron, ō), \. (dot above, ṁ), \u (breve), \v (háček), \H (double acute), \c (cedilla, ç), \d (dot below), \b (bar below). Each positions a glyph from the font above (or below) the base character using TeX's box-and-raise machinery.
Special symbols. Plain TeX provides \dag (†), \ddag (‡), \S (§), \P (¶), \copyright (©), \AA (Å), \aa (å), \AE (Æ), \ae (æ), \OE (Œ), \oe (œ), \O (Ø), \o (ø), \ss (ß), \l (ł), \L (Ł), and others needed for European text.
Key ideas
- The Computer Modern fonts are the default and are inseparable from TeX's visual identity; they were designed specifically to work with TeX's metric machinery.
- Accent placement is not typeset from a pre-built accented glyph but constructed dynamically by TeX from the base character and a separate accent glyph — making it language-extensible.
- The distinction between
\it(italic, CM italic style) and\sl(slanted, mathematically skewed roman) reflects different design philosophies in typesetting.
Key takeaway
TeX's default fonts are Donald Knuth's own Computer Modern family; accents are constructed dynamically using TeX's box machinery, and the plain TeX format provides macros for the full range of Western European characters.
Chapter 10 — Dimensions
Central question
How does TeX express lengths and measurements, and what units are available?
Main argument
TeX's unit system. All dimensions in TeX are specified as a number followed by a unit. The fundamental unit is the scaled point (sp): 1 sp = 2^{-16} pt. This means all internal calculations are exact integer arithmetic — TeX never accumulates floating-point rounding errors.
Physical units. TeX accepts: pt (point, 1/72.27 inch), pc (pica, 12pt), in (inch), cm (centimeter), mm (millimeter), bp (big point, 1/72 inch — the PostScript point), dd (Didot point, European typographic tradition), cc (cicero, 12dd), sp (scaled point, the atomic unit).
Relative units. TeX also accepts em (the width of the letter M in the current font, the natural unit for horizontal spacing) and ex (the height of the letter x, the natural unit for vertical spacing). These are font-relative and scale automatically with font size.
Dimension registers. TeX has 256 dimension registers (\dimen0 through \dimen255) plus many named parameters like \hsize (the current line width), \vsize (the current page height), \baselineskip (the nominal distance between text baselines), and \parindent (paragraph indentation). These are set with \hsize=6.5in.
Arithmetic. Dimensions can be multiplied or divided by numbers: \hsize=2\dimen0 or specified as a fraction. The \dimen registers support addition, subtraction, and scaling within certain constraints.
Key ideas
- Using integer scaled points throughout eliminates floating-point error; two runs of TeX on the same source always produce identical DVI files.
- The
emandexunits adapt to the current font, making them the right choice for font-relative spacing in macros. \hsizeand\vsizeare the two most important dimension parameters: they determine the text block geometry.- The distinction between pt (TeX point) and bp (PostScript point) matters when integrating with modern PDF workflows.
Key takeaway
TeX uses exact integer arithmetic for all measurements, with scaled points as the atomic unit, and provides both physical and font-relative units so that dimensions can be expressed independently of or relative to the current typeface.
Chapter 11 — Boxes
Central question
What is a "box" in TeX's abstract model, and how are boxes constructed and manipulated?
Main argument
The box as universal container. Everything TeX typesets is ultimately a box: a rectangle with a reference point, a width (horizontal extent), a height (extent above the baseline), and a depth (extent below the baseline). Characters are boxes; words are boxes of characters; lines are boxes of words; pages are boxes of lines. This uniformity means that any operation that works on a character also works on a page, and complex structures are built by nesting.
Horizontal boxes (\hbox). \hbox{content} creates a horizontal box by lining up items from left to right. The width of the result is the sum of the widths of the contents; height and depth are the maximum height and depth of any item. \hbox to 5cm{...} creates a box of exactly 5cm width, distributing any excess or deficit as glue stretch/shrink.
Vertical boxes (\vbox and \vtop). \vbox{content} stacks items vertically. The two variants differ in where the reference point sits: \vbox places it at the bottom (depth = depth of last item), \vtop places it at the top (height = height of first item). This affects how the box aligns with surrounding content.
Struts. A strut is an invisible box of zero width but full height and depth. In TeX, \strut (defined in plain TeX) has the height and depth of a standard line; inserting a strut into a box ensures consistent vertical spacing even when the visible content would produce a shorter box.
Box registers. TeX has 256 box registers (\box0 through \box255). \setbox0=\hbox{hello} stores a box in register 0; \box0 retrieves it (consuming it in the process); \copy0 retrieves it without consuming. \wd0, \ht0, \dp0 access the width, height, and depth of the stored box.
Key ideas
- The box model is TeX's single most important abstraction: it unifies characters, words, lines, and pages under one data structure.
\hboxand\vboxare the building blocks of all complex layout in TeX and LaTeX.- The distinction between
\box(destructive) and\copy(non-destructive) matters in macro programming. - Over- and underfull boxes arise when glue cannot stretch or shrink enough to meet the requested width — these are the most common TeX warnings.
Key takeaway
A TeX box is a rectangle with width, height, and depth; everything on the page is a nested hierarchy of boxes, and \hbox/\vbox are the fundamental construction primitives.
Chapter 12 — Glue
Central question
How does TeX produce justified text and flexible spacing, and what is the mathematical model behind it?
Main argument
Glue as a triple. TeX's glue is not a fixed length but a specification (width, stretch, shrink): 5pt plus 2pt minus 1pt means "natural size 5pt, can grow up to 7pt, can shrink to 4pt." TeX assembles lists of boxes and glue, then adjusts the glue to make the total match a target dimension (a line width, a page height).
The glue-setting ratio. When TeX sets a line or a box, it computes a glue ratio r: if the available space exceeds the natural size, it stretches the glue proportionally; if space is too tight, it shrinks. The ratio is dimensionless; stretch and shrink are distributed across all glue items in proportion to their declared stretch and shrink.
Infinity orders. TeX supports three orders of infinity for stretch and shrink: fil (first order infinite), fill (second order), filll (third order). Higher-order glue dominates lower-order: \hfil (half-line fill, order 1) is infinitely stretchable relative to normal glue, \hfill (order 2) dominates \hfil, and \hfilll dominates both. This hierarchy allows \hfill to override a \hfil without requiring the programmer to know how much space is available.
Badness. The badness of a line is a measure of how far its glue was forced to stretch or shrink from its natural size. TeX computes badness as 100 × r³ (capped at 10000 for overfull boxes), where r is the adjustment ratio. A badness of 0 is perfect; 12 is decent; 100 is poor; 10000 (or ∞) means TeX could not make the line fit at all.
Explicit glue commands. Plain TeX provides: \quad (1em), \qquad (2em), \, (thin space, 3/18 em), \! (negative thin space), \hfil (stretchable fill), \hfill (dominant fill), \hss (infinite stretch and shrink), \vfil, \vfill for vertical spacing.
Key ideas
- Glue is a first-class data type in TeX; it is not a workaround but the designed mechanism for flexible spacing.
- The badness formula (100r³) penalizes large adjustments steeply; small stretches are nearly free, large ones are expensive.
- The order-of-infinity hierarchy is a clever algebraic trick: it allows dominant glue to absorb all available space without requiring numerical comparisons.
\hss(infinitely stretchable and shrinkable) is the Swiss-army knife for making a box fit an arbitrary width; it is used internally in many plain TeX macros.
Key takeaway
Glue is a flexible-length specification (natural, stretch, shrink) that allows TeX to justify text by distributing excess or deficit space proportionally across all inter-word and inter-character spaces.
Chapter 13 — Modes
Central question
What modes does TeX operate in, and how do the rules for building lists differ across modes?
Main argument
Six modes. TeX operates in six modes: horizontal mode (assembling text into lines), restricted horizontal mode (inside an \hbox), vertical mode (assembling lines and vertical material into a page), internal vertical mode (inside a \vbox), display math mode (a displayed equation), and math mode (inline math). The mode determines which commands are legal and how items are added to the current list.
Main vertical mode and the page builder. When TeX is at the top level (not inside any box), it is in vertical mode, building the "main vertical list" that will eventually be broken into pages. Each completed line from horizontal mode is appended to this list as a box, separated by \baselineskip glue.
Horizontal mode and paragraphs. TeX enters horizontal mode when it encounters a character, an \hbox, or certain other commands at the outer level. Text is collected into a horizontal list, which is eventually broken into lines by the paragraph algorithm (Chapter 14). The horizontal list is then converted to a sequence of line boxes and returned to vertical mode.
Mode transitions. Some commands are intrinsically mode-specific. \par ends a paragraph and returns TeX to vertical mode. A $ enters math mode; $$ enters display math mode. \hbox enters restricted horizontal mode. Certain commands are valid in any mode; others are restricted. TeX reports an error ("Missing $ inserted," "You can't use X in Y mode") when a command is used in the wrong mode.
Key ideas
- Understanding modes is essential for diagnosing "you can't use this here" errors; TeX's mode is a state machine and commands are mode-qualified.
- The six modes fall into three pairs (outer/inner): vertical/internal-vertical, horizontal/restricted-horizontal, display-math/math.
- The main vertical list is the ultimate destination of all TeX output; everything eventually becomes a box on this list.
- Mode transitions are not always explicit; certain characters (letters, spaces) implicitly trigger transitions from vertical to horizontal mode.
Key takeaway
TeX operates in six modes corresponding to different list-building contexts; the current mode determines which commands are legal and how items accumulate, and mode transitions are triggered implicitly by the characters and commands TeX encounters.
Chapter 14 — How TeX Breaks Paragraphs into Lines
Central question
How does TeX decide where to break a paragraph into lines, and what makes this algorithm superior to greedy line-by-line approaches?
Main argument
Global optimality. Unlike most word processors, TeX does not break lines greedily (one at a time from top to bottom). Instead, it considers the entire paragraph at once, finds the globally optimal set of breakpoints, and minimizes the total demerits across all lines. This paragraph-level approach means that adjusting early lines can improve later ones — an effect familiar to readers who have seen TeX "reach back" and reflow earlier text when a new paragraph is added.
Breakpoints. TeX identifies potential breakpoints in the horizontal list: after spaces (glue), at explicit penalties, or at hyphenation points. A breakpoint at glue discards the glue; a breakpoint at a penalty incurs that penalty's cost in the objective function.
Badness and demerits. Each line's badness (b) measures glue distortion (0 = perfect, 10000 = impossible). The demerits (d) for a line are computed as: d = (1 + b + p)² + a², where p is a penalty (if the line ends at a penalty item) and a is an "adjectives" term that penalizes two consecutive hyphenated lines (\adjdemerits), a very tight followed by very loose line (\looseness), and hyphenating the second-to-last line of a paragraph. The algorithm minimizes Σd across all lines.
Dynamic programming. The optimal solution is found by dynamic programming (the Knuth-Plass algorithm): for each potential breakpoint, TeX records the optimal set of breaks from the start of the paragraph to that point. The total computational complexity is O(n²) in the worst case (where n is the number of potential breakpoints) but typically near-linear in practice.
Fitness classes. Each line is classified into one of four fitness classes: very tight, tight, decent, loose. Extra demerits are incurred when adjacent lines have very different fitness classes (\adjdemerits), preventing an aesthetically jarring alternation of cramped and spacious lines.
Tolerance and \looseness. The parameter \tolerance is the maximum badness TeX will accept for any line before trying harder (with hyphenation). Setting \tolerance=10000 tells TeX to accept any line, however bad. The \looseness parameter (default 0) tells TeX to make the paragraph n lines longer or shorter than optimal — useful for manual fine-tuning.
Hyphenation. Before running the line-breaking algorithm, TeX identifies legal hyphenation points in each word using the pattern-matching algorithm of Frank Liang (implemented in Appendix H). Hyphenation points are inserted as penalty nodes with a cost of \hyphenpenalty. The line-breaking algorithm then treats them as optional breakpoints.
Key ideas
- TeX's line-breaking is a global optimization, not a greedy algorithm; this is the single largest quality difference from word processors.
- Badness is cubic in the adjustment ratio: b ≈ 100r³, so a slightly under-filled line (r = 0.5, b ≈ 12) is far less costly than a very under-filled line (r = 1.5, b ≈ 337).
- The demerits formula penalizes not just badness but also consecutive hyphens, incompatible fitness classes, and explicit penalties.
\emergencystretch(a last-resort additional glue) and\tolerancetogether control TeX's willingness to accept imperfect lines.
Key takeaway
TeX breaks paragraphs into lines by global optimization using dynamic programming, minimizing total demerits (a function of badness, penalties, and fitness-class incompatibilities) across all lines simultaneously.
Chapter 15 — How TeX Makes Lines into Pages
Central question
How does TeX decide where to break a sequence of lines and other vertical material into pages?
Main argument
The page builder. TeX continuously moves material from the main vertical list into a "recent contributions" list and attempts to break the accumulated material into a page. This is not done with a global paragraph-like algorithm; instead, TeX uses an "asynchronous" page builder that fires whenever the current vertical list exceeds \vsize (the target page height).
Page breaking as a one-dimensional problem. A page break must occur at a legal breakpoint: between two lines, before a display, at a \penalty, or at certain glue. At each candidate breakpoint, TeX computes the "cost" of breaking there, considering the badness of the resulting page (how well its vertical glue fills \vsize), plus any penalty at the breakpoint.
The output routine. When TeX decides to break a page, it does not directly write lines to the DVI file. Instead, it places the accumulated material into box register 255 (\box255) and invokes the output routine (stored in \output). The output routine (defined by the format, e.g., plain TeX's \plainoutput) is responsible for adding headers, footers, and page numbers, then calling \shipout to write the page to the DVI file. This design makes page-level formatting customizable at the user level.
Penalties and \goodbreak. \penalty-10000 forces a page break; \penalty10000 forbids one. Plain TeX provides \goodbreak (a weak suggestion), \filbreak (a flexible suggestion), \vfil\break (a forced break), and \eject (a \penalty-10000 followed by \vfil). \nobreak is \penalty10000.
Insertions. Footnotes and floats are handled via TeX's insertion mechanism. \insert<class>{...} injects material associated with a given insertion class; the page builder allocates space for it on the page and adjusts the text height accordingly. Plain TeX's \footnote macro uses insertions.
Key ideas
- Unlike line-breaking, page-breaking is asynchronous and not globally optimal — TeX cannot look ahead to future pages.
- The output routine's intermediary role (receiving
\box255and calling\shipout) makes TeX's page formatting entirely programmable. - Insertions (footnotes, floats) interact with page-breaking in complex ways: the page builder must allocate space for them while determining how much text fits.
\topskipis the glue between the top of the page box and the first line's baseline; it normalizes the first-line position across pages.
Key takeaway
TeX breaks vertical lists into pages by a cost-minimizing one-pass algorithm; the actual page output is delegated to a user-programmable output routine that receives the page material in a box register and calls \shipout.
Chapter 16 — Typing Math Formulas
Central question
What is the basic syntax for typesetting mathematical expressions in TeX, and what formatting conventions does TeX apply automatically?
Main argument
Math mode entry. A single $...$ enters inline math mode (the formula appears within the running text); $$...$$ enters display math mode (the formula is centered on its own line with vertical space above and below). Inside math mode, TeX applies different spacing rules, uses math fonts instead of text fonts, and interprets ^ and _ as superscript and subscript.
Superscripts and subscripts. x^2 produces x², x_i produces xᵢ, and x^{ij} produces x^{ij} (the braces group a multi-character superscript). These can be nested: x^{y^z} produces x^{y^z} (each nested level is slightly smaller). The size reduction follows predefined style levels: textstyle, scriptstyle, scriptscriptstyle.
Symbols and operators. TeX provides commands for Greek letters (\alpha, \beta, ..., \Omega), binary operators (\times, \div, \pm, \cap, \cup), relations (\leq, \geq, \neq, \subset, \in), arrows (\to, \leftarrow, \Rightarrow), and many others. Spacing around binary operators and relations is automatically correct; no manual spacing is required for standard expressions.
Fractions. \over produces a fraction: {a+b \over c+d} renders (a+b)/(c+d) as a vertical fraction with a horizontal bar. The \above command gives explicit control over the bar thickness; \atop omits the bar (useful for binomial coefficients). In LaTeX, \frac{numerator}{denominator} wraps this.
Delimiters. Parentheses and brackets ( ), [ ], \{ \} are available at fixed sizes. For tall expressions, \left( ... \right) automatically scales the delimiters to match the height of the enclosed formula. \left. and \right. are "null" delimiters (invisible) for when only one side is needed.
Key ideas
- TeX's math mode is a separate sublanguage with its own spacing model, font selection, and syntax.
- Automatic spacing around operators and relations is one of TeX's most visible advantages over manual typesetting.
\overis a primitive that creates a generalized fraction; LaTeX's\fracis a macro wrapping it.- The three style levels (text, script, scriptscript) automatically shrink nested superscripts/subscripts.
Key takeaway
TeX's math mode provides a concise syntax (^, _, \over, \left/\right) and automatically handles spacing, font sizing, and delimiter scaling so that standard mathematical expressions require no manual intervention.
Chapter 17 — More about Math
Central question
What advanced mathematical structures does TeX support, and how are spacing and font selection controlled in math mode?
Main argument
Accents in math. Math mode has its own accent commands distinct from text accents: \hat{x} (x̂), \bar{x} (x̄), \dot{x} (ẋ), \ddot{x} (ẍ), \vec{x} (x⃗), \tilde{x} (x̃), \widehat{xyz} (wide hat that stretches over multi-character arguments).
Radical signs. \sqrt{x} produces √x; \sqrt[n]{x} produces the nth root (∜x for n=4). The radical sign grows automatically to match the height of its radicand.
Sums, integrals, and large operators. Commands like \sum, \int, \prod, \bigcup, \lim are "large operators" that change size between text and display mode. In display mode, \sum_{i=1}^{n} places limits above and below the sigma; in inline mode, limits appear as superscripts/subscripts to avoid excessive line height.
Matrices and arrays. TeX does not have a dedicated matrix command at the primitive level; matrices are built with \matrix (a plain TeX macro built on \halign) or with explicit alignment structures. Each entry is separated by & and rows by \cr.
Mathematical spacing. Inside math mode, spacing is controlled by invisible commands: \, (thin space, 3/18 em), \: (medium space, 4/18 em), \; (thick space, 5/18 em), \! (negative thin space). These override TeX's automatic spacing when the author wants a non-standard visual grouping.
Math fonts. Math mode uses different fonts from text mode: \mathrm (roman letters in math), \mathit (italic), \mathbf (bold), \mathcal (calligraphic capital letters A–Z), \mathbb (blackboard bold, if available). Digits and Latin letters in math mode are automatically set in math italic (cmmi10), not text italic.
Key ideas
- The distinction between text-mode and math-mode accents, fonts, and spacing means that switching between the two contexts requires re-learning conventions.
- Large operators (
\sum,\int) have display-style and text-style forms with different limit placement; TeX selects the right form automatically. \mathcalrequires the math font familycmsyand only covers uppercase letters.- The
\phantomcommand (inserts an invisible box of the same size as its argument) is used to control spacing and alignment in multi-line structures.
Key takeaway
Advanced math in TeX involves accent commands, stretchy radicals, large operators with automatic limit placement, and fine-grained spacing control using thin/medium/thick space commands.
Chapter 18 — Fine Points of Mathematics Typing
Central question
What subtle typographic rules govern mathematics that TeX encodes, and how does a user override them when the default is wrong?
Main argument
Atom types. In math mode, every item has a type (mathord, mathop, mathbin, mathrel, mathopen, mathclose, mathpunct, mathinner) that determines the spacing around it. For example, a binary operator (+, \times) automatically gets medium space on each side; a relation (=, <, \leq) gets thick space. Knuth encodes the 8×8 spacing table that governs these interactions.
Overriding atom types. Sometimes TeX guesses wrong. A minus sign used as a unary negation (−x) should not get binary-operator spacing; {-}x or \mathord{-}x suppresses the extra space. Conversely, \mathbin{\circ} promotes a symbol to binary-operator status for proper spacing in function composition (f ∘ g).
Displayed equations with numbers. The \eqno{(1)} command appends an equation number to the right of a displayed equation; \leqno{(1)} places it on the left. These are direct primitives, not macros.
Multi-line displays. Plain TeX's \eqalign (built on \halign) aligns a set of equations at a common column — typically the equals sign. \eqalignno adds equation numbers. These structures require understanding TeX's alignment primitives (see Chapter 22).
TeX's 16 math spacing rules. Knuth gives the complete 8×8 table of spacings between atom types, with entries 0 (no space), 1 (thin space, context-dependent), 2 (thin space), 3 (thick space), and 4 (medium space). This table is defined by the TeX language specification and cannot be changed by users (though its application can be worked around by changing atom types).
Key ideas
- The atom-type system automates the spacing rules of professional mathematical typography without requiring the author to know them.
- The spacing table is part of the TeX language definition, not a stylistic choice: it encodes decades of mathematical typography convention.
\mathord,\mathbin,\mathrel, etc. are available for overriding TeX's automatic classification.\phantom,\vphantom,\hphantom(invisible boxes) are the tools for manual alignment in complex multi-line formulas.
Key takeaway
TeX automatically applies a codified spacing table based on the type of each mathematical atom, and users can override these classifications to handle the cases where TeX's heuristic classification is semantically wrong.
Chapter 19 — Displayed Equations
Central question
How are large, centered, multi-line displayed equations typeset, and what tools does TeX provide for equation numbering and alignment?
Main argument
Display math fundamentals. Entering $$...$$ places TeX in display math mode: the equation is centered on its own line, preceded and followed by \abovedisplayskip and \belowdisplayskip (variable-length glue that compresses if the preceding text line is short — the "short display" optimization). The equation uses textstyle math (full-size operators) rather than the scriptstyle of inline math.
Short display optimization. If the last line of the preceding paragraph is short enough that it does not extend into the space where the equation will appear, TeX can use a narrower display indentation. This is controlled by \abovedisplayshortskip and \belowdisplayshortskip, which are typically smaller (less vertical space) than their normal counterparts.
Equation numbering. \eqno{...} and \leqno{...} add a right-side or left-side equation number, respectively. In practice, automated numbering requires a counter macro; plain TeX provides \eqno\eq where \eq is a macro that increments and formats the equation counter.
Multi-line aligned equations. \eqalign{...} is a plain TeX macro built on \halign (Chapter 22) that aligns multiple lines of an equation at a designated column (the & token marks the alignment point). \eqalignno adds equation numbers. These tools handle the common case of breaking a derivation over multiple lines with alignment at the equals sign.
Key ideas
- Display math mode ($$) is not merely a centering directive; it also changes operator size, limit placement, and applies vertical spacing macros.
- The short-display skip optimization is a subtle, automatic space-saving feature that most users never notice but that improves the visual density of pages.
\eqalignis the plain TeX workaround for multi-line alignment, preceding LaTeX'salignandgatherenvironments.
Key takeaway
Displayed equations are centered with automatic vertical spacing; TeX provides equation numbering primitives and an alignment environment (\eqalign) for multi-line derivations.
Chapter 20 — Definitions (also called Macros)
Central question
How does TeX's macro system work, and what techniques allow macros to be parameterized, conditional, and self-referential?
Main argument
\def and its variants. \def\macroname{replacement text} defines a macro: whenever \macroname appears in the input, TeX replaces it with the replacement text. \edef ("expanded def") expands the replacement text at definition time. \gdef is a global \def (not scoped to the current group). \xdef is a global \edef.
Parameters. A macro can have up to 9 parameters: \def\add#1#2{#1+#2} defines a two-argument macro; \add{3}{4} expands to 3+4. Parameters can also be delimited: \def\findword#1 {#1} matches a word up to the next space. This "delimited argument" syntax allows TeX to parse arbitrarily complex input patterns.
\let. \let\newname=\oldname makes \newname equivalent to the current meaning of \oldname. Unlike \def, \let snapshots the current binding — if \oldname is later redefined, \newname retains the original definition.
\expandafter. The single most important advanced macro tool. \expandafter\X\Y expands the token immediately after \Y before processing \X, allowing the programmer to control the order of macro expansion precisely. Chains of \expandafter are used to "reach through" macro arguments and process their expanded forms.
Conditional execution. TeX provides a set of \if... primitives: \if (test two character tokens for equality), \ifx (test two tokens for identical meaning), \ifnum (compare integers), \ifdim (compare dimensions), \ifhmode, \ifvmode, \ifmmode (test current mode), \iftrue, \iffalse, and others. Conditionals are closed with \fi and may include an \else branch. They are expandable — they can appear in contexts where only expansion is occurring.
\csname and \endcsname. This pair constructs a control sequence name dynamically from a string of tokens: \csname abc\endcsname produces the control sequence \abc. Combined with \expandafter, this allows TeX macros to build and invoke control sequences whose names are computed at runtime — the foundation of associative-array and dispatch-table techniques in TeX programming.
Key ideas
- TeX's macro system is Turing-complete: its combination of
\def, conditionals, recursion, and token manipulation allows arbitrary computation at expansion time. - The expansion/execution distinction is fundamental: expandable commands (
\if,\csname, user\defmacros) run during the "mouth" stage; non-expandable commands (\setbox,\advance) run during the "stomach" stage. \expandafteris the key to advanced TeX programming; mastery of it separates macro novices from macro experts.- Delimited arguments allow TeX macros to parse input that does not use braces — enabling entirely new syntaxes to be implemented within TeX.
Key takeaway
TeX's macro language is a Turing-complete token-rewriting system; \def creates text-replacement rules, parameters handle arguments, \expandafter controls expansion order, and \if... primitives enable conditional execution, together enabling programs of arbitrary complexity to run inside TeX.
Chapter 21 — Making Boxes
Central question
What are the advanced box-manipulation commands, and how can boxes be positioned, measured, and altered after construction?
Main argument
\raise and \lower. These commands shift a box vertically relative to the current baseline: \raise 3pt \hbox{X} lifts the box 3pt; \lower 2pt \hbox{Y} drops it 2pt. They produce a box of zero width for TeX's spacing calculations (the horizontal extent is not counted), unless used inside a measured context.
\moveright and \moveleft. Analogues for horizontal displacement inside a vertical list: \moveright 1cm \vbox{...} shifts a vbox 1cm to the right. These are used in custom page layout macros.
Rule boxes. \hrule (a horizontal rule) and \vrule (a vertical rule) produce filled rectangular boxes. Their dimensions are specified by width, height, and depth keywords: \hrule width 3cm height 0.4pt. Default dimensions fill the available space (for \hrule) or match the current strut (for \vrule). Rules are the primitive from which all lines and boxes in output are drawn.
Unboxing. \unhbox0 (or \unhcopy0) extracts the contents of a stored horizontal box and inserts them directly into the current horizontal list, discarding the outer box. Similarly \unvbox0 extracts from a vertical box. This allows post-construction manipulation: measuring a box, modifying its contents, and re-inserting.
\smash. Plain TeX's \smash{...} macro typesets its argument but pretends it has zero height and depth — useful for placing material in math mode without affecting the bounding box height.
Key ideas
\raise/\lowerand\moveright/\moveleftare the primitives for off-baseline positioning; they are used in logo construction, superscript placement, and diacritical marks.- Rules are the only way to draw horizontal or vertical lines in TeX; all lines (underlines, tabular borders, fraction bars) are ultimately
\hruleor\vrule. \unhboxis the key to modifying pre-built boxes; it is used in complex page-layout macros that measure content before placing it.
Key takeaway
TeX provides primitives for vertical and horizontal displacement (\raise, \lower, \moveright, \moveleft), for drawing rules, and for "unboxing" stored boxes — together enabling precise control over final glyph placement.
Chapter 22 — Alignment
Central question
How does TeX typeset tables and aligned columns, and what is the structure of \halign?
Main argument
\halign fundamentals. \halign{preamble\cr row\cr row\cr} aligns rows of material in columns. The preamble specifies a template for each column with # as the placeholder for cell content; columns are separated by &. For example, \halign{#\hfil & \hfil#\hfil & \hfil#\cr A & B & C\cr} creates a three-column table with the first column left-aligned and the others centered.
Column width. \halign automatically sets each column width to the widest entry in that column. The programmer need not specify widths; TeX makes two passes through the rows (actually, it makes as many passes as necessary to stabilize the widths) and sets glue accordingly.
\tabskip. The \tabskip glue is inserted between columns (and at the margins). Setting \tabskip=0pt closes gaps; setting it to a flexible glue spreads columns evenly. Each \tabskip value in the preamble governs the space after that column.
The \omit command. Inside a \halign, \omit suppresses the preamble template for the current cell, using only the cell content directly. This allows headers or special cells to span or override the column template.
\span and multi-column cells. \span causes the current cell's template to be combined with the next column's template (merging two columns). Repeated &\span combinations create cells spanning multiple columns — TeX's equivalent of colspan in HTML tables.
\valign. The vertical counterpart to \halign: \valign{preamble\cr col\cr col\cr} arranges columns of stacked items side by side. This is used for setting items in parallel vertical lists.
Key ideas
\halignis a general-purpose columnar alignment engine; LaTeX'stabularandarrayenvironments, as well as\matrixand\eqalign, are all built on top of it.- The two-pass column-width measurement means
\halignalways produces perfectly fitted columns without manual width specification. \omit,\span, and\tabskipprovide fine-grained control for the exceptional cells that do not follow the standard template.- Display-math
\eqalignand\cases(a macro for piecewise functions) are both thin wrappers around\halign.
Key takeaway
\halign is the universal alignment engine in TeX: it accepts a column template, automatically computes column widths, and handles merged cells — making it the underlying mechanism for tables, matrices, and multi-line equation alignment.
Chapter 23 — Output Routines
Central question
How is TeX's page-output process customized, and what is the contract between the page builder and the output routine?
Main argument
The \output token register. When the page builder decides a page is full, it inserts \box255 (the page material) and triggers the routine stored in \output. The default plain TeX output routine calls \plainoutput, which adds headers and footers (from \headline and \footline token registers) and calls \shipout\box255 to write the page to the DVI file.
\shipout. This primitive serializes the contents of a box to the DVI output file. It is the only way material reaches the output; everything else in TeX manipulates internal data structures. \shipout\hbox{...} can ship any box, not just a page, making it possible to produce DVI files with non-standard page sizes.
Marks. \mark{text} inserts a mark token into the vertical list. The output routine can retrieve the first and last marks on a page via \firstmark and \botmark — the standard mechanism for producing running headers that reflect the current chapter or section.
Insertions in the output routine. Footnote insertions (collected by the page builder in insertion boxes) are available in the output routine via \box<class>. The output routine is responsible for placing them at the bottom of the page with appropriate spacing.
Custom output routines. Complex page layouts (two-column formats, crop marks, watermarks) require custom output routines. The key technique is "saving the page" — placing \box255 into another register before calling \shipout — so the routine can measure, modify, and then ship the material.
Key ideas
- The output routine is the programming interface to pages; everything at the page level (headers, footers, footnotes, margins) is customized here.
\firstmarkand\botmarkare how running headers know which section they are in — the author marks the sectional divisions, and the output routine reads the marks on each page.- Two-pass output routines (save, measure, re-set, ship) are the workaround for TeX's lack of forward look-ahead at the page level.
- The output routine runs inside a group; any assignments it makes are local unless explicitly global.
Key takeaway
TeX's output routine receives each completed page in \box255 and is responsible for adding headers, footers, and footnotes before calling \shipout; mark registers allow the routine to track which section material appears on each page.
Chapter 24 — Summary of Vertical Mode
Central question
What is the complete set of commands that are legal in vertical mode, and what do they produce?
Main argument
This chapter is a systematic reference, enumerating every TeX primitive and plain TeX command that can be used while TeX is in vertical mode (assembling lines into pages). The chapter is organized as a classified list:
Items that go on the vertical list. Boxes (produced by \hbox, \vbox, or shifted with \moveleft/\moveright), rules (\hrule), insertions (\insert), marks (\mark), \special (driver-specific commands), and whatsits (deferred actions).
Vertical glue and penalties. \vskip, \vfil, \vfill, \vfilneg, \vss insert vertical glue. \penalty inserts a vertical penalty. \bigskip, \medskip, \smallskip are plain TeX macros for common inter-paragraph spacings.
Horizontal mode triggers. Encountering a letter, \hbox, \valign, or other horizontal-mode material in vertical mode causes TeX to enter horizontal mode automatically (an "implicit \indent").
Parameters that affect vertical mode. \baselineskip, \lineskip, \lineskiplimit, \topskip, \maxdepth, \splitmaxdepth, \vsize, \prevdepth all govern how lines are spaced and how they are assembled into pages.
Key ideas
- Vertical mode's command set is the dual of horizontal mode: it builds lists of line-sized boxes rather than character-sized boxes.
- Understanding which commands are vertical-mode-only is essential for writing output routines and custom page-layout code.
\prevdepth(the depth of the last item added to the vertical list) is used by TeX's\baselineskipinsertion logic to maintain consistent baseline spacing.
Key takeaway
Chapter 24 is the definitive reference for vertical-mode commands, documenting every primitive that contributes to the main vertical list and the parameters that control their behavior.
Chapter 25 — Summary of Horizontal Mode
Central question
What is the complete set of commands legal in horizontal mode, and how do they interact with the character stream?
Main argument
This chapter is the horizontal-mode counterpart to Chapter 24, cataloguing all commands and parameters relevant to building horizontal lists (lines of text).
Items on the horizontal list. Characters (set from the current font), ligatures (formed automatically by TeX from font ligature tables), kerns (explicit or automatic spacing from font metrics), boxes (\hbox, \vbox, \raise/\lower boxes), rules (\vrule), and glue.
Horizontal glue. \hskip, \hfil, \hfill, \hfilneg, \hss, \quad, \qquad, \thinspace and other plain TeX spacing commands.
Word spacing. Inter-word space in horizontal mode comes from the \spaceskip and \xspaceskip parameters (or, by default, from the current font's space, stretch, and shrink metrics). \spacefactor modifies spacing: after a period it is set to 3000, causing the inter-sentence space to be slightly larger.
Discretionary hyphens. \- inserts a discretionary hyphen at a specific point; the \discretionary primitive (with pre-break, post-break, and no-break texts) allows full control over how a word breaks.
Parameters. \hsize, \rightskip, \leftskip, \parindent, \parskip, \spaceskip, \spacefactor, \parfillskip, \emergencystretch and many others affect horizontal-mode behavior.
Key ideas
- Horizontal mode is where typography "happens" at the character level: ligatures, kerning, discretionary hyphens, and word spacing are all resolved here.
\spacefactoris TeX's mechanism for distinguishing inter-word from inter-sentence spaces without explicit markup.\leftskipand\rightskipare the tools for hanging indentation, ragged-right, and centered text — they add glue at line endings/beginnings.
Key takeaway
Chapter 25 catalogs all horizontal-mode commands, documenting how characters, glue, kerns, and boxes are assembled into lines and how dozens of parameters fine-tune the process.
Chapter 26 — Summary of Math Mode
Central question
What is the complete set of math-mode commands and parameters, and how does TeX's math typesetting engine work internally?
Main argument
This chapter is the reference for math mode, cataloguing the math atoms, commands, and parameters that govern formula typesetting.
Math atoms. Every item in a math list is an atom with a type (Ord, Op, Bin, Rel, Open, Close, Punct, Inner, Over, Under, Acc, Rad, Vcent). The atom type determines spacing (from the table in Chapter 18) and how the item participates in limit placement and fraction building.
Math styles. Formulas are typeset in one of four styles: Display (D), Text (T), Script (S), ScriptScript (SS). TeX automatically selects the style based on context (display or inline, subscript depth); \displaystyle, \textstyle, \scriptstyle, \scriptscriptstyle override this.
Math fonts and families. TeX supports 16 math font families (0–15); each family has three sizes (text, script, scriptscript). The plain TeX format sets up families 0 (roman), 1 (math italic, cmmi), 2 (math symbols, cmsy), 3 (math extension font cmex10 for large operators and delimiters).
Math parameters. TeX has 22 numeric math parameters (accessed via \mathcode, \delcode, \catcode) and numerous dimension parameters (\mathsurround, \medmuskip, \thickmuskip, \thinmuskip, \nulldelimiterspace, \scriptspace, \delimiterfactor, \delimitershortfall). These govern spacing, delimiter sizing, and fraction-bar thickness.
Key ideas
- The 16 math font families allow a single TeX document to simultaneously use multiple math typefaces (roman, calligraphic, blackboard bold, etc.).
\mathcodeassigns each character code a math code: a 15-bit number encoding family, position in font, and atom type — the foundation of TeX's math font-selection machinery.\delimiterfactorand\delimitershortfalltogether determine how large a\left/\rightdelimiter must be relative to the enclosed formula.
Key takeaway
Chapter 26 is the definitive reference for math mode internals, covering atom types, style levels, font families, and the 22+ numeric parameters that govern every spacing and sizing decision in mathematical typesetting.
Chapter 27 — Recovery from Errors
Central question
How does TeX communicate errors, and what strategies allow the user to diagnose and recover from them interactively?
Main argument
TeX's error messages. When TeX encounters a problem, it writes an error message to the terminal and log file. Messages follow a consistent format: a line beginning with ! states the error; subsequent lines show the context (the current input line and a pointer to the problem). Common errors include Undefined control sequence, Missing $ inserted, Overfull \hbox, Runaway argument, and Missing { inserted.
The error-recovery prompt. After an error, TeX pauses and displays ?. The user can type: <Return> (continue with TeX's best guess), H (get a hint explaining the error), I<text> (insert tokens before the current point), D (show a dynamic dump of TeX's state), E (invoke an editor at the error location), S (scroll mode, continue without stopping), R (non-stop mode), Q (batch mode, stop all output), X (exit the current run).
Common errors and remedies. Knuth catalogs the errors a typical user encounters, explains their causes, and gives remedies. An overfull box (text too wide) has solutions: increase \tolerance, add manual \- hyphens, rewrite the sentence, or use \sloppy (which sets \tolerance=9999, \hbadness=9999, \emergencystretch=...). A missing $ error typically means a math command appeared in text mode.
Error categories. Knuth distinguishes errors that TeX can patch and continue (inserting missing tokens) from fatal errors that force a TeX abort. The \errorstopmode, \scrollmode, \nonstopmode, \batchmode primitives control how aggressively TeX stops for errors.
Key ideas
- TeX's error messages are precise and contain context, but require learning their vocabulary to interpret efficiently.
- Interactive insertion (
I<text>at the?prompt) allows fixing a typo in a running job without restarting from scratch. - Overfull and underfull box warnings are the most common non-fatal messages; they indicate where TeX's optimization produced an imperfect result.
- Running in batch mode (
\batchmode) suppresses all interaction; the log file becomes the sole record of errors.
Key takeaway
TeX provides a detailed interactive error-recovery protocol with context display, hint messages, token insertion, and mode controls; understanding it converts cryptic error messages into actionable diagnoses.
The book's overall argument
- Chapter 1 (The Name of the Game) — establishes TeX's identity, purpose, and the book's layered reading structure, framing precision as the central value.
- Chapter 2 (Book Printing versus Ordinary Typing) — enumerates the typographic conventions (quotes, dashes, ligatures, spacing) that TeX implements correctly by default, motivating the system's existence.
- Chapter 3 (Controlling TeX) — introduces the escape character and control-sequence syntax as the universal mechanism for giving TeX instructions.
- Chapter 4 (Fonts of Type) — explains how TeX loads and selects fonts via metric files, establishing glyphs as metric objects rather than pixels.
- Chapter 5 (Grouping) — introduces curly-brace scoping, the mechanism that prevents local changes from propagating globally.
- Chapter 6 (Running TeX) — grounds the conceptual machinery in practical operation: how to invoke TeX, read its output, and recover from errors interactively.
- Chapter 7 (How TeX Reads What You Type) — reveals the tokenization layer: catcodes turn characters into typed tokens, and this layer is the deepest hook for altering TeX's behavior.
- Chapter 8 (The Characters You Type) — catalogs the ten special characters and the
\char/^^mechanisms for literal output of any character code. - Chapter 9 (TeX's Roman Fonts) — details the Computer Modern family, accent construction, and the special symbols provided by plain TeX.
- Chapter 10 (Dimensions) — establishes TeX's exact integer measurement system, avoiding floating-point error across all size specifications.
- Chapter 11 (Boxes) — introduces the box as the universal data structure: everything visible is a rectangle with width, height, and depth.
- Chapter 12 (Glue) — introduces the flexible-spacing mechanism (natural/stretch/shrink) and badness, completing the box-and-glue model.
- Chapter 13 (Modes) — reveals the state-machine structure governing which operations are legal in which context.
- Chapter 14 (How TeX Breaks Paragraphs into Lines) — presents the globally optimizing Knuth-Plass algorithm as the core innovation distinguishing TeX from earlier typesetting systems.
- Chapter 15 (How TeX Makes Lines into Pages) — extends the one-dimensional breaking problem to pages, introducing the asynchronous page builder and the output routine interface.
- Chapter 16 (Typing Math Formulas) — opens the math sublanguage with its syntax for superscripts, subscripts, fractions, and automatic operator spacing.
- Chapter 17 (More about Math) — extends math to accents, radicals, large operators, and matrix-like structures.
- Chapter 18 (Fine Points of Mathematics Typing) — codifies the spacing rules and atom-type system that make TeX's math output match professional standards.
- Chapter 19 (Displayed Equations) — handles the vertical dimension of math: centering, vertical spacing, numbering, and multi-line alignment.
- Chapter 20 (Definitions / Macros) — reveals TeX as a programmable language:
\def, parameters, conditionals, and\expandaftertogether constitute a Turing-complete token-rewriting system. - Chapter 21 (Making Boxes) — gives the advanced box-manipulation tools (raise, lower, rules, unbox) needed for complex layout.
- Chapter 22 (Alignment) — presents
\halignas the universal table and column engine underlying all of TeX's structured layout. - Chapter 23 (Output Routines) — exposes the page-output interface: the output routine,
\shipout, marks, and insertions make page-level formatting fully programmable. - Chapter 24 (Summary of Vertical Mode) — provides the definitive reference for vertical-mode commands, closing the loop on page composition.
- Chapter 25 (Summary of Horizontal Mode) — provides the definitive reference for horizontal-mode commands, closing the loop on line composition.
- Chapter 26 (Summary of Math Mode) — provides the definitive reference for math mode, its atom taxonomy, style levels, and parameters.
- Chapter 27 (Recovery from Errors) — completes the practical loop: understanding TeX's error messages and recovery protocol turns the system from opaque into transparent.
Common misunderstandings
Misunderstanding: TeX and LaTeX are the same thing.
TeX is the underlying typesetting engine (a program with about 300 primitives). LaTeX is a macro package written in TeX that provides document-structure commands (\section, \begin{...}, \bibliography). Plain TeX (described in The TeXbook) is a different, lower-level macro package. The TeXbook teaches plain TeX and TeX primitives, not LaTeX.
Misunderstanding: TeX is a markup language like HTML.
TeX is a macro-programming language that happens to produce typeset output. A .tex file is a program: it defines macros, executes conditionals, loops, and manipulates token streams. The "document" is computed by running this program.
Misunderstanding: TeX's line-breaking is just a fancy word-wrap.
TeX's paragraph-level dynamic-programming algorithm is fundamentally different from the greedy line-by-line approach used by word processors. It considers all possible break combinations for the entire paragraph simultaneously and minimizes total demerits — an NP-hard optimization problem (solved efficiently here via O(n²) dynamic programming).
Misunderstanding: The dangerous-bend sections can be safely skipped forever.
The dangerous-bend sections contain material that is advanced but often necessary for real use: font loading, catcode changes, \expandafter, output routines. A user who never reads them cannot write their own macros or customize page layout.
Misunderstanding: TeX's behavior changes between versions.
Knuth has deliberately frozen TeX. The version number asymptotically approaches π (currently 3.141592653...); each bug fix increments one more decimal digit. No new features will be added. This stability is a design goal, not a limitation — it guarantees that a 1986 TeX source file produces identical output today.
Misunderstanding: "Overfull \hbox" is always an error the user must fix.
Overfull and underfull box warnings are informational diagnostics, not errors. TeX produces the best output it can; the message tells you where it could not satisfy all constraints simultaneously. For most documents, occasional overfull hboxes below 1pt overage are invisible in print.
Central paradox / key insight
The central paradox of The TeXbook is that a typesetting system of extraordinary output quality is produced by a programming language of extraordinary primitiveness. TeX has no loops with break conditions, no data structures beyond registers and token lists, no garbage collection, no first-class functions — and yet Knuth and subsequent macro writers have built entire document-processing ecosystems (LaTeX, ConTeXt, BibTeX, and more) from these 300 primitives.
The resolution is that TeX's primitives are chosen with extreme care: boxes, glue, and penalties are the right atomic units for typesetting; catcodes are the right hook for syntactic flexibility; the \expandafter + conditionals + \csname triad provides exactly enough power for Turing-complete macro programming. Knuth's insight was that quality in typography reduces to a small number of well-chosen constraints — and that once those constraints are correctly formalized, a computer can satisfy them better than a human can.
"TeX has to make literally hundreds of micro-typographic decisions every minute, and it gets them right every time, because it has been told the correct rules."
Important concepts
Box
TeX's universal layout object: a rectangle defined by a reference point, width (horizontal extent), height (extent above baseline), and depth (extent below baseline). Characters, lines, and pages are all boxes; complex structures are nested boxes.
Glue
A flexible-length specification written as (natural) plus (stretch) minus (shrink), e.g., 5pt plus 2pt minus 1pt. Glue can stretch or shrink to fill available space; TeX distributes stretch/shrink proportionally to satisfy line-width or page-height constraints.
Badness
A dimensionless measure of how far a line's glue deviated from its natural size. Defined approximately as 100r³ where r is the glue ratio (0 = no deviation, 10000 = impossible). Used by the Knuth-Plass algorithm to evaluate line quality.
Demerits
The objective function minimized by the Knuth-Plass line-breaking algorithm. Combines badness, penalties, and incompatibility terms (consecutive hyphens, adjacent lines with very different fitness classes) into a single number per line; the algorithm minimizes the sum across the paragraph.
Catcode (category code)
A number from 0 to 15 assigned to each character, determining how TeX tokenizes it. Changing a character's catcode changes the syntax of the TeX language for that character. The ten special characters in plain TeX have non-default catcodes.
Token
The output unit of TeX's input stage: either a (character, catcode) pair or a control-sequence name. TeX's processing engine operates entirely on the token stream, never on raw characters.
Control sequence
A named TeX command beginning with \. Defined either as a primitive (built into the TeX engine) or as a macro (defined via \def or related commands). Control words absorb trailing spaces; control symbols do not.
Macro
A user-defined control sequence created with \def. When TeX encounters the macro name, it replaces it with the replacement text (after parameter substitution). The combination of macros, conditionals, and \expandafter makes TeX Turing-complete.
Mode
TeX's processing state, determining which commands are legal and how items are added to the current list. The six modes are: horizontal, restricted horizontal (inside \hbox), vertical, internal vertical (inside \vbox), display math, and math.
Output routine
A user-programmable token list (stored in \output) that receives each completed page in \box255 and is responsible for adding headers, footers, and footnotes before calling \shipout. The output routine is the programming interface to page-level formatting.
DVI (Device Independent) file
The output format of the TeX program: a binary file encoding box-and-glue structure at a resolution-independent level. DVI files are rendered to PostScript, PDF, or other formats by driver programs (dvips, dvipdfmx).
Knuth-Plass algorithm
The dynamic programming algorithm (developed by Knuth and Michael Plass) for globally optimal paragraph line-breaking. It minimizes total demerits across the entire paragraph simultaneously, achieving O(n²) worst-case complexity (near-linear in practice) versus the 2^n naive enumeration.
Plain TeX
The macro package loaded by default when running tex. Provides document-level commands (\beginsection, \headline, \footline, \bye), math macros (\eqalign, \cases), and font-loading conventions. Distinct from LaTeX.
\expandafter
A TeX primitive that reverses the normal expansion order: \expandafter\X\Y expands the token after \Y once before processing \X. It is the fundamental tool for controlling token-stream manipulation in complex macros.
Scaled point (sp)
TeX's atomic unit of measurement: 1 sp = 2^{-16} pt ≈ 0.0000054 mm. All internal TeX calculations are exact integer arithmetic in scaled points, guaranteeing that two runs of TeX on the same source always produce bit-identical DVI files.
References and Web Links
Primary book and edition information
- Knuth, Donald E. The TeXbook (Computers and Typesetting, Volume A). Addison-Wesley, 1984. 21st printing revised, 1992.
TeX source and CTAN
Background and overview
The Knuth-Plass line-breaking algorithm
- Knuth, Donald E. and Michael F. Plass. "Breaking Paragraphs into Lines." Software: Practice and Experience 11 (1981), 1119–1184.
Related TeX documentation
Additional study resources
These are secondary summaries and should be used alongside, not instead of, The TeXbook.