Mathematics34 slides0 views

Probability

Probability is the mathematics of uncertainty. It quantifies how often things happen — or, on a different reading, how much we should believe they will.

Standalone Download

Shared with ShipslidesCreate your own deck →

About this HTML presentation

This Shipslides page presents Probability as an interactive HTML presentation deck in the Mathematics catalog with 34 slides. The share page keeps the uploaded deck sandboxed while exposing readable context, topics, and a slide outline for viewers and search engines.

Probability is the mathematics of uncertainty. It quantifies how often things happen — or, on a different reading, how much we should believe they will. Key sections include: Proba bility.; Opening What probability is.; Chapter I Gambling problems.; Chapter II Cardano's Liber de ludo aleae.; Chapter III The 1654 correspondence.; Chapter IV The first textbook.; Chapter V The law of large numbers.; Chapter VI The normal approximation.; Chapter VII Bayes's theorem.; Chapter VIII Laplace, the consolidator..

Key sections

01Proba bility.
02Opening What probability is.
03Chapter I Gambling problems.
04Chapter II Cardano's Liber de ludo aleae.
05Chapter III The 1654 correspondence.
06Chapter IV The first textbook.
07Chapter V The law of large numbers.
08Chapter VI The normal approximation.
09Chapter VII Bayes's theorem.
10Chapter VIII Laplace, the consolidator.
11Chapter IX Random variables.
12Chapter X The standard distributions.
13Chapter XI The central limit theorem.
14Chapter XII Markov chains.
15Chapter XIII The 1933 axioms.
16Chapter XIV Measure-theoretic probability.
17Chapter XV Conditional probability.
18Chapter XVI Two interpretations.
19Chapter XVII Stochastic processes.
20Chapter XVIII Brownian motion.
21Chapter XIX Ergodic theory.
22Chapter XX Probability in physics.
23Chapter XXI Probability in genetics.
24Chapter XXII Probability in finance.

Topics covered

mathematics probability

Related decks

Mathematics32 slides

Calculus

Slide outline

01Proba bility.
02Opening What probability is.
03Chapter I Gambling problems.
04Chapter II Cardano's Liber de ludo aleae.
05Chapter III The 1654 correspondence.
06Chapter IV The first textbook.
07Chapter V The law of large numbers.
08Chapter VI The normal approximation.
09Chapter VII Bayes's theorem.
10Chapter VIII Laplace, the consolidator.
11Chapter IX Random variables.
12Chapter X The standard distributions.
13Chapter XI The central limit theorem.
14Chapter XII Markov chains.
15Chapter XIII The 1933 axioms.
16Chapter XIV Measure-theoretic probability.
17Chapter XV Conditional probability.
18Chapter XVI Two interpretations.
19Chapter XVII Stochastic processes.
20Chapter XVIII Brownian motion.
21Chapter XIX Ergodic theory.
22Chapter XX Probability in physics.
23Chapter XXI Probability in genetics.
24Chapter XXII Probability in finance.
25Chapter XXIII Game theory.
26Chapter XXIV The Monty Hall problem.
27Chapter XXV The birthday paradox.
28Chapter XXVI Statistical paradoxes.
29Chapter XXVII Probability in machine learning.
30Chapter XXVIII Quantum probability.
31Chapter XXIX Twenty essentials.
32Chapter XXX Watch & read.
33Chapter XXXI Where to begin.
34The end of the deck.

Page data

Canonical: https://shipslides.com/d/mathematics-probability
Category: Mathematics
Size: 514.0 KB
Updated: 2026-05-17
LLM text: https://shipslides.com/d/mathematics-probability/llms.txt

Presentation Transcript

Detailed slide-by-slide text content extracted from this presentation.

Slide 01

Probability.

Vol. XIII · Deck 05 · The Deck Catalog
From Pascal-Fermat correspondence to Kolmogorov's axioms; from Bayes's posthumous theorem to the modern statistical machinery underwriting science, finance, and machine learning.
Founded1654
Axiomatised1933
Pages32

Slide 02

OpeningWhat probability is.

LedeII
Probability is the mathematics of uncertainty. It quantifies how often things happen — or, on a different reading, how much we should believe they will.
The discipline is much younger than algebra or geometry. Its founding correspondence is dated July–October 1654: a series of letters between Blaise Pascal and Pierre de Fermat about gambling problems posed by a French nobleman. Three centuries later, the framework supports the standard apparatus of empirical science, financial pricing, machine learning, and most quantitative decision-making.
This deck traces the line from the Pascal–Fermat letters to Kolmogorov's measure-theoretic axiomatisation, to modern Bayesian inference, statistical paradoxes, and the probabilistic machinery underneath everything from Brownian motion to large language models.
Vol. XIII— ii —

Slide 03

Chapter IGambling problems.

Pre-historyIII
Dice were known to ancient civilisations. Roman, Greek, Egyptian, and Indian gamblers all developed empirical intuitions about long-run frequencies. None developed a mathematical theory.
The earliest written analyses come from the late medieval period. Luca Pacioli's 1494 Summa de arithmetica posed the "problem of points": how should the stakes be divided when a multi-round game is interrupted? Pacioli proposed dividing in proportion to wins so far — a poor answer, as later mathematicians would show.
Niccolò Tartaglia (1556) and Galileo (c. 1620) made some progress. The first systematic treatment came from a gambler-physician who would not get the credit.
Probability · Pre-history— iii —

Slide 04

Chapter IICardano's Liber de ludo aleae.

CardanoIV
Gerolamo Cardano (1501–1576) — physician, mathematician, compulsive gambler — wrote the Liber de Ludo Aleae ("Book on Games of Chance") around 1564. It contains the first explicit calculation of the modern probability of a dice outcome, the first rigorous statement of the law of large numbers as an empirical observation, and an analysis of fair betting odds.
The book was not published until 1663 — eighty-seven years after Cardano's death and nine years after the Pascal–Fermat correspondence. By then probability had been founded by other hands and the priority was lost.
Cardano was the first to recognise the central principle: in a "fair" game, each elementary outcome should be assigned an equal share of probability, and the probability of an event is the sum over the outcomes that compose it. The principle is so foundational that we forget how recently it was invented.
Probability · Cardano— iv —

Slide 05

Chapter IIIThe 1654 correspondence.

Pascal & FermatV
The chevalier de Méré, a French nobleman and gambler, posed two problems to Blaise Pascal in 1654. The first: which is more likely — at least one six in four throws of one die, or at least one double-six in 24 throws of two dice? The second: the problem of points (Pacioli's, still unsolved).
Pascal wrote to Pierre de Fermat. Their correspondence (dating from July 1654) is the founding document of probability theory. Both reached the same answers by different routes — Pascal by combinatorial enumeration, Fermat by symmetry arguments.
The analysis used expected value for the first time as the deciding criterion: a fair division of stakes is one that gives each player the expected value they would accumulate by playing the game out. The concept generalises today's risk-neutral pricing in finance.
Probability · Pascal & Fermat— v —

Slide 06

Chapter IVThe first textbook.

HuygensVI
Christiaan Huygens (1629–1695) heard rumours of the Pascal–Fermat correspondence on a 1655 visit to Paris. Without seeing the original letters, he reconstructed the results and added new ones. The result was De Ratiociniis in Ludo Aleae ("On Reasoning in Games of Chance," 1657) — the first published textbook on probability.
Huygens introduced expected value formally as the mathematical certainty equivalent of an uncertain prospect. He gave the first systematic treatment of conditional probability and the duration-of-play problems that would occupy probabilists for the next century.
The book remained the standard reference for fifty years. Jakob Bernoulli annotated it heavily; the annotations grew into Ars Conjectandi.
Probability · Huygens— vi —

Slide 07

Chapter VThe law of large numbers.

LLNVII
Jakob Bernoulli's posthumous Ars Conjectandi (1713) contains the first rigorous statement and proof of the (weak) law of large numbers: as the number of independent trials grows, the observed proportion of successes converges (in probability) to the true probability of success.
The intuition is ancient — gamblers had observed it informally for centuries. The proof was the technical achievement: bounding the probability that the empirical proportion deviates from the true probability by more than a given amount.
The strong law of large numbers — almost-sure convergence rather than convergence in probability — was proved by Émile Borel (1909) and refined by Andrei Kolmogorov in his 1933 measure-theoretic foundation. The two laws together justify the entire frequentist interpretation of probability: that probabilities are long-run frequencies.
Probability · LLN— vii —

Slide 08

Chapter VIThe normal approximation.

de MoivreVIII
Abraham de Moivre (1667–1754), French Huguenot exile in London, friend of Newton and Halley. His Doctrine of Chances (1718, expanded 1738, 1756) was the standard probability textbook for a century.
De Moivre's 1733 Approximatio — privately printed and added to the second edition — proved the first version of the central limit theorem: the binomial distribution, suitably scaled, approaches the normal distribution as the number of trials grows. The result gave the famous bell curve its first explicit description.
The normal distribution had not been seen before. Gauss (1809) rediscovered it in his analysis of measurement errors and gave it his name. Pierre-Simon Laplace's 1812 Théorie analytique des probabilités generalised de Moivre's result to the central limit theorem in something close to its modern form: the suitably scaled sum of any independent random variables (with finite variance) converges to a normal distribution.
Pascal (1623–62), whose 1654 correspondence with Fermat founded probability theory
Probability · de Moivre— viii —

Slide 09

Chapter VIIBayes's theorem.

BayesIX
Thomas Bayes (c. 1701–1761), English Presbyterian minister and amateur mathematician. His "Essay Towards Solving a Problem in the Doctrine of Chances" was published posthumously in 1763 by Richard Price.
The theorem in its modern form:
P(H | E) = P(E | H) · P(H) / P(E)
The probability of hypothesis H given evidence E equals the likelihood of E under H, weighted by the prior probability of H, normalised by the total probability of E.
Bayes's theorem is the formal rule for updating beliefs in light of evidence. It is mathematically uncontroversial. What divided statisticians for two centuries was whether prior probabilities on hypotheses make sense — a debate (frequentist vs Bayesian) that was largely settled in favour of Bayesian methods only in the late 20th century, with the rise of computational tools that made priors tractable.
Probability · Bayes— ix —

Slide 10

Chapter VIIILaplace, the consolidator.

LaplaceX
Pierre-Simon Laplace (1749–1827) wrote the most influential book on classical probability. Théorie analytique des probabilités (1812) ran to 700 pages and synthesised everything from Pascal to its publication date. Laplace independently rediscovered Bayes's theorem and made it the foundation of his approach.
His popular Essai philosophique sur les probabilités (1814), the introduction to the technical treatise, contains the famous formulation of "Laplace's demon" — the deterministic conception in which an intellect that knew the position and momentum of every atom could predict the entire future of the universe.
Laplace's most-quoted line: "Probability theory is nothing but common sense reduced to calculation." It is also the source of Laplace's rule of succession — the assertion that having observed n successes out of n trials, the probability of success on the next trial is (n+1)/(n+2). The rule encodes a particular uniform prior on the underlying probability and remains a much-debated baseline.
Probability · Laplace— x —

Slide 11

Chapter IXRandom variables.

Random variablesXI
A random variable is a function from a probability space to the real numbers. The roll of a die, the number of heads in 100 coin flips, the high temperature in New York tomorrow — each is a random variable.
Two key summaries: the expected value E[X] (long-run average) and the variance Var(X) = E[(X − E[X])²] (typical squared deviation from the mean). The square root of variance — the standard deviation — is in the same units as the variable itself and is the standard scale for "typical fluctuation."
Chebyshev's inequality: for any random variable with finite variance, P(|X − E[X]| ≥ k σ) ≤ 1/k². The result is general but loose. Markov's inequality, weaker still, applies to any non-negative random variable. Together these are the workhorse concentration bounds — the more sophisticated bounds (Chernoff, Hoeffding, Azuma) refine them by exploiting more structure.
Probability · Random variables— xi —

Slide 12

Chapter XThe standard distributions.

DistributionsXII
Bernoulli (single trial)μ = p
Binomial (n trials)μ = np
Poisson (rare events)μ = σ² = λ
Geometric (first success)μ = 1/p
Normal (Gaussian)μ, σ²
Exponentialμ = 1/λ
Uniform [a,b]μ = (a+b)/2
The discrete distributions count things; the continuous distributions measure things. Each arises naturally in a particular limit.
The Poisson distribution is the limit of binomial as n → ∞ with np = λ fixed — the law of rare events. The number of typing errors per page, of phone calls per minute, of decays per second from a radioactive sample — Poisson, in close approximation. Ladislaus Bortkiewicz's 1898 study of Prussian cavalrymen kicked to death by horses (an average of 0.61 per regiment per year) is the classical empirical check.
Probability · Distributions— xii —

Slide 13

Chapter XIThe central limit theorem.

CLTXIII
The most useful theorem in applied probability. For independent identically-distributed random variables X₁, X₂, … with mean μ and finite variance σ²:
(X̄_n − μ) / (σ / √n) → N(0, 1) in distribution as n → ∞.
The theorem says that the sum (or average) of many independent random influences is approximately normally distributed, regardless of the underlying distribution of each component. This is why the bell curve is everywhere — heights, measurement errors, IQ scores, the distribution of grades on a large class — they all come from sums of many independent contributions.
de Moivre proved a special case (binomial) in 1733; Laplace generalised in 1812; Lyapunov and Lindeberg gave the modern conditions in the early 20th century. The Lindeberg–Feller condition gives essentially the most general statement.
Probability · CLT— xiii —

Slide 14

Chapter XIIMarkov chains.

MarkovXIV
A Markov chain is a stochastic process in which the future depends on the past only through the present state. Andrey Markov's 1906 study of vowels and consonants in Pushkin's Eugene Onegin introduced the framework — analysing 20,000 letters to estimate the transition probabilities.
The Perron–Frobenius theorem applied to the transition matrix gives the long-run behaviour: under irreducibility and aperiodicity, the chain converges to a unique stationary distribution from any starting state.
Markov chains are everywhere in modern computing. Markov chain Monte Carlo (MCMC) draws samples from intractable distributions by constructing a Markov chain whose stationary distribution is the target — Metropolis (1953) and Hastings (1970) introduced the standard algorithms. PageRank (Brin and Page, 1998) computes the stationary distribution of a Markov chain on the web graph. The hidden Markov model underlies speech recognition.
Probability · Markov— xiv —

Slide 15

Chapter XIIIThe 1933 axioms.

KolmogorovXV
For 280 years probability theory had operated without rigorous foundations. Andrey Kolmogorov's 1933 monograph Grundbegriffe der Wahrscheinlichkeitsrechnung ("Foundations of the Theory of Probability") fixed this in 62 pages.
The axioms: a probability space is a triple (Ω, F, P) where Ω is a sample space, F is a sigma-algebra of subsets of Ω (the events), and P : F → [0, 1] is a measure with P(Ω) = 1. Three axioms (non-negativity, normalisation, countable additivity) and probability theory becomes a special case of measure theory.
The framework absorbed continuous probability, conditional expectation, stochastic processes, and the central limit theorems into a single coherent structure. Within a generation, every research paper in probability used Kolmogorov's framework. It is one of the swiftest and most complete axiomatic conquests in the history of mathematics.
Probability · Kolmogorov— xv —

Slide 16

Chapter XIVMeasure-theoretic probability.

MeasureXVI
The mature framework. A random variable is a measurable function on a probability space; the expectation is its Lebesgue integral. Convergence concepts — almost sure, in probability, in L^p, in distribution — are all measure-theoretic.
The framework gives clean statements for what was previously informal. The Radon–Nikodym theorem formalises conditional probability density. The strong Markov property generalises Markov chains to continuous-time processes. Doob's martingale theorems (1940s) provide the convergence machinery for stochastic processes that are "fair games" given the past.
Standard graduate texts: Kolmogorov (1933, in translation), Loève (1955), Billingsley's Probability and Measure (1979), Durrett's Probability: Theory and Examples (1991, many editions). All assume measure theory; all use it relentlessly.
The normal distribution — the most-studied curve in mathematics
Probability · Measure— xvi —

Slide 17

Chapter XVConditional probability.

ConditionalXVII
The probability of A given B, written P(A | B), is the probability of A within the restricted universe in which B is known to have occurred. The formula:
P(A | B) = P(A ∩ B) / P(B)
The concept is harder than it looks. Most people intuitively confuse P(A | B) with P(B | A) — the prosecutor's fallacy. Most legal evidence and medical-test interpretation requires getting these straight.
A medical test with 99% sensitivity and 99% specificity, applied to a population with 0.1% disease prevalence, gives a positive predictive value of about 9% — most positives are false positives. The framework is correct; intuitions about it are not.
For continuous random variables, conditional probability requires more care: the event "X = x" usually has probability zero. The regular conditional probability framework, formalised by Doob and others, handles this via Radon–Nikodym derivatives.
Probability · Conditional— xvii —

Slide 18

Chapter XVITwo interpretations.

Bayes vs frequentistXVIII
What does "probability" mean? Two camps.
Frequentists: probability is the long-run relative frequency of an event in repeated trials. Hypotheses don't have probabilities; data does. The standard tools are p-values, confidence intervals, and likelihood-ratio tests. Founded by R. A. Fisher, Jerzy Neyman, and Egon Pearson in the 1920s–30s.
Bayesians: probability is a degree of belief, applicable to anything that is uncertain — including hypotheses. The standard tools are prior distributions, posterior distributions, and Bayesian credible intervals. Founded effectively by Bayes (1763), Laplace (1812), and revived by Jeffreys, de Finetti, and Savage in the 20th century.
The schism dominated 20th-century statistics. The 1990s computational revolution (MCMC, large datasets, fast computers) made Bayesian methods practical for previously intractable problems and the religious heat has cooled. Today most working statisticians use both; Bayesian inference is now the default for many machine-learning and biomedical applications.
Probability · Bayes vs frequentist— xviii —

Slide 19

Chapter XVIIStochastic processes.

Stochastic processesXIX
A stochastic process is a family of random variables indexed by time (discrete or continuous). The basic examples: random walks (discrete time), Markov chains, Brownian motion, Poisson processes.
The 20th-century explosion. Norbert Wiener's 1923 construction of Brownian motion as a measure on continuous-function space. Andrey Kolmogorov's 1931 paper on continuous-time Markov processes. Joseph Doob's 1953 Stochastic Processes codified the modern framework. Kiyoshi Itô's 1942–51 stochastic calculus made nonlinear stochastic differential equations tractable.
Today the subject is the language of high-dimensional dynamical systems with noise. Climate models, epidemiology, queueing theory, financial mathematics, neuroscience — each has stochastic processes at its core. The boundary between probability theory and applied mathematics has blurred to invisibility.
Probability · Stochastic processes— xix —

Slide 20

Chapter XVIIIBrownian motion.

BrownianXX
Robert Brown (1827) observed pollen grains in water jittering randomly. The motion was unexplained for 78 years.
Albert Einstein's 1905 paper "On the Movement of Small Particles…" derived a quantitative law for the motion: a particle's mean-squared displacement grows linearly with time, with coefficient depending on temperature and viscosity. The result implied — and confirmed — the molecular hypothesis. Marian Smoluchowski reached the same conclusion independently the same year.
Norbert Wiener's 1923 mathematical construction of Brownian motion as a measure on continuous-function space gave the rigorous framework. The Wiener process — continuous everywhere, differentiable nowhere, with stationary independent Gaussian increments — is the foundational continuous-time stochastic process. Itô calculus, the Black–Scholes formula, and most of mathematical finance live downstream.
Probability · Brownian— xx —

Slide 21

Chapter XIXErgodic theory.

ErgodicXXI
The mathematical bridge between probability theory and dynamical systems. An ergodic system is one in which time averages along a single trajectory equal space averages over the entire phase space.
George David Birkhoff's 1931 ergodic theorem: for an ergodic measure-preserving system, time averages converge almost surely to the corresponding space averages. John von Neumann's mean ergodic theorem (1931) gave an L² version.
The theorem provides the rigorous foundation for the statistical-mechanical assumption that time averages equal ensemble averages — the basis of Boltzmann's original justification of thermodynamic equilibrium. The ergodic hypothesis remains a delicate matter in physics; in pure mathematics it has flowered into a full theory of dynamical systems by Sinai, Ornstein, Margulis, Furstenberg, and many others.
Probability · Ergodic— xxi —

Slide 22

Chapter XXProbability in physics.

PhysicsXXII
Ludwig Boltzmann's statistical mechanics (1872, 1896) reconceived thermodynamics as the macroscopic average of microscopic random behaviour. Entropy S = k log W — the formula on his tombstone — connects the macroscopic state to the number of microscopic configurations consistent with it.
Quantum mechanics (1925–) makes probability irreducible. The wave function gives amplitudes whose squared magnitudes are probabilities; the measurement postulate (Born 1926) puts probability into the foundations of physics, not just into our ignorance of the microscopic detail.
The interpretation question — whether quantum probabilities reflect epistemic limits or ontic randomness — is open. Bell's theorem (1964) and the Aspect–Clauser–Zeilinger experiments (1981–2015, with the 2022 Nobel Prize) rule out broad classes of "hidden variables." The probabilities are not reducible to ignorance about deeper deterministic mechanisms — at least not in any local way.
Probability · Physics— xxii —

Slide 23

Chapter XXIProbability in genetics.

GeneticsXXIII
Gregor Mendel's 1866 paper on pea-plant inheritance used probability implicitly: 3:1 ratios of phenotype emerge from independent random assortment of allele pairs. The work was largely ignored until rediscovered around 1900.
The mathematical theory of population genetics was built by R. A. Fisher, Sewall Wright, and J. B. S. Haldane in the 1920s–30s. Fisher's theorem on the rate of natural selection; Wright's island model and the concept of effective population size; Haldane's dilemma on substitutional load — these are probability calculations applied to gene frequencies.
The Hardy–Weinberg equilibrium (independently derived by Godfrey Hardy and Wilhelm Weinberg, 1908) is the founding identity of population genetics: in a randomly mating population, allele frequencies are conserved across generations and genotype frequencies follow a binomial expansion of the allele frequencies. Modern human genetics — disease association, ancestry inference, polygenic scores — sits on top of probabilistic models that descend from this work.
Probability · Genetics— xxiii —

Slide 24

Chapter XXIIProbability in finance.

FinanceXXIV
Louis Bachelier's 1900 PhD thesis "Théorie de la spéculation" modelled stock prices as a Brownian motion — five years before Einstein's paper on Brownian motion in physics. Bachelier's work was ignored until Paul Samuelson rediscovered it in 1955.
The 1973 Black–Scholes–Merton option pricing formula derived a closed-form price for European options under geometric Brownian motion. The paper changed Wall Street: by 1980 every options trading desk used variants of the formula; the volume of options trading exploded. Myron Scholes and Robert Merton won the 1997 Nobel Prize in Economics; Black had died in 1995 and was therefore ineligible.
Modern quantitative finance is applied stochastic calculus. Stochastic volatility, jump diffusions, fractional Brownian motion, regime-switching processes — every refinement is an enrichment of the underlying probability model. The 2008 financial crisis sharpened scepticism about over-reliance on Gaussian tail-thinness and rare-event modelling.
Thomas Bayes (c. 1701–61) — the only known portrait, contested
Probability · Finance— xxiv —

Slide 25

Chapter XXIIIGame theory.

Game theoryXXV
The mathematical theory of strategic interaction. John von Neumann's 1928 Zur Theorie der Gesellschaftsspiele proved the minimax theorem for two-player zero-sum games. The 1944 von Neumann–Morgenstern Theory of Games and Economic Behavior founded the modern subject.
John Nash's 1950 thesis (28 pages, written at 22) proved the existence of equilibria in n-player non-cooperative games. The Nash equilibrium has become the central solution concept in game theory; the 1994 Nobel Prize (Nash, Harsanyi, Selten) was the formal recognition.
Probability enters in mixed strategies — randomised choices between pure actions. In rock-paper-scissors there is no pure equilibrium; the unique Nash equilibrium has each player choosing each action with probability 1/3. Probability is not optional in strategic settings; it is forced by the structure of the games.
Probability · Game theory— xxv —

Slide 26

Chapter XXIVThe Monty Hall problem.

Monty HallXXVI
You face three doors. Behind one is a car; behind the other two, goats. You pick door A. The host (who knows where the car is) opens door C, revealing a goat, and offers to let you switch to door B. Should you?
Yes. Your original choice had a 1/3 chance of being right; the host's reveal does not change this. The remaining unopened door (B) now carries the entire 2/3 probability that the car is not behind A. Switching wins twice as often.
The problem became famous when Marilyn vos Savant presented it correctly in Parade magazine (September 1990). She received roughly 10,000 letters disputing her answer, including from PhD mathematicians. Paul Erdős, told the answer, refused to believe it until shown a simulation.
The problem is the canonical example of how probability defeats intuition. Variants — Bertrand's box paradox, the three-prisoners problem — have circulated since the 19th century. The 1990 publicity made Monty Hall the famous one.
Probability · Monty Hall— xxvi —

Slide 27

Chapter XXVThe birthday paradox.

BirthdayXXVII
How many people must be in a room before the probability that two share a birthday exceeds 1/2? The answer surprises: 23.
The calculation: the probability that n people all have different birthdays is 365·364·…·(365 − n + 1) / 365^n. At n = 23 this drops below 1/2; at n = 50 it is below 3%. By n = 70, the probability of a shared birthday is over 99.9%.
The paradox illustrates the difference between "is X in the set?" (low probability for any specific X) and "is there any match?" (much higher, because the number of pairs grows as n²).
The phenomenon is exploited in cryptographic birthday attacks: to find a hash collision in a function with N possible outputs, you only need about √N trials. This is why cryptographic hashes are at least 256 bits long — to make 2¹²⁸ birthday-attack work prohibitive.
Probability · Birthday— xxvii —

Slide 28

Chapter XXVIStatistical paradoxes.

ParadoxesXXVIII
Simpson's paradox: a trend that appears in different groups of data can disappear or reverse when the groups are combined. The 1973 University of California, Berkeley, sex-discrimination case is the classic example: aggregate admissions favoured men, but department-by-department admissions favoured women — men applied more often to less competitive departments.
The base rate fallacy: ignoring prior probability. A diagnostic test that is 99% accurate, applied to a 1-in-1000 disease, gives a positive predictive value around 9%.
Berkson's paradox: when sampling from a hospital population, two diseases can appear negatively correlated even when independent in the general population.
The Saint Petersburg paradox (Daniel Bernoulli, 1738): a game with infinite expected value should command an arbitrarily large entry fee, but rational people will pay only a small amount. Bernoulli's resolution — diminishing marginal utility — was the founding insight of utility theory.
Probability · Paradoxes— xxviii —

Slide 29

Chapter XXVIIProbability in machine learning.

MLXXIX
Modern machine learning is applied probability theory. Supervised learning fits a probabilistic model P(y | x) to data; generative models fit P(x) directly. The training objective — maximum likelihood, or its Bayesian equivalent posterior maximisation — is a probability statement.
The major frameworks: graphical models (Pearl, Lauritzen, Spiegelhalter, late 1980s) factorise joint distributions over many variables; variational inference approximates intractable posteriors; diffusion models generate images by reversing a learned noise process; large language models are autoregressive probability distributions over token sequences.
The field has rediscovered Bayesian intuitions repeatedly. Dropout regularisation (Hinton et al., 2012) is approximate Bayesian model averaging; ensemble methods are mixture models; modern neural-network uncertainty quantification (deep ensembles, Laplace approximations) is explicit Bayesian inference. Probability is not optional in machine learning. It is the substrate.
Probability · ML— xxix —

Slide 30

Chapter XXVIIIQuantum probability.

QuantumXXX
Quantum probability deviates from classical in a fundamental way. Probabilities arise as squared magnitudes of complex amplitudes; amplitudes can interfere — adding constructively or destructively — in ways probabilities cannot.
The classical probability axioms allow a joint distribution over any collection of random variables; quantum mechanics does not, when those random variables correspond to non-commuting observables. The Kochen–Specker theorem (1967) gives a precise no-go statement for non-contextual hidden variables.
The mathematical formalism — quantum probability theory — replaces the classical Kolmogorov triple (Ω, F, P) with non-commutative algebra of operators on a Hilbert space, with quantum states as positive trace-1 operators, observables as self-adjoint operators, and probabilities given by Born's rule. The structure was clarified by von Neumann's 1932 Mathematical Foundations of Quantum Mechanics and remains the standard framework today.
Probability · Quantum— xxx —

Slide 31

Chapter XXIXTwenty essentials.

Reading listXXXI
1713Ars ConjectandiJ. Bernoulli
1738The Doctrine of Chances (2nd ed.)de Moivre
1763An Essay Towards Solving a Problem…Bayes (Price)
1812Théorie analytique des probabilitésLaplace
1933Foundations of the Theory of ProbabilityKolmogorov
1953Stochastic ProcessesDoob
1957An Introduction to Probability TheoryFeller
1968The Foundations of StatisticsSavage
1979Probability and MeasureBillingsley
1988Probabilistic Reasoning in Intelligent SystemsPearl
1991Probability: Theory and ExamplesDurrett
2003Probability Theory: The Logic of ScienceJaynes
2005Pattern Recognition and Machine LearningBishop
2007The Black SwanTaleb
2010Probability and Random ProcessesGrimmett & Stirzaker
2011Thinking, Fast and SlowKahneman
2013The Theory That Would Not DieMcGrayne
2014Bayesian Data Analysis (3rd)Gelman et al.
2017Statistical RethinkingMcElreath
2019The Book of WhyPearl & Mackenzie
Probability · Reading list— xxxi —

Slide 32

Chapter XXXWatch & read.

Watch & ReadXXXII
↑ 3Blue1Brown · Bayes' theorem · the geometry of changing beliefs
More on YouTube
Watch · The Monty Hall Problem · explained
Watch · The Law of Large Numbers · visualised
Probability · Watch & Read— xxxii —

Slide 33

Chapter XXXIWhere to begin.

ClosingXXXIII
William Feller's An Introduction to Probability Theory and Its Applications (volumes 1957, 1966) is unmatched as a literate, intuitive introduction. Sheldon Ross's A First Course in Probability is the standard undergraduate textbook.
For the modern measure-theoretic subject: Durrett's Probability: Theory and Examples is the standard graduate text; Billingsley's Probability and Measure is denser and more careful.
For the Bayesian view: E. T. Jaynes's Probability Theory: The Logic of Science (posthumous, 2003) is the polemical masterpiece of the late-20th-century Bayesian revival; Gelman et al.'s Bayesian Data Analysis is the practical handbook.
For the popular shelf: Sharon Bertsch McGrayne's The Theory That Would Not Die tells the history of Bayes; Daniel Kahneman's Thinking, Fast and Slow shows where intuition about probability goes wrong.
The reward of going deep: probability is the discipline most readers will actually use. It rewards the time. There is no domain of contemporary thinking — scientific, financial, computational, or ordinary — where it does not pay.
Probability · Closing— xxxiii —

Slide 34

The end of the deck.

ColophonXXXIV
Probability — Volume XIII, Deck 05 of The Deck Catalog. Set in Inter and Tiempos Text. White #fafafa; deep ink with teal and coral accents.
Thirty leaves on the mathematics of uncertainty. From a 1654 letter about gambling to the inference engine inside every modern AI — three and a half centuries on, the calculations are still common sense, reduced.
FINIS
↑ Vol. XIII · Math · Deck 05

Remove this deck