shipslides
Math23 slides0 views

Statistics

The Science of Learning from Data

StandaloneDownload
Sandboxed deck
Open raw

About this HTML presentation

This Shipslides page presents Statistics as an interactive HTML presentation deck in the Mathematics catalog with 23 slides. The share page keeps the uploaded deck sandboxed while exposing readable context, topics, and a slide outline for viewers and search engines.

The Science of Learning from Data Key sections include: Statistics; What Is Statistics ?; A Brief History; Probability : The Language of Uncertainty; Key Distributions; The Central Limit Theorem; Descriptive Measures; Sampling Methods; Hypothesis Testing; The p-Value Controversy.

Key sections

  • 01Statistics
  • 02What Is Statistics ?
  • 03A Brief History
  • 04Probability : The Language of Uncertainty
  • 05Key Distributions
  • 06The Central Limit Theorem
  • 07Descriptive Measures
  • 08Sampling Methods
  • 09Hypothesis Testing
  • 10The p-Value Controversy
  • 11Confidence Intervals
  • 12Linear Regression
  • 13Beyond Simple Regression
  • 14ANOVA : Analysis of Variance
  • 15Bayesian Statistics
  • 16Experimental Design
  • 17Non-Parametric Methods
  • 18Correlation vs. Causation
  • 19The Replication Crisis
  • 20Statistics & Machine Learning
  • 21Statistics in Practice
  • 22Common Pitfalls
  • 23The Art of Uncertainty

Topics covered

Slide outline
  1. 01Statistics
  2. 02What Is Statistics ?
  3. 03A Brief History
  4. 04Probability : The Language of Uncertainty
  5. 05Key Distributions
  6. 06The Central Limit Theorem
  7. 07Descriptive Measures
  8. 08Sampling Methods
  9. 09Hypothesis Testing
  10. 10The p-Value Controversy
  11. 11Confidence Intervals
  12. 12Linear Regression
  13. 13Beyond Simple Regression
  14. 14ANOVA : Analysis of Variance
  15. 15Bayesian Statistics
  16. 16Experimental Design
  17. 17Non-Parametric Methods
  18. 18Correlation vs. Causation
  19. 19The Replication Crisis
  20. 20Statistics & Machine Learning
  21. 21Statistics in Practice
  22. 22Common Pitfalls
  23. 23The Art of Uncertainty
Page data
Canonical
https://shipslides.com/d/mathematics-statistics
Category
Mathematics
Size
40.4 KB
Updated
2026-05-17
LLM text
https://shipslides.com/d/mathematics-statistics/llms.txt

Presentation Transcript

Detailed slide-by-slide text content extracted from this presentation.

Slide 01

Statistics

  • 𝓢
  • The Science of Learning from Data
  • From probability theory to machine learning -- how we quantify uncertainty and extract meaning from noise
  • Probability
  • Inference
  • Regression
  • Bayesian
  • Hypothesis Testing
  • 1 / 23
Slide 02

What Is Statistics?

  • Statistics is the mathematical discipline concerned with collecting, organizing, analyzing, interpreting, and presenting data. It provides the tools to make decisions under uncertainty -- the universal condition of the real world.
  • Descriptive Statistics
  • Summarizes data that has already been collected. Tools include means, medians, standard deviations, histograms, and box plots. The goal: condense raw data into understandable patterns.
  • Inferential Statistics
  • Draws conclusions about a population based on a sample. Tools include confidence intervals, hypothesis tests, and regression models. The goal: generalize from the observed to the unobserved.
  • "All models are wrong, but some are useful." -- George E.P. Box, 1976
  • 2 / 23
Slide 03

A Brief History

  • 1654
  • Pascal and Fermat exchange letters on the "Problem of Points," founding probability theory.
  • 1713
  • Jacob Bernoulli's Ars Conjectandi proves the Law of Large Numbers.
  • 1763
  • Thomas Bayes's theorem published posthumously -- the foundation of Bayesian statistics.
  • 1809
  • Gauss publishes the method of least squares and the normal distribution theory.
  • 1900-1930
  • Karl Pearson, R.A. Fisher, Jerzy Neyman, and Egon Pearson create modern frequentist statistics: chi-squared test, ANOVA, maximum likelihood, p-values, confidence intervals.
  • 1950s-present
  • Computational revolution: bootstrap, MCMC, machine learning, and causal inference transform the field.
  • 3 / 23
Slide 04

Probability: The Language of Uncertainty

  • Probability is the mathematical foundation of statistics. It assigns a number between 0 and 1 to every possible event, where 0 means impossible and 1 means certain.
  • Three Interpretations
  • Classical
  • Probability = favorable outcomes / total outcomes. Works when outcomes are equally likely (dice, cards). Laplace's definition.
  • Frequentist
  • Probability = long-run relative frequency. Flip a coin 10,000 times; heads appears ~50%. Von Mises formalized this view.
  • Bayesian
  • Probability = degree of belief. Updated as evidence accumulates via Bayes' theorem. Subjective but rigorous. De Finetti championed this.
  • Kolmogorov's Axioms (1933)
  • P(A) ≥ 0  |  P(Ω) = 1  |  P(A ∪ B) = P(A) + P(B) if A ∩ B = ∅
  • 4 / 23
Slide 05

Key Distributions

  • A probability distribution describes how the values of a random variable are spread. Choosing the right distribution is half the battle in applied statistics.
  • Discrete
  • Bernoulli: Single yes/no trial (coin flip)
  • Binomial: Number of successes in n trials
  • Poisson: Count of rare events (calls/hour, mutations/gene)
  • Geometric: Trials until first success
  • Continuous
  • Normal (Gaussian): Bell curve; heights, errors, IQ scores
  • Exponential: Time between Poisson events
  • Uniform: Equal probability across an interval
  • Student's t: Normal-like but heavier tails; small samples
  • Normal Distribution PDF
  • f(x) = (1 / σ√(2π)) · e-(x-μ)² / 2σ²
  • The normal distribution appears everywhere because of the Central Limit Theorem: the sum of many independent random variables tends toward normality regardless of their individual distributions.
  • 5 / 23
Slide 06

The Central Limit Theorem

  • Perhaps the most important theorem in all of statistics, the CLT explains why the normal distribution is so ubiquitous and why sampling works.
  • Statement
  • If X1, X2, ..., Xn are independent and identically distributed random variables with finite mean μ and variance σ², then as n → ∞, the distribution of the sample mean approaches a normal distribution N(μ, σ²/n).
  • Why It Matters
  • Sampling: You can estimate a population mean with a known margin of error
  • Polling: A random sample of ~1,000 can represent 330 million Americans with ~3% margin of error
  • Quality control: Manufacturing processes use CLT-based control charts to detect defects
  • Finance: Portfolio returns approximate normality when composed of many independent assets (though fat tails remain a trap)
  • Standardized Sample Mean
  • Z = (&Xbar; - μ) / (σ / √n) → N(0, 1)
  • 6 / 23
Slide 07

Descriptive Measures

  • Measures of Center
  • Mean: Arithmetic average. Sensitive to outliers. Sum/n.
  • Median: Middle value when sorted. Robust to outliers. Preferred for income, home prices.
  • Mode: Most frequent value. Useful for categorical data.
  • Trimmed mean: Drops top/bottom k% before averaging. Compromise between mean and median.
  • Measures of Spread
  • Range: Max minus min. Crude but intuitive.
  • Variance (s²): Average squared deviation from the mean.
  • Standard deviation (s): Square root of variance. Same units as data.
  • IQR: Q3 minus Q1. Used in box plots. Robust to outliers.
  • Shape
  • Skewness: Measures asymmetry. Positive = right tail, negative = left tail. Income is right-skewed; test scores often left-skewed.
  • Kurtosis: Measures tail heaviness. Normal = 3 (or 0 if excess kurtosis). Financial returns are leptokurtic ("fat-tailed").
  • 7 / 23
Slide 08

Sampling Methods

  • The foundation of inferential statistics is the idea that a well-chosen sample can represent a much larger population. But how the sample is chosen matters enormously.
  • Probability Sampling
  • Simple random: Every member has equal chance (lottery, random number generator)
  • Stratified: Divide population into strata (age, region), sample from each proportionally
  • Cluster: Randomly select groups (schools, city blocks), sample everyone within
  • Systematic: Select every k-th member from a list
  • Non-Probability Sampling
  • Convenience: Whoever is available (college students, online volunteers)
  • Snowball: Participants recruit others (hard-to-reach populations)
  • Quota: Non-random selection meeting demographic targets
  • Purposive: Expert judgment selects "representative" cases
  • "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a body of data." -- John Tukey, 1986
  • 8 / 23
Slide 09

Hypothesis Testing

  • The most widely used (and misunderstood) framework in applied statistics. Developed by Fisher, Neyman, and E. Pearson in the 1920s-30s.
  • The Steps
  • State the null hypothesis (H0): no effect, no difference
  • State the alternative hypothesis (H1): an effect exists
  • Choose a significance level (α), typically 0.05
  • Compute a test statistic (z, t, F, χ²) from your data
  • Find the p-value: the probability of seeing data this extreme if H0 is true
  • If p ≤ α, reject H0; otherwise, fail to reject
  • Type I Error (α)
  • Rejecting H0 when it is true. A "false positive." Controlled by choosing α.
  • Type II Error (β)
  • Failing to reject H0 when it is false. A "false negative." Power = 1 - β.
  • 9 / 23
Slide 10

The p-Value Controversy

  • The p-value has been called "the most used and abused statistical concept in science." In 2016, the American Statistical Association took the unprecedented step of issuing a formal statement on its interpretation.
  • What a p-Value IS
  • The probability of observing data at least as extreme as what was collected, assuming the null hypothesis is true. It measures the compatibility of the data with H0.
  • What a p-Value is NOT
  • The probability that H0 is true (that requires Bayes' theorem)
  • The probability that the result occurred by chance
  • A measure of effect size or practical significance
  • A binary "significant/not significant" verdict -- the 0.05 threshold is arbitrary
  • "The earth is round (p < .05)." -- Jacob Cohen, 1994 -- satirizing the ritual of null hypothesis testing
  • In 2019, over 800 scientists signed a letter in Nature calling for the retirement of "statistical significance" as a concept. The debate continues.
  • 10 / 23
Slide 11

Confidence Intervals

  • A confidence interval provides a range of plausible values for a population parameter, offering more information than a simple point estimate or p-value.
  • 95% CI for a Mean
  • x&#772; &plusmn; z0.025 &middot; (s / &radic;n) = x&#772; &plusmn; 1.96 &middot; (s / &radic;n)
  • Correct Interpretation
  • If we repeated the experiment many times, 95% of the resulting intervals would contain the true population parameter. Any single interval either contains the parameter or it does not -- we just don't know which.
  • Width Depends On
  • Sample size (n): Larger n &rarr; narrower interval
  • Variability (s): More variation &rarr; wider interval
  • Confidence level: Higher confidence (99% vs. 95%) &rarr; wider interval
  • 90%z = 1.645
  • 95%z = 1.960
  • 99%z = 2.576
  • 11 / 23
Slide 12

Linear Regression

  • Regression models the relationship between a dependent variable (Y) and one or more independent variables (X). It is the workhorse of applied statistics.
  • Simple Linear Regression
  • Y = &beta;0 + &beta;1X + &epsilon; &emsp; where &epsilon; ~ N(0, &sigma;&sup2;)
  • Key Concepts
  • &beta;0 (intercept): Expected value of Y when X = 0
  • &beta;1 (slope): Change in Y for a one-unit increase in X
  • R&sup2;: Proportion of variance in Y explained by the model (0 to 1)
  • Residuals: Observed minus predicted values. Should be randomly scattered.
  • Least squares: Choose &beta;0, &beta;1 to minimize the sum of squared residuals
  • Assumptions (LINE)
  • Linearity, Independence of errors, Normality of residuals, Equal variance (homoscedasticity). Violating these can invalidate inference.
  • 12 / 23
Slide 13

Beyond Simple Regression

  • Multiple Regression
  • Y = &beta;0 + &beta;1X1 + &beta;2X2 + ... + &epsilon;
  • Controls for confounders by including multiple predictors. Adjusted R&sup2; penalizes adding unnecessary variables.
  • Logistic Regression
  • For binary outcomes (yes/no, survived/died). Models the log-odds as a linear function of predictors.
  • log(p/(1-p)) = &beta;0 + &beta;1X
  • Polynomial & Spline
  • Captures nonlinear relationships by adding X&sup2;, X&sup3; terms or using piecewise smooth functions (splines). Flexible but risks overfitting.
  • Regularization
  • Ridge (L2): Shrinks coefficients. Lasso (L1): Shrinks and sets some to zero (variable selection). Elastic Net: Combines both.
  • 13 / 23
Slide 14

ANOVA: Analysis of Variance

  • ANOVA tests whether the means of three or more groups differ significantly. Developed by R.A. Fisher in the 1920s for agricultural experiments.
  • The Logic
  • ANOVA partitions total variability into two components:
  • Between-group variance: How much group means differ from the overall mean
  • Within-group variance: How much individual observations vary within each group
  • F-statistic
  • F = (Between-group variance) / (Within-group variance) = MSbetween / MSwithin
  • A large F means the groups differ more than expected by chance. If significant, post-hoc tests (Tukey's HSD, Bonferroni) identify which specific groups differ.
  • Variants
  • One-way ANOVA: One factor (e.g., drug A vs. B vs. C)
  • Two-way ANOVA: Two factors and their interaction (drug x dosage)
  • Repeated measures ANOVA: Same subjects measured multiple times
  • MANOVA: Multiple dependent variables simultaneously
  • 14 / 23
Slide 15

Bayesian Statistics

  • The Bayesian approach treats probability as a measure of belief, updated by evidence. Once controversial, it has become mainstream thanks to computational advances like Markov Chain Monte Carlo (MCMC).
  • Bayes' Theorem
  • P(H | D) = P(D | H) &middot; P(H) / P(D)
  • Components
  • Prior P(H): What you believed before seeing data
  • Likelihood P(D|H): How probable the data is under H
  • Posterior P(H|D): Updated belief after seeing data
  • Evidence P(D): Normalizing constant
  • Advantages
  • Directly answers "What is the probability that H is true?"
  • Naturally incorporates prior knowledge
  • Handles small samples gracefully
  • Produces full posterior distributions, not just point estimates
  • Key tools: Stan, PyMC, BUGS/JAGS for computational Bayesian modeling. Used extensively in clinical trials, A/B testing, astrophysics, and natural language processing.
  • 15 / 23
Slide 16

Experimental Design

  • Good statistics begins before data collection. The design of an experiment determines what conclusions can be drawn.
  • Fisher's Principles (1935)
  • Randomization: Randomly assign subjects to treatment/control to eliminate confounders
  • Replication: Multiple independent observations to estimate variability
  • Blocking: Group similar subjects together to reduce noise
  • Modern Designs
  • Randomized Controlled Trial
  • The gold standard in medicine. Random assignment + blinding (single or double) + placebo control. Phase III drug trials typically enroll hundreds to thousands.
  • A/B Testing
  • The tech industry's version of the RCT. Users are randomly assigned to variant A or B. Netflix, Google, and Amazon run thousands of A/B tests per year.
  • Observational studies (where randomization is impossible) use techniques like matching, instrumental variables, and regression discontinuity to approximate causal inference.
  • 16 / 23
Slide 17

Non-Parametric Methods

  • When data violates the assumptions of parametric tests (normality, equal variance), non-parametric methods provide distribution-free alternatives.
  • Parametric &rarr; Non-Parametric
  • t-test &rarr; Mann-Whitney U (Wilcoxon rank-sum)
  • Paired t-test &rarr; Wilcoxon signed-rank
  • One-way ANOVA &rarr; Kruskal-Wallis
  • Pearson correlation &rarr; Spearman rank correlation
  • How They Work
  • Instead of using raw data values, non-parametric tests rank the observations and analyze the ranks. This makes them robust to outliers and skewed distributions.
  • Trade-off: they have less statistical power than parametric tests when parametric assumptions are met.
  • The Bootstrap (Efron, 1979)
  • Resample your data with replacement thousands of times to estimate the sampling distribution of any statistic -- mean, median, regression coefficients, anything. No distributional assumptions needed. One of the most influential statistical ideas of the 20th century.
  • 17 / 23
Slide 18

Correlation vs. Causation

  • The most important lesson in statistics: correlation does not imply causation. Two variables can be correlated because of a common cause, reverse causation, or pure coincidence.
  • Pearson Correlation Coefficient
  • r = &Sigma;(xi - x&#772;)(yi - y&#772;) / &radic;[&Sigma;(xi - x&#772;)&sup2; &middot; &Sigma;(yi - y&#772;)&sup2;] &emsp; (-1 &le; r &le; 1)
  • Famous Spurious Correlations
  • Per capita cheese consumption correlates with death by bedsheet entanglement (r = 0.95)
  • Nicholas Cage films correlate with swimming pool drownings (r = 0.67)
  • Organic food sales correlate with autism diagnoses (both rising over time)
  • Causal Inference Tools
  • Randomized experiments: The gold standard for establishing causation
  • Instrumental variables: Find a "natural experiment" (Angrist & Imbens, Nobel 2021)
  • Difference-in-differences: Compare before/after changes in treatment vs. control groups
  • Directed acyclic graphs (DAGs): Judea Pearl's framework for causal reasoning
  • 18 / 23
Slide 19

The Replication Crisis

  • Beginning around 2011, science confronted a disturbing reality: many published findings could not be reproduced.
  • 36%Psychology studies replicated (Open Science Collaboration, 2015)
  • 11%Cancer biology studies fully replicated (eLife, 2021)
  • 70%Researchers who tried and failed to reproduce others' work (Nature survey, 2016)
  • Root Causes
  • P-hacking: Running many analyses and reporting only significant ones
  • HARKing: Hypothesizing After Results are Known
  • Publication bias: Journals prefer positive results; null findings go unpublished ("file drawer problem")
  • Small samples: Underpowered studies produce unreliable effect estimates
  • Reforms
  • Pre-registration of studies, registered reports, open data mandates, and the push for larger sample sizes and effect-size reporting are reshaping how science uses statistics.
  • 19 / 23
Slide 20

Statistics & Machine Learning

  • Machine learning grew out of statistics and computer science. The boundaries are blurry: many ML methods are rebranded statistical techniques.
  • Statistical Thinking
  • Emphasizes interpretability and inference
  • Models are hypothesis-driven
  • Focus on understanding why
  • Smaller, carefully designed datasets
  • Tools: R, SAS, Stata
  • ML Thinking
  • Emphasizes prediction accuracy
  • Models are data-driven
  • Focus on predicting what
  • Large-scale, messy datasets
  • Tools: Python, TensorFlow, PyTorch
  • Shared Foundations
  • Cross-validation, regularization, the bias-variance trade-off, maximum likelihood estimation, and information criteria (AIC, BIC) are core to both fields. Random forests are extensions of CART (Classification and Regression Trees), which is itself a statistical method. Neural networks are nonlinear regression models with many parameters.
  • 20 / 23
Slide 21

Statistics in Practice

  • Medicine
  • Clinical trials, survival analysis (Kaplan-Meier curves), meta-analysis. The FDA requires statistical evidence (usually p < 0.05 in two independent trials) before approving a drug.
  • Economics
  • Econometrics applies regression and time series to economic data. Instrumental variables, panel data models, and difference-in-differences are standard tools. Three Nobel Prizes in econometrics (2000, 2003, 2021).
  • Sports
  • "Moneyball" (2003) popularized sabermetrics. Expected goals (xG) in soccer, win probability models in baseball, and player tracking data have transformed every major sport.
  • Tech / Industry
  • A/B testing, recommendation engines, fraud detection, natural language processing, and quality control (Six Sigma). Google reportedly runs over 10,000 A/B tests per year on Search alone.
  • 21 / 23
Slide 22

Common Pitfalls

  • Even professionals fall into these traps. Statistical literacy means knowing what can go wrong.
  • Simpson's Paradox: A trend that appears in several groups reverses when the groups are combined. UC Berkeley gender bias case (1973): overall admission rates favored men, but department-by-department, women were admitted at higher rates.
  • Survivorship Bias: Analyzing only the successes. WWII planes: engineers reinforced where returning planes had bullet holes, until Abraham Wald pointed out they should reinforce where the holes were not -- those planes didn't return.
  • Base Rate Neglect: Ignoring the prevalence of a condition. A 99%-accurate test for a 1-in-10,000 disease will mostly produce false positives.
  • Ecological Fallacy: Inferring individual behavior from group-level data. States with high ice cream sales have high crime rates -- but individuals eating ice cream aren't committing crimes.
  • Overfitting: A model that fits the training data perfectly but predicts poorly on new data. Adding enough parameters can fit any dataset -- including noise.
  • Multiple Comparisons: Testing 20 hypotheses at &alpha; = 0.05 expects one false positive by chance. Bonferroni correction: divide &alpha; by the number of tests.
  • 22 / 23
Slide 23

The Art of Uncertainty

  • Statistics is not about eliminating uncertainty -- it is about quantifying it honestly. In a world drowning in data, statistical thinking is no longer optional.
  • 400+Years of theory
  • 2.5 QBBytes of data created daily
  • &infin;Decisions to be made
  • "To call in the statistician after the experiment is done may be no more than asking him to perform a postmortem examination: he may be able to say what the experiment died of."
  • -- R.A. Fisher, 1938
  • 23 / 23
Remove this deck