shipslides
Technology13 slides1 view

AI & Machine Learning

A 70-year arc that stalled twice, then accelerated past most of its critics. Below: dates, names, and the equations that built modern AI.

StandaloneDownload
Sandboxed deck
Open raw

About this HTML presentation

This Shipslides page presents AI & Machine Learning as an interactive HTML presentation deck in the Technology catalog with 13 slides. The share page keeps the uploaded deck sandboxed while exposing readable context, topics, and a slide outline for viewers and search engines.

A 70-year arc that stalled twice, then accelerated past most of its critics. Below: dates, names, and the equations that built modern AI. Key sections include: The long path from neuron to network.; 1943–1958: The neuron, formalized.; The two AI winters.; 1986: Backpropagation, popularized.; Convolutions and the GPU.; 2012: AlexNet and the spark.; 2017: Attention is all you need.; Scaling laws.; The modern LLM stack.; Multimodal & tool use..

Key sections

  • 01The long path from neuron to network.
  • 021943–1958: The neuron, formalized.
  • 03The two AI winters.
  • 041986: Backpropagation, popularized.
  • 05Convolutions and the GPU.
  • 062012: AlexNet and the spark.
  • 072017: Attention is all you need.
  • 08Scaling laws.
  • 09The modern LLM stack.
  • 10Multimodal & tool use.
  • 11Agents.
  • 12Alignment & safety.
  • 13Watch this.

Topics covered

Slide outline
  1. 01The long path from neuron to network.
  2. 021943–1958: The neuron, formalized.
  3. 03The two AI winters.
  4. 041986: Backpropagation, popularized.
  5. 05Convolutions and the GPU.
  6. 062012: AlexNet and the spark.
  7. 072017: Attention is all you need.
  8. 08Scaling laws.
  9. 09The modern LLM stack.
  10. 10Multimodal & tool use.
  11. 11Agents.
  12. 12Alignment & safety.
  13. 13Watch this.
Page data
Canonical
https://shipslides.com/d/technology-ai-and-ml
Category
Technology
Size
153.1 KB
Updated
2026-05-17
LLM text
https://shipslides.com/d/technology-ai-and-ml/llms.txt

Presentation Transcript

Detailed slide-by-slide text content extracted from this presentation.

Slide 01

The long path from neuron to network.

  • Deck 01 / Modern editorial
  • A 70-year arc that stalled twice, then accelerated past most of its critics.
  • Below: dates, names, and the equations that built modern AI.
  • Figure 1. Procedurally seeded image, picsum.photos. Decorative.
Slide 02

1943–1958: The neuron, formalized.

  • In 1943, Warren McCulloch and Walter Pitts proposed a binary threshold model of the
  • neuron — a logic gate with weighted inputs. Fifteen years later Frank Rosenblatt built the
  • Mark I Perceptron at the Cornell Aeronautical Laboratory, a 400-photocell machine that could learn
  • to distinguish marked cards. The New York Times announced an "embryo of an electronic computer
  • that the Navy expects will be able to walk, talk, see, write, reproduce itself."
  • Figure 2. The Rosenblatt perceptron: y = step(Σ wᵢxᵢ + b).
Slide 03

The two AI winters.

  • In 1969 Marvin Minsky and Seymour Papert published Perceptrons, proving the single-layer
  • model could not learn XOR. Funding dried up. A second winter followed in the late 1980s and early
  • 1990s when expert systems failed to scale economically.
  • "There is no reason to suppose that any of these virtues carry over to the many-layered version."
  • — Minsky & Papert, Perceptrons, 1969 (later revised)
Slide 04

1986: Backpropagation, popularized.

  • Rumelhart, Hinton, and Williams' Nature paper "Learning representations by back-propagating errors"
  • showed that gradient descent through a chain rule could train multi-layer networks. The math had been
  • derived by Seppo Linnainmaa in 1970 and applied to NNs by Werbos in 1974 — but the 1986 paper made it stick.
  • # a tiny pure-python sketch
  • for epoch in range(N):
  • y_hat = forward(x, W)
  • loss = mse(y_hat, y)
  • grads = backward(loss, W) # chain rule
  • W -= lr * grads # gradient descent
Slide 05

Convolutions and the GPU.

  • Yann LeCun's LeNet-5 (1998) read postal codes with convolutional layers — local receptive fields,
  • weight sharing, pooling. The technique waited for hardware: in 2009 Raina, Madhavan and Ng showed
  • GPUs could train deep networks 70× faster than CPUs.
  • Figure 3. The classical convolutional pipeline (LeNet-5 family).
Slide 06

2012: AlexNet and the spark.

  • Krizhevsky, Sutskever, and Hinton's AlexNet halved the ImageNet top-5 error rate to 15.3%.
  • Two NVIDIA GTX 580s, ReLU activations, dropout, and 60M parameters. The result was so far ahead of
  • the field that the deep-learning revolution effectively dates from this paper.
  • YearTop-5 errorModel
  • 201028.2%NEC-UIUC (SIFT + SVM)
  • 201125.8%Xerox
  • 201215.3%AlexNet
  • 20146.7%GoogLeNet
  • 20153.6%ResNet-152
Slide 07

2017: Attention is all you need.

  • Vaswani et al. dropped recurrence entirely. Self-attention computes a weighted average of values,
  • with weights from scaled dot-products of queries and keys.
  • Attention(Q, K, V) = softmax( Q · Kᵀ / √dₖ ) · V
  • Parallelizable across sequence positions, the transformer scaled to GPT-3's 175B parameters by 2020
  • and beyond. Every modern frontier model — GPT, Gemini, Claude, Llama — is a transformer or close descendant.
Slide 08

Scaling laws.

  • Kaplan et al. (2020) and Hoffmann et al. (2022, "Chinchilla") found loss falls as a power law in
  • parameters, data, and compute. The Chinchilla update: for a fixed compute budget, you want roughly
  • equal scaling of parameters and tokens (~20 tokens per parameter).
  • Figure 4. Stylized loss vs. compute (Kaplan/Hoffmann scaling).
Slide 09

The modern LLM stack.

  • Pretraining
  • Self-supervised next-token prediction on web text, code, books, and licensed corpora. Trillions of tokens.
  • SFT
  • Supervised fine-tuning on curated demonstrations. Teaches the model the desired output format and tone.
  • RLHF / RLAIF
  • Reinforcement learning from human or AI preferences. PPO, DPO, or constitutional methods.
  • Inference
  • KV-cache, speculative decoding, quantization, MoE routing. The serving layer is now a research field of its own.
Slide 10

Multimodal & tool use.

  • CLIP (2021) tied images and text into a shared embedding space. By 2024 frontier models were natively
  • multimodal: text in, text-image-audio-video out. Tool use — function calling, browsing, code execution —
  • turned chatbots into agents that can act.
  • CLIPDALL-ESoraGeminiClaude
Slide 11

Agents.

  • An agent is a model in a loop with tools and memory. The 2025–2026 wave — Claude with computer use,
  • OpenAI Operator, Devin, AutoGPT descendants — pushed reliability past the threshold for real work:
  • software engineering, research, customer support, ops.
  • while not done:
  • obs = env.observe()
  • thought, action = model(obs, history)
  • obs = env.act(action)
  • history.append((obs, action))
Slide 12

Alignment & safety.

  • The technical problem: train a system whose behavior matches human intent across distribution shift.
  • Key concepts include reward hacking, deceptive alignment, eval-gaming, and scalable oversight. The field
  • draws from RL, mechanistic interpretability, and formal verification.
  • "The genie does what you ask, not what you want."— folk maxim of the alignment community
Slide 13

Watch this.

  • Watch: transformers explained
  • Open problems
  • Sample-efficient continual learning without catastrophic forgetting.
  • Robust mechanistic interpretability of large transformers.
  • Scalable oversight of superhuman models.
  • Energy and water cost of inference at planetary scale.
Remove this deck