Blogs, Videos, & Visualizations that I like

Machine Learning

You Could Have Invented Rope

Why positional encoding was worth looking at
The type of thinking done when tweaking a transformer

Karpathy Recipe For Training Neural Networks

A classic, with a some nice tips for getting that sweet sweet convergence and avoiding footguns Most critically: checking the data RIGHT before its input into the model

LLM inference speed of light

“Minimum amount of time it can take to run the inference process from the model configuration, the size of the KV-cache, and the available bandwidth.”

The theoretical bound for LLM inference speed is pretty straightforward to calculate, and it pretty much achieved effective bandwidth. You can use this number to estimate how close to the theoretical limit your model is running at and how much more optimization you can do.

Circuits in Neural Networks

Back when we had models that were small enough to understand

A Visual Explanation of Gradient Descent Methods (Momentum, AdaGrad, RMSProp, Adam)

Deep Learning and the Blessing of Dimensionality

“Deep learning contradicts the basic premise of conventional learning theory - don’t overfit

Convolution visualization of why everyone uses 3x3:

kernel sizes gotta be odd so there is a center pixel

Visualization of how skip connections smooth out the loss landscape

Interactive Visualization of the decision boundaries for a handful of small neural networks

ReLU is just breaking the input space into a bunch of linear regions

Is Cosine-Similarity of Embeddings Really About Similarity?

Stealing Part of a Production Language Model

reverse engineering the size of the hidden dimension of a model

Transformers

Introduces the beatiful concept of multiplying a one hot vector by a matrix as a column lookup.

Initialization

Interactive diagram showing how intialization affects the foward and backwards pass

Linear Algebra

A Beginner’s Guide to Singular Value Decomposition (SVD).

Why reparameterizing using the feature dimensions with the most variance makes sense

Singular Value Decomposition as Simply as Possible

A Geometric Understanding of Matrices

Special Linear Transformations in R²

Computer Science

Myths Programmers Believe About CPU Caches

A fun read for the Leetcode generation that there is more to programming than Big O

Facts and Myths about Python Names and Values

Either this makes you hate Python a tiny bit or you were writing JavaScript before this

Other

Principles of Vasocomputation: A Unification of Buddhist Phenomenology, Active Inference, and Physical Reflex (Part I)

involuntary smooth muscle contractions are doing all sorts of calculations and state management

Dr. Luca Turin detects quantum clues to consciousness

“The only thing we know about consciousness is that it’s soluble in chloroform”

Bioelectric networks underlie the intelligence of the body

Michael Levin - how genes/proteins interact with environment bioelectrically to solve problems in anatomical space

Qualia Research Institue - Geometric Eigenmodes of Brain Activity

Using different drug phenomenologies to reason about the brain, drugs as a brain ablation test (what happens when you turn some of the neurons off) https://qri.org/blog/5meo-vs-dmt