Good Links
Blogs, Videos, & Visualizations that I like
Machine Learning
You Could Have Invented Rope
Why positional encoding was worth looking at
The type of thinking done when tweaking a transformer
Karpathy Recipe For Training Neural Networks
A classic, with a some nice tips for getting that sweet sweet convergence and avoiding footguns Most critically: checking the data RIGHT before its input into the model
LLM inference speed of light
“Minimum amount of time it can take to run the inference process from the model configuration, the size of the KV-cache, and the available bandwidth.”
The theoretical bound for LLM inference speed is pretty straightforward to calculate, and it pretty much achieved effective bandwidth. You can use this number to estimate how close to the theoretical limit your model is running at and how much more optimization you can do.
Circuits in Neural Networks
Back when we had models that were small enough to understand
A Visual Explanation of Gradient Descent Methods (Momentum, AdaGrad, RMSProp, Adam)
Deep Learning and the Blessing of Dimensionality
“Deep learning contradicts the basic premise of conventional learning theory - don’t overfit
Convolution visualization of why everyone uses 3x3:
kernel sizes gotta be odd so there is a center pixel
Visualization of how skip connections smooth out the loss landscape
Interactive Visualization of the decision boundaries for a handful of small neural networks
ReLU is just breaking the input space into a bunch of linear regions
Is Cosine-Similarity of Embeddings Really About Similarity?
Stealing Part of a Production Language Model
reverse engineering the size of the hidden dimension of a model
Transformers
Introduces the beatiful concept of multiply a one hot vector by a matrix as a column lookup.
Initialization
Interactive diagram showing how intialization affects the foward and backwards pass
Linear Algebra
A Beginner’s Guide to Singular Value Decomposition (SVD).
Why reparameterizing using the feature dimensions with the most variance makes sense
Singular Value Decomposition as Simply as Possible
A Geometric Understanding of Matrices
Special Linear Transformations in R²
Computer Science
Myths Programmers Believe About CPU Caches
A fun read for the Leetcode generation that there is more to programming than Big O
Facts and Myths about Python Names and Values
Either this makes you hate Python a tiny bit or you were writing JavaScript before this
Other
Principles of Vasocomputation: A Unification of Buddhist Phenomenology, Active Inference, and Physical Reflex (Part I)
involuntary smooth muscle contractions are doing all sorts of calculations and state management
Dr. Luca Turin detects quantum clues to consciousness
“The only thing we know about consciousness is that it’s soluble in chloroform”
Bioelectric networks underlie the intelligence of the body
Michael Levin - how genes/proteins interact with environment bioelectrically to solve problems in anatomical space
Qualia Research Institue - Geometric Eigenmodes of Brain Activity
Using different drug phenomenologies to reason about the brain, drugs as a brain ablation test (what happens when you turn some of the neurons off) https://qri.org/blog/5meo-vs-dmt