week 02 / 12
Your first neural network
A neural network is ordinary code you can write yourself.
works through arena 0.2 · 0.2 CNNs & ResNets
A network is a function with knobs
Forget the brain metaphor for now. A neural network is a function with many tunable constants. A linear layer multiplies the input by a matrix of weights and adds a bias. A nonlinearity such as ReLU keeps stacked layers from collapsing into one big matrix multiply.
import torch
x = torch.tensor([[2.0, -1.0]]) # one example, two input features
W = torch.tensor([[0.5, 1.0], [-1.0, 0.25], [0.2, 0.2]])
b = torch.tensor([0.1, 0.0, -0.3])
logits = x @ W.T + b
print(logits)
# raw class scores, shape (1, 3)
Those raw class scores are Logits . They are not probabilities yet; they are the numbers the model gives to the loss function.
Why ReLU matters
If you stack only linear layers, the whole stack is still linear: W2 @ (W1 @ x) can be rewritten as (W2 @ W1) @ x. ReLU is just max(0, x), but that one bend makes depth useful.
import torch
hidden = torch.tensor([-2.0, 0.5, 3.0])
print(torch.relu(hidden))
# tensor([0.0000, 0.5000, 3.0000])
That clipping is also why initialization matters. If activations start enormous, values can explode; if they start tiny or mostly negative, useful signal can disappear.
nn.Module is packaging, not magic
PyTorch modules are classes that register parameters and define a forward method. The notebook asks you to rebuild pieces PyTorch normally gives you.
import torch
class Linear(torch.nn.Module):
def __init__(self, in_features, out_features):
super().__init__()
bound = in_features ** -0.5
self.weight = torch.nn.Parameter(torch.empty(out_features, in_features).uniform_(-bound, bound))
self.bias = torch.nn.Parameter(torch.zeros(out_features))
def forward(self, x):
return x @ self.weight.T + self.bias
layer = Linear(2, 3)
print(layer(torch.randn(5, 2)).shape)
# torch.Size([5, 3])
The important contract is the shape: five examples with two features become five examples with three output scores.
The training loop in one breath
A training step is forward -> loss -> zero_grad -> backward -> step. This week you can treat backward() as a black box: it computes how each parameter should change. Week 4 opens that box.
Convolutions add one image-specific idea: a small kernel slides over an image looking for the same pattern everywhere. A Residual / skip connection adds the input back to the output, out = x + f(x), which makes deep networks easier to train and foreshadows the transformer’s residual stream.
Pair-session guide
Core work is ARENA 0.2 sections 1 and 2: implement modules, train on MNIST, and watch accuracy climb. Stretch work is the convolution and ResNet assembly sections. Rotate roles between “shape caller” and “driver”; before each module test, say the input and output shapes out loud.
What you should see
You should finish with training loss falling over a few epochs and MNIST test accuracy above roughly 95%. You should see a grid of handwritten digits with predicted labels. The milestone is simple but real: you trained a neural network from parts you understand.
this week's pair session
core
- 1–2 — build modules; train on MNIST
stretch
- Convolutions as modules
- ResNet assembly