week 02 / 12

Your first neural network

A neural network is ordinary code you can write yourself.

works through arena 0.2 · 0.2 CNNs & ResNets

A network is a function with knobs

Forget the brain metaphor for now. A neural network is a function with many tunable constants. A linear layer multiplies the input by a matrix of weights and adds a bias. A nonlinearity such as ReLU keeps stacked layers from collapsing into one big matrix multiply.

import torch

x = torch.tensor([[2.0, -1.0]])       # one example, two input features
W = torch.tensor([[0.5, 1.0], [-1.0, 0.25], [0.2, 0.2]])
b = torch.tensor([0.1, 0.0, -0.3])
logits = x @ W.T + b
print(logits)
# raw class scores, shape (1, 3)

Those raw class scores are Logits . They are not probabilities yet; they are the numbers the model gives to the loss function.

Why ReLU matters

If you stack only linear layers, the whole stack is still linear: W2 @ (W1 @ x) can be rewritten as (W2 @ W1) @ x. ReLU is just max(0, x), but that one bend makes depth useful.

import torch

hidden = torch.tensor([-2.0, 0.5, 3.0])
print(torch.relu(hidden))
# tensor([0.0000, 0.5000, 3.0000])

That clipping is also why initialization matters. If activations start enormous, values can explode; if they start tiny or mostly negative, useful signal can disappear.

`nn.Module` is packaging, not magic

PyTorch modules are classes that register parameters and define a forward method. The notebook asks you to rebuild pieces PyTorch normally gives you.

import torch

class Linear(torch.nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()
        bound = in_features ** -0.5
        self.weight = torch.nn.Parameter(torch.empty(out_features, in_features).uniform_(-bound, bound))
        self.bias = torch.nn.Parameter(torch.zeros(out_features))

    def forward(self, x):
        return x @ self.weight.T + self.bias

layer = Linear(2, 3)
print(layer(torch.randn(5, 2)).shape)
# torch.Size([5, 3])

The important contract is the shape: five examples with two features become five examples with three output scores.

The training loop in one breath

A training step is forward -> loss -> zero_grad -> backward -> step. This week you can treat backward() as a black box: it computes how each parameter should change. Week 4 opens that box.

Convolutions add one image-specific idea: a small kernel slides over an image looking for the same pattern everywhere. A Residual / skip connection adds the input back to the output, out = x + f(x), which makes deep networks easier to train and foreshadows the transformer’s residual stream.

Pair-session guide

Core work is ARENA 0.2 sections 1 and 2: implement modules, train on MNIST, and watch accuracy climb. Stretch work is the convolution and ResNet assembly sections. Rotate roles between “shape caller” and “driver”; before each module test, say the input and output shapes out loud.

What you should see

You should finish with training loss falling over a few epochs and MNIST test accuracy above roughly 95%. You should see a grid of handwritten digits with predicted labels. The milestone is simple but real: you trained a neural network from parts you understand.

this week's pair session

core

1–2 — build modules; train on MNIST

stretch

Convolutions as modules
ResNet assembly