Exercise 3: Classification

Exercise 3: Classification#

In this exercise, you’ll complete the previous section by performing a linear-logistic and full neural net classification of penguin species by bill length and depth, using PyTorch.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
from torch import nn, optim

Get the data#

The code below is copied from the previous section. It loads the data and defines plotting functions to plot the data points and the 50% thresholds between categories.

penguins_df = pd.read_csv("data/penguins.csv")

categorical_int_df = penguins_df.dropna()[["bill_length_mm", "bill_depth_mm", "species"]]
categorical_int_df["species"], code_to_name = pd.factorize(categorical_int_df["species"].values)

categorical_1hot_df = pd.get_dummies(penguins_df.dropna()[["bill_length_mm", "bill_depth_mm", "species"]])

def plot_categorical_problem(ax, xlow=29, xhigh=61, ylow=12, yhigh=22):
    df_Adelie = categorical_1hot_df[categorical_1hot_df["species_Adelie"] == 1]
    df_Gentoo = categorical_1hot_df[categorical_1hot_df["species_Gentoo"] == 1]
    df_Chinstrap = categorical_1hot_df[categorical_1hot_df["species_Chinstrap"] == 1]

    ax.scatter(df_Adelie["bill_length_mm"], df_Adelie["bill_depth_mm"], color="tab:blue", label="Adelie")
    ax.scatter(df_Gentoo["bill_length_mm"], df_Gentoo["bill_depth_mm"], color="tab:orange", label="Gentoo")
    ax.scatter(df_Chinstrap["bill_length_mm"], df_Chinstrap["bill_depth_mm"], color="tab:green", label="Chinstrap")

    ax.set_xlim(xlow, xhigh)
    ax.set_ylim(ylow, yhigh)
    ax.set_xlabel("bill length (mm)")
    ax.set_ylabel("bill depth (mm)")

    ax.legend(loc="lower left", framealpha=1)

def plot_categorical_solution(ax, model, xlow=29, xhigh=61, ylow=12, yhigh=22):
    # compute the three probabilities for every 2D point in the background
    background_x, background_y = np.meshgrid(np.linspace(xlow, xhigh, 100), np.linspace(ylow, yhigh, 100))
    background_2d = np.column_stack([background_x.ravel(), background_y.ravel()])
    probabilities = model(background_2d)
    
    # draw contour lines where the probabilities cross the 50% threshold
    ax.contour(background_x, background_y, probabilities[:, 0].reshape(background_x.shape), [0.5])
    ax.contour(background_x, background_y, probabilities[:, 1].reshape(background_x.shape), [0.5])
    ax.contour(background_x, background_y, probabilities[:, 2].reshape(background_x.shape), [0.5])
fig, ax = plt.subplots()

plot_categorical_problem(ax)

plt.show()
_images/59e2c233e4481894e3899758d787d332cfd72fd2f45fc35be95ca05592912327.png

Hints for the exercise#

The linear-logistic fit has no hidden layer (no adaptive basis functions), just a linear fit that feeds into a softmax.

As a suggestion, build the model in two pieces:

model_without_softmax = nn.Sequential(
    nn.Linear(2, 3),         # 2D → 3D linear transformation
)

model_with_softmax = nn.Sequential(
    model_without_softmax,   # same 2D → 3D transformation
    nn.Softmax(dim=1),       # 3D space → 3 probabilities
)

Since nn.CrossEntropyLoss wants predictions without the softmax being applied and you’ll need to plot the result with the softmax applied.

These two models are connected:

list(model_without_softmax.parameters())
[Parameter containing:
 tensor([[-0.2326,  0.2633],
         [-0.0355,  0.4587],
         [-0.5658, -0.5759]], requires_grad=True),
 Parameter containing:
 tensor([-0.1672, -0.4778, -0.6891], requires_grad=True)]
list(model_with_softmax.parameters())
[Parameter containing:
 tensor([[-0.2326,  0.2633],
         [-0.0355,  0.4587],
         [-0.5658, -0.5759]], requires_grad=True),
 Parameter containing:
 tensor([-0.1672, -0.4778, -0.6891], requires_grad=True)]

They have the same parameter values, such that if model_without_softmax’s parameters are changed, you’ll see the same change in model_with_softmax.

I’ve found this linking of two model objects to be a useful way to keep track of whether the softmax has been applied—it’s in the name.

Also, you’ll need the data to be scaled before it reaches these parameters of order 1. You could either scale the data directly (and keep track of scaled and unscaled datasets) or make it a step in the model:

class ScaleFeatures(nn.Module):
    def __init__(self, means, stds):
        super().__init__()   # let PyTorch do its initialization first

        self.register_buffer("means", torch.tensor(means.reshape(1, 2), dtype=torch.float32))
        self.register_buffer("stds", torch.tensor(stds.reshape(1, 2), dtype=torch.float32))

    def __repr__(self):
        return f"{type(self).__name__}({self.means}, {self.stds})"

    def forward(self, x):
        return (x - self.means) / self.stds

scale_features = ScaleFeatures(
    categorical_int_df.drop(columns=["species"]).mean().values,
    categorical_int_df.drop(columns=["species"]).std().values,
)
scale_features
ScaleFeatures(tensor([[43.9928, 17.1649]]), tensor([[5.4687, 1.9692]]))
scaled_features = scale_features(
    torch.tensor(categorical_int_df.drop(columns=["species"]).values, dtype=torch.float32)
)
scaled_features[:15]
tensor([[-0.8947,  0.7796],
        [-0.8216,  0.1194],
        [-0.6753,  0.4241],
        [-1.3336,  1.0842],
        [-0.8581,  1.7444],
        [-0.9313,  0.3225],
        [-0.8764,  1.2366],
        [-0.5290,  0.2210],
        [-0.9861,  2.0491],
        [-1.7176,  1.9983],
        [-1.3518,  0.3225],
        [-0.9678,  0.9319],
        [-0.2730,  1.7952],
        [-1.7541,  0.6272],
        [ 0.3670,  2.2014]])

If your model works with scaled_features as input, it must not include scale_features as a step; if your model works with the original, unscaled features, then it must include scale_features as a step.

Finally, you’ll need to decide whether to use integer-encoded category labels (from categorical_int_df) as targets with dtype=torch.int64 or category probabilities (from categorical_1hot_df) as targets with dtype=torch.float32. nn.CrossEntropyLoss has wildly different behavior depending on the dtype of its input.

After the linear-logistic fit, add a 5-dimensional hidden layer with nn.ReLU activation functions. Remember that

relu = nn.ReLU()
relu
ReLU()

is an object that you can include in your model and

relu(scaled_features[:15])
tensor([[0.0000, 0.7796],
        [0.0000, 0.1194],
        [0.0000, 0.4241],
        [0.0000, 1.0842],
        [0.0000, 1.7444],
        [0.0000, 0.3225],
        [0.0000, 1.2366],
        [0.0000, 0.2210],
        [0.0000, 2.0491],
        [0.0000, 1.9983],
        [0.0000, 0.3225],
        [0.0000, 0.9319],
        [0.0000, 1.7952],
        [0.0000, 0.6272],
        [0.3670, 2.2014]])

is a function from tensors to tensors.

Have fun!