Segmentation backbone layer reference#
This page records how the values for eval.segmentation.layers (see
Configuration) were determined for each model
family, and serves as a reference when adding new models. It complements
the SegmentationProbe and the segmentation-head
classes (LinearHead, FPNHead, DPTHead, …).
Background#
SegmentationProbe hooks into named PyTorch modules via backbone.named_modules(). The layer names in each model config must exactly match these module names. Layers should be listed deepest first (coarsest feature map first) for the FPN head.
How Layer Names Were Discovered#
Layer names were found by:
Instantiating each timm model with
pretrained=False, num_classes=0Running a dummy 224×224 input through the model with forward hooks on candidate layers
Recording the output spatial size of each layer
Discovery script:
import timm, torch
model = timm.create_model("resnet50", pretrained=False, num_classes=0)
model.eval()
x = torch.zeros(1, 3, 224, 224)
shapes = {}
handles = []
for name in ["layer1", "layer2", "layer3", "layer4"]:
module = dict(model.named_modules())[name]
def hook(n):
def fn(m, inp, out): shapes[n] = out.shape
return fn
handles.append(module.register_forward_hook(hook(name)))
with torch.no_grad():
model(x)
for h in handles:
h.remove()
print(shapes)
Layer Names by Family#
All spatial sizes are for a 224×224 input image.
ResNet (resnet18, resnet34, resnet50, resnet101)#
Layer |
Spatial size |
Channels (resnet50) |
|---|---|---|
|
7×7 |
2048 |
|
14×14 |
1024 |
|
28×28 |
512 |
|
56×56 |
256 |
Config: layers: ["layer4", "layer3", "layer2", "layer1"]
DenseNet (densenet121, densenet161)#
Layer |
Spatial size |
Channels (densenet121) |
|---|---|---|
|
7×7 |
1024 |
|
14×14 |
1024 |
|
28×28 |
512 |
|
56×56 |
256 |
Config: layers: ["features.denseblock4", "features.denseblock3", "features.denseblock2", "features.denseblock1"]
Note: features.transition1/2/3 downsample between denseblocks. Using the denseblocks (before downsampling) gives richer features.
VGG16#
Layer |
Spatial size |
Description |
|---|---|---|
|
7×7 |
After pool5 |
|
14×14 |
After pool4 |
|
28×28 |
After pool3 |
|
56×56 |
After pool2 |
Config: layers: ["features.30", "features.23", "features.16", "features.9"]
VGG19#
Layer |
Spatial size |
Description |
|---|---|---|
|
7×7 |
After pool5 |
|
14×14 |
After pool4 |
|
28×28 |
After pool3 |
|
56×56 |
After pool2 |
Config: layers: ["features.36", "features.27", "features.18", "features.9"]
Note: VGG19 has more conv layers per stage than VGG16, so the MaxPool indices differ.
EfficientNet (efficientnet_b0, b1, b2, b3)#
Layer |
Spatial size |
Channels (b0) |
|---|---|---|
|
7×7 |
320 |
|
7×7 |
192 |
|
14×14 |
80 |
|
56×56 |
24 |
Config: layers: ["blocks.6", "blocks.5", "blocks.3", "blocks.1"]
Note: blocks.4, blocks.5, and blocks.6 all operate at 7×7 (EfficientNet uses multiple MBConv stages at the same stride). We include blocks.5 to get 4 layers, though it shares spatial size with blocks.6. If 3 distinct pyramid levels are preferred, use ["blocks.6", "blocks.3", "blocks.1"].
ConvNeXt (convnext_tiny, small, base, large, large_dinov3)#
Layer |
Spatial size |
Channels (tiny) |
|---|---|---|
|
7×7 |
768 |
|
14×14 |
384 |
|
28×28 |
192 |
|
56×56 |
96 |
Config: layers: ["stages.3", "stages.2", "stages.1", "stages.0"]
MobileNetV3-Large (mobilenetv3_large_100)#
Layer |
Spatial size |
Channels |
|---|---|---|
|
7×7 |
960 |
|
7×7 |
160 |
|
14×14 |
80 |
|
56×56 |
24 |
Config: layers: ["blocks.6", "blocks.5", "blocks.3", "blocks.1"]
Note: blocks.5 and blocks.6 share 7×7 (same as EfficientNet note above).
MobileNetV3-Small (mobilenetv3_small_100)#
Layer |
Spatial size |
Channels |
|---|---|---|
|
7×7 |
576 |
|
7×7 |
96 |
|
14×14 |
40 |
|
28×28 |
24 |
Config: layers: ["blocks.5", "blocks.4", "blocks.2", "blocks.1"]
Note: MobileNetV3-Small only has 6 blocks (0–5), unlike the Large variant which has 7 (0–6). The shallowest useful stage is 28×28 (no 56×56 stage).
RegNet (regnetx_002, regnetx_008, regnety_002, regnety_008)#
Layer |
Spatial size |
Channels (regnetx_002) |
|---|---|---|
|
7×7 |
368 |
|
14×14 |
152 |
|
28×28 |
56 |
|
56×56 |
24 |
Config: layers: ["s4", "s3", "s2", "s1"]
Adding a New Model#
Instantiate the model with
timm.create_model(name, pretrained=False, num_classes=0)Print top-level modules:
[name for name, _ in model.named_children()]Run the discovery script above to confirm spatial sizes
Add to the model config in coarse-to-fine order (deepest first)
Note any stages that share spatial size (common in EfficientNet/MobileNet)