Segmentation backbone layer reference#

This page records how the values for eval.segmentation.layers (see Configuration) were determined for each model family, and serves as a reference when adding new models. It complements the SegmentationProbe and the segmentation-head classes (LinearHead, FPNHead, DPTHead, …).

Background#

SegmentationProbe hooks into named PyTorch modules via backbone.named_modules(). The layer names in each model config must exactly match these module names. Layers should be listed deepest first (coarsest feature map first) for the FPN head.

How Layer Names Were Discovered#

Layer names were found by:

Instantiating each timm model with pretrained=False, num_classes=0
Running a dummy 224×224 input through the model with forward hooks on candidate layers
Recording the output spatial size of each layer

Discovery script:

import timm, torch

model = timm.create_model("resnet50", pretrained=False, num_classes=0)
model.eval()
x = torch.zeros(1, 3, 224, 224)
shapes = {}
handles = []

for name in ["layer1", "layer2", "layer3", "layer4"]:
    module = dict(model.named_modules())[name]
    def hook(n):
        def fn(m, inp, out): shapes[n] = out.shape
        return fn
    handles.append(module.register_forward_hook(hook(name)))

with torch.no_grad():
    model(x)
for h in handles:
    h.remove()

print(shapes)

Layer Names by Family#

All spatial sizes are for a 224×224 input image.

ResNet (resnet18, resnet34, resnet50, resnet101)#

Layer	Spatial size	Channels (resnet50)
`layer4`	7×7	2048
`layer3`	14×14	1024
`layer2`	28×28	512
`layer1`	56×56	256

Config: layers: ["layer4", "layer3", "layer2", "layer1"]

DenseNet (densenet121, densenet161)#

Layer	Spatial size	Channels (densenet121)
`features.denseblock4`	7×7	1024
`features.denseblock3`	14×14	1024
`features.denseblock2`	28×28	512
`features.denseblock1`	56×56	256

Config: layers: ["features.denseblock4", "features.denseblock3", "features.denseblock2", "features.denseblock1"]

Note: features.transition1/2/3 downsample between denseblocks. Using the denseblocks (before downsampling) gives richer features.

VGG16#

Layer	Spatial size	Description
`features.30`	7×7	After pool5
`features.23`	14×14	After pool4
`features.16`	28×28	After pool3
`features.9`	56×56	After pool2

Config: layers: ["features.30", "features.23", "features.16", "features.9"]

VGG19#

Layer	Spatial size	Description
`features.36`	7×7	After pool5
`features.27`	14×14	After pool4
`features.18`	28×28	After pool3
`features.9`	56×56	After pool2

Config: layers: ["features.36", "features.27", "features.18", "features.9"]

Note: VGG19 has more conv layers per stage than VGG16, so the MaxPool indices differ.

EfficientNet (efficientnet_b0, b1, b2, b3)#

Layer	Spatial size	Channels (b0)
`blocks.6`	7×7	320
`blocks.5`	7×7	192
`blocks.3`	14×14	80
`blocks.1`	56×56	24

Config: layers: ["blocks.6", "blocks.5", "blocks.3", "blocks.1"]

Note: blocks.4, blocks.5, and blocks.6 all operate at 7×7 (EfficientNet uses multiple MBConv stages at the same stride). We include blocks.5 to get 4 layers, though it shares spatial size with blocks.6. If 3 distinct pyramid levels are preferred, use ["blocks.6", "blocks.3", "blocks.1"].

ConvNeXt (convnext_tiny, small, base, large, large_dinov3)#

Layer	Spatial size	Channels (tiny)
`stages.3`	7×7	768
`stages.2`	14×14	384
`stages.1`	28×28	192
`stages.0`	56×56	96

Config: layers: ["stages.3", "stages.2", "stages.1", "stages.0"]

MobileNetV3-Large (mobilenetv3_large_100)#

Layer	Spatial size	Channels
`blocks.6`	7×7	960
`blocks.5`	7×7	160
`blocks.3`	14×14	80
`blocks.1`	56×56	24

Config: layers: ["blocks.6", "blocks.5", "blocks.3", "blocks.1"]

Note: blocks.5 and blocks.6 share 7×7 (same as EfficientNet note above).

MobileNetV3-Small (mobilenetv3_small_100)#

Layer	Spatial size	Channels
`blocks.5`	7×7	576
`blocks.4`	7×7	96
`blocks.2`	14×14	40
`blocks.1`	28×28	24

Config: layers: ["blocks.5", "blocks.4", "blocks.2", "blocks.1"]

Note: MobileNetV3-Small only has 6 blocks (0–5), unlike the Large variant which has 7 (0–6). The shallowest useful stage is 28×28 (no 56×56 stage).

RegNet (regnetx_002, regnetx_008, regnety_002, regnety_008)#

Layer	Spatial size	Channels (regnetx_002)
`s4`	7×7	368
`s3`	14×14	152
`s2`	28×28	56
`s1`	56×56	24

Config: layers: ["s4", "s3", "s2", "s1"]

Adding a New Model#

Instantiate the model with timm.create_model(name, pretrained=False, num_classes=0)
Print top-level modules: [name for name, _ in model.named_children()]
Run the discovery script above to confirm spatial sizes
Add to the model config in coarse-to-fine order (deepest first)
Note any stages that share spatial size (common in EfficientNet/MobileNet)