Segmentation backbone layer reference#

This page records how the values for eval.segmentation.layers (see Configuration) were determined for each model family, and serves as a reference when adding new models. It complements the SegmentationProbe and the segmentation-head classes (LinearHead, FPNHead, DPTHead, …).

Background#

SegmentationProbe hooks into named PyTorch modules via backbone.named_modules(). The layer names in each model config must exactly match these module names. Layers should be listed deepest first (coarsest feature map first) for the FPN head.

How Layer Names Were Discovered#

Layer names were found by:

  1. Instantiating each timm model with pretrained=False, num_classes=0

  2. Running a dummy 224×224 input through the model with forward hooks on candidate layers

  3. Recording the output spatial size of each layer

Discovery script:

import timm, torch

model = timm.create_model("resnet50", pretrained=False, num_classes=0)
model.eval()
x = torch.zeros(1, 3, 224, 224)
shapes = {}
handles = []

for name in ["layer1", "layer2", "layer3", "layer4"]:
    module = dict(model.named_modules())[name]
    def hook(n):
        def fn(m, inp, out): shapes[n] = out.shape
        return fn
    handles.append(module.register_forward_hook(hook(name)))

with torch.no_grad():
    model(x)
for h in handles:
    h.remove()

print(shapes)

Layer Names by Family#

All spatial sizes are for a 224×224 input image.

ResNet (resnet18, resnet34, resnet50, resnet101)#

Layer

Spatial size

Channels (resnet50)

layer4

7×7

2048

layer3

14×14

1024

layer2

28×28

512

layer1

56×56

256

Config: layers: ["layer4", "layer3", "layer2", "layer1"]

DenseNet (densenet121, densenet161)#

Layer

Spatial size

Channels (densenet121)

features.denseblock4

7×7

1024

features.denseblock3

14×14

1024

features.denseblock2

28×28

512

features.denseblock1

56×56

256

Config: layers: ["features.denseblock4", "features.denseblock3", "features.denseblock2", "features.denseblock1"]

Note: features.transition1/2/3 downsample between denseblocks. Using the denseblocks (before downsampling) gives richer features.

VGG16#

Layer

Spatial size

Description

features.30

7×7

After pool5

features.23

14×14

After pool4

features.16

28×28

After pool3

features.9

56×56

After pool2

Config: layers: ["features.30", "features.23", "features.16", "features.9"]

VGG19#

Layer

Spatial size

Description

features.36

7×7

After pool5

features.27

14×14

After pool4

features.18

28×28

After pool3

features.9

56×56

After pool2

Config: layers: ["features.36", "features.27", "features.18", "features.9"]

Note: VGG19 has more conv layers per stage than VGG16, so the MaxPool indices differ.

EfficientNet (efficientnet_b0, b1, b2, b3)#

Layer

Spatial size

Channels (b0)

blocks.6

7×7

320

blocks.5

7×7

192

blocks.3

14×14

80

blocks.1

56×56

24

Config: layers: ["blocks.6", "blocks.5", "blocks.3", "blocks.1"]

Note: blocks.4, blocks.5, and blocks.6 all operate at 7×7 (EfficientNet uses multiple MBConv stages at the same stride). We include blocks.5 to get 4 layers, though it shares spatial size with blocks.6. If 3 distinct pyramid levels are preferred, use ["blocks.6", "blocks.3", "blocks.1"].

ConvNeXt (convnext_tiny, small, base, large, large_dinov3)#

Layer

Spatial size

Channels (tiny)

stages.3

7×7

768

stages.2

14×14

384

stages.1

28×28

192

stages.0

56×56

96

Config: layers: ["stages.3", "stages.2", "stages.1", "stages.0"]

MobileNetV3-Large (mobilenetv3_large_100)#

Layer

Spatial size

Channels

blocks.6

7×7

960

blocks.5

7×7

160

blocks.3

14×14

80

blocks.1

56×56

24

Config: layers: ["blocks.6", "blocks.5", "blocks.3", "blocks.1"]

Note: blocks.5 and blocks.6 share 7×7 (same as EfficientNet note above).

MobileNetV3-Small (mobilenetv3_small_100)#

Layer

Spatial size

Channels

blocks.5

7×7

576

blocks.4

7×7

96

blocks.2

14×14

40

blocks.1

28×28

24

Config: layers: ["blocks.5", "blocks.4", "blocks.2", "blocks.1"]

Note: MobileNetV3-Small only has 6 blocks (0–5), unlike the Large variant which has 7 (0–6). The shallowest useful stage is 28×28 (no 56×56 stage).

RegNet (regnetx_002, regnetx_008, regnety_002, regnety_008)#

Layer

Spatial size

Channels (regnetx_002)

s4

7×7

368

s3

14×14

152

s2

28×28

56

s1

56×56

24

Config: layers: ["s4", "s3", "s2", "s1"]

Adding a New Model#

  1. Instantiate the model with timm.create_model(name, pretrained=False, num_classes=0)

  2. Print top-level modules: [name for name, _ in model.named_children()]

  3. Run the discovery script above to confirm spatial sizes

  4. Add to the model config in coarse-to-fine order (deepest first)

  5. Note any stages that share spatial size (common in EfficientNet/MobileNet)