# Segmentation backbone layer reference This page records how the values for `eval.segmentation.layers` (see [Configuration](configuration.rst)) were determined for each model family, and serves as a reference when adding new models. It complements the [`SegmentationProbe`](../api/models.rst) and the segmentation-head classes (`LinearHead`, `FPNHead`, `DPTHead`, …). ## Background `SegmentationProbe` hooks into named PyTorch modules via `backbone.named_modules()`. The layer names in each model config must exactly match these module names. Layers should be listed **deepest first** (coarsest feature map first) for the FPN head. ## How Layer Names Were Discovered Layer names were found by: 1. Instantiating each timm model with `pretrained=False, num_classes=0` 2. Running a dummy 224×224 input through the model with forward hooks on candidate layers 3. Recording the output spatial size of each layer Discovery script: ```python import timm, torch model = timm.create_model("resnet50", pretrained=False, num_classes=0) model.eval() x = torch.zeros(1, 3, 224, 224) shapes = {} handles = [] for name in ["layer1", "layer2", "layer3", "layer4"]: module = dict(model.named_modules())[name] def hook(n): def fn(m, inp, out): shapes[n] = out.shape return fn handles.append(module.register_forward_hook(hook(name))) with torch.no_grad(): model(x) for h in handles: h.remove() print(shapes) ``` ## Layer Names by Family All spatial sizes are for a 224×224 input image. ### ResNet (resnet18, resnet34, resnet50, resnet101) | Layer | Spatial size | Channels (resnet50) | |---|---|---| | `layer4` | 7×7 | 2048 | | `layer3` | 14×14 | 1024 | | `layer2` | 28×28 | 512 | | `layer1` | 56×56 | 256 | Config: `layers: ["layer4", "layer3", "layer2", "layer1"]` ### DenseNet (densenet121, densenet161) | Layer | Spatial size | Channels (densenet121) | |---|---|---| | `features.denseblock4` | 7×7 | 1024 | | `features.denseblock3` | 14×14 | 1024 | | `features.denseblock2` | 28×28 | 512 | | `features.denseblock1` | 56×56 | 256 | Config: `layers: ["features.denseblock4", "features.denseblock3", "features.denseblock2", "features.denseblock1"]` Note: `features.transition1/2/3` downsample between denseblocks. Using the denseblocks (before downsampling) gives richer features. ### VGG16 | Layer | Spatial size | Description | |---|---|---| | `features.30` | 7×7 | After pool5 | | `features.23` | 14×14 | After pool4 | | `features.16` | 28×28 | After pool3 | | `features.9` | 56×56 | After pool2 | Config: `layers: ["features.30", "features.23", "features.16", "features.9"]` ### VGG19 | Layer | Spatial size | Description | |---|---|---| | `features.36` | 7×7 | After pool5 | | `features.27` | 14×14 | After pool4 | | `features.18` | 28×28 | After pool3 | | `features.9` | 56×56 | After pool2 | Config: `layers: ["features.36", "features.27", "features.18", "features.9"]` Note: VGG19 has more conv layers per stage than VGG16, so the MaxPool indices differ. ### EfficientNet (efficientnet_b0, b1, b2, b3) | Layer | Spatial size | Channels (b0) | |---|---|---| | `blocks.6` | 7×7 | 320 | | `blocks.5` | 7×7 | 192 | | `blocks.3` | 14×14 | 80 | | `blocks.1` | 56×56 | 24 | Config: `layers: ["blocks.6", "blocks.5", "blocks.3", "blocks.1"]` **Note:** `blocks.4`, `blocks.5`, and `blocks.6` all operate at 7×7 (EfficientNet uses multiple MBConv stages at the same stride). We include `blocks.5` to get 4 layers, though it shares spatial size with `blocks.6`. If 3 distinct pyramid levels are preferred, use `["blocks.6", "blocks.3", "blocks.1"]`. ### ConvNeXt (convnext_tiny, small, base, large, large_dinov3) | Layer | Spatial size | Channels (tiny) | |---|---|---| | `stages.3` | 7×7 | 768 | | `stages.2` | 14×14 | 384 | | `stages.1` | 28×28 | 192 | | `stages.0` | 56×56 | 96 | Config: `layers: ["stages.3", "stages.2", "stages.1", "stages.0"]` ### MobileNetV3-Large (mobilenetv3_large_100) | Layer | Spatial size | Channels | |---|---|---| | `blocks.6` | 7×7 | 960 | | `blocks.5` | 7×7 | 160 | | `blocks.3` | 14×14 | 80 | | `blocks.1` | 56×56 | 24 | Config: `layers: ["blocks.6", "blocks.5", "blocks.3", "blocks.1"]` **Note:** `blocks.5` and `blocks.6` share 7×7 (same as EfficientNet note above). ### MobileNetV3-Small (mobilenetv3_small_100) | Layer | Spatial size | Channels | |---|---|---| | `blocks.5` | 7×7 | 576 | | `blocks.4` | 7×7 | 96 | | `blocks.2` | 14×14 | 40 | | `blocks.1` | 28×28 | 24 | Config: `layers: ["blocks.5", "blocks.4", "blocks.2", "blocks.1"]` Note: MobileNetV3-Small only has 6 blocks (0–5), unlike the Large variant which has 7 (0–6). The shallowest useful stage is 28×28 (no 56×56 stage). ### RegNet (regnetx_002, regnetx_008, regnety_002, regnety_008) | Layer | Spatial size | Channels (regnetx_002) | |---|---|---| | `s4` | 7×7 | 368 | | `s3` | 14×14 | 152 | | `s2` | 28×28 | 56 | | `s1` | 56×56 | 24 | Config: `layers: ["s4", "s3", "s2", "s1"]` ## Adding a New Model 1. Instantiate the model with `timm.create_model(name, pretrained=False, num_classes=0)` 2. Print top-level modules: `[name for name, _ in model.named_children()]` 3. Run the discovery script above to confirm spatial sizes 4. Add to the model config in coarse-to-fine order (deepest first) 5. Note any stages that share spatial size (common in EfficientNet/MobileNet)