Evaluation#
The evaluation pipeline lives in torchgeo_bench.main and a few
focused sub-modules. Each evaluation method (KNN-5, linear probe,
segmentation, intrinsic dimension) consumes per-split feature embeddings
or raw images and produces one EvaluationResult row per metric.
Result schema#
- class torchgeo_bench.main.EvaluationResult(dataset, method, metric_name, metric_value, ci_lower, ci_upper, feature_dim, best_c, best_lr, best_batch_size, n_train, n_val, n_test, seed, model, name, normalization, image_size, interpolation, partition, bands, c_range_start, c_range_stop, c_range_num, merge_val, bootstrap, fw_iou=None, precision=None, recall=None, f1=None, ece=None, rms_ce=None, mce=None, ece_ts=None, rms_ce_ts=None, mce_ts=None, temperature=None, calibration_n_bins=None)[source][source]#
Bases:
objectContainer for a single evaluation result row.
Feature extraction#
Bootstrap helpers#
KNN-5 evaluation#
- torchgeo_bench.main.evaluate_knn(x_train, y_train, x_test, y_test, seed, n_bootstrap, verbose=False, device='cpu', n_neighbors=5, calibration_n_bins=None)[source][source]#
Evaluate KNN classifier. Auto-detects single-label vs multi-label from y shape.
Returns the primary metric with bootstrap CI, a calibration dict (
ece/rms_ce/mce) computed frompredict_proba, and then_binsactually used (defaults ton_neighbors + 1).
- class torchgeo_bench.knn.KNNClassifier(n_neighbors=5, device='cpu', metric='l2', use_fp16=False)[source][source]#
Bases:
objectFAISS-backed KNN classifier with single- and multi-label support.
Multi-label mode is auto-detected from the shape of
yduringfit(): 1-D labels → single-label, 2-D labels → multi-label.- Parameters:
n_neighbors (int) – Number of neighbours (k). Clamped to
min(k, n_train)on the CPU path; faissknn does not clamp internally.device (str) –
"cpu"(default) →faiss-cuda-cu128CPU index. Anything else ("cuda","cuda:0") requires thecudaextra (faissknn); raisesImportErrorif not installed.metric (Literal['l2', 'ip', 'cosine']) – Distance metric —
"l2"(default),"ip"(inner product), or"cosine"(cosine similarity; auto-normalizes inputs). GPU path only; CPU path always uses L2.use_fp16 (bool) – Use fp16 for GPU index computation (~30 % speedup on Ampere+). GPU path only; ignored on CPU.
Linear probing#
- torchgeo_bench.main.evaluate_logistic(x_train, y_train, x_val, y_val, x_test, y_test, c_values, seed, n_bootstrap, merge_val, device, verbose=False, calibration_n_bins=15, temp_scale=True)[source][source]#
Sweep C values, retrain, and evaluate. Auto-detects single/multi-label from y shape.
Returns the primary metric with bootstrap CI, the selected
C, a calibration dict from rawpredict_probaon the test split, and a second dict with temperature-scaled calibration plus the fittedtemperature(allNonewhentemp_scale=False).
- class torchgeo_bench.linear.LogisticRegression(C=1.0, max_iter=1000, lr=1.0, batch_size=1024, solver='lbfgs', tol=0.0001, patience=1, random_state=None, device=None, verbose=False, use_tf32=True, multi_label=False)[source][source]#
Bases:
objectLogistic regression with identical objective scaling to sklearn.
Supports both single-label (softmax cross-entropy) and multi-label (sigmoid BCE) classification via the
multi_labelflag.Objective:
loss = (1/n) * CrossEntropy + (1/n) * 0.5/C * ||W||^2
Differences from the previous version (speed-oriented but same math):
LBFGS uses its internal iteration loop (one external
.step).Adam uses on-device manual batching (no DataLoader overhead).
Inference paths use
torch.inference_mode.Optional TF32 for CUDA matmul (single linear layer still benefits slightly).
Coefficients and intercept are exposed via properties (no copying at fit time).
Args match previous class unless noted.
- fit(X, y)[source][source]#
Fit the logistic regression model on training data.
- Parameters:
- Returns:
Self, for method chaining.
- Raises:
TypeError – If X or y is not a torch.Tensor.
ValueError – If shapes are invalid or data is empty.
- Return type:
- property coef_: ndarray#
Return learned weight matrix as a NumPy array of shape
(n_classes, n_features).
Segmentation#
- torchgeo_bench.main.evaluate_segmentation(model, train_loader, val_loader, test_loader, cfg, num_classes, device, collect_preds=False)[source][source]#
Evaluate segmentation performance using a frozen-backbone segmentation probe.
Trains a lightweight segmentation head on top of the frozen backbone and evaluates mIoU on the test split. Optionally pre-caches backbone features for faster training across epochs.
- Parameters:
model (Module) – Frozen backbone model.
train_loader (DataLoader) – Training DataLoader.
val_loader (DataLoader) – Validation DataLoader.
test_loader (DataLoader) – Test DataLoader.
cfg (DictConfig) – Full Hydra config.
num_classes (int) – Number of segmentation classes.
device (device) – Torch device.
collect_preds (bool) – If True, collect and return test predictions as (N, H, W) tensor.
- Returns:
Tuple of (metrics_dict, feature_dim, None, None, preds_or_None).
preds_or_Noneis None when collect_preds is False.- Return type:
tuple[dict[str, float], int, float | None, int | None, Tensor | None]
- class torchgeo_bench.segmentation_probe.SegmentationProbe(backbone, layer_names, num_classes, freeze_backbone=True, head_type='linear', hidden_dim=None)[source][source]#
Bases:
ModuleMulti-scale segmentation probe that hooks into backbone feature layers.
Backbone layers are tapped via forward hooks. Features are passed to a decoder head (
LinearHead,ConvBlockHead,FPNHead, orDPTHead) that produces per-pixel class logits.- Layer ordering convention (applies to all head types):
Coarse-to-fine — deepest / lowest-resolution layer first.
Example for ResNet:
["layer4", "layer3", "layer2", "layer1"].For
DPTHeadthis means index 0 = coarsest, which is also what the DPT cascade expects.
- Parameters:
backbone (Module) – Feature extractor. May be a raw backbone or a
BenchModelwrapper (backbone.*prefixes are stripped automatically).layer_names (list[str]) – Ordered list of layer names to hook (coarse-to-fine).
num_classes (int) – Number of segmentation output classes.
freeze_backbone (bool) – If
True(default), backbone parameters are frozen and the backbone runs in eval mode during inference.head_type (str) – Decoder architecture — one of
"linear","conv_block","fpn","dpt","patch_linear".hidden_dim (int | None) – Hidden channel dimension for
conv_block,fpn, anddptheads (default 256).
- extract_segmentation_features(dataloader, cache_dtype=torch.float16)[source][source]#
Run the frozen backbone once over dataloader and cache features.
- Parameters:
dataloader (DataLoader) – DataLoader that yields
dictor(image, mask)batches.cache_dtype (dtype) – Storage dtype for cached feature tensors. Use
torch.float16(default) to halve RAM, ortorch.float32for full precision.
- Returns:
A
CachedFeaturesDatasetwith one entry per sample.- Return type:
CachedFeaturesDataset
- class torchgeo_bench.segmentation_task.SegmentationSolver(model, num_classes, lr=0.001, weight_decay=0.0, device='cuda', criterion=None, lr_scheduler='cosine', ignore_index=255)[source][source]#
Bases:
objectA lightweight trainer for the SegmentationProbe.
- fit(train_loader, val_loader=None, epochs=10, verbose=True)[source][source]#
Train the segmentation probe.
- Parameters:
train_loader (DataLoader) – Training data loader.
val_loader (DataLoader | None) – Optional validation data loader for per-epoch mIoU logging.
epochs (int) – Number of training epochs.
verbose (bool) – Whether to show progress bars and epoch logs.
- Returns:
Val mIoU from the final epoch if val_loader is given, else None.
- Return type:
float | None
- evaluate(dataloader, collect_preds=False)[source][source]#
Evaluate the model on a dataloader and return segmentation metrics.
- Parameters:
dataloader (DataLoader) – Evaluation data loader.
collect_preds (bool) – If True, also return predicted class maps (N, H, W) int64.
- Returns:
Dict of metric name → value, or (metrics_dict, preds_tensor) when collect_preds=True.
- Return type:
- fit_cached(train_cache, val_cache=None, batch_size=64, epochs=10, verbose=True, gpu_train=None, gpu_val=None)[source][source]#
Train the segmentation head on pre-cached backbone features.
The backbone is not called during training — cached features are fed directly to
self.model.head, which is the only component that runs a forward/backward pass.The entire feature cache is pre-moved to the GPU as contiguous tensors (
GPUTensorCache), eliminating per-batch CPU→GPU DMA transfers andtorch.stackcalls.If
gpu_trainis provided, that pre-built cache is used directly, allowing callers (e.g. an HPO loop) to transfer the cache once and reuse it across many calls.- Parameters:
train_cache (CachedFeaturesDataset) – Pre-extracted training features from
SegmentationProbe.extract_segmentation_features().val_cache (CachedFeaturesDataset | None) – Optional validation cache for per-epoch mIoU logging.
batch_size (int) – Batch size for iterating over cached data.
epochs (int) – Number of training epochs.
verbose (bool) – Whether to show progress bars and epoch logs.
gpu_train (GPUTensorCache | None) – Optional pre-built GPU cache for training. If provided, the GPU transfer is skipped.
gpu_val (GPUTensorCache | None) – Optional pre-built GPU cache for validation. Used only when
gpu_trainis also provided.
- Returns:
Val mIoU from the final epoch if val_cache is given, else None.
- Return type:
float | None
- evaluate_cached(cache, batch_size=64, collect_preds=False)[source][source]#
Evaluate on a CachedFeaturesDataset.
The cache is moved to GPU as a
GPUTensorCachefor zero per-batch host→device transfers.- Parameters:
- Returns:
Dict of metric name → value, or (metrics_dict, preds_tensor) when collect_preds=True.
- Return type:
Intrinsic dimension#
See torchgeo_bench.intrinsic_dim for the standalone module API; the orchestration
function lives in torchgeo_bench.main:
- torchgeo_bench.main.evaluate_intrinsic_dim(splits, estimators, selected_splits, device, max_samples, seed, common_meta, feature_dim, n_counts, verbose=False)[source][source]#
Compute intrinsic-dimension metrics over selected splits and return CSV rows.
Each (split, estimator) yields one row with
method="intrinsic_dim"andmetric_name=f"id_{estimator}_{split}".
Result I/O#
- torchgeo_bench.main.append_rows_atomic(path, rows)[source][source]#
Append rows to a CSV atomically, with advisory file lock and schema healing.
Behavior:
Empty/missing file: writes the header derived from
rowsand the rows.Existing file whose header matches
rows[0]keys exactly: appends rows without rewriting the header (fast path).Existing file with a different schema (e.g.
EvaluationResultgained a field since the file was first written): the file is rewritten with the unioned schema so every value lives under a named column instead of being silently stuffed into an unnamed position.