Results format ============== All evaluation runs append rows to a single CSV file (default ``results/all_results.csv``). Each row is a flattened :class:`~torchgeo_bench.main.EvaluationResult` describing a single ``(dataset, method, model, config)`` measurement. Sample rows ----------- .. code-block:: text dataset,method,metric_name,metric_value,ci_lower,ci_upper,feature_dim,best_c,n_train,n_val,n_test,seed,model,name,normalization,image_size,interpolation,partition,bands m-eurosat,knn5,accuracy,0.8234,0.8123,0.8345,512,,21600,5400,5400,0,torchgeo_bench.models.RCFBench,rcf,bandspec_zscore,224,bilinear,default,rgb m-eurosat,linear,accuracy,0.8567,0.8461,0.8673,512,0.1,21600,5400,5400,0,torchgeo_bench.models.RCFBench,rcf,bandspec_zscore,224,bilinear,default,rgb burn_scars,seg-fpn,mIoU,0.6234,0.0,0.0,768,,1000,200,300,0,torchgeo_bench.models.TimmPatchBenchModel,resnet50,bandspec_zscore,224,bilinear,default,rgb Datasets emit unnormalized tensors; each model wrapper normalises inside :meth:`~torchgeo_bench.models.BenchModel.normalize_inputs` according to the strategy selected by ``cfg.dataset.normalization``. Allowed values: .. list-table:: :header-rows: 1 :widths: 25 75 * - Strategy - Behaviour * - ``bandspec_zscore`` - Per-channel z-score using ``BandSpec`` mean/std (default). * - ``model_native`` - Convert to the wrapper's ``expected_input_unit``, then apply any ``pretrain_mean`` / ``pretrain_std`` declared on the class. * - ``minmax`` - Scale each channel to ``[0, 1]`` from BandSpec min/max. * - ``minmax_zscore`` - ``minmax`` then z-score against assumed ``mean=0.5, std=0.25``. * - ``identity`` - No rescaling (for models whose forward owns normalisation). Older snapshots may carry legacy values such as ``raw`` / ``mean_stdev`` / ``percentile_2_98`` — they are kept verbatim for resume safety. Method values ------------- ================== ================================================================================== ``method`` Meaning ================== ================================================================================== ``knn5`` KNN-5 classification (multilabel KNN for ``m-bigearthnet``). ``linear`` L-BFGS logistic regression with C-sweep on the validation set. ``intrinsic_dim`` Optional intrinsic-dimension metrics on extracted embeddings (requires the ``[id]`` extra and ``eval.intrinsic_dim.enabled=true``). ``seg-`` Segmentation probe with the configured head (``linear`` / ``conv_block`` / ``fpn`` / ``dpt``). ================== ================================================================================== CSV schema ---------- ==================== ============================================================ Column Description ==================== ============================================================ ``dataset`` Dataset CLI name (e.g. ``m-eurosat``). ``method`` ``knn5``, ``linear``, ``intrinsic_dim``, or ``seg-``. ``metric_name`` Primary metric (``accuracy``, ``micro_mAP``, ``mIoU``, or ``id__`` for intrinsic dim rows). ``metric_value`` Point estimate. ``ci_lower`` Bootstrap CI lower bound (0.0 when not applicable). ``ci_upper`` Bootstrap CI upper bound (0.0 when not applicable). ``feature_dim`` Embedding dimension produced by the backbone. ``best_c`` Best ``C`` from the logistic-regression sweep (linear probe only, otherwise ``None``). ``best_lr`` Best learning rate (segmentation only). ``best_batch_size`` Best batch size (segmentation only). ``n_train`` Train-split sample count. ``n_val`` Validation-split sample count. ``n_test`` Test-split sample count. ``seed`` RNG seed used for the run. ``model`` Fully-qualified model class (``cfg.model._target_``). ``name`` Human-readable model name (``cfg.model.name``). ``normalization`` Strategy applied by the model wrapper (see table above). ``image_size`` Input resize size (``None`` if no resizing). ``interpolation`` Resize interpolation mode. ``partition`` GeoBench V1 partition name (``default`` for V2). ``bands`` ``rgb`` / ``all`` / a sorted comma-joined list. ``c_range_start`` ``eval.c_range[0]``. ``c_range_stop`` ``eval.c_range[1]``. ``c_range_num`` ``eval.c_range[2]``. ``merge_val`` Whether ``train+val`` was merged before final logistic fit. ``bootstrap`` Number of bootstrap resamples used for CIs. ``fw_iou`` Frequency-weighted IoU (segmentation only). ``precision`` Macro precision (segmentation only). ``recall`` Macro recall (segmentation only). ``f1`` Macro F1 (segmentation only). ==================== ============================================================ Atomic appends -------------- Rows are appended via :func:`~torchgeo_bench.main.append_rows_atomic`, which uses ``fcntl`` advisory file locking. This makes it safe to point multiple parallel jobs (e.g. one per GPU or per dataset) at the same output file without corrupting it. Resume mode ----------- When ``resume=true``, the runner reads the existing CSV at startup and skips any combination that already has a matching row. The de-dup key is: .. code-block:: python (dataset, method, model._target_, model.name, normalization, image_size, interpolation, partition, bands) Note that ``method`` is per-method (``knn5`` / ``linear`` / ``intrinsic_dim`` / ``seg-``), so re-running with ``eval.skip_linear=false`` after a ``skip_linear=true`` run will fill in just the linear-probe rows.