AI · Geospatial Foundation Models

Four winners on GeoBench: Panopticon on KNN, DINOv3-SAT and OlmoEarth on linear, Terramind on multispectral

Across 14,000 measurements on 11 GeoBench classification datasets and 28 frozen-backbone variants, four distinct leaders emerge: Panopticon tops KNN-5 on most datasets, DINOv3-SAT remains the strongest RGB-only linear probe, OlmoEarth reaches 97.8% on eurosat-spatial and 97.6% on m-eurosat, and Terramind wins the multispectral datasets when all MSI bands are available.

By torchgeo-bench Published 19 May 2026 Source: results/all_results.csv 14409 of 14409 observations shown

Across — frozen-backbone variants evaluated on — GeoBench classification datasets, the strongest configuration in this snapshot reaches an accuracy of — on —, while the median model variant clusters around —. Use the controls below to filter the underlying observations; every figure on this page updates accordingly.

Each row of the underlying table records a single (dataset, method, model, normalization) experiment with bootstrapped 95% confidence intervals on accuracy. The four figures below explore the data from different angles — first a per-dataset leaderboard, then a flexible scatter view, a head-to-head comparison of KNN-5 and linear-probe accuracy, and finally a cross-dataset ranking that surfaces variants which generalise.

Customise the analysis

Snapshot

Dataset

Method

Model family

Normalization

Bands

Search by name

0 observations match. Best accuracy in selection: —

Figure 1

Per-dataset leaderboard

The strongest model variants on each GeoBench dataset, ranked by accuracy. Whiskers show the bootstrapped 95% confidence interval.

Method Show top

Source: torchgeo-bench results CSV

Figure 2

Two-axis explorer

Map any numeric column against any other. By default, embedding dimension is plotted against accuracy — points coloured by model family, faceted by dataset.

X-axis Y-axis Colour by Facet by

Source: torchgeo-bench results CSV

Figure 3

KNN-5 versus linear probe

For each (dataset, model) pair, the linear-probe accuracy plotted against the parametric-free KNN-5 baseline. Points above the diagonal are configurations where the linear probe extracts more signal than nearest-neighbour retrieval.

Source: torchgeo-bench results CSV

Figure 4

Mean accuracy across datasets

Variants are ranked by their mean accuracy across the currently selected datasets — the best generalisers within the filter.

Method Show top

Source: torchgeo-bench results CSV

Figure 5

Compute & efficiency

Accuracy alone hides cost. The left panel plots linear-probe accuracy against the backbone's measured throughput on an A100-80GB; up-and-right is the efficient frontier. The right panel extrapolates the $ and kgCO2 of running each model on one million samples on the selected cloud region.

Cloud / region Sort by Samples

Profile rows measured on NVIDIA A100-SXM4-80GB. Prices from cloud on-demand list (snapshot 2026-05-15); carbon intensity from codecarbon's regional grid table. Both inlined into the page.

Figure 6

Intrinsic dimension by dataset

Each frozen backbone produces a feature manifold that — for a given dataset — has an effective dimensionality much smaller than the embedding size. Higher bars mean the backbone uses more of its budget; the spread between estimators is a coarse confidence interval on the estimate.

Estimator

One observation per (model, dataset, estimator, bands). Bars show the cross-model mean; horizontal whiskers the [P10, P90] band.

Figure 7

Does intrinsic dimension predict probe accuracy?

For each (model, dataset, bands) tuple, the linear-probe accuracy is plotted against the intrinsic dimension of the backbone's frozen embeddings. Higher ID does not automatically buy you accuracy — some models produce richly-spread embeddings that aren't linearly separable.

Estimator

Each point is one (name, dataset, bands) trio. Linear-probe accuracy is the matching `method="linear"` row.

Appendix · the underlying data

Click any column to sort. Search the table or use the filters above to narrow the view. Numeric values are rounded for display.