Compute pairwise Earth-mover's distances (Wasserstein-1) between
two or more utilisation distributions on a common raster grid. The
default method uses Sinkhorn entropic regularisation (Cuturi
2013) rather than the original transportation-LP solver used by
move::emd(); this is typically hundreds of times faster at
sub-percent accuracy, and scales to UD stacks that are otherwise
prohibitive. An exact solver is available when accuracy matters
more than speed.
Usage
emd(
x,
method = c("sinkhorn", "exact"),
mask_quantile = 0.999,
reg = 0.05,
max_iter = 500L,
tol = 1e-09,
gc = FALSE,
threshold = NULL
)Arguments
- x
A multi-layer
terra::SpatRasterof UDs on a common grid, or a named list of single-layerSpatRasters with identical geometry. Each layer must sum to approximately 1 (enforced by renormalisation).- method
One of
"sinkhorn"(default) or"exact". Exact mode requires theemdistpackage (Suggests).- mask_quantile
Numeric in
(0, 1]. Cells outside this volume contour are dropped before transport. Default0.999; set to1to disable masking.- reg
Positive numeric. Sinkhorn entropic regularisation (applied to distance scaled to
[0, 1]). Default0.05, which in practice converges in tens of iterations on typical UDs and stays within \(\sim\)0.1 % of exact. Values below0.01approach exact but need many more iterations and risk numerical underflow; values above0.1are looser approximations but extremely fast.- max_iter
Integer. Sinkhorn iteration cap. Default 500.
- tol
Convergence tolerance (max absolute change in the scaling vector between iterations). Default
1e-9.- gc
Logical. If
FALSE(default) Euclidean distances are used between cell centroids; ifTRUEgreat-circle distances via Haversine are used instead (requires thegeospherepackage).gc = TRUEis intended for utilisation distributions on unprojected (longitude/latitude) grids;emd()warns when input is on a longlat CRS andgc = FALSE. Matches thegcargument of the legacymove::emd().- threshold
Optional numeric in the same units as the cost matrix (map units when
gc = FALSE, metres whengc = TRUE). When set, the cost matrix is clipped atthreshold: any pair of cells separated by more thanthresholdcontributes no more thanthresholdto the transport cost. This reproduces the EMD-hat variant (Pele & Werman 2009) thatmove::emd()exposes through the same argument name. DefaultNULL(no cutoff). For most usersmask_quantileis the preferred speed lever;thresholdis provided for parity with themove::emd()interface.
Value
A stats::dist object of pairwise distances between UDs,
with the UD layer names as dimnames. For a two-UD input it is a
length-1 dist — use as.numeric() to extract the scalar.
Details
What changed relative to move::emd()
A volume-based pre-mask discards cells outside the
mask_quantilevolume contour before building the cost matrix, reducing the transportation problem from \(10^4\) or more cells to typically a few hundred. On well-supported UDs this is lossless; on heavy-tailed UDs you can raisemask_quantileto include more tail.Sinkhorn replaces the simplex LP. Default
reg = 0.01on scaled distances converges in a few hundred iterations and gives approximations well within 1 % of exact for typical UDs.Convenient defaults: Sinkhorn + 0.999 mask + Euclidean ground metric on cell centroids reproduces the move-vignette workflow in one call.
Empirical speed
On the canonical move::emd example (dbbmmstack, two UDs on a
21x45 grid, pre-clipped to the 99.9999% volume contour) emd()
returns in 24 ms against 72 ms for move::emd()'s simplex LP, with
a Sinkhorn-vs-exact relative error of 0.12%. On a denser
30x30 field the LP takes ~140 s where Sinkhorn stays under 30 ms;
at 100x100 the simplex LP exceeds ten gigabytes of memory and is
impractical, while Sinkhorn remains sub-second.
References
Cuturi, M. (2013). Sinkhorn Distances: Lightspeed Computation of Optimal Transport. Advances in Neural Information Processing Systems, 26.
Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The Earth Mover's Distance as a Metric for Image Retrieval. International Journal of Computer Vision, 40(2), 99–121.
See also
ud_volume() for the volume transform used by the
pre-mask; move::emd() for the legacy simplex-LP implementation.
Examples
if (FALSE) { # \dontrun{
library(move2)
library(sf)
fishers <- mt_read(mt_example())
fishers <- fishers[!st_is_empty(fishers), ]
ids <- c("F1", "F2", "M1")
sub <- do.call(
rbind,
lapply(ids, function(i) fishers[mt_track_id(fishers) == i, ][1:300, ])
)
sub_p <- st_transform(sub, mt_aeqd_crs(sub))
stk <- mt_dbbmm_ud(sub_p, location_error = 25,
window_size = 31, margin = 11,
raster = 100, ext = 1.25)
d_sink <- emd(stk) # fast, default
d_sink
## Exact solver — requires the `emdist` package:
if (requireNamespace("emdist", quietly = TRUE)) {
d_exact <- emd(stk, method = "exact")
}
} # }