Skip to contents

Compute pairwise Earth-mover's distances (Wasserstein-1) between two or more utilisation distributions on a common raster grid. The default method uses Sinkhorn entropic regularisation (Cuturi 2013) rather than the original transportation-LP solver used by move::emd(); this is typically hundreds of times faster at sub-percent accuracy, and scales to UD stacks that are otherwise prohibitive. An exact solver is available when accuracy matters more than speed.

Usage

emd(
  x,
  method = c("sinkhorn", "exact"),
  mask_quantile = 0.999,
  reg = 0.05,
  max_iter = 500L,
  tol = 1e-09,
  gc = FALSE,
  threshold = NULL
)

Arguments

x

A multi-layer terra::SpatRaster of UDs on a common grid, or a named list of single-layer SpatRasters with identical geometry. Each layer must sum to approximately 1 (enforced by renormalisation).

method

One of "sinkhorn" (default) or "exact". Exact mode requires the emdist package (Suggests).

mask_quantile

Numeric in (0, 1]. Cells outside this volume contour are dropped before transport. Default 0.999; set to 1 to disable masking.

reg

Positive numeric. Sinkhorn entropic regularisation (applied to distance scaled to [0, 1]). Default 0.05, which in practice converges in tens of iterations on typical UDs and stays within \(\sim\)0.1 % of exact. Values below 0.01 approach exact but need many more iterations and risk numerical underflow; values above 0.1 are looser approximations but extremely fast.

max_iter

Integer. Sinkhorn iteration cap. Default 500.

tol

Convergence tolerance (max absolute change in the scaling vector between iterations). Default 1e-9.

gc

Logical. If FALSE (default) Euclidean distances are used between cell centroids; if TRUE great-circle distances via Haversine are used instead (requires the geosphere package). gc = TRUE is intended for utilisation distributions on unprojected (longitude/latitude) grids; emd() warns when input is on a longlat CRS and gc = FALSE. Matches the gc argument of the legacy move::emd().

threshold

Optional numeric in the same units as the cost matrix (map units when gc = FALSE, metres when gc = TRUE). When set, the cost matrix is clipped at threshold: any pair of cells separated by more than threshold contributes no more than threshold to the transport cost. This reproduces the EMD-hat variant (Pele & Werman 2009) that move::emd() exposes through the same argument name. Default NULL (no cutoff). For most users mask_quantile is the preferred speed lever; threshold is provided for parity with the move::emd() interface.

Value

A stats::dist object of pairwise distances between UDs, with the UD layer names as dimnames. For a two-UD input it is a length-1 dist — use as.numeric() to extract the scalar.

Details

What changed relative to move::emd()

  1. A volume-based pre-mask discards cells outside the mask_quantile volume contour before building the cost matrix, reducing the transportation problem from \(10^4\) or more cells to typically a few hundred. On well-supported UDs this is lossless; on heavy-tailed UDs you can raise mask_quantile to include more tail.

  2. Sinkhorn replaces the simplex LP. Default reg = 0.01 on scaled distances converges in a few hundred iterations and gives approximations well within 1 % of exact for typical UDs.

  3. Convenient defaults: Sinkhorn + 0.999 mask + Euclidean ground metric on cell centroids reproduces the move-vignette workflow in one call.

Empirical speed

On the canonical move::emd example (dbbmmstack, two UDs on a 21x45 grid, pre-clipped to the 99.9999% volume contour) emd() returns in 24 ms against 72 ms for move::emd()'s simplex LP, with a Sinkhorn-vs-exact relative error of 0.12%. On a denser 30x30 field the LP takes ~140 s where Sinkhorn stays under 30 ms; at 100x100 the simplex LP exceeds ten gigabytes of memory and is impractical, while Sinkhorn remains sub-second.

References

Cuturi, M. (2013). Sinkhorn Distances: Lightspeed Computation of Optimal Transport. Advances in Neural Information Processing Systems, 26.

Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The Earth Mover's Distance as a Metric for Image Retrieval. International Journal of Computer Vision, 40(2), 99–121.

See also

ud_volume() for the volume transform used by the pre-mask; move::emd() for the legacy simplex-LP implementation.

Examples

if (FALSE) { # \dontrun{
library(move2)
library(sf)
fishers <- mt_read(mt_example())
fishers <- fishers[!st_is_empty(fishers), ]

ids <- c("F1", "F2", "M1")
sub <- do.call(
  rbind,
  lapply(ids, function(i) fishers[mt_track_id(fishers) == i, ][1:300, ])
)
sub_p <- st_transform(sub, mt_aeqd_crs(sub))
stk <- mt_dbbmm_ud(sub_p, location_error = 25,
                    window_size = 31, margin = 11,
                    raster = 100, ext = 1.25)

d_sink <- emd(stk)                      # fast, default
d_sink

## Exact solver — requires the `emdist` package:
if (requireNamespace("emdist", quietly = TRUE)) {
  d_exact <- emd(stk, method = "exact")
}
} # }