Flag outliers via Brownian bridge residuals

Detects GPS outliers by scoring each fix against the Brownian bridge interpolated from its temporal neighbours. A point sitting far from the bridge mean relative to the bridge width is spatially implausible given its neighbours and the local sampling cadence. Complements mt_flag_outliers by catching block-shaped errors and faint near-cluster displacements that movement-metric methods miss.

Usage

mt_flag_outliers_bridge(
  x,
  method = c("combined", "isotropic", "directional"),
  threshold_type = c("entropy", "gap"),
  threshold = NULL,
  location_error = NULL,
  residual_floor = 0,
  iterations = 3,
  dedup_neighbours = TRUE,
  pool_by = NULL,
  plot = TRUE,
  remove = FALSE,
  silent = FALSE
)

Arguments

x

A move2 object in a projected CRS. Single- or multi-track. Multi-track inputs are processed per-individual.

method

Character, one of "combined" (default), "isotropic", or "directional". The residual decomposition gives two correlated but non-redundant scores per fix:

\(\eta\) — scalar residual / width ("isotropic"); catches strong-omnidirectional outliers.
\(\eta_\perp\) — orthogonal residual / width ("directional"); catches perpendicular-leverage outliers whose parallel component would otherwise dilute the scalar signal. Concentrates on multipath errors and spoofing clusters, which drift across-track rather than along-track.

"combined" applies threshold_type independently to each score and flags a fix if either score trips — Pareto-dominates the single-score methods on synthetic ground-truth benchmarks and is the recommended default. "isotropic" and "directional" restrict flagging to a single score for interpretation or for reproducing the individual primitives in isolation.

Note: earlier versions named these options "dBBMM" and "dBGB" in reference to the dynamic Brownian-bridge framework. The names were misleading — this method uses the bridge-mean construction and the temporal width factor \(\sqrt{\Delta t_1 \Delta t_2 / (\Delta t_1 + \Delta t_2)}\) but deliberately does not invoke the variance-estimation machinery with its window and margin that defines the real dBBMM / dBGB (see mt_dbbmm_variance and mt_dbgb_variance for those). Omitting the variance is intentional: a locally-estimated variance is corrupted by the very outliers it would denominate (the leverage problem).

threshold_type

Character, one of "entropy" (default) or "gap". "entropy" requires a real density valley between outliers and the bulk of \(-\log \eta\); it returns no outliers on clean tracks. "gap" uses the broken-stick and tail-decay inflection detector; it is more sensitive but can over-flag unimodal tails.

threshold

Numeric. Detection strictness. Its meaning depends on threshold_type: "entropy" uses it as the maximum valley-to-peak density ratio (default 0.3); "gap" uses it as the break-size multiplier (default 3). If NULL, a sensible default is chosen based on threshold_type.

location_error

Per-fix observation-error prior, in metres (1-sigma horizontal). Default NULL (disabled, current behaviour). Accepts:

NULL – no obs-error injection.
a positive numeric scalar – uniform sigma applied to every fix. Useful when the device has a quoted nominal accuracy but no per-fix quality column.
a numeric vector of length nrow(x) – per-fix sigma already in metres.
a single character string – name of a column in x containing per-fix sigma in metres (e.g. "eobs_horizontal_accuracy_estimate").
the literal "auto" – probes eobs_horizontal_accuracy_estimate first, then argos_lc (mapped via the standard CLS / Vincent et al. 2002 location-class table). Falls back to no injection (with a message) when neither column is present.

When supplied, the bridge denominator at each fix is augmented with the variance contribution from its anchors only – the target fix's own sigma deliberately does not enter, preserving leverage immunity. The anchor variance is converted into bridge-width-equivalent units via an empirical residual-scale estimator \(\hat S = \mathrm{median}(r_i^2 / w_i^2)\) computed on the active mask in iteration 1. See the diagnostic column bridge_obs_inflation for how strongly each fix's denominator was inflated. The mechanism is most useful for Argos / mixed-mode tracks where per-fix accuracy varies by orders of magnitude; on uniform-quality GPS tracks it is typically a small correction.

residual_floor

Numeric, non-negative. Minimum absolute bridge residual (metres) for a rate-flagged fix to actually be flagged as an outlier. Default 0 (disabled); the pre-2026 rate-only behaviour. Set to a positive value (commonly 5–25 m, or your device's nominal accuracy) to add a two-axis criterion: a fix is flagged only where both the rate score trips the threshold AND the absolute residual exceeds this floor. Recommended for real-world data with burst-mode sampling, where a few-metre GPS jitter over a 1-second dt produces a large eta that is not a physical outlier. Not on by default so existing synthetic-ground-truth benchmarks (which can include sub-noise displacement outliers by construction) continue to pass.

iterations

Integer. Maximum number of refinement passes. Default 3 is HEURISTIC – empirically the bulk of refinement gain happens in the first 2–3 iterations on the synthetic CPF benchmark. Iteration stops early when no new points are flagged. Plausible range: 2–5. Higher values increase runtime without meaningful F1 gain on the benchmark cohort.

dedup_neighbours

Logical. If TRUE (default), remove neighbour-smearing by keeping only the peak \(\eta\) in each consecutive run of flagged indices.

pool_by

Optional character vector of length 1 or 2 naming column(s) in mt_track_data(x). Length 1: single column used as both fit set and operating unit. Length 2: c(outer, inner) where outer names the fit-source column (the union of its events supplies the entropy / gap break distribution on bridge_eta and, for "directional"/"combined", bridge_eta_perp) and inner names the operating unit (within which pool- added flags are unioned). Length 2 requires strict nesting: every distinct inner value must map to exactly one outer value. Length \(> 2\) is rejected (pool_by has exactly two semantic roles). Pool flags union into is_outlier – additive, never un-flags what per-track iteration caught. The residual_floor gate is respected at the pool level (matches the per-track gate). No dedup at the pool level (per-track dedup already ran inside the iteration). NULL (default) preserves per-track behaviour byte-identically. See ?mt_clean_track for the orchestrator-level walkthrough.

plot

Logical. If TRUE (default), produce a diagnostic plot (sorted \(\log \eta\) with break + map of flags).

remove

Logical. If TRUE, return the object with flagged rows removed. Default FALSE.

silent

Logical. If FALSE (default) the function prints a brief running narration: per-iteration break and flag counts, input-hygiene notes, projection messages, and a final summary line. Set TRUE to suppress. Errors and warnings are always shown.

Value

A move2 object with columns added:

bridge_residual: Euclidean deviation from the bridge mean, in metres.
bridge_width: The bridge-width factor \(\sqrt{\Delta t_1 \Delta t_2 / (\Delta t_1 + \Delta t_2)}\), in \(\sqrt{s}\).
bridge_eta: The normalised bridge score used for flagging. For method = "isotropic" this is \(\eta_i = r_i / w_i\). For method = "directional" it is the orthogonal component \(\eta_{\perp,i} = r_{\perp,i} / w_i\).
bridge_eta_para, bridge_eta_perp: Gap- normalised parallel and orthogonal residuals. Useful for inspecting whether a flagged point is primarily an along-track or perpendicular anomaly.
bridge_obs_inflation: Ratio of effective to geometric bridge width \(w_{\text{eff}}/w\). Equal to 1 when location_error is NULL or all anchors have unknown sigma; \(>1\) where anchor obs-error contributed meaningfully.
bridge_percentile: Empirical percentile of \(-\log \eta\) (0 = most extreme outlier).
bridge_iteration: Integer iteration at which the point was flagged, or NA if unflagged.
is_outlier: Logical flag.
is_na_prob: TRUE where \(\eta\) is undefined (track endpoints, missing neighbours).

If remove = TRUE, the flagged rows are dropped.

Details

For each fix \(i\) with temporal neighbours \(i-1\) and \(i+1\):

The Brownian-bridge mean at \(i\) is the time-weighted midpoint of the neighbours' coordinates.
The bridge width is \(\sqrt{\Delta t_1 \Delta t_2 / (\Delta t_1 + \Delta t_2)}\) where \(\Delta t_1, \Delta t_2\) are the gaps to the previous and next fix.
The score \(\eta_i\) is the Euclidean residual divided by the bridge width. It is gap-aware by construction and does not depend on a local variance estimate (which would be inflated by the outlier itself – the leverage problem).

Outliers are identified by applying the broken-stick plus tail-decay inflection threshold (same algorithm as mt_flag_outliers) to \(-\log \eta\). Outliers sit in the lower tail of \(-\log \eta\), equivalently the upper tail of \(\eta\).

Two post-processing steps clean the result:

Neighbour dedup. An outlier at fix \(i\) distorts the bridges centred on \(i-1\) and \(i+1\), inflating their \(\eta\) too. For each run of consecutively-flagged indices, only the one with the largest \(\eta\) is kept.
Iterative refinement. After flagging, residuals and thresholds are recomputed on the remaining points, and the process repeats until no new flags appear or iterations is reached. This catches outliers that were masked by nearby ones on the first pass.

The bridge-residual math is Euclidean and therefore requires metric coordinates. If the input is in longitude/latitude the function auto-projects to a local azimuthal-equidistant CRS (centred on the track's centroid) for the computation, and returns the result in your original CRS. bridge_residual and bridge_width are always in metres regardless of the input CRS.

Input preprocessing

The bridge detector assumes each track is a time-sorted sequence of finite-coordinate fixes with no duplicate timestamps. The recommended preprocessing chain on a raw Movebank download is:


x <- mt_filter_gps_quality(x)              # sat/DOP/hacc + empty geoms
x <- move2::mt_filter_unique(x, "first")   # resolve duplicate times
x <- dplyr::arrange(x, mt_time(x))         # ensure time-sorted
x <- mt_flag_outliers_bridge(x)            # auto-projects if lon/lat

Rows with empty geometries or missing timestamps that survive preprocessing are silently excluded from scoring (with an input-hygiene message) but retained in the output so row parity with the caller's object is preserved. Duplicate or out-of-order timestamps are a hard error, because their editorial resolution (which of two coincident fixes to keep) is the user's decision, not this function's.

References

Safi, K. (in preparation). Self-thresholding hierarchical outlier-detection for animal movement tracks. Companion paper to the move2utils R package. Preprint: bioRxiv (DOI forthcoming).

Examples

if (FALSE) { # \dontrun{
library(move2)
syn <- mt_read(system.file("extdata", "synthetic_tracks.csv.gz",
                            package = "move2utils"))
## project to a metric CRS before calling
syn_m <- sf::st_transform(syn,
  "+proj=tmerc +lon_0=11.5 +lat_0=47.5 +ellps=WGS84 +units=m")
result <- mt_flag_outliers_bridge(syn_m)
} # }