Flag outliers via Brownian bridge residuals
Source:R/mt_flag_outliers_bridge.R
mt_flag_outliers_bridge.RdDetects GPS outliers by scoring each fix against the Brownian bridge
interpolated from its temporal neighbours. A point sitting far from
the bridge mean relative to the bridge width is spatially implausible
given its neighbours and the local sampling cadence. Complements
mt_flag_outliers by catching block-shaped errors and
faint near-cluster displacements that movement-metric methods miss.
Arguments
- x
A
move2object in a projected CRS. Single- or multi-track. Multi-track inputs are processed per-individual.- method
Character, one of
"combined"(default),"isotropic", or"directional". The residual decomposition gives two correlated but non-redundant scores per fix:\(\eta\) — scalar residual / width (
"isotropic"); catches strong-omnidirectional outliers.\(\eta_\perp\) — orthogonal residual / width (
"directional"); catches perpendicular-leverage outliers whose parallel component would otherwise dilute the scalar signal. Concentrates on multipath errors and spoofing clusters, which drift across-track rather than along-track.
"combined"appliesthreshold_typeindependently to each score and flags a fix if either score trips — Pareto-dominates the single-score methods on synthetic ground-truth benchmarks and is the recommended default."isotropic"and"directional"restrict flagging to a single score for interpretation or for reproducing the individual primitives in isolation.Note: earlier versions named these options
"dBBMM"and"dBGB"in reference to the dynamic Brownian-bridge framework. The names were misleading — this method uses the bridge-mean construction and the temporal width factor \(\sqrt{\Delta t_1 \Delta t_2 / (\Delta t_1 + \Delta t_2)}\) but deliberately does not invoke the variance-estimation machinery with its window and margin that defines the real dBBMM / dBGB (seemt_dbbmm_varianceandmt_dbgb_variancefor those). Omitting the variance is intentional: a locally-estimated variance is corrupted by the very outliers it would denominate (the leverage problem).- threshold_type
Character, one of
"entropy"(default) or"gap"."entropy"requires a real density valley between outliers and the bulk of \(-\log \eta\); it returns no outliers on clean tracks."gap"uses the broken-stick and tail-decay inflection detector; it is more sensitive but can over-flag unimodal tails.- threshold
Numeric. Detection strictness. Its meaning depends on
threshold_type:"entropy"uses it as the maximum valley-to-peak density ratio (default 0.3);"gap"uses it as the break-size multiplier (default 3). IfNULL, a sensible default is chosen based onthreshold_type.- location_error
Per-fix observation-error prior, in metres (1-sigma horizontal). Default
NULL(disabled, current behaviour). Accepts:NULL– no obs-error injection.a positive numeric scalar – uniform sigma applied to every fix. Useful when the device has a quoted nominal accuracy but no per-fix quality column.
a numeric vector of length
nrow(x)– per-fix sigma already in metres.a single character string – name of a column in
xcontaining per-fix sigma in metres (e.g."eobs_horizontal_accuracy_estimate").the literal
"auto"– probeseobs_horizontal_accuracy_estimatefirst, thenargos_lc(mapped via the standard CLS / Vincent et al. 2002 location-class table). Falls back to no injection (with a message) when neither column is present.
When supplied, the bridge denominator at each fix is augmented with the variance contribution from its anchors only – the target fix's own sigma deliberately does not enter, preserving leverage immunity. The anchor variance is converted into bridge-width-equivalent units via an empirical residual-scale estimator \(\hat S = \mathrm{median}(r_i^2 / w_i^2)\) computed on the active mask in iteration 1. See the diagnostic column
bridge_obs_inflationfor how strongly each fix's denominator was inflated. The mechanism is most useful for Argos / mixed-mode tracks where per-fix accuracy varies by orders of magnitude; on uniform-quality GPS tracks it is typically a small correction.- residual_floor
Numeric, non-negative. Minimum absolute bridge residual (metres) for a rate-flagged fix to actually be flagged as an outlier. Default
0(disabled); the pre-2026 rate-only behaviour. Set to a positive value (commonly 5–25 m, or your device's nominal accuracy) to add a two-axis criterion: a fix is flagged only where both the rate score trips the threshold AND the absolute residual exceeds this floor. Recommended for real-world data with burst-mode sampling, where a few-metre GPS jitter over a 1-second dt produces a large eta that is not a physical outlier. Not on by default so existing synthetic-ground-truth benchmarks (which can include sub-noise displacement outliers by construction) continue to pass.- iterations
Integer. Maximum number of refinement passes. Default 3 is HEURISTIC – empirically the bulk of refinement gain happens in the first 2–3 iterations on the synthetic CPF benchmark. Iteration stops early when no new points are flagged. Plausible range: 2–5. Higher values increase runtime without meaningful F1 gain on the benchmark cohort.
- dedup_neighbours
Logical. If
TRUE(default), remove neighbour-smearing by keeping only the peak \(\eta\) in each consecutive run of flagged indices.- pool_by
Optional character vector of length 1 or 2 naming column(s) in
mt_track_data(x). Length 1: single column used as both fit set and operating unit. Length 2:c(outer, inner)whereouternames the fit-source column (the union of its events supplies the entropy / gap break distribution onbridge_etaand, for"directional"/"combined",bridge_eta_perp) andinnernames the operating unit (within which pool- added flags are unioned). Length 2 requires strict nesting: every distinctinnervalue must map to exactly oneoutervalue. Length \(> 2\) is rejected (pool_by has exactly two semantic roles). Pool flags union intois_outlier– additive, never un-flags what per-track iteration caught. Theresidual_floorgate is respected at the pool level (matches the per-track gate). No dedup at the pool level (per-track dedup already ran inside the iteration).NULL(default) preserves per-track behaviour byte-identically. See?mt_clean_trackfor the orchestrator-level walkthrough.- plot
Logical. If
TRUE(default), produce a diagnostic plot (sorted \(\log \eta\) with break + map of flags).- remove
Logical. If
TRUE, return the object with flagged rows removed. DefaultFALSE.- silent
Logical. If
FALSE(default) the function prints a brief running narration: per-iteration break and flag counts, input-hygiene notes, projection messages, and a final summary line. SetTRUEto suppress. Errors and warnings are always shown.
Value
A move2 object with columns added:
bridge_residualEuclidean deviation from the bridge mean, in metres.
bridge_widthThe bridge-width factor \(\sqrt{\Delta t_1 \Delta t_2 / (\Delta t_1 + \Delta t_2)}\), in \(\sqrt{s}\).
bridge_etaThe normalised bridge score used for flagging. For
method = "isotropic"this is \(\eta_i = r_i / w_i\). Formethod = "directional"it is the orthogonal component \(\eta_{\perp,i} = r_{\perp,i} / w_i\).bridge_eta_para,bridge_eta_perpGap- normalised parallel and orthogonal residuals. Useful for inspecting whether a flagged point is primarily an along-track or perpendicular anomaly.
bridge_obs_inflationRatio of effective to geometric bridge width \(w_{\text{eff}}/w\). Equal to
1whenlocation_errorisNULLor all anchors have unknown sigma; \(>1\) where anchor obs-error contributed meaningfully.bridge_percentileEmpirical percentile of \(-\log \eta\) (0 = most extreme outlier).
bridge_iterationInteger iteration at which the point was flagged, or
NAif unflagged.is_outlierLogical flag.
is_na_probTRUEwhere \(\eta\) is undefined (track endpoints, missing neighbours).
If remove = TRUE, the flagged rows are dropped.
Details
For each fix \(i\) with temporal neighbours \(i-1\) and \(i+1\):
The Brownian-bridge mean at \(i\) is the time-weighted midpoint of the neighbours' coordinates.
The bridge width is \(\sqrt{\Delta t_1 \Delta t_2 / (\Delta t_1 + \Delta t_2)}\) where \(\Delta t_1, \Delta t_2\) are the gaps to the previous and next fix.
The score \(\eta_i\) is the Euclidean residual divided by the bridge width. It is gap-aware by construction and does not depend on a local variance estimate (which would be inflated by the outlier itself – the leverage problem).
Outliers are identified by applying the broken-stick plus
tail-decay inflection threshold (same algorithm as
mt_flag_outliers) to \(-\log \eta\). Outliers sit
in the lower tail of \(-\log \eta\), equivalently the upper tail
of \(\eta\).
Two post-processing steps clean the result:
Neighbour dedup. An outlier at fix \(i\) distorts the bridges centred on \(i-1\) and \(i+1\), inflating their \(\eta\) too. For each run of consecutively-flagged indices, only the one with the largest \(\eta\) is kept.
Iterative refinement. After flagging, residuals and thresholds are recomputed on the remaining points, and the process repeats until no new flags appear or
iterationsis reached. This catches outliers that were masked by nearby ones on the first pass.
The bridge-residual math is Euclidean and therefore requires metric
coordinates. If the input is in longitude/latitude the function
auto-projects to a local azimuthal-equidistant CRS (centred on the
track's centroid) for the computation, and returns the result in
your original CRS. bridge_residual and bridge_width
are always in metres regardless of the input CRS.
Input preprocessing
The bridge detector assumes each track is a time-sorted sequence of finite-coordinate fixes with no duplicate timestamps. The recommended preprocessing chain on a raw Movebank download is:
x <- mt_filter_gps_quality(x) # sat/DOP/hacc + empty geoms
x <- move2::mt_filter_unique(x, "first") # resolve duplicate times
x <- dplyr::arrange(x, mt_time(x)) # ensure time-sorted
x <- mt_flag_outliers_bridge(x) # auto-projects if lon/latRows with empty geometries or missing timestamps that survive preprocessing are silently excluded from scoring (with an input-hygiene message) but retained in the output so row parity with the caller's object is preserved. Duplicate or out-of-order timestamps are a hard error, because their editorial resolution (which of two coincident fixes to keep) is the user's decision, not this function's.
References
Safi, K. (in preparation). Self-thresholding hierarchical outlier-detection for animal movement tracks. Companion paper to the move2utils R package. Preprint: bioRxiv (DOI forthcoming).
See also
mt_flag_outliers for movement-metric
detection; mt_combined_outliers for composing
detectors via majority vote.
Examples
if (FALSE) { # \dontrun{
library(move2)
syn <- mt_read(system.file("extdata", "synthetic_tracks.csv.gz",
package = "move2utils"))
## project to a metric CRS before calling
syn_m <- sf::st_transform(syn,
"+proj=tmerc +lon_0=11.5 +lat_0=47.5 +ellps=WGS84 +units=m")
result <- mt_flag_outliers_bridge(syn_m)
} # }