Flag outliers via state-aware bridge residuals (dBGB envelope)
Source:R/mt_flag_outliers_dbgb.R
mt_flag_outliers_dbgb.RdState-aware bridge-residual primitive. Estimates per-axis
(parallel / orthogonal) motion variance via
mt_dbgb_variance and flags fixes by an envelope rule
across three Z channels: along-axis (\(Z_{\parallel}\)),
across-axis (\(Z_{\perp}\)), and the joint
\(Z_{\chi^2_2} = \sqrt{Z_{\parallel}^2 + Z_{\perp}^2}\). Under
the constraint \(\sigma^2_{\parallel} = \sigma^2_{\perp}\) (i.e.
isotropic motion variance) the dBGB model reduces to dBBMM and
\(Z_{\chi^2_2}\) reduces to the dBBMM scalar-Z; that submodel is
available via the variance = "dbbmm" option here, or
equivalently via mt_flag_outliers_dbbmm as a
speed-saving wrapper. The leverage-immune static bridge primitive
is mt_flag_outliers_bridge.
Usage
mt_flag_outliers_dbgb(
x,
location_error = 0,
z_threshold = NULL,
z_threshold_method = c("bonferroni", "bh_fdr", "gap"),
z_gap_threshold = NULL,
residual_max = NULL,
residual_entropy_threshold = NULL,
residual_gap_threshold = NULL,
residual_dip_alpha = 0.05,
window_size = NULL,
margin = NULL,
variance = c("dbgb", "dbbmm"),
pre_peel_v_max = NULL,
plot = TRUE,
remove = FALSE,
silent = FALSE
)Arguments
- x
A
move2object. Single- or multi-track. Auto- projected to a per-track local AEQD if input is lon/lat.- location_error
Numeric scalar or vector of length
nrow(x)giving the per-fix GPS noise floor in metres (1-sigma). Default0– no anchor noise floor injected unless the user supplies a calibrated value. Enters the denominator as2 * location_error^2, the variance contribution from imperfect knowledge of both bridge anchors; with the default, that term is zero and the denominator reduces tosqrt(sigma^2(t_i) * w_i^2). For e-obs / Argos-A class hardware,25is a defensible scalar; for other devices, supply a calibration estimate. See the package vocabulary note:location_erroris a user-supplied calibration prior, not a Movebank-exposed device attribute – those are handled separately as quality columns.- z_threshold
Numeric scalar or
NULL. When supplied, overrides only the chisq-channel threshold; per-axis thresholds stay parametric. Incompatible withz_threshold_method = "gap"(which is data-driven).- z_threshold_method
Character, one of
"bonferroni"(default),"bh_fdr", or"gap". Selects how the per-channel thresholds are derived; see Details for the alpha allocation across the three Z channels.- z_gap_threshold
Numeric multiplier controlling strictness of the broken-stick gap detector when
z_threshold_method = "gap".NULL(default) defers to.z_gap_threshold's leaf formal (3). Higher = more conservative (fewer flags). Plausible range 2–5.- residual_max
Numeric scalar,
Inf, orNULL. The absolute-residual axis: a fix is flagged if its bridge residual exceeds this magnitude regardless of \(Z\). Catches geometrically extreme residuals that dBBMM-Z would absorb into local sigma^2. DefaultNULLruns a "last-satisfactory- break" gap detector onlog(residual): it walks gaps from the extreme tail down toward the bulk and picks the last gap that exceeds the broken-stick null. This places the cap at the bulk-to-tail boundary, ensuring all residuals above the bulk are caught even when the outlier set itself is multi-scale (e.g. spoof clusters at multiple distance scales). Track-specific: stork bulk ~100 m gets cap ~ 1 km; swift bulk in km range gets cap scaled accordingly.Infdisables the axis.- residual_entropy_threshold, residual_gap_threshold
Numeric scalars or
NULL. Control the entropy / gap detectors inside theresidual_max = NULLauto path – the gate on|residual|in metres that catches geometrically extreme residuals.NULL(default) defers to the leaf formals. See audits/2026-05-25-parameter-propagation/findings.md §1.2.- residual_dip_alpha
Numeric in (0, 1). Significance level for the Hartigan dip test that validates the broken-stick fallback inside the
residual_maxauto path. Default0.05(Fisher convention).- window_size
Integer or
NULL. dBBMM sliding-window size.NULL(default) auto-selects from the data's motion autocorrelation: window =c* decorrelation lag of \(v^2\), withc = 4. Capped atmax(11, n / 8).- margin
Integer or
NULL. dBBMM margin.NULL(default) setsmargin = (window_size - 1) / 4, rounded to the nearest odd integer.- variance
Character, one of
"dbgb"(default, directional) or"dbbmm"(isotropic special case, faster but discards the directional decomposition). Under"dbbmm"only the chisq channel exists, and the envelope reduces to the legacy scalar Rayleigh-Z test.- pre_peel_v_max
Numeric scalar or
NULL. If supplied, runmt_peel_speedat this cap before sigma^2 estimation; this protects against the leverage problem when physiologically-impossible fixes are present in the data.NULL(default) skips the peel – standalone use of this primitive.- plot
Logical. Default TRUE; diagnostic plot.
- remove
Logical. If
TRUE, drop flagged rows from the returned object. DefaultFALSE.- silent
Logical. Default
FALSE.
Value
The input object with added columns:
bridge_residualEuclidean residual from bridge mean, in metres.
bridge_widthBridge width (sqrt-time scale).
sigma2_motion_para,sigma2_motion_orthPer-axis local \(\sigma^2\) from dBGB in m^2/min. Only emitted under
variance = "dbgb".sigma2_motionIsotropic local \(\sigma^2\) from dBBMM in m^2/min. Only emitted under
variance = "dbbmm".bridge_zThe chisq Z used as the principal flagging statistic (=
bridge_z_chisq). Kept for back-compat with downstream code that reads a singlebridge_zcolumn.bridge_z_chisqThe combined-axis Z, \(\sqrt{Z_{\parallel}^2 + Z_{\perp}^2}\). Rayleigh(1) under the bridge null. Equal to
bridge_zundervariance = "dbbmm".bridge_z_para,bridge_z_orthPer-axis half-normal Z-scores from the dBGB decomposition. Only emitted under
variance = "dbgb".bridge_z_classDiagnostic label classifying which channel(s) fired on flagged fixes (
"para","orth","isotropic","isotropic_para_dom"/"isotropic_orth_dom"sub-classes for chisq-only flags by axis dominance, multi-channel combinations,"residual"for the geometric-impossibility cap, or"none").bridge_iterationInteger iteration at which the fix was flagged (always 0 / 1 here – this primitive is single-pass; provided for API symmetry).
is_outlierLogical. TRUE where flagged.
Plus attributes z_thresholds (list with per-channel
thresholds), z_threshold (the chisq threshold, for back-
compat), z_threshold_method, residual_max,
window_size, location_error.
Details
Three Z channels. Under variance = "dbgb", the
bridge residual at each fix is decomposed onto the local travel
axis (\(r_{\parallel}\)) and its perpendicular
(\(r_{\perp}\)). Each component is normalised by its own
axis-specific bridge variance:
$$Z_{\parallel,i} = r_{\parallel,i} / \sqrt{\sigma^2_{\parallel}(t_i) \cdot w_i^2 + 2 \cdot \mathrm{location\_error}^2}$$
and analogously for \(Z_{\perp,i}\). The chisq channel is \(Z_{\chi^2_2,i} = \sqrt{Z_{\parallel,i}^2 + Z_{\perp,i}^2}\), which under the diagonal-Sigma bridge null is Rayleigh(1)- distributed – the same null distribution as the dBBMM scalar Z.
Envelope flag rule. A fix is flagged when ANY of the three Z channels exceeds its own threshold. This catches three distinct outlier signatures:
\(Z_{\parallel}\)-only: a step-acceleration anomaly along the local travel direction (overshoot / undershoot in the direction of motion).
\(Z_{\perp}\)-only: a sideways anomaly perpendicular to travel (typical of colony-return spikes in migrating animals or azimuthal GPS jitter).
\(Z_{\chi^2_2}\)-only: a combined-axis anomaly where neither component exceeds its per-axis threshold but their quadrature sum does (an isotropic spike with no axis preference).
Threshold methods. Three calibrations are available via
z_threshold_method:
"bonferroni"(default). Per-channel Bonferroni-Z thresholds with joint FWER \(\le 0.05\): \(\alpha/2\) to the chisq channel (Rayleigh-tail), \(\alpha/4\) to each per- axis channel (\(|N(0,1)|\)-tail). Conservative and FWER- correct."bh_fdr". Same alpha allocation but using Benjamini-Hochberg FDR per channel. More sensitive at the same nominal alpha; appropriate when missing a real outlier costs more than flagging a marginal one."gap". Per-channel data-driven break via the package's broken-stick + tail-decay inflection helper. The same self-thresholding family used elsewhere in the package (e.g.mt_flag_outliers_bridge, theresidual_maxgate below). Note: the gap detector was calibrated for log-probability scores – on raw Z (whose null is half-normal / Rayleigh with thin natural tails) it tends to find a "break" at the natural decay edge of the bulk and over-flags clean tracks. Use only when the parametric calibration is too conservative for your application.
Diagnostic labelling. Every flagged fix carries a
bridge_z_class label identifying which channel(s) fired
and (for chisq-only flags) which axis dominated. This is the
principal value-add over the dBBMM-only primitive: the flag set
is typically the same on real tracks (the per-axis envelope
tests rarely fire under proper FWER control), but the directional
attribution lets users explain why each fix was caught.
The leverage caveat
The dBBMM variance estimator is sensitive to outliers (large
residuals inflate the windowed sigma^2 estimate, after which the
offending fix's Z score is bounded – the leverage problem). The
legacy bridge avoids this by not estimating variance. If your
track contains physiologically-impossible-speed events, run
mt_peel_speed first OR pass pre_peel_v_max
(which runs the peel internally before sigma^2 estimation).
mt_clean_track composes the two stages by default.
See also
mt_flag_outliers_dbbmm (isotropic
submodel; speed-saving wrapper around this function with
variance = "dbbmm");
mt_flag_outliers_bridge (leverage-immune static
bridge); mt_peel_speed (Stage 1 of the cascade);
mt_dbgb_variance; mt_clean_track.
Examples
if (FALSE) { # \dontrun{
library(move2)
x <- mt_read(system.file("extdata/Pettstadt1-14053.csv.gz",
package = "move2utils"))
x <- x[!sf::st_is_empty(x), ]
x <- mt_filter_unique(x, criterion = "first")
x <- dplyr::arrange(x, mt_track_id(x), mt_time(x))
## standalone (single-state track or pre-cleaned)
res <- mt_flag_outliers_dbgb(x)
## cascade composition: peel impossible-speed fixes first
res <- mt_flag_outliers_dbgb(x, pre_peel_v_max = 50)
} # }