Flag outliers via state-aware bridge residuals (dBGB envelope)

State-aware bridge-residual primitive. Estimates per-axis (parallel / orthogonal) motion variance via mt_dbgb_variance and flags fixes by an envelope rule across three Z channels: along-axis ($Z_{\parallel}$), across-axis ($Z_{\perp}$), and the joint $Z_{\chi^2_2} = \sqrt{Z_{\parallel}^2 + Z_{\perp}^2}$. Under the constraint $\sigma^2_{\parallel} = \sigma^2_{\perp}$ (i.e. isotropic motion variance) the dBGB model reduces to dBBMM and $Z_{\chi^2_2}$ reduces to the dBBMM scalar-Z; that submodel is available via the variance = "dbbmm" option here, or equivalently via mt_flag_outliers_dbbmm as a speed-saving wrapper. The leverage-immune static bridge primitive is mt_flag_outliers_bridge.

Usage

mt_flag_outliers_dbgb(
  x,
  location_error = 0,
  z_threshold = NULL,
  z_threshold_method = c("bonferroni", "bh_fdr", "gap"),
  z_gap_threshold = NULL,
  residual_max = NULL,
  residual_entropy_threshold = NULL,
  residual_gap_threshold = NULL,
  residual_dip_alpha = 0.05,
  window_size = NULL,
  margin = NULL,
  variance = c("dbgb", "dbbmm"),
  pre_peel_v_max = NULL,
  plot = TRUE,
  remove = FALSE,
  silent = FALSE
)

Arguments

x: A move2 object. Single- or multi-track. Auto- projected to a per-track local AEQD if input is lon/lat.
location_error: Numeric scalar or vector of length nrow(x) giving the per-fix GPS noise floor in metres (1-sigma). Default 0 – no anchor noise floor injected unless the user supplies a calibrated value. Enters the denominator as 2 * location_error^2, the variance contribution from imperfect knowledge of both bridge anchors; with the default, that term is zero and the denominator reduces to sqrt(sigma^2(t_i) * w_i^2). For e-obs / Argos-A class hardware, 25 is a defensible scalar; for other devices, supply a calibration estimate. See the package vocabulary note: location_error is a user-supplied calibration prior, not a Movebank-exposed device attribute – those are handled separately as quality columns.
z_threshold: Numeric scalar or NULL. When supplied, overrides only the chisq-channel threshold; per-axis thresholds stay parametric. Incompatible with z_threshold_method = "gap" (which is data-driven).
z_threshold_method: Character, one of "bonferroni" (default), "bh_fdr", or "gap". Selects how the per-channel thresholds are derived; see Details for the alpha allocation across the three Z channels.
z_gap_threshold: Numeric multiplier controlling strictness of the broken-stick gap detector when z_threshold_method = "gap". NULL (default) defers to .z_gap_threshold's leaf formal (3). Higher = more conservative (fewer flags). Plausible range 2–5.
residual_max: Numeric scalar, Inf, or NULL. The absolute-residual axis: a fix is flagged if its bridge residual exceeds this magnitude regardless of $Z$. Catches geometrically extreme residuals that dBBMM-Z would absorb into local sigma^2. Default NULL runs a "last-satisfactory- break" gap detector on log(residual): it walks gaps from the extreme tail down toward the bulk and picks the last gap that exceeds the broken-stick null. This places the cap at the bulk-to-tail boundary, ensuring all residuals above the bulk are caught even when the outlier set itself is multi-scale (e.g. spoof clusters at multiple distance scales). Track-specific: stork bulk ~100 m gets cap ~ 1 km; swift bulk in km range gets cap scaled accordingly. Inf disables the axis.
residual_entropy_threshold, residual_gap_threshold: Numeric scalars or NULL. Control the entropy / gap detectors inside the residual_max = NULL auto path – the gate on |residual| in metres that catches geometrically extreme residuals. NULL (default) defers to the leaf formals. See audits/2026-05-25-parameter-propagation/findings.md §1.2.
residual_dip_alpha: Numeric in (0, 1). Significance level for the Hartigan dip test that validates the broken-stick fallback inside the residual_max auto path. Default 0.05 (Fisher convention).
window_size: Integer or NULL. dBBMM sliding-window size. NULL (default) auto-selects from the data's motion autocorrelation: window = c * decorrelation lag of $v^2$, with c = 4. Capped at max(11, n / 8).
margin: Integer or NULL. dBBMM margin. NULL (default) sets margin = (window_size - 1) / 4, rounded to the nearest odd integer.
variance: Character, one of "dbgb" (default, directional) or "dbbmm" (isotropic special case, faster but discards the directional decomposition). Under "dbbmm" only the chisq channel exists, and the envelope reduces to the legacy scalar Rayleigh-Z test.
pre_peel_v_max: Numeric scalar or NULL. If supplied, run mt_peel_speed at this cap before sigma^2 estimation; this protects against the leverage problem when physiologically-impossible fixes are present in the data. NULL (default) skips the peel – standalone use of this primitive.
plot: Logical. Default TRUE; diagnostic plot.
remove: Logical. If TRUE, drop flagged rows from the returned object. Default FALSE.
silent: Logical. Default FALSE.

Value

The input object with added columns:

bridge_residual: Euclidean residual from bridge mean, in metres.
bridge_width: Bridge width (sqrt-time scale).
sigma2_motion_para, sigma2_motion_orth: Per-axis local $\sigma^2$ from dBGB in m^2/min. Only emitted under variance = "dbgb".
sigma2_motion: Isotropic local $\sigma^2$ from dBBMM in m^2/min. Only emitted under variance = "dbbmm".
bridge_z: The chisq Z used as the principal flagging statistic (= bridge_z_chisq). Kept for back-compat with downstream code that reads a single bridge_z column.
bridge_z_chisq: The combined-axis Z, $\sqrt{Z_{\parallel}^2 + Z_{\perp}^2}$. Rayleigh(1) under the bridge null. Equal to bridge_z under variance = "dbbmm".
bridge_z_para, bridge_z_orth: Per-axis half-normal Z-scores from the dBGB decomposition. Only emitted under variance = "dbgb".
bridge_z_class: Diagnostic label classifying which channel(s) fired on flagged fixes ("para", "orth", "isotropic", "isotropic_para_dom" / "isotropic_orth_dom" sub-classes for chisq-only flags by axis dominance, multi-channel combinations, "residual" for the geometric-impossibility cap, or "none").
bridge_iteration: Integer iteration at which the fix was flagged (always 0 / 1 here – this primitive is single-pass; provided for API symmetry).
is_outlier: Logical. TRUE where flagged.

Plus attributes z_thresholds (list with per-channel thresholds), z_threshold (the chisq threshold, for back- compat), z_threshold_method, residual_max, window_size, location_error.

Details

Three Z channels. Under variance = "dbgb", the bridge residual at each fix is decomposed onto the local travel axis ($r_{\parallel}$) and its perpendicular ($r_{\perp}$). Each component is normalised by its own axis-specific bridge variance:

$$Z_{\parallel,i} = r_{\parallel,i} / \sqrt{\sigma^2_{\parallel}(t_i) \cdot w_i^2 + 2 \cdot \mathrm{location\_error}^2}$$

and analogously for $Z_{\perp,i}$. The chisq channel is $Z_{\chi^2_2,i} = \sqrt{Z_{\parallel,i}^2 + Z_{\perp,i}^2}$, which under the diagonal-Sigma bridge null is Rayleigh(1)- distributed – the same null distribution as the dBBMM scalar Z.

Envelope flag rule. A fix is flagged when ANY of the three Z channels exceeds its own threshold. This catches three distinct outlier signatures:

$Z_{\parallel}$-only: a step-acceleration anomaly along the local travel direction (overshoot / undershoot in the direction of motion).
$Z_{\perp}$-only: a sideways anomaly perpendicular to travel (typical of colony-return spikes in migrating animals or azimuthal GPS jitter).
$Z_{\chi^2_2}$-only: a combined-axis anomaly where neither component exceeds its per-axis threshold but their quadrature sum does (an isotropic spike with no axis preference).

Threshold methods. Three calibrations are available via z_threshold_method:

"bonferroni" (default). Per-channel Bonferroni-Z thresholds with joint FWER $\le 0.05$: $\alpha/2$ to the chisq channel (Rayleigh-tail), $\alpha/4$ to each per- axis channel ($|N(0,1)|$-tail). Conservative and FWER- correct.
"bh_fdr". Same alpha allocation but using Benjamini-Hochberg FDR per channel. More sensitive at the same nominal alpha; appropriate when missing a real outlier costs more than flagging a marginal one.
"gap". Per-channel data-driven break via the package's broken-stick + tail-decay inflection helper. The same self-thresholding family used elsewhere in the package (e.g. mt_flag_outliers_bridge, the residual_max gate below). Note: the gap detector was calibrated for log-probability scores – on raw Z (whose null is half-normal / Rayleigh with thin natural tails) it tends to find a "break" at the natural decay edge of the bulk and over-flags clean tracks. Use only when the parametric calibration is too conservative for your application.

Diagnostic labelling. Every flagged fix carries a bridge_z_class label identifying which channel(s) fired and (for chisq-only flags) which axis dominated. This is the principal value-add over the dBBMM-only primitive: the flag set is typically the same on real tracks (the per-axis envelope tests rarely fire under proper FWER control), but the directional attribution lets users explain why each fix was caught.

The leverage caveat

The dBBMM variance estimator is sensitive to outliers (large residuals inflate the windowed sigma^2 estimate, after which the offending fix's Z score is bounded – the leverage problem). The legacy bridge avoids this by not estimating variance. If your track contains physiologically-impossible-speed events, run mt_peel_speed first OR pass pre_peel_v_max (which runs the peel internally before sigma^2 estimation). mt_clean_track composes the two stages by default.

Examples

if (FALSE) { # \dontrun{
library(move2)
x <- mt_read(system.file("extdata/Pettstadt1-14053.csv.gz",
                           package = "move2utils"))
x <- x[!sf::st_is_empty(x), ]
x <- mt_filter_unique(x, criterion = "first")
x <- dplyr::arrange(x, mt_track_id(x), mt_time(x))

## standalone (single-state track or pre-cleaned)
res <- mt_flag_outliers_dbgb(x)

## cascade composition: peel impossible-speed fixes first
res <- mt_flag_outliers_dbgb(x, pre_peel_v_max = 50)
} # }