Sequential outlier detection for movement data

Detects outliers by walking through a track from confirmed-good locations, evaluating each candidate step against the same joint probability (Equation 1) used by mt_flag_outliers.

Usage

mt_sequential_outliers(
  x,
  reference = NULL,
  scan = "forward-backward",
  n_random = 10,
  autodiff_alpha = 0.5,
  threshold = NULL,
  max_skip = 100,
  time_normalize = TRUE,
  anchor_corruption_threshold = 0.3,
  plot = FALSE
)

Arguments

x: A move2 object.
reference: Optional move2 object from which the probability surfaces are built. If NULL, built from x.
scan: Scanning strategy: "forward-backward" (default), "greedy", or "random".
n_random: Number of random anchor points for scan = "random". Default: 10.
autodiff_alpha: Exponent on the auto-difference terms (Equation 1). Default: 0.5.
threshold: Minimum joint probability for accepting a step. If NULL (default), set to the 0.5th percentile of the reference joint probability distribution.
max_skip: Maximum consecutive locations to skip before force-advancing. Default: 100.
time_normalize: Logical; if TRUE (default), use speed and angular velocity.
anchor_corruption_threshold: Numeric in (0, 1). If a track's per-track flag rate exceeds this value the function emits a warning naming the anchor-corruption signature. Default 0.30: above 30\ scan, the anchor is the more likely failure mode than the data. Set to NULL to disable.
plot: Logical; if TRUE, plot the results.

Value

The input move2 object with added columns: is_outlier and seq_joint_prob.

Details

Three scanning strategies are available:

"forward-backward": Scan forward from the first location and backward from the last; combine via max probability.
"greedy": Start from the location with the highest simultaneous joint probability and expand outward in both directions. Requires running mt_flag_outliers first.
"random": Launch scans from multiple random anchor points; each location's probability is the median across scans.

Relationship to the four-primitive cascade. The unified cleaner mt_clean_track uses simultaneous thresholding on the joint-probability surface (via mt_flag_outliers) and combines it with three other detectors (bridge, detour, speed-cap) under a class-aware flag rule. mt_sequential_outliers is an alternative strategy on the same probability surface: rather than applying a single threshold to all fixes at once, it walks from confirmed-good locations and evaluates each step as a transition. This catches block-shaped errors whose interior steps look ordinary in transitions (the simultaneous threshold misses them because each interior step is locally plausible). The cascade addresses block-shaped contamination via bridge + detour + the topological block-expansion step, so mt_clean_track is the recommended default. Reach for mt_sequential_outliers when (i) you want an orthogonal cross-check on the cascade results, (ii) you have a tightly-clustered burst of contamination on an otherwise short track where the cascade's iteration loop converges slowly, or (iii) you are calibrating thresholds on the joint-probability surface specifically and want to compare simultaneous vs sequential evaluation.

Anchor-corruption diagnostic

Sequential scanning evaluates each step relative to a confirmed- good anchor. When the anchor is itself corrupt (a tag-deployment glitch on the first or last fix; an Argos bias spike that coincides with the strategy's chosen anchor in scan = "greedy"; or all anchors in scan = "random" falling within a contaminated cluster), the scan immediately classifies every legitimate fix as anomalous because every legitimate step looks "anomalous" relative to the corrupt anchor. The max_skip safety net catches the runaway case eventually, but below that threshold the failure mode is silent.

This diagnostic surfaces the failure mode by warning when more than anchor_corruption_threshold of a track's fixes are flagged. Real outlier rates are typically <5\ indicate the scan is reflecting the anchor's perspective rather than the data's. Recovery: try scan = "forward-backward" (already the default; combines both endpoints), supply a clean reference = object, or use mt_clean_track which is anchor-free.

Two-threshold calibration (fixed 2026-05-12)

The scan-time scoring formula has two regimes: a partial-formula first step from any anchor (prob = stp, no autodifference contribution because there is no prior anchor or post-max_skip reset wiped prev_v/prev_w) and a full-formula in-scan step (prob = stp * (dsp * dtp)^autodiff_alpha). The algorithm now calibrates two thresholds (threshold_partial and threshold_full) – each the 0.5\ reference distribution – and the scan compares each step's score to its regime-matched threshold. Pre-fix (before 2026-05-12) the algorithm used a single threshold calibrated on the full formula; on multi-state tracks where the autodifference KDE returned very large densities at the bimodal Delta-step peaks, the full-formula threshold sat orders of magnitude above typical partial-formula first-step scores and the scan flagged ~all fixes (WH17 over-flag at 99.99\ single-state synthetic CPF tracks the two thresholds happen to agree numerically and the per-fix flag set is byte-identical. User-supplied threshold = X sets both thresholds to X (back-compat).

The 0.5\ a per-fix p-value-based threshold (parametric null on the reference log-probability) is the natural improvement direction but is deferred (see FUTURE_IMPROVEMENTS.md M-12).

Documented workflow recommendations remain unchanged: on heavily contaminated multi-state tracks, supply a clean reference = object, or use mt_flag_outliers for initial screening, or use mt_clean_track (anchor-free). Deriving a state column from speed and dispatching per-state does NOT help – it fragments the failure AND confounds outlier signal with state signal on synthetic block-shape contamination (CPF_D).

References

Safi, K. (in preparation). Self-thresholding hierarchical outlier-detection for animal movement tracks. Companion paper to the move2utils R package. Preprint: bioRxiv (DOI forthcoming).

Examples

if (FALSE) { # \dontrun{
## Forward-backward scan from the track endpoints:
res <- mt_sequential_outliers(track, scan = "forward-backward")
summary(res$seq_joint_prob)
} # }