Sequential outlier detection for movement data
Source:R/mt_sequential_outliers.R
mt_sequential_outliers.RdDetects outliers by walking through a track from confirmed-good
locations, evaluating each candidate step against the same joint
probability (Equation 1) used by mt_flag_outliers.
Usage
mt_sequential_outliers(
x,
reference = NULL,
scan = "forward-backward",
n_random = 10,
autodiff_alpha = 0.5,
threshold = NULL,
max_skip = 100,
time_normalize = TRUE,
anchor_corruption_threshold = 0.3,
plot = FALSE
)Arguments
- x
A
move2object.- reference
Optional
move2object from which the probability surfaces are built. If NULL, built fromx.- scan
Scanning strategy:
"forward-backward"(default),"greedy", or"random".- n_random
Number of random anchor points for
scan = "random". Default: 10.- autodiff_alpha
Exponent on the auto-difference terms (Equation 1). Default: 0.5.
- threshold
Minimum joint probability for accepting a step. If NULL (default), set to the 0.5th percentile of the reference joint probability distribution.
- max_skip
Maximum consecutive locations to skip before force-advancing. Default: 100.
- time_normalize
Logical; if TRUE (default), use speed and angular velocity.
- anchor_corruption_threshold
Numeric in
(0, 1). If a track's per-track flag rate exceeds this value the function emits a warning naming the anchor-corruption signature. Default0.30: above 30\ scan, the anchor is the more likely failure mode than the data. Set toNULLto disable.- plot
Logical; if TRUE, plot the results.
Details
Three scanning strategies are available:
"forward-backward"Scan forward from the first location and backward from the last; combine via max probability.
"greedy"Start from the location with the highest simultaneous joint probability and expand outward in both directions. Requires running
mt_flag_outliersfirst."random"Launch scans from multiple random anchor points; each location's probability is the median across scans.
Relationship to the four-primitive cascade. The unified
cleaner mt_clean_track uses simultaneous
thresholding on the joint-probability surface (via
mt_flag_outliers) and combines it with three other
detectors (bridge, detour, speed-cap) under a class-aware flag
rule. mt_sequential_outliers is an alternative
strategy on the same probability surface: rather than
applying a single threshold to all fixes at once, it walks from
confirmed-good locations and evaluates each step as a transition.
This catches block-shaped errors whose interior steps look
ordinary in transitions (the simultaneous threshold misses them
because each interior step is locally plausible). The cascade
addresses block-shaped contamination via bridge + detour + the
topological block-expansion step, so mt_clean_track is the
recommended default. Reach for mt_sequential_outliers
when (i) you want an orthogonal cross-check on the cascade
results, (ii) you have a tightly-clustered burst of contamination
on an otherwise short track where the cascade's iteration loop
converges slowly, or (iii) you are calibrating thresholds on the
joint-probability surface specifically and want to compare
simultaneous vs sequential evaluation.
Anchor-corruption diagnostic
Sequential scanning evaluates each step relative to a confirmed-
good anchor. When the anchor is itself corrupt (a tag-deployment
glitch on the first or last fix; an Argos bias spike that
coincides with the strategy's chosen anchor in
scan = "greedy"; or all anchors in scan = "random"
falling within a contaminated cluster), the scan immediately
classifies every legitimate fix as anomalous because every
legitimate step looks "anomalous" relative to the corrupt anchor.
The max_skip safety net catches the runaway case eventually,
but below that threshold the failure mode is silent.
This diagnostic surfaces the failure mode by warning when more
than anchor_corruption_threshold of a track's fixes are
flagged. Real outlier rates are typically <5\
indicate the scan is reflecting the anchor's perspective rather
than the data's. Recovery: try scan = "forward-backward"
(already the default; combines both endpoints), supply a clean
reference = object, or use mt_clean_track
which is anchor-free.
Two-threshold calibration (fixed 2026-05-12)
The scan-time scoring formula has two regimes: a partial-formula
first step from any anchor (prob = stp, no autodifference
contribution because there is no prior anchor or post-max_skip
reset wiped prev_v/prev_w) and a full-formula in-scan
step (prob = stp * (dsp * dtp)^autodiff_alpha). The
algorithm now calibrates two thresholds (threshold_partial
and threshold_full) – each the 0.5\
reference distribution – and the scan compares each step's score
to its regime-matched threshold. Pre-fix (before 2026-05-12) the
algorithm used a single threshold calibrated on the full formula;
on multi-state tracks where the autodifference KDE returned very
large densities at the bimodal Delta-step peaks, the full-formula
threshold sat orders of magnitude above typical partial-formula
first-step scores and the scan flagged ~all fixes (WH17 over-flag
at 99.99\
single-state synthetic CPF tracks the two thresholds happen to
agree numerically and the per-fix flag set is byte-identical.
User-supplied threshold = X sets both thresholds to
X (back-compat).
The 0.5\
a per-fix p-value-based threshold (parametric null on the
reference log-probability) is the natural improvement direction
but is deferred (see FUTURE_IMPROVEMENTS.md M-12).
Documented workflow recommendations remain unchanged: on heavily
contaminated multi-state tracks, supply a clean
reference = object, or use mt_flag_outliers
for initial screening, or use mt_clean_track
(anchor-free). Deriving a state column from speed and
dispatching per-state does NOT help – it fragments the failure
AND confounds outlier signal with state signal on synthetic
block-shape contamination (CPF_D).
References
Safi, K. (in preparation). Self-thresholding hierarchical outlier-detection for animal movement tracks. Companion paper to the move2utils R package. Preprint: bioRxiv (DOI forthcoming).
See also
mt_clean_track (recommended unified
cleaner); mt_flag_outliers (probability primitive
with simultaneous thresholding – the basis this function
scans over); mt_combined_outliers
(majority vote across simultaneous + sequential strategies);
mt_persistence_score (multi-scale persistence
annotation – works on the output of any flagger including
this one).
Examples
if (FALSE) { # \dontrun{
## Forward-backward scan from the track endpoints:
res <- mt_sequential_outliers(track, scan = "forward-backward")
summary(res$seq_joint_prob)
} # }