One-stop detector that fuses the three primitives
(mt_flag_outliers_bridge,
mt_flag_outliers,
mt_flag_speed_cap) into a single iterative pipeline
and adds a topological block-expansion step. Each primitive works
at a different grain of analysis (point, segment, step) and catches
a different class of error; combining them via a conjunction rule
and iterating to convergence catches the long outlier trains that
per-fix scoring structurally cannot resolve.
Usage
mt_clean_track(
x,
v_max = NULL,
mass = NULL,
mode = NULL,
state = NULL,
consensus = c("class_aware", "strict", "majority", "speed_trusted", "any", "custom"),
consensus_custom = NULL,
transition_buffer = 1L,
use_detour = TRUE,
detour_k = 1L,
detour_threshold = 8,
iterations = "until_clean",
max_iterations = 100L,
max_flag_fraction = 0.2,
expand_blocks = TRUE,
pre_peel_aux = c("none", "primitives"),
persistence_filter = c("none", "class_aware"),
location_error = NULL,
residual_floor = 0,
step_floor = 0,
pool_by = NULL,
bridge_method = NULL,
bridge_threshold_type = NULL,
bridge_iterations = NULL,
prob_threshold_type = NULL,
detour_threshold_type = NULL,
entropy_threshold = NULL,
gap_threshold = NULL,
persistence_filter_threshold = NULL,
plot = TRUE,
remove = TRUE,
silent = FALSE,
compact = FALSE
)Arguments
- x
A
move2object. Any CRS (auto-projected if needed).- v_max
Numeric scalar or
NULL. Physiological speed cap in m/s. IfNULL(default), the speed detector runs in"auto"mode (dip-test-validated data-driven threshold) on every iteration. Supply a positive scalar for a hard cap override. Mutually exclusive with the(mass, mode)allometric route below.- mass, mode
Optional pair. When both are supplied (and
v_maxisNULL), the function derives a principled physiological cap from species body mass and locomotor mode viav_phys_estimate(Hirt et al. 2017 general scaling law) and uses the central estimate asv_max. This is the recommended route for users without a species-specific published maximum. Specialist sprinters (cheetah, pronghorn) and 3D-tracked diving / stooping data should override with a published value instead.Both can be passed as either a single value applied to every individual, or a per-track named vector whose names match the track ids. Per-track
massis the right choice for multi-individual studies where individuals differ in body mass (juveniles vs adults, sexual dimorphism); per-trackmodeis rarely needed within a single-species study but supports mixed-locomotion datasets.massis in kg. Ifmasscarries aunitsattribute (e.g. as returned by Movebank'sanimal_massin grams) it is auto-converted to kg. A bare scalarmass > 100triggers a warning that the value looks like grams.modeis one of"flying","running","swimming".- state
Optional behavioural-state assignment. Default
NULL(no segmentation; the cleaner sees one global distribution). Accepted forms:character of length 1: name of a column on
xholding per-fix state values.vector of length
nrow(x): per-fix state values directly. Numeric, character, factor, or logical are all accepted;NAis treated as its own state.
When supplied, the cleaner partitions each track into contiguous runs of constant state and runs the full pipeline independently on each segment. This is the right behaviour for tracks with kinematically distinct states (rest + flight, perch + migration glide), where pooling the speed / residual distributions across states forces the threshold detectors to choose between over- or under-flagging the smaller mode. The package's contract is to respect user-supplied state labels — segmentation itself (speed-threshold, HMM, manual) is the user's responsibility; see
vignette("OUTLIER_3_state_conditional", package = "move2utils").Segments shorter than 3 fixes pass through unflagged (the per-fix detectors require at least 3 points for a residual / auto- difference); a one-line note is emitted unless
silent.- consensus
Character. Which rule to use to combine the four detector outputs into a per-fix outlier flag. Defaults to
"class_aware", the empirically validated one-fits-all default (round-4 audit, 2026-05-11). Other values ("strict","majority","speed_trusted","any","custom") give the user a conservative-vs- liberal knob. Seemt_flag_consensusfor the detailed specification of each mode. The block-expansion step (whenexpand_blocks = TRUE) is independent ofconsensus– it identifies disconnected components directly from the speed distribution, so coherent outlier blocks are caught regardless of the per-fix rule.- consensus_custom
Function used when
consensus = "custom". Receives four logical vectors of equal length (by_bridge, by_prob, by_speed, by_detour) and must return a single logical vector of the same length giving the per-fix outlier decision. Ignored unlessconsensus = "custom". Seemt_flag_consensusfor examples.- transition_buffer
Non-negative integer. Default
1L. Width (in fixes) of the buffer zone around state transitions inside which state-dependent flag classes (state_anomaly,kinematic_confluence) are demoted tostate_transition_buffered(kept). Set to0Lto disable. Geometric consensus (geometric_spike,consensus) andblockexpansion still flag at transitions. Only takes effect whenstate =is supplied.- use_detour
Logical. If
TRUE(default), include the path-vs-displacement detour ratio (seemt_flag_outliers_detour) as a fourth point-level detector in the conjunction rule. Detour is time-insensitive and scale-invariant, complementing the bridge primitive at sparse sampling rates where bridge \(\sigma\)-scaling loses sensitivity to single-fix spikes whose implied step speed stays below physiological caps. Set toFALSEto reproduce the strict three-detector behaviour from earlier package versions.- detour_k
Integer. Window radius for the detour primitive. Default
1L(single-fix spikes). Seemt_flag_outliers_detourfor the multi-k diagnostic form.- detour_threshold
Numeric, > 1. Detour ratio threshold. Default
8. EMPIRICALLY TUNED – the minimum value at which the synthetic ground-truth set (CPF_A/B/C) is preserved exactly underuse_detour = TRUE(no false positive on CPF_B, no recall loss on CPF_A or CPF_C). Plausible range: 5–15; lower is more aggressive at sparse sampling.- iterations
Either the string
"until_clean"(default),Inf, or a positive integer."until_clean"andInfare equivalent and mean "iterate until no new flags are produced ormax_iterationsis reached". A positive integer caps iteration at that number of passes.- max_iterations
Integer. Hard safety cap on iteration count even with
iterations = "until_clean". Default 100 is EMPIRICALLY TUNED (CASCADE_AUDIT_2026-05-11.md Section 3.4 + 8): the iteration-count distribution across all audit tracks tops out at 22 iter for legitimate convergence on CPF / cohort; K02 (block contamination, auto-cap path) is the longest documented legitimate convergence at 61 iter. 100 covers K02 with ~39-iter margin AND caps multi-state failure modes (WH17-class tracks that never converge) at half the previous wallclock. Tracks that hit the cap return withconvergence = "max_iterations"– a visible signal the user can opt up viamax_iterations = 200L(the previous default) or higher. Plausible range: 50–500.- max_flag_fraction
Numeric in (0, 1]. If cumulative flags exceed this fraction of the track, abort. Default 0.2 is HEURISTIC – the convention "if cleaning >20\ you are misusing the tool" rather than a derived bound. Plausible range: 0.1–0.3. Lower values abort earlier on pathological tracks; higher values give the iteration loop more room.
- expand_blocks
Logical. If
TRUE(default), run the topological block-expansion step.- pre_peel_aux
Character, one of
"none"(default) or"primitives". Controls the pre-peel mode whenv_max(or(mass, mode)) is supplied. With"none", the pre-peel is symmetric: every fix on either end of an offending edge (step_speed > v_max) is removed. With"primitives", the function first runsmt_flag_outliers_bridgeandmt_flag_outliers_detouronce on the raw input, builds a per-fix auxiliary score by rank-normalising and summing their magnitudes, and passes it tomt_peel_speed'saux_scoresargument; the peel then flags only the higher-scoring endpoint per offending edge.The asymmetric mode is designed for tracks whose contamination is dominated by 1-fix spikes whose clean neighbours the symmetric default would peel along with the spike. Empirical benchmark (synthetic CPF, audit 2026-05-11): mean F1 +0.036, CPF_A F1 0.885 -> 0.958, CPF_D F1 0.674 -> 0.779, no losses.
Cluster-outlier caveat. On coherent multi-fix contamination (sustained spoofs, deployment confusion blocks like the K02 benchmark) the asymmetric peel can walk inward from one boundary only or oscillate, producing false positives at the cluster boundary. Empirical (K02 benchmark, audit 2026-05-11): F1 0.776 (symmetric) -> 0.572 (asymmetric). Use the default
"none"on tracks with known cluster-shape contamination. See alsomt_peel_speed-> Asymmetric peel.- persistence_filter
Character, one of
"none"(default) or"class_aware". When"class_aware", runsmt_persistence_scoreon the cascade output and demotes flagged fixes in thestate_anomaly+consensuserror classes whosepersistence_count < 3tois_outlier = FALSE.Empirical motivation (the 2026-05-09 class-conditional analysis): on cascade output, persistence cleanly separates TPs from FPs only on the
state_anomaly(+39.3 pp gap at \(p \geq 3\)) andconsensus(+37.9 pp) classes. Onkinematic_confluencethe gap reverses (-14.3 pp);geometric_spikewas class-pure on synthetic. Filter is therefore class-aware by design – universal application would actively regress halo-spike detection.Per-track CPF effect under the CRS-invariant
mt_persistence_score(post 2026-05-11 fix): CPF_A 0.958 (unchanged); CPF_C 1.000 (unchanged); CPF_D 0.674 -> 0.675 (+0.001); CPF_E 0.994 -> 0.975 (-0.019); CPF_F 0.533 (unchanged). Mean DF1 = -0.003.Net-near-neutral on synthetic CPF: the class-conditional aggregate finding holds across many fixes but doesn't translate to per-track wins on the small validation set. The filter ships as strict opt-in (round-3 mixed-CPF rule); the user-side case for enabling it is per-dataset. The class taxonomy used by this filter (
error_class) is independent of theconsensusmode and is therefore always available.- location_error
Per-fix observation-error prior (1-sigma, m). Default
NULL. Forwarded to the bridge detector; seemt_flag_outliers_bridgefor accepted forms (NULL, scalar, vector, column name, or"auto"for Movebank quality columns).- residual_floor
Passed to the bridge detector; see
mt_flag_outliers_bridge.- step_floor
Passed to the probability detector; see
mt_flag_outliers.- pool_by
Optional character vector of length 1 or 2 naming column(s) in
mt_track_data(x). Pool_by has two semantic roles: a fit set (which tracks' events contribute to the threshold-fitting distribution) and an operating unit (within which pool-added flags are unioned and the post-cascade sweep iterates).Length 1 (e.g.
pool_by = "individual_id"): the same column is used for both roles – pool deployments of one animal, with the union scoped to that animal. This is the legacy single-level behaviour, preserved byte-identically.Length 2 (e.g.
pool_by = c("study_id", "individual_id")): the first element is the outer column (fit source) and the second is the inner column (operating unit). Threshold-fitting primitives draw their reference distribution from the union of events sharing the outer value; the post-cascade flag union acts within the inner value. This lets you, say, fit thresholds from a population-wide distribution but keep the union scoped to each animal.
The length-2 form requires strict nesting: every distinct inner value must map to exactly one outer value. Inputs that violate this (e.g. an individual that appears in two studies) error with the offending value named.
Length \(> 2\) is rejected with a deliberately verbose message: pool_by has exactly two semantic roles; a deeper hierarchy (e.g. species/population/individual/tag) would only earn its keep under hierarchical / partial-pooling threshold estimation, which the cascade does not perform. Users with many nested levels should pick the pair of columns that captures their trust claim (which level's distribution to fit from) and their operating unit.
The cascade itself remains per-track (preserved byte-identically when
pool_by = NULL); a post-cascade pool sweep then runs each pool-aware primitive wrapper (bridge, speed-cap, detour) once on the original multi-track input withpool_byset, and unions the pool-added flags intois_outlier. Pool-added flags are taggederror_class = "pool"when the prior class was empty.NULL(default) preserves per-track behaviour byte-identically. NA values in the named column(s) cause those tracks to fall back to per-track processing with a warning. Errors when supplied withstateif any inner pool group's tracks have entirely disjoint state vocabularies (different labelling conventions across deployments) – outer-group state inconsistency is not enforced since outer is a fit source, not a union target.The prob primitive's contribution to pool semantics is currently integrated (not post-hoc):
mt_flag_outliersuses the outer column to fit one reference distribution per outer group and injects it into per-track dispatch. The inner column has no role in prob's pool path – users wanting prob pooling can callmt_flag_outliersstandalone withpool_byand merge.Heterogeneous error regimes (e.g. mixed GPS and Sigfox fixes in one track) violate the per-call homogeneity contract of the primitives and should be split with
dplyrormove2filtering before invokingmt_clean_track. Seevignette("OUTLIER_heterogeneous_error_regimes", package = "move2utils")for the pre-split + re-merge pattern.- bridge_method
Optional character override for the bridge primitive's
method.NULL(default) uses the cascade's empirically-tuned value"combined"; valid alternatives"isotropic"(scalar Rayleigh residual) and"directional"(perpendicular component of the isotropic/directional decomposition, useful for error-morphology classification). See?mt_flag_outliers_bridgefor full method semantics.- bridge_threshold_type
Optional override for the bridge primitive's
threshold_type.NULL(default) uses the cascade's value"entropy"(sweep-validated density-ratio break detector); alternative"gap"(broken-stick + tail-decay; more sensitive but can over-flag).- bridge_iterations
Optional override for the bridge primitive's iterative refinement count.
NULL(default) uses3L; convergence is typically reached in 1–2 passes on real data so the override is rarely needed.- prob_threshold_type
Optional override for the probability primitive's
threshold_type.NULL(default) uses the cascade's empirical default"gap"; alternatives"entropy","significance","percentile"per?mt_flag_outliers.- detour_threshold_type
Optional override for the detour primitive's
threshold_type.NULL(default) uses"fixed"(the cascade's permissive value gated by the conjunction rule; threshold =detour_threshold); alternative"auto"adapts to each track's \(-\log(\text{ratio})\) distribution. Note: whenpool_byis set, the post-cascade pool sweep always usesthreshold_type = "auto"regardless of this argument so pool detour has effect; this argument controls only the per-track cascade detour call.- entropy_threshold
Numeric in (0, 1) or
NULL. Density-ratio threshold used wherever the cascade runs an entropy-valley detector (bridge primitive whenbridge_threshold_type = "entropy", prob primitive whenprob_threshold_type = "entropy", speed-cap auto path's entropy arm).NULL(default) defers to.entropy_threshold_lower's leaf formal – the package-wide single source of truth (0.3, sweep-validated 2026-05-06). Plausible range 0.3–0.7.- gap_threshold
Positive numeric or
NULL. Break-size multiplier used wherever the cascade runs a broken-stick gap detector (bridge / prob / speed-cap auto path's gap fallback / the class-aware persistence post-filter).NULL(default) defers to.gap_threshold_lower's leaf formal (3, "3-sigma" convention). Plausible range 2–5.- persistence_filter_threshold
Positive numeric or
NULL. Per-scale gap-threshold passed to the class-aware persistence post-filter (active only whenpersistence_filter = "class_aware").NULL(default) defers to.gap_threshold_lower's leaf formal. Conceptually a separate knob fromgap_thresholdbecause it operates on a per-scale persistence statistic, not the cascade's primary detector outputs.- plot
Logical. Diagnostic map on return. Default TRUE.
- remove
Logical. If
TRUE(default), return only the non-flagged rows – the cleaned track, ready for downstream analysis. Set toFALSEto keep all rows with the flag columns (is_outlier,flagged_by_bridge,flagged_by_prob,flagged_by_speed,flag_iteration,block_id) attached for inspection of what was flagged and why.- silent
Logical. If
FALSE(default) the function prints a brief running narration of its inner workings: per- iteration flag counts, the block-expansion gate's decision and reason, and a final summary line. Set toTRUEto suppress all messages (the same effect as wrapping the call insuppressMessages()). Errors and warnings are always shown.- compact
Logical. Used together with
silent = FALSEto control narration verbosity. DefaultFALSE(the full per-iteration narration). SetTRUEfor a one-line- per-individual summary instead of the full per-iteration trace – the right choice when running on a multi-individual study where the per-iteration output would otherwise scale linearly with the cohort size. Ignored whensilent = TRUE.
Value
The input move2 object with added columns:
is_outlierLogical; TRUE where flagged.
flagged_by_bridge,flagged_by_prob,flagged_by_speed,flagged_by_detourLogical; per-detector history (useful for diagnosing which signal caught what).
flagged_by_detouris identicallyFALSEwhenuse_detour = FALSE.flag_iterationInteger; iteration at which the fix was flagged (NA otherwise).
block_idInteger; same value for all fixes in an expanded-block flag, NA otherwise.
error_classCharacter; categorical interpretation of why each flagged fix was caught.
NAwhereis_outlier == FALSE. Categories:"block"– fix is a member of a topologically-isolated component (block expansion). The mechanistic interpretation is a coherent multi-fix error cluster (typical of GPS spoofs, timestamp glitches, or systematic tag-data contamination) whose interior is locally consistent and only the boundary transitions are suspect to per-fix scoring."consensus"– two or more of the per-fix detectors (bridge, probability, speed) agree on the same fix. Highest confidence in the flag; the fix is geometrically, kinematically and/or physically anomalous."speed_cap"– only the speed-cap detector fired. The implied step speed exceeds the threshold but the fix is geometrically and kinematically plausible – typical of an isolated transition that violates a physiological cap (e.g. a single jump-and-stay error)."jitter"– only the bridge detector fired. The fix is geometrically out of place relative to its temporal neighbours but its step kinematics are not extreme – typical of GPS multipath, point jitter, or a transcription error."kinematic"– only the probability detector fired. The fix's joint speed / turn / auto-difference signature conflicts with the animal's normal behaviour – typical of a subtle behavioural-state confound or an error that is too small to disturb the bridge or speed-cap detectors."detour"– only the detour detector fired. The fix's path-vs-displacement ratio is pathological but no kinematic detector agrees – typical of out-and-back GPS spikes at sparse sampling whose implied step speed stays below physiological caps. Only seen whenuse_detour = TRUE.
Details
Per-iteration logic:
Run each primitive in single-pass mode on the current active (unflagged) set:
mt_flag_outliers_bridge()-> point-level bridge residual. Defaultmethod = "combined",threshold_type = "entropy"(strict).mt_flag_outliers()-> segment/vertex-level joint probability. Defaultthreshold_type = "gap"(sensitive).mt_flag_speed_cap()-> step-level implied speed. Usesv_maxif supplied; otherwise runs the detector inthreshold_type = "auto"mode (entropy valley with dip-test-validated broken-stick fallback).
Apply the conjunction rule for per-fix flagging: fix \(i\) is flagged if the bridge says its point-level residual is suspect AND at least one of (probability, speed-cap) also says an incident transition is suspect. The conjunction rewards agreement across grains and prevents any single score from over-flagging.
Block expansion. After flagging, partition the kept fixes into connected components: two kept fixes \(i\) and \(j\) are in the same component only if the step \(i \to j\) has an implied speed below the block-speed threshold (either
v_maxor whatever the speed detector chose in"auto"mode). Small components (size \(< \text{max\_flag\_fraction} \cdot n\)) that are disconnected from the main trajectory through suspect transitions are flagged as blocks. This is the step that dissolves long trains the per-fix detectors structurally cannot catch — the train interior is locally coherent but topologically isolated after stage 1 flags its boundaries.Stop criteria checked at the end of every iteration:
No new fixes flagged -> converged.
Cumulative flags exceed
max_flag_fractionof the track -> abort with warning (likely a pipeline misuse).Iteration count reaches
max_iterations.
CRS handling: like the other detectors, this function auto-projects longitude/latitude input to a local AEQD for the Euclidean math and returns in the original CRS.
Composition with the individual primitives: the three
primitives remain exported for users who want to inspect which
score caught what. mt_clean_track() is the convenient
one-call entrypoint.
Primitive-knob overrides
The cascade's primitive calls have empirically-tuned defaults
that are exposed for fine-tuning at the orchestrator level
(bridge_method, bridge_threshold_type,
bridge_iterations, prob_threshold_type,
detour_threshold_type). Each NULL-defaulted
argument preserves the cascade's hardcoded value byte-identically;
a non-NULL value forwards to the primitive's
.fn_core call in the iteration loop. Use these when a
paper-replication or diagnostic workflow needs a non-default
value without rebuilding the cascade by hand.
Two primitive knobs are deliberately NOT exposed because they carry architectural rather than methodological meaning, and overriding them would break the cascade's documented design:
detour
min_legis hardcoded at0. Detour's leg gate (min_leg > 0) is a standalone-use gating mechanism that prevents the detour ratio from firing on small-displacement noise wiggles. Inside the cascade, the conjunction rule plays the same gating role: detour contributes to flags only when it agrees with another detector under the class-aware rule. Exposingmin_legon the cascade would create two competing gates running in parallel, with documented failure modes where a fix is gated out bymin_legdespite satisfying the conjunction. Users who specifically want standalone leg-gated detour should callmt_flag_outliers_detourdirectly.speed_cap
threshold_typeis hardcoded at"auto". The cascade decouples two roles for speed-based detection: (i) the conjunction's speed flag, which fires when an implied step speed is anomalous relative to the local distribution (inside a rest segment, 14 m/s is anomalous even though it sits below the gull's 36 m/s physiological cap); and (ii) the block- expansion cap, which uses the user-suppliedv_max(or(mass, mode)allometric estimate) as an absolute physiological cap for the boundary-edge component graph. Auto-threshold for (i) catches state-anomalous fixes that an absolute cap misses at sparse sampling; absolutev_maxfor (ii) anchors block expansion against true physiological impossibility. Collapsing (i) and (ii) into a single user-overridable threshold reintroduces the over-flag failure mode documented in the 2026-04-29 stratified Movebank audit (homing pigeon racing flight over-flagged as anomaly relative to the resting baseline). Users who specifically want the conjunction's speed flag to use the absolute cap should callmt_flag_speed_cap(threshold_type = "hard", v_max = ...)alongsidemt_clean_trackand combine the outputs as their workflow requires.
See also
The four primitives this function composes:
mt_flag_outliers_bridge,
mt_flag_outliers,
mt_flag_outliers_detour,
mt_flag_speed_cap. Diagnostic helpers:
mt_suggest_speed_cap,
v_phys_estimate. State-aware bridge
primitives (standalone, not currently wired into the
cascade): mt_flag_outliers_bridge's leverage-
immune isotropic/directional decomposition, plus
mt_flag_outliers_dbgb and
mt_flag_outliers_dbbmm for the variance-
estimating variants. Alternative strategies for advanced
users (different voting schemes on the probability surface
or across temporal resolutions):
mt_sequential_outliers,
mt_combined_outliers,
mt_persistence_score for multi-scale persistence
annotation on cascade output (use the score with the
error_class column for class-aware FP filtering on
state_anomaly and consensus flags).
Examples
if (FALSE) { # \dontrun{
library(move2)
x <- movebank_download_study(study_id = 123, ...)
x <- mt_filter_gps_quality(x)
x <- move2::mt_filter_unique(x, "first")
x <- dplyr::arrange(x, mt_time(x))
## Default: return the cleaned track, ready for downstream analysis
x_clean <- mt_clean_track(x, v_max = 50)
## Inspection mode: keep all rows, see which were flagged and why
x_with_flags <- mt_clean_track(x, v_max = 50, remove = FALSE)
table(x_with_flags$is_outlier)
head(x_with_flags[x_with_flags$is_outlier, ])
} # }