State-conditional outlier cleaning
Source:vignettes/OUTLIER_3_state_conditional.Rmd
OUTLIER_3_state_conditional.RmdWhen you need this
The per-fix detectors in mt_clean_track() —
mt_flag_outliers_bridge(),
mt_flag_outliers_detour(), mt_flag_outliers(),
and mt_flag_speed_cap() — threshold the distribution of
bridge residuals, detour ratios, joint probabilities, or step speeds
against a single distribution. On a track where the animal has multiple
behavioural states with very different kinematics (rest at a perch +
soaring + active flight + migration glides), each state contributes its
own mode to those distributions. The threshold detectors then have a
choice:
- Cut between the modes (over-flag the smaller one), or
- Cut beyond the largest mode (under-flag the smaller one’s outliers).
Neither answer is right. The right answer is to segment by state and run the cleaner per segment, so that each detector sees a single mode and can find the outlier tail relative to that one mode.
How to recognise the case
mt_diagnose_clean_track() prints a clear note when this
signature is present:
Panel 1: 3 substantive modes detected at 0.01, 0.14, 6.91 m/s – bimodal behaviour. The per-fix detectors threshold against a single distribution; consider state-conditional analysis or filtering to one mode before cleaning.
Combined with a sustained elevated flag rate over a contiguous time window (Panel 2), it is the dominant first-adopter pain point on multi-state species.
The state = parameter
mt_clean_track() accepts a state = argument
that partitions each track into contiguous runs of constant state and
runs the full pipeline independently on each segment. This respects the
user’s state assignment as a first-class input — segmentation itself
(speed-threshold, HMM, BCPA, manual annotation) is the user’s
responsibility, not the package’s.
Three accepted forms:
-
Column name on the move2 object:
x_clean <- mt_clean_track(x, v_max = 50, state = "behaviour") -
Per-fix vector of length
nrow(x):x_clean <- mt_clean_track(x, v_max = 50, state = my_state_vec) NULL(default): single global distribution; current behaviour.
NA values in the state column or vector are treated as
their own state (segments of NAs are cleaned in their own
context). Segments shorter than 3 fixes pass through unflagged because
the per-fix detectors require at least 3 points for a residual /
auto-difference.
Where to get the state assignment from
From a column already on the move2
The simplest case. If your data already carries a behavioural classification (annotated by the field team, derived from accelerometer, imported from Movebank’s annotation columns), pass the column name:
x_clean <- mt_clean_track(x, v_max = 50, state = "behaviour")From a speed threshold
The cheapest user-side segmentation. Pick a valley between speed
modes (the diagnostic plot in mt_diagnose_clean_track()
shows you where the valleys are), classify each fix as active or
stationary, and pass the resulting vector:
library(move2)
## Per-fix step speed in m/s, via move2's exported helper.
v <- as.numeric(suppressWarnings(mt_speed(x, units = "m/s")))
## Smooth to suppress single-fix glitches affecting the assignment.
v_smooth <- stats::filter(v, rep(1/9, 9), sides = 2, circular = FALSE)
## Cut at a valley reported by mt_diagnose_clean_track() (~ 1 m/s for
## stork-class species; species- and study-specific in practice).
state <- ifelse(is.na(v_smooth) | v_smooth > 1, "active", "stationary")
x_clean <- mt_clean_track(x, v_max = 50, state = state)From an HMM (or BCPA, or any other principled segmenter)
When you need a more rigorous decomposition (e.g. soaring vs flapping
flight, or three states), fit a Hidden Markov Model with
momentuHMM or moveHMM and pass the decoded
state sequence:
library(momentuHMM)
prep <- prepData(x, type = "UTM", coordNames = c("x", "y"))
m_hmm <- fitHMM(prep, nbStates = 3,
stepPar0 = c(...), anglePar0 = c(...))
state <- viterbi(m_hmm)
x_clean <- mt_clean_track(x, v_max = 50, state = state)The package treats whatever you pass as authoritative — its only job is to respect the segmentation and produce per-segment cleaning.
A worked example with synthetic data
We synthesise a two-state assignment on the package’s
synthetic_tracks fixture (CPF_A, the densely-sampled OUF
track with 23 graded outliers) and show that state =
returns the same row count and a sensible flag pattern.
library(move2)
library(move2utils)
m <- read.csv(gzfile(system.file("extdata",
"synthetic_tracks.csv.gz",
package = "move2utils")),
stringsAsFactors = FALSE)
m$timestamp <- as.POSIXct(m$timestamp, tz = "UTC")
mA <- mt_as_move2(m[m$individual.local.identifier == "CPF_A", ],
coords = c("location.long", "location.lat"),
time_column = "timestamp",
track_id_column = "individual.local.identifier",
crs = 4326)
## Synthetic two-state assignment: first half "rest", second half "flight".
state <- c(rep("rest", floor(nrow(mA) / 2)),
rep("flight", nrow(mA) - floor(nrow(mA) / 2)))
out <- mt_clean_track(mA, v_max = 30, state = state,
plot = FALSE, remove = FALSE, silent = TRUE)
table(state, flagged = out$is_outlier)
#> flagged
#> state FALSE TRUE
#> flight 851 23
#> rest 868 6Compare against the no-state baseline:
out_pooled <- mt_clean_track(mA, v_max = 30,
plot = FALSE, remove = FALSE, silent = TRUE)
sum(out_pooled$is_outlier)
#> [1] 29
sum(out$is_outlier)
#> [1] 29On clean ground-truth data the two should be similar; on a real multi-state track the segmented call typically catches outliers the pooled call misses (in the smaller-mode tail) and avoids over-flagging the smaller-mode body.
Acknowledged limitations
Segmentation is a research problem in itself. A naive density- valley cut on log-speed can mis-classify intermediate states (“soaring” vs “active flight” on a stork). HMM-based segmentation with
momentuHMMis the more rigorous alternative; manual annotation is the gold standard.Endpoint effects. Fixes near segment boundaries are scored against neighbours that may belong to a different state — the bridge primitive in particular is sensitive to this. Smoothing the state assignment (as in the speed-threshold recipe above) damps it.
-
Boundary outliers can slip through. Block-shaped errors that straddle a state boundary may not be caught by either segment’s per-segment cleaning. A final whole-track pass with a hard
v_maxandexpand_blocks = TRUE(the default) is a reasonable belt-and-braces step:out <- mt_clean_track(x, v_max = 50, state = state, expand_blocks = TRUE) ## then a final whole-track sweep with state = NULL to catch ## anything that straddles a boundary out <- mt_clean_track(out, v_max = 50, expand_blocks = TRUE) Multi-individual cohorts are handled transparently — if the state vector spans multiple track ids, each (track id × state run) becomes its own segment. State labels do not need to be unique across individuals.
Further reading
-
vignette("OUTLIER_1_getting_started", package = "move2utils")— the unifiedmt_clean_track()workflow and a brief tour of all four primitives. -
vignette("OUTLIER_2_diagnose_clean_track", package = "move2utils")— the post-run health check -
vignette("OUTLIER_4_outlier_bridge", package = "move2utils")— the bridge primitive and the directional error-morphology classifier; for users who want fine-grained control over just one detector. -
vignette("OUTLIER_5_persistence_score", package = "move2utils")— multi-scale annotation that scores how confidently each flag is an outlier; useful as a post-cleaning confidence filter. -
vignette("OUTLIER_heterogeneous_error_regimes", package = "move2utils")— outlier detection with heterogeneous error regimes: one sensor at a time -
vignette("OUTLIER_example_outlier_whitestork", package = "move2utils")— a full narrated cleaning pipeline on a real high-frequency stork track. -
vignette("OUTLIER_example_leo_migration", package = "move2utils")— outlier detection on irregular, large-scale satellite data.