State-conditional outlier cleaning

When you need this

The per-fix detectors in mt_clean_track() — mt_flag_outliers_bridge(), mt_flag_outliers_detour(), mt_flag_outliers(), and mt_flag_speed_cap() — threshold the distribution of bridge residuals, detour ratios, joint probabilities, or step speeds against a single distribution. On a track where the animal has multiple behavioural states with very different kinematics (rest at a perch + soaring + active flight + migration glides), each state contributes its own mode to those distributions. The threshold detectors then have a choice:

Cut between the modes (over-flag the smaller one), or
Cut beyond the largest mode (under-flag the smaller one’s outliers).

Neither answer is right. The right answer is to segment by state and run the cleaner per segment, so that each detector sees a single mode and can find the outlier tail relative to that one mode.

How to recognise the case

mt_diagnose_clean_track() prints a clear note when this signature is present:

Panel 1: 3 substantive modes detected at 0.01, 0.14, 6.91 m/s – bimodal behaviour. The per-fix detectors threshold against a single distribution; consider state-conditional analysis or filtering to one mode before cleaning.

Combined with a sustained elevated flag rate over a contiguous time window (Panel 2), it is the dominant first-adopter pain point on multi-state species.

The `state =` parameter

mt_clean_track() accepts a state = argument that partitions each track into contiguous runs of constant state and runs the full pipeline independently on each segment. This respects the user’s state assignment as a first-class input — segmentation itself (speed-threshold, HMM, BCPA, manual annotation) is the user’s responsibility, not the package’s.

Three accepted forms:

Column name on the move2 object:

x_clean <- mt_clean_track(x, v_max = 50, state = "behaviour")

Per-fix vector of length nrow(x):

x_clean <- mt_clean_track(x, v_max = 50, state = my_state_vec)

NULL (default): single global distribution; current behaviour.

NA values in the state column or vector are treated as their own state (segments of NAs are cleaned in their own context). Segments shorter than 3 fixes pass through unflagged because the per-fix detectors require at least 3 points for a residual / auto-difference.

Where to get the state assignment from

From a column already on the move2

The simplest case. If your data already carries a behavioural classification (annotated by the field team, derived from accelerometer, imported from Movebank’s annotation columns), pass the column name:

x_clean <- mt_clean_track(x, v_max = 50, state = "behaviour")

From a speed threshold

The cheapest user-side segmentation. Pick a valley between speed modes (the diagnostic plot in mt_diagnose_clean_track() shows you where the valleys are), classify each fix as active or stationary, and pass the resulting vector:

library(move2)

## Per-fix step speed in m/s, via move2's exported helper.
v <- as.numeric(suppressWarnings(mt_speed(x, units = "m/s")))

## Smooth to suppress single-fix glitches affecting the assignment.
v_smooth <- stats::filter(v, rep(1/9, 9), sides = 2, circular = FALSE)

## Cut at a valley reported by mt_diagnose_clean_track() (~ 1 m/s for
## stork-class species; species- and study-specific in practice).
state <- ifelse(is.na(v_smooth) | v_smooth > 1,  "active", "stationary")

x_clean <- mt_clean_track(x, v_max = 50, state = state)

From an HMM (or BCPA, or any other principled segmenter)

When you need a more rigorous decomposition (e.g. soaring vs flapping flight, or three states), fit a Hidden Markov Model with momentuHMM or moveHMM and pass the decoded state sequence:

library(momentuHMM)

prep   <- prepData(x, type = "UTM", coordNames = c("x", "y"))
m_hmm  <- fitHMM(prep, nbStates = 3,
                 stepPar0 = c(...), anglePar0 = c(...))
state  <- viterbi(m_hmm)

x_clean <- mt_clean_track(x, v_max = 50, state = state)

The package treats whatever you pass as authoritative — its only job is to respect the segmentation and produce per-segment cleaning.

A worked example with synthetic data

We synthesise a two-state assignment on the package’s synthetic_tracks fixture (CPF_A, the densely-sampled OUF track with 23 graded outliers) and show that state = returns the same row count and a sensible flag pattern.

library(move2)
library(move2utils)

m <- read.csv(gzfile(system.file("extdata",
                                  "synthetic_tracks.csv.gz",
                                  package = "move2utils")),
              stringsAsFactors = FALSE)
m$timestamp <- as.POSIXct(m$timestamp, tz = "UTC")
mA <- mt_as_move2(m[m$individual.local.identifier == "CPF_A", ],
                  coords = c("location.long", "location.lat"),
                  time_column = "timestamp",
                  track_id_column = "individual.local.identifier",
                  crs = 4326)

## Synthetic two-state assignment: first half "rest", second half "flight".
state <- c(rep("rest",   floor(nrow(mA) / 2)),
           rep("flight", nrow(mA) - floor(nrow(mA) / 2)))

out <- mt_clean_track(mA, v_max = 30, state = state,
                       plot = FALSE, remove = FALSE, silent = TRUE)

table(state, flagged = out$is_outlier)
#>         flagged
#> state    FALSE TRUE
#>   flight   851   23
#>   rest     868    6

Compare against the no-state baseline:

out_pooled <- mt_clean_track(mA, v_max = 30,
                              plot = FALSE, remove = FALSE, silent = TRUE)
sum(out_pooled$is_outlier)
#> [1] 29
sum(out$is_outlier)
#> [1] 29

On clean ground-truth data the two should be similar; on a real multi-state track the segmented call typically catches outliers the pooled call misses (in the smaller-mode tail) and avoids over-flagging the smaller-mode body.

Acknowledged limitations

Segmentation is a research problem in itself. A naive density- valley cut on log-speed can mis-classify intermediate states (“soaring” vs “active flight” on a stork). HMM-based segmentation with momentuHMM is the more rigorous alternative; manual annotation is the gold standard.
Endpoint effects. Fixes near segment boundaries are scored against neighbours that may belong to a different state — the bridge primitive in particular is sensitive to this. Smoothing the state assignment (as in the speed-threshold recipe above) damps it.

Boundary outliers can slip through. Block-shaped errors that straddle a state boundary may not be caught by either segment’s per-segment cleaning. A final whole-track pass with a hard v_max and expand_blocks = TRUE (the default) is a reasonable belt-and-braces step:

out <- mt_clean_track(x, v_max = 50, state = state,
                       expand_blocks = TRUE)
## then a final whole-track sweep with state = NULL to catch
## anything that straddles a boundary
out <- mt_clean_track(out, v_max = 50, expand_blocks = TRUE)

Multi-individual cohorts are handled transparently — if the state vector spans multiple track ids, each (track id × state run) becomes its own segment. State labels do not need to be unique across individuals.