Getting started with move2utils

move2utils cleans GPS and satellite tracks by composing four outlier-detection primitives — bridge residual (geometric), path-vs-displacement detour ratio (geometric, time-insensitive), movement-metric probability (kinematic), and step-level speed cap (physical) — under a single iterative entry-point, mt_clean_track(). This vignette walks through that one-call workflow on a bundled synthetic track with known outliers, then briefly tours each primitive for readers who want finer control.

Load data

This vignette uses the CPF_A track from the package’s bundled synthetic data — a 1748-fix central-place-forager simulation with 23 graded outliers injected at known positions. The same track is the demonstration fixture in vignettes 3, 4 and 5, so cascade behaviour can be compared across them. For narrated cleaning on real GPS tracks see the OUTLIER_example_* vignettes listed in “Further reading”.

library(move2)
library(sf)
library(move2utils)

## CPF_A synthetic track (1748 fixes, 23 known outliers)
path <- system.file("extdata/synthetic_tracks.csv.gz",
                    package = "move2utils")
tracks <- mt_read(path)
cpf_a <- filter_track_data(tracks, .track_id = "CPF_A")
cat(nrow(cpf_a), "locations for CPF_A\n")
#> 1748 locations for CPF_A

If you’re pulling your data from Movebank, jump to the appendix at the end of this vignette for the recommended download pattern — it matters which attribute columns you ask for, because the cleaning pipeline can use Movebank’s per-fix quality information to sharpen its decisions.

The one-call workflow

mt_clean_track() runs the four primitives in turn, applies the class-aware consensus rule by default, expands topologically isolated blocks, and iterates until no new fixes are flagged. By default it returns the cleaned track ready for downstream analysis. The default consensus can be changed via consensus = — see ?mt_flag_consensus for the available modes (class_aware, strict, majority, speed_trusted, any, or custom).

clean <- mt_clean_track(cpf_a)
#> No physiological speed cap supplied -- running with a data-driven cap chosen from your track.  This works well for most cases.  If your animal has multiple behavioural states (e.g. perched and flying) or you expect sustained-spoof errors, supplying `v_max =` (a published top speed in m/s) or `(mass = ..., mode = ...)` for the allometric estimate gives sharper results.  See `?v_phys_estimate` for the allometric helper; `?mt_clean_track` documents the failure modes of the auto-cap in detail.
#> Auto-cap landed at 60.2 m/s -- above the Hirt 2017 95% upper CI of the maximum biological speed (~52.6 m/s).  The gap finder is detecting a structural break within the outlier tail. Supply `(mass, mode)` or a hard `v_max` for a principled physiological cap.  See `?v_phys_estimate`.
#> Iter 1: bridge=20 prob=5 speed=23 detour=11 (v_max=60.2) | conjunction=19 | new=19 cumulative=19
#> Iter 2: bridge=4 prob=17 speed=6 detour=5 (v_max=25.5) | conjunction=6 | new=6 cumulative=25
#> Iter 3: bridge=0 prob=12 speed=0 detour=2 (v_max=-) | conjunction=0 | new=0 cumulative=25
#> === mt_clean_track: 25 flagged (1.430% of 1748); stopped: no_new_flags ===
#>     Returning the cleaned track (1723 rows). To inspect what was flagged, re-run with remove = FALSE.

cat("Before:", nrow(cpf_a), " After:", nrow(clean),
    " Removed:", nrow(cpf_a) - nrow(clean), "\n")
#> Before: 1748  After: 1723  Removed: 25

The diagnostic plot shows the kept track and the removed fixes by detector colour. If you want to inspect which primitive caught what, set remove = FALSE to keep all rows with the per-detector flag columns attached:

flagged <- mt_clean_track(cpf_a, remove = FALSE, plot = FALSE)
#> No physiological speed cap supplied -- running with a data-driven cap chosen from your track.  This works well for most cases.  If your animal has multiple behavioural states (e.g. perched and flying) or you expect sustained-spoof errors, supplying `v_max =` (a published top speed in m/s) or `(mass = ..., mode = ...)` for the allometric estimate gives sharper results.  See `?v_phys_estimate` for the allometric helper; `?mt_clean_track` documents the failure modes of the auto-cap in detail.
#> Auto-cap landed at 60.2 m/s -- above the Hirt 2017 95% upper CI of the maximum biological speed (~52.6 m/s).  The gap finder is detecting a structural break within the outlier tail. Supply `(mass, mode)` or a hard `v_max` for a principled physiological cap.  See `?v_phys_estimate`.
#> Iter 1: bridge=20 prob=5 speed=23 detour=11 (v_max=60.2) | conjunction=19 | new=19 cumulative=19
#> Iter 2: bridge=4 prob=17 speed=6 detour=5 (v_max=25.5) | conjunction=6 | new=6 cumulative=25
#> Iter 3: bridge=0 prob=12 speed=0 detour=2 (v_max=-) | conjunction=0 | new=0 cumulative=25
#> === mt_clean_track: 25 flagged (1.430% of 1748); stopped: no_new_flags ===
#>     Returning all rows with flag columns attached. To drop flagged rows, either re-run with remove = TRUE (the default) or subset: x[!x$is_outlier, ].
table(flagged$is_outlier)
#> 
#> FALSE  TRUE 
#>  1723    25
table(bridge = flagged$flagged_by_bridge,
       detour = flagged$flagged_by_detour,
       prob   = flagged$flagged_by_prob,
       speed  = flagged$flagged_by_speed)
#> , , prob = FALSE, speed = FALSE
#> 
#>        detour
#> bridge  FALSE TRUE
#>   FALSE  1695    2
#>   TRUE      0    4
#> 
#> , , prob = TRUE, speed = FALSE
#> 
#>        detour
#> bridge  FALSE TRUE
#>   FALSE    20    2
#>   TRUE      0    0
#> 
#> , , prob = FALSE, speed = TRUE
#> 
#>        detour
#> bridge  FALSE TRUE
#>   FALSE     4    0
#>   TRUE      7    3
#> 
#> , , prob = TRUE, speed = TRUE
#> 
#>        detour
#> bridge  FALSE TRUE
#>   FALSE     1    0
#>   TRUE      9    1

The columns flagged_by_bridge, flagged_by_detour, flagged_by_prob, flagged_by_speed, flag_iteration, and block_id track the provenance of each flag. The error_class column records which class-aware rule (consensus, geometric_spike, state_anomaly, kinematic_confluence, block, physiological, state_transition_buffered) caused the flag — see vignette("OUTLIER_2_diagnose_clean_track", package = "move2utils") for the taxonomy.

When to supply a physiological speed cap

The default call leaves v_max = NULL, in which case the speed-cap detector chooses a data-driven threshold by dip-test-validated entropy valley. That is safe on clean data and on most species, but it has a known sensitivity floor: long, internally-coherent error clusters (typical of GPS spoofs or multi-hour position jumps) carry small internal step speeds and are visible only at their boundaries. A single fixed physiological cap, peeled to convergence, dissolves them cleanly.

In addition, the block-expansion step that follows the per-fix detectors uses v_max as a connectivity threshold to identify topologically-isolated outlier clusters. Block expansion only fires when the resulting component partition shows the expected “trajectory + isolated clusters” signature (a clearly dominant component plus a clean log-gap to the rest); if the data-driven cap sits inside the species’ normal flight range, block expansion would sever continuous trajectory and is automatically declined. The gate prints its decision in the running narration. To enable block expansion on a fast-flying species, supply a physiological cap via v_max = ... or the (mass, mode) allometric route below.

## Eagle physiological cap is around 30 m/s sustained;
## vulture around 25 m/s; stork around 50 m/s.
## Supply v_max when species biology gives you the number.
clean <- mt_clean_track(track, v_max = 30)

When v_max is supplied, mt_clean_track() first runs mt_peel_speed() to convergence at that cap, and then runs the remaining primitives on what survives. Use mt_suggest_speed_cap() as a diagnostic helper — it reports the entropy and gap candidates and warns when they disagree, but it deliberately does not pick v_max for you.

mt_suggest_speed_cap(track)

A principled default from body mass and locomotor mode

If you do not have a published species-specific speed cap, you can derive a principled v_max from body mass and locomotor mode using the Hirt et al. (2017) general scaling law. The helper v_phys_estimate(mass, mode) returns the predicted maximum sustained burst speed in m/s (with a parameter-uncertainty confidence interval); mt_clean_track() accepts the same (mass, mode) pair directly.

## Direct call: returns a printable estimate with CI
v_phys_estimate(mass = 5, mode = "flying")   # eagle-class

## Use in the cleaning pipeline:
clean <- mt_clean_track(track, mass = 5, mode = "flying")

The diagnostic mt_suggest_speed_cap() accepts the same arguments and overlays the allometric prediction on the step-speed distribution alongside any user estimate, so all three perspectives — empirical break, allometric prediction, user-supplied v_max — appear on one plot. Disagreement between them is itself diagnostic: an empirical break far above the allometric prediction is a contamination signal, while an empirical break far below it is a within-distribution structure (behavioural state changes, gait transitions) that should not be cut at.

mt_suggest_speed_cap(track, mass = 5, mode = "flying", v_max = 30)

If you are uncertain about the locomotor mode (e.g. a stork that flies and walks, a seal that swims and walks), pass mass alone and the diagnostic shows all three mode-specific lines so you can read the implied cap for each.

## Compare flying / running / swimming predictions for the same mass
mt_suggest_speed_cap(track, mass = 1)

Where to find body mass for your species

Body mass is the only species-level input the allometric helper needs. A few practical routes:

Movebank reference data. If you downloaded the track from Movebank, per-individual mass is often already in the metadata table:
```
ref <- move2::mt_track_data(track)
ref$animal_mass   # in kg, when populated
```
EltonTraits 1.0 (Wilman et al. 2014) for birds and mammals; PanTHERIA (Jones et al. 2009) for mammals; Amniote (Myhrvold et al. 2015) for birds, mammals and reptiles. All published as ESA Data Paper supplements; download the CSV once and look up by Latin name.
traitdataform R package (github.com/EcologicalTraitData/traitdataform) aggregates the above under a single schema and is the closest thing to a programmatic API.
Wikidata SPARQL endpoint is the lightest live route — query wdt:P2067 (body mass) for a species’ Wikidata Q-id. No file management, but coverage and curation vary by taxonomic group.

Curate the metadata once, save it alongside the track, and the allometric prediction follows.

Touring the primitives

Each primitive is exported on its own for users who want to inspect a single signal or build their own composition.

Bridge residual — geometric

mt_flag_outliers_bridge() scores each fix by its deviation from the time-weighted Brownian-bridge mean of its temporal neighbours, normalised by bridge width. The width depends only on timestamps, so an outlier cannot inflate its own denominator. The directional variant decomposes the residual into along-track and across-track components and classifies the kind of error at each flag. See vignette("OUTLIER_4_outlier_bridge", package = "move2utils").

br <- mt_flag_outliers_bridge(cpf_a, plot = FALSE)
#> Input is in longitude/latitude.  Auto-projecting to a local AEQD for Euclidean bridge math; output is returned in the original CRS.
#> Running bridge-residual detection (method = combined) on 1748 locations...
#>   Iter 1: flagged 6 (break at eta = 1591.91 / eta_perp = 406.83).
#>   Iter 2: flagged 7 (break at eta = 1472.59 / eta_perp = 56.83).
#>   Iter 3: flagged 7 (break at eta = 1254.78 / eta_perp = 505.75).
#> === 20 outliers (1.14% of 1748) ===
cat("bridge flagged:", sum(br$is_outlier), "\n")
#> bridge flagged: 20

Movement-metric probability — kinematic

mt_flag_outliers() scores each fix by the joint empirical probability of its step length, turning angle, and their gap-aware auto-differences. It catches fixes whose movement kinematics conflict with the animal’s usual behaviour even when the geometry alone looks plausible.

pr <- mt_flag_outliers(cpf_a, plot = FALSE)
#> Calculating movement metrics...
#> ACF-derived alpha: 0.215 (r_speed=0.816, r_angvel=0.057)
#> Calculating probability distributions...
#> Note: step-length range/IQR = 8231 is extreme;
#>   teleport-class GPS errors are better handled by
#>   mt_filter_gps_quality() (drop fixes with <5 satellites)
#>   and mt_flag_outliers_bridge() (geometric, leverage-immune).
#>   step_transform = "log" is available but can hide
#>   physiologically-plausible joint turn/step outliers.
#> Calculating joint probabilities...
#> Identifying outliers...
#> 
#> 3 locations (0.2%) have NA probabilities --will be kept.
#> === 11 outliers (0.63% of 1748) ===
cat("probability flagged:", sum(pr$is_outlier), "\n")
#> probability flagged: 11

The default threshold_type = "gap" uses a broken-stick null model on the log-probability tail; "entropy" is the conservative alternative (no-op on clean data); "significance" and "percentile" retain the older z-score and quantile thresholds.

Step-level speed cap — physical

mt_flag_speed_cap() flags individual steps whose implied speed exceeds a threshold. By default the threshold is data-driven (auto, dip-test-validated). Pass threshold_type = "hard" with v_max for a fixed cap; pass mt_peel_speed(x, v_max) for the iterative peel that dissolves coherent clusters.

sc <- mt_flag_speed_cap(cpf_a, plot = FALSE)
#> Auto-cap landed at 60.1 m/s -- above 55.0 m/s (Hirt 2017 universal upper-CI (~52.6 m/s, fastest flier)).  The gap finder is detecting a structural break within the outlier tail rather than between bulk and outliers.  Supply `(mass, mode)` to `mt_clean_track()` (or pass a hard `v_max`) for a principled physiological cap.  See `?v_phys_estimate`.
#> Speed cap: 60.1041 m/s (auto) -- 23 fix(es) flagged.  Total is_outlier = 23 (1.316%).
#> === 23 outliers (1.32% of 1748) ===
cat("speed flagged:", sum(sc$is_outlier), "\n")
#> speed flagged: 23

Multi-scale persistence annotation and other variants

mt_persistence_score() annotates any flagger’s output with a per-flag confidence score derived from how anomalous each flagged fix looks when viewed over wider temporal windows (scales = c(2, 4, 8) by default). It does not modify is_outlier – it adds a persistence_count column ranging 1 (flagged only at native resolution) to 4 (flagged at every validation scale). Empirical work (see vignette("OUTLIER_5_persistence_score", package = "move2utils")) shows the score is class-conditionally informative for cleaning cascade output: high persistence on state_anomaly and consensus flags is supporting evidence; on geometric_spike flags the class is empirically pure on the synthetic data and the score adds nothing.

clean <- mt_clean_track(cpf_a, plot = FALSE, remove = FALSE)
#> No physiological speed cap supplied -- running with a data-driven cap chosen from your track.  This works well for most cases.  If your animal has multiple behavioural states (e.g. perched and flying) or you expect sustained-spoof errors, supplying `v_max =` (a published top speed in m/s) or `(mass = ..., mode = ...)` for the allometric estimate gives sharper results.  See `?v_phys_estimate` for the allometric helper; `?mt_clean_track` documents the failure modes of the auto-cap in detail.
#> Auto-cap landed at 60.2 m/s -- above the Hirt 2017 95% upper CI of the maximum biological speed (~52.6 m/s).  The gap finder is detecting a structural break within the outlier tail. Supply `(mass, mode)` or a hard `v_max` for a principled physiological cap.  See `?v_phys_estimate`.
#> Iter 1: bridge=20 prob=5 speed=23 detour=11 (v_max=60.2) | conjunction=19 | new=19 cumulative=19
#> Iter 2: bridge=4 prob=17 speed=6 detour=5 (v_max=25.5) | conjunction=6 | new=6 cumulative=25
#> Iter 3: bridge=0 prob=12 speed=0 detour=2 (v_max=-) | conjunction=0 | new=0 cumulative=25
#> === mt_clean_track: 25 flagged (1.430% of 1748); stopped: no_new_flags ===
#>     Returning all rows with flag columns attached. To drop flagged rows, either re-run with remove = TRUE (the default) or subset: x[!x$is_outlier, ].
annotated <- mt_persistence_score(clean, silent = TRUE)
flagged <- which(annotated$is_outlier)
print(table(annotated$persistence_count[flagged]))
#> 
#>  3  4 
#>  1 24

mt_sequential_outliers() and mt_combined_outliers() provide sequential-scan and majority-vote variants of the probability detector; see their reference pages for details.

Suggested workflow

Load your move2 object; drop empty geometries and duplicates.
Run mt_filter_gps_quality() if your tag exposes satellite/DOP/horizontal-accuracy columns.
Run mt_clean_track() with default settings.
If you have a species physiological speed cap, call mt_clean_track(x, v_max = ...) or, equivalently, supply (mass, mode) for an allometric prior.
Inspect the diagnostic plot; refine with the per-primitive functions only if the unified call misses something specific.
Run mt_diagnose_clean_track() on the result to check whether the run was healthy. The diagnostic surfaces the patterns the single-figure plot of the cleaned track does not — bimodal behaviour, sustained migration bands, non-converging iteration — and tells you when to escalate to state-conditional analysis.

Health-check after cleaning — `mt_diagnose_clean_track()`

res  <- mt_clean_track(track, mass = 3.4, mode = "flying",
                         plot = FALSE, remove = FALSE)
diag <- mt_diagnose_clean_track(res)

Six panels (single-track) or six-with-cohort-overview (multi-track) that let you eyeball whether the run was healthy. Each panel that trips a concern prints a one-line interpretive note pointing at the appropriate remedy. See vignette("OUTLIER_2_diagnose_clean_track", package = "move2utils") for a full walkthrough on real data.

Appendix: downloading your data from Movebank

If you’re pulling your track from Movebank, a little care at the download step pays off later: the cleaning pipeline can use Movebank’s per-fix quality information (how many satellites the fix used, how confident the tag was, what kind of fix it was) to make sharper decisions about which fixes to drop. The good news is you don’t need to do anything fancy — you just need to ask for those columns explicitly when you download.

Principle: if Movebank has the per-fix quality information, ask for it. The pipeline uses every quality column it can find — satellite count, DOP (a precision-of-fix indicator), fix type, device-reported horizontal accuracy, Argos location class. Each one independently sharpens the decision on borderline fixes. If a column is missing, the pipeline still runs — mt_filter_gps_quality() skips any criterion whose column it can’t find, and the bridge detector falls back to a user-supplied location_error — but quality-informed flags are always more reliable than geometry-only flags. The cost of the extra columns is small in bytes and zero in latency since the API streams them in the same request.

Two things to ask for at download time:

Restrict to GPS only via sensor_type_id = "gps". Many tags (e-obs in particular) emit accelerometer bursts as separate records with empty geometries; without this restriction a download of a small bird can be ~5× larger than the GPS data and 80% empty rows.
List the quality columns explicitly in attributes = c(...). While movebank_download_study() returns a sensible default attribute set, naming the quality columns by hand makes the data dependency auditable and protects against studies whose default attributes have been customised away from Movebank’s canonical schema.

The columns move2utils recognises out of the box:

Movebank canonical column	What it represents	Used by
`gps_satellite_count` / `gnss_satellite_count`	number of satellites used in the fix	`mt_filter_gps_quality()`, `mt_flag_outliers(quality_columns = ...)`
`gps_hdop` / `gnss_hdop`	horizontal dilution of precision	filter + probability weight
`gps_pdop` / `gnss_pdop`	3D positional dilution of precision	filter + probability weight
`gps_dop` / `gnss_dop`	unspecified DOP (for tags that emit a single number)	filter + probability weight
`gps_fix_type` / `gnss_fix_type`	2D vs 3D fix	filter (drops 2D) + weight
`eobs_horizontal_accuracy_estimate`	per-fix 1-σ horizontal accuracy in metres (e-obs tags)	filter + weight + bridge anchor sigma
`argos_lc`	Argos location class (G, 3, 2, 1, 0, A, B, Z)	filter + weight + bridge sigma via the CLS table

The recommended download pattern:

## download all attributes available in the study
track <- move2::movebank_download_study(
  study_id,
  sensor_type_id = "gps",
  attributes= "all")

## or select specific attibutes to download
track <- move2::movebank_download_study(
  study_id,
  sensor_type_id = "gps",
  attributes = c(
    ## minimum core (always retrieved, listed for clarity)
    "timestamp", "location_lat", "location_long",
    "individual_local_identifier",
    ## quality columns consumed by mt_filter_gps_quality() and
    ## mt_flag_outliers's quality_columns weighting
    "gps_satellite_count",
    "gps_hdop",
    "gps_pdop",
    "gps_dop",
    "gps_fix_type",
    ## per-fix horizontal accuracy in metres (e-obs tags); also
    ## consumed by the bridge detector as its anchor sigma
    "eobs_horizontal_accuracy_estimate",
    ## Argos location class (when relevant)
    "argos_lc"
  )
)

## For modern multi-constellation tags that emit GNSS_* names, swap
## the gps_* prefix above for gnss_*.  Where both prefixes are
## present in a study, GNSS is preferred.

## If you just want a quick look without quality information, pass
## `attributes = NULL` for the absolute minimum (timestamp,
## location, track id).  The pipeline will then run in degraded
## mode -- mt_filter_gps_quality() becomes a no-op and the bridge
## falls back to your `location_error` argument.

Heads-up on the first download from a study. Movebank requires you to accept each study’s licence before it returns data. On a first call to a study you haven’t downloaded before, the request above will fail and the error message will include the full licence terms, ending with a line like:

'license-md5'='306ac0a2292eb02b9b42d1b5faeca786'

(the exact hash depends on the study). Copy that string back into your call as a named argument:

track <- movebank_download_study(
  study_id, sensor_type_id = "gps", attributes = c(...),
  "license-md5" = "306ac0a2292eb02b9b42d1b5faeca786"
)

The quoting around "license-md5" is there because the name contains a hyphen. You only need to do this once per study; future calls remember the acceptance for that login.

If you can’t restrict the attributes at download time and the object arrives with empty rows, drop them before passing to the cleaning pipeline: x <- x[!sf::st_is_empty(x), ].

Non-Movebank vendor data

If your data comes from a vendor whose column names differ from the Movebank canonical schema (Vectronic, Lotek, Telonics, Sirtrack, custom feeds), the auto-detection above won’t find your quality columns and the cleaning pipeline runs in degraded mode (it’ll still work, just without the extra information). A unified quality-column resolver — slot-based, with a user-supplied alias map — is in design (see DESIGN_quality_columns.md at the package root); for the current release the workaround is to either rename the vendor columns to the Movebank canonical names before running mt_clean_track(), or pass per-fix quality functions directly via mt_flag_outliers(quality_columns = list(...)).