Cleaning a white stork GPS track: a narrated pipeline
Source:vignettes/OUTLIER_example_outlier_whitestork.Rmd
OUTLIER_example_outlier_whitestork.RmdThis vignette walks through a full outlier-cleaning pipeline on a real high-frequency GPS track: Pettstadt1, a juvenile white stork (Ciconia ciconia) from the LifeTrack White Stork Bavaria study on Movebank (study ID 24442409). The track has 49,263 fixes between June 2024 and April 2026, with several classes of error: teleport-class GPS failures, low-satellite fixes, borderline movement-metric outliers within Europe, and the usual crop of empty geometries and duplicate timestamps from download artefacts.
The pipeline is four stages, and each stage eats a distinct class of error:
- Structural cleaning — empty geometries, duplicate timestamps.
-
GPS-quality pre-filter —
mt_filter_gps_quality()removes fixes with too few satellites, high DOP, or poor eObs horizontal accuracy. The canonical first line of defence. -
Geometric outlier detection —
mt_flag_outliers_bridge()scores each fix against its Brownian-bridge neighbours. Immune to the leverage problem of locally-estimated variance; catches teleport-class errors that survive the quality filter. -
Movement-metric outlier detection —
mt_flag_outliers()scores each fix by the joint probability of its step length, turning angle, and their gap-aware auto-differences. Catches borderline fixes that are geometrically plausible but move inconsistently with the animal’s normal kinematics.
Each stage is timed so the computational cost is visible alongside
the ecological side. Stages 3 and 4 are also wrapped in the unified
mt_clean_track() entry-point, demonstrated at the end of
this vignette.
Meet the bird
Pettstadt1 (AHH16, eObs tag 14053) is a white stork
nestling that fledged from a nest near Pettstadt, Bavaria. The tag was
deployed on 2024-06-13 by Wolfgang Fiedler (Max Planck Institute of
Animal Behavior). A representative subset of the track is bundled with
move2utils as
inst/extdata/Pettstadt1-14053.csv.gz. The full track can be
downloaded live with:
library(move2)
movebank_store_credentials() # run once, interactive
x <- movebank_download_study(
study_id = 24442409,
sensor_type_id = "gps",
individual_local_identifier = "Pettstadt1 (AHH16, 14053)"
)
# x now carries the Movebank columns we need for quality filtering:
# gps_satellite_count, gps_dop, eobs_horizontal_accuracy_estimateWhite storks from Bavaria are partial migrants. Expected geography: central Europe in summer and autumn, with some individuals wintering in Iberia, North Africa, or the Sahel. Anything far outside that envelope is not a stork — it is a fix error.
Stage 1 — Structural cleaning
Empty geometries (GPS timestamp written without a position lock) and duplicate timestamps (Movebank download resume artefact) must be removed before any downstream step can work. Nothing clever here — just mechanical cleanup.
n0 <- nrow(raw)
x <- raw[!st_is_empty(raw), ]
n_empty <- n0 - nrow(x)
x <- mt_filter_unique(x, criterion = "first")
n_dup <- n0 - n_empty - nrow(x)
cat(sprintf("dropped %d empty + %d duplicate; kept %d\n",
n_empty, n_dup, nrow(x)))
#> dropped 872 empty + 152 duplicate; kept 48239Stage 2 — GPS-quality pre-filter
GPS trilateration needs at least four satellites to produce any fix
at all, but fixes at the four-satellite boundary have pathological error
geometry and frequently land thousands of kilometres from the true
position. The standard telemetry-practice threshold is
sat >= 5. DOP and horizontal-accuracy estimate add
orthogonal evidence when the columns are available.
xq <- mt_filter_gps_quality(x)A quick before/after map shows how much these low-quality fixes matter in practice.
world <- ne_countries(scale = "small", returnclass = "sf")
before_sf <- x
after_sf <- xq
world_x <- c(-100, 80)
world_y <- c(-25, 80)
europe_x <- c(-15, 45)
europe_y <- c(30, 55)
p_before <- ggplot() +
geom_sf(data = world, fill = "grey95", colour = "grey80",
linewidth = 0.2) +
geom_sf(data = before_sf, size = 1, alpha = 0.9,
colour = "firebrick") +
coord_sf(xlim = world_x, ylim = world_y) +
labs(title = "Before filter", x = NULL, y = NULL) +
theme_minimal(base_size = 10)
p_after <- ggplot() +
geom_sf(data = world, fill = "grey95", colour = "grey80",
linewidth = 0.2) +
geom_sf(data = after_sf, size = 0.5, alpha = 0.8,
colour = "steelblue4") +
geom_rect(aes(xmin = europe_x[1], xmax = europe_x[2],
ymin = europe_y[1], ymax = europe_y[2]),
fill = NA, colour = "black", linewidth = 0.3) +
coord_sf(xlim = world_x, ylim = world_y) +
labs(title = "After sat>=5, DOP<=10, hacc<=100m",
x = NULL, y = NULL) +
theme_minimal(base_size = 10)
p_after_zoom <- ggplot() +
geom_sf(data = world, fill = "grey95", colour = "grey80",
linewidth = 0.25) +
geom_sf(data = after_sf, size =0.5, alpha = 0.5,
colour = "steelblue4") +
coord_sf(xlim = europe_x, ylim = europe_y) +
labs(title = "After filter, zoomed to Europe",
x = NULL, y = NULL) +
theme_minimal(base_size = 10)
if (requireNamespace("patchwork", quietly = TRUE)) {
patchwork::wrap_plots(p_before, p_after, p_after_zoom, ncol = 3)
} else {
print(p_before); print(p_after); print(p_after_zoom)
}
Coordinate extent before and after mt_filter_gps_quality(). Left: raw fixes span three continents from a handful of low-satellite teleports. Middle: same extent, after filtering, dots have collapsed onto Europe. Right: the same retained fixes zoomed to the European flyway, showing the actual cleaned-track geography the reader can check against known stork distribution.
The extreme geographic range collapses onto the expected European and Mediterranean envelope; the zoomed panel resolves the structure that the world-scale view compresses into dots and is the first view in which the track’s shape is actually legible. The filter is doing most of its work on low-satellite fixes; DOP and hacc contribute smaller increments.
Stage 3 — Geometric outlier detection (bridge)
mt_flag_outliers_bridge() scores each fix by its
deviation from the time-weighted Brownian-bridge mean of its temporal
neighbours, normalised by the bridge width (which depends only on
timestamps, not on positions). Because the width is leverage-immune, an
outlier cannot inflate its own denominator — the primitive finds
geometric outliers that survive the quality filter.
The default method = "combined" computes both the scalar
residual (dBBMM) and the orthogonal-component residual (dBGB) in one
pass, applies the threshold to each score independently, and unions the
flags. On synthetic ground-truth benchmarks this Pareto-dominates the
single-score variants.
x_utm <- st_transform(xq, 32632) # UTM 32N, central Europe
r_br <- mt_flag_outliers_bridge(x_utm,
threshold_type = "entropy",
iterations = 1,
plot = FALSE)
cat(sprintf("bridge flagged %d of %d fixes\n",
sum(r_br$is_outlier), nrow(r_br)))
#> bridge flagged 6 of 40579 fixesThe bridge_eta, bridge_eta_para, and
bridge_eta_perp columns carry the scalar, parallel, and
perpendicular scores for every fix — useful for diagnosing the error
morphology later.
r_br_ll <- st_transform(r_br, 4326)
r_br_ll$log_eta <- log10(pmax(r_br_ll$bridge_eta, 1e-10) + 1)
ggplot() +
geom_sf(data = world, fill = "grey95", colour = "grey80",
linewidth = 0.15) +
geom_sf(data = r_br_ll, aes(colour = log_eta),
size = 0.3, alpha = 0.6) +
geom_sf(data = r_br_ll[r_br_ll$is_outlier, ],
shape = 1, colour = "black", size = 2.2, stroke = 0.4) +
scale_colour_viridis_c(option = "viridis",
name = expression(log[10](1 + eta))) +
coord_sf(xlim = c(-15, 45), ylim = c(10, 60)) +
labs(title = "mt_flag_outliers_bridge() — combined method, entropy",
x = NULL, y = NULL) +
theme_minimal(base_size = 10)
Bridge-residual score in UTM coordinates. Flagged fixes (black circles) concentrate on points spatially inconsistent with their neighbours given the local sampling cadence.
Stage 4 — Movement-metric outlier detection
mt_flag_outliers() complements the bridge primitive by
scoring each fix against the joint empirical distribution of step
length, turning angle, and their gap-aware auto-differences. It catches
fixes whose movement kinematics (speeds, turns, accelerations) are
incompatible with the animal’s usual behaviour, even when the point is
geometrically plausible.
Dropping the bridge-flagged fixes before running the probability primitive is good practice: they distort the empirical distribution the histogram is built from.
xp <- r_br[!r_br$is_outlier, ] # bridge-cleaned
xp <- st_transform(xp, 4326) # probability primitive is CRS-agnostic
r_prob <- mt_flag_outliers(xp, threshold_type = "entropy",
iterations = 1, plot = FALSE)
cat(sprintf("probability flagged %d of %d fixes\n",
sum(r_prob$is_outlier), nrow(r_prob)))
#> probability flagged 3 of 40573 fixesCleaned track
The union of the bridge and probability flags defines the removed set. Plotting the kept track alongside the removed fixes on a European extent shows the scope of what the pipeline caught.
## merge removal sources
quality_removed <- x[!seq_len(nrow(x)) %in%
which(seq_len(nrow(x)) %in%
match(st_geometry(xq), st_geometry(x))), ]
## simpler: track in/out per stage via row-count accounting in the
## diagnostic only (exact mapping not needed for the map)
kept <- r_prob[!r_prob$is_outlier, ]
p_kept <- ggplot() +
geom_sf(data = world, fill = "grey95", colour = "grey80",
linewidth = 0.2) +
geom_sf(data = kept, size = 0.2, alpha = 0.4,
colour = "steelblue4") +
coord_sf(xlim = c(-15, 40), ylim = c(25, 60)) +
labs(title = sprintf("Cleaned track (n = %d)", nrow(kept)),
x = NULL, y = NULL) +
theme_minimal(base_size = 10)
removed_bridge <- r_br[r_br$is_outlier, ]
removed_prob <- r_prob[r_prob$is_outlier, ]
p_removed <- ggplot() +
geom_sf(data = world, fill = "grey95", colour = "grey80",
linewidth = 0.2) +
geom_sf(data = st_transform(removed_bridge, 4326),
size = 1.2, colour = "firebrick") +
geom_sf(data = removed_prob,
size = 1.2, colour = "orange") +
coord_sf(xlim = c(-15, 40), ylim = c(25, 60)) +
labs(title = sprintf("Removed by bridge (red, n=%d) and probability (orange, n=%d)",
nrow(removed_bridge), nrow(removed_prob)),
x = NULL, y = NULL) +
theme_minimal(base_size = 10)
if (requireNamespace("patchwork", quietly = TRUE)) {
patchwork::wrap_plots(p_kept, p_removed, ncol = 2)
} else {
print(p_kept); print(p_removed)
}
Left: retained fixes after the full pipeline. Right: fixes removed by stages 2–4, coloured by the stage that caught them.
The one-call alternative — mt_clean_track()
The pipeline above narrates each primitive explicitly so the reader
can see what is being done and why. For routine cleaning, the unified
mt_clean_track() entry-point composes all four primitives
behind a single call. It runs the bridge, detour, probability, and
speed-cap detectors, applies the class-aware consensus rule by default
(configurable via consensus =; see
?mt_flag_consensus), expands topologically isolated blocks,
and iterates until convergence.
White storks have a known sustained-flight ceiling around 50 m/s, so supplying that as a physiological cap lets the speed primitive peel any spoof- or teleport-class clusters cleanly before the bridge and probability detectors run on the survivors.
clean_one_call <- mt_clean_track(xq, v_max = 50, plot = FALSE)
cat(sprintf("mt_clean_track kept %d of %d fixes\n",
nrow(clean_one_call), nrow(xq)))
#> mt_clean_track kept 40244 of 40579 fixesThe kept set should be very close to the union of the bridge and probability flags from the staged pipeline; the class-aware consensus and block expansion give it slightly different recall characteristics on edge cases. Use whichever style matches your taste — the staged version when you want to inspect each signal, the one-call version when you trust the defaults and want to move on.
Composition summary
tab <- data.frame(
stage = c("1. structural cleaning",
"2. GPS-quality filter",
"3. bridge (combined)",
"4. probability (entropy)"),
n_in = c(n0, nrow(x), nrow(xq), nrow(xp)),
n_out = c(nrow(x), nrow(xq), sum(!r_br$is_outlier),
sum(!r_prob$is_outlier))
)
tab$n_dropped <- tab$n_in - tab$n_out
print(tab, row.names = FALSE)
#> stage n_in n_out n_dropped
#> 1. structural cleaning 49263 48239 1024
#> 2. GPS-quality filter 48239 40579 7660
#> 3. bridge (combined) 40579 40573 6
#> 4. probability (entropy) 40573 40570 3Each stage handles a distinct class of error:
- Structural cleaning is O(n) and mechanical.
- The GPS-quality filter is O(n) and kills the most pathological fixes at source — often the difference between a usable and unusable track.
- The bridge primitive is O(n) with a one-pass neighbour scan.
- The probability primitive is O(n) with a fully vectorised KDE
lookup; a cap on the 2D turn/step histogram bin count keeps the
terraraster operations linear in n even on heavy-tailed step distributions.
Notes on reproducibility
- The bundled CSV was produced by filtering the Movebank download to a
stable subset of columns. All three GPS-quality columns
(
gps-satellite-count,gps-dop,eobs-horizontal-accuracy-estimate) are preserved. - The full Movebank study can be re-downloaded with
move2::movebank_download_study(24442409, individual_local_identifier = "Pettstadt1 (AHH16, 14053)")after runningmove2::movebank_store_credentials()once. - Study acknowledgement: LifeTrack White Stork Bavaria, data contributed by Wolfgang Fiedler and colleagues, Max Planck Institute of Animal Behavior.
Further reading
-
vignette("OUTLIER_1_getting_started", package = "move2utils")— the unifiedmt_clean_track()workflow and a brief tour of all four primitives. -
vignette("OUTLIER_2_diagnose_clean_track", package = "move2utils")— the post-run health check -
vignette("OUTLIER_3_state_conditional", package = "move2utils")— when the diagnostic flags bimodal behaviour, the recipe for cleaning each behavioural state separately. -
vignette("OUTLIER_4_outlier_bridge", package = "move2utils")— the bridge primitive and the directional error-morphology classifier; for users who want fine-grained control over just one detector. -
vignette("OUTLIER_5_persistence_score", package = "move2utils")— multi-scale annotation that scores how confidently each flag is an outlier; useful as a post-cleaning confidence filter. -
vignette("OUTLIER_heterogeneous_error_regimes", package = "move2utils")— outlier detection with heterogeneous error regimes: one sensor at a time -
vignette("OUTLIER_example_leo_migration", package = "move2utils")— outlier detection on irregular, large-scale satellite data.