Skip to contents

This vignette walks through a full outlier-cleaning pipeline on a real high-frequency GPS track: Pettstadt1, a juvenile white stork (Ciconia ciconia) from the LifeTrack White Stork Bavaria study on Movebank (study ID 24442409). The track has 49,263 fixes between June 2024 and April 2026, with several classes of error: teleport-class GPS failures, low-satellite fixes, borderline movement-metric outliers within Europe, and the usual crop of empty geometries and duplicate timestamps from download artefacts.

The pipeline is four stages, and each stage eats a distinct class of error:

  1. Structural cleaning — empty geometries, duplicate timestamps.
  2. GPS-quality pre-filtermt_filter_gps_quality() removes fixes with too few satellites, high DOP, or poor eObs horizontal accuracy. The canonical first line of defence.
  3. Geometric outlier detectionmt_flag_outliers_bridge() scores each fix against its Brownian-bridge neighbours. Immune to the leverage problem of locally-estimated variance; catches teleport-class errors that survive the quality filter.
  4. Movement-metric outlier detectionmt_flag_outliers() scores each fix by the joint probability of its step length, turning angle, and their gap-aware auto-differences. Catches borderline fixes that are geometrically plausible but move inconsistently with the animal’s normal kinematics.

Each stage is timed so the computational cost is visible alongside the ecological side. Stages 3 and 4 are also wrapped in the unified mt_clean_track() entry-point, demonstrated at the end of this vignette.

Meet the bird

Pettstadt1 (AHH16, eObs tag 14053) is a white stork nestling that fledged from a nest near Pettstadt, Bavaria. The tag was deployed on 2024-06-13 by Wolfgang Fiedler (Max Planck Institute of Animal Behavior). A representative subset of the track is bundled with move2utils as inst/extdata/Pettstadt1-14053.csv.gz. The full track can be downloaded live with:

library(move2)
movebank_store_credentials()  # run once, interactive
x <- movebank_download_study(
  study_id = 24442409,
  sensor_type_id = "gps",
  individual_local_identifier = "Pettstadt1 (AHH16, 14053)"
)
# x now carries the Movebank columns we need for quality filtering:
# gps_satellite_count, gps_dop, eobs_horizontal_accuracy_estimate

White storks from Bavaria are partial migrants. Expected geography: central Europe in summer and autumn, with some individuals wintering in Iberia, North Africa, or the Sahel. Anything far outside that envelope is not a stork — it is a fix error.

Setup

library(move2)
library(sf)
library(ggplot2)
library(rnaturalearth)
library(move2utils)

f <- system.file("extdata", "Pettstadt1-14053.csv.gz",
                  package = "move2utils")
raw <- mt_read(f)
cat(sprintf("raw: %d fixes, %d columns\n", nrow(raw), ncol(raw)))
#> raw: 49263 fixes, 10 columns

Stage 1 — Structural cleaning

Empty geometries (GPS timestamp written without a position lock) and duplicate timestamps (Movebank download resume artefact) must be removed before any downstream step can work. Nothing clever here — just mechanical cleanup.

n0 <- nrow(raw)

x <- raw[!st_is_empty(raw), ]
n_empty <- n0 - nrow(x)

x <- mt_filter_unique(x, criterion = "first")
n_dup <- n0 - n_empty - nrow(x)

cat(sprintf("dropped %d empty + %d duplicate; kept %d\n",
             n_empty, n_dup, nrow(x)))
#> dropped 872 empty + 152 duplicate; kept 48239

Stage 2 — GPS-quality pre-filter

GPS trilateration needs at least four satellites to produce any fix at all, but fixes at the four-satellite boundary have pathological error geometry and frequently land thousands of kilometres from the true position. The standard telemetry-practice threshold is sat >= 5. DOP and horizontal-accuracy estimate add orthogonal evidence when the columns are available.

A quick before/after map shows how much these low-quality fixes matter in practice.

world <- ne_countries(scale = "small", returnclass = "sf")

before_sf <- x
after_sf  <- xq

world_x <- c(-100, 80)
world_y <- c(-25, 80)
europe_x <- c(-15, 45)
europe_y <- c(30, 55)

p_before <- ggplot() +
  geom_sf(data = world, fill = "grey95", colour = "grey80",
          linewidth = 0.2) +
  geom_sf(data = before_sf, size = 1, alpha = 0.9,
          colour = "firebrick") +
  coord_sf(xlim = world_x, ylim = world_y) +
  labs(title = "Before filter", x = NULL, y = NULL) +
  theme_minimal(base_size = 10)

p_after <- ggplot() +
  geom_sf(data = world, fill = "grey95", colour = "grey80",
          linewidth = 0.2) +
  geom_sf(data = after_sf, size = 0.5, alpha = 0.8,
          colour = "steelblue4") +
  geom_rect(aes(xmin = europe_x[1], xmax = europe_x[2],
                ymin = europe_y[1], ymax = europe_y[2]),
            fill = NA, colour = "black", linewidth = 0.3) +
  coord_sf(xlim = world_x, ylim = world_y) +
  labs(title = "After sat>=5, DOP<=10, hacc<=100m",
       x = NULL, y = NULL) +
  theme_minimal(base_size = 10)

p_after_zoom <- ggplot() +
  geom_sf(data = world, fill = "grey95", colour = "grey80",
          linewidth = 0.25) +
  geom_sf(data = after_sf, size =0.5, alpha = 0.5,
          colour = "steelblue4") +
  coord_sf(xlim = europe_x, ylim = europe_y) +
  labs(title = "After filter, zoomed to Europe",
       x = NULL, y = NULL) +
  theme_minimal(base_size = 10)

if (requireNamespace("patchwork", quietly = TRUE)) {
  patchwork::wrap_plots(p_before, p_after, p_after_zoom, ncol = 3)
} else {
  print(p_before); print(p_after); print(p_after_zoom)
}
Coordinate extent before and after mt_filter_gps_quality(). Left: raw fixes span three continents from a handful of low-satellite teleports. Middle: same extent, after filtering, dots have collapsed onto Europe. Right: the same retained fixes zoomed to the European flyway, showing the actual cleaned-track geography the reader can check against known stork distribution.

Coordinate extent before and after mt_filter_gps_quality(). Left: raw fixes span three continents from a handful of low-satellite teleports. Middle: same extent, after filtering, dots have collapsed onto Europe. Right: the same retained fixes zoomed to the European flyway, showing the actual cleaned-track geography the reader can check against known stork distribution.

The extreme geographic range collapses onto the expected European and Mediterranean envelope; the zoomed panel resolves the structure that the world-scale view compresses into dots and is the first view in which the track’s shape is actually legible. The filter is doing most of its work on low-satellite fixes; DOP and hacc contribute smaller increments.

Stage 3 — Geometric outlier detection (bridge)

mt_flag_outliers_bridge() scores each fix by its deviation from the time-weighted Brownian-bridge mean of its temporal neighbours, normalised by the bridge width (which depends only on timestamps, not on positions). Because the width is leverage-immune, an outlier cannot inflate its own denominator — the primitive finds geometric outliers that survive the quality filter.

The default method = "combined" computes both the scalar residual (dBBMM) and the orthogonal-component residual (dBGB) in one pass, applies the threshold to each score independently, and unions the flags. On synthetic ground-truth benchmarks this Pareto-dominates the single-score variants.

x_utm <- st_transform(xq, 32632)  # UTM 32N, central Europe

r_br <- mt_flag_outliers_bridge(x_utm,
                                 threshold_type = "entropy",
                                 iterations = 1,
                                 plot = FALSE)
cat(sprintf("bridge flagged %d of %d fixes\n",
             sum(r_br$is_outlier), nrow(r_br)))
#> bridge flagged 6 of 40579 fixes

The bridge_eta, bridge_eta_para, and bridge_eta_perp columns carry the scalar, parallel, and perpendicular scores for every fix — useful for diagnosing the error morphology later.

r_br_ll <- st_transform(r_br, 4326)
r_br_ll$log_eta <- log10(pmax(r_br_ll$bridge_eta, 1e-10) + 1)

ggplot() +
  geom_sf(data = world, fill = "grey95", colour = "grey80",
          linewidth = 0.15) +
  geom_sf(data = r_br_ll, aes(colour = log_eta),
          size = 0.3, alpha = 0.6) +
  geom_sf(data = r_br_ll[r_br_ll$is_outlier, ],
          shape = 1, colour = "black", size = 2.2, stroke = 0.4) +
  scale_colour_viridis_c(option = "viridis",
                         name = expression(log[10](1 + eta))) +
  coord_sf(xlim = c(-15, 45), ylim = c(10, 60)) +
  labs(title = "mt_flag_outliers_bridge() — combined method, entropy",
       x = NULL, y = NULL) +
  theme_minimal(base_size = 10)
Bridge-residual score in UTM coordinates. Flagged fixes (black circles) concentrate on points spatially inconsistent with their neighbours given the local sampling cadence.

Bridge-residual score in UTM coordinates. Flagged fixes (black circles) concentrate on points spatially inconsistent with their neighbours given the local sampling cadence.

Stage 4 — Movement-metric outlier detection

mt_flag_outliers() complements the bridge primitive by scoring each fix against the joint empirical distribution of step length, turning angle, and their gap-aware auto-differences. It catches fixes whose movement kinematics (speeds, turns, accelerations) are incompatible with the animal’s usual behaviour, even when the point is geometrically plausible.

Dropping the bridge-flagged fixes before running the probability primitive is good practice: they distort the empirical distribution the histogram is built from.

xp <- r_br[!r_br$is_outlier, ]  # bridge-cleaned
xp <- st_transform(xp, 4326)    # probability primitive is CRS-agnostic

r_prob <- mt_flag_outliers(xp, threshold_type = "entropy",
                            iterations = 1, plot = FALSE)
cat(sprintf("probability flagged %d of %d fixes\n",
             sum(r_prob$is_outlier), nrow(r_prob)))
#> probability flagged 3 of 40573 fixes

Cleaned track

The union of the bridge and probability flags defines the removed set. Plotting the kept track alongside the removed fixes on a European extent shows the scope of what the pipeline caught.

## merge removal sources
quality_removed <- x[!seq_len(nrow(x)) %in%
                      which(seq_len(nrow(x)) %in%
                             match(st_geometry(xq), st_geometry(x))), ]
## simpler: track in/out per stage via row-count accounting in the
## diagnostic only (exact mapping not needed for the map)

kept <- r_prob[!r_prob$is_outlier, ]

p_kept <- ggplot() +
  geom_sf(data = world, fill = "grey95", colour = "grey80",
          linewidth = 0.2) +
  geom_sf(data = kept, size = 0.2, alpha = 0.4,
          colour = "steelblue4") +
  coord_sf(xlim = c(-15, 40), ylim = c(25, 60)) +
  labs(title = sprintf("Cleaned track (n = %d)", nrow(kept)),
       x = NULL, y = NULL) +
  theme_minimal(base_size = 10)

removed_bridge <- r_br[r_br$is_outlier, ]
removed_prob   <- r_prob[r_prob$is_outlier, ]

p_removed <- ggplot() +
  geom_sf(data = world, fill = "grey95", colour = "grey80",
          linewidth = 0.2) +
  geom_sf(data = st_transform(removed_bridge, 4326),
          size = 1.2, colour = "firebrick") +
  geom_sf(data = removed_prob,
          size = 1.2, colour = "orange") +
  coord_sf(xlim = c(-15, 40), ylim = c(25, 60)) +
  labs(title = sprintf("Removed by bridge (red, n=%d) and probability (orange, n=%d)",
                         nrow(removed_bridge), nrow(removed_prob)),
       x = NULL, y = NULL) +
  theme_minimal(base_size = 10)

if (requireNamespace("patchwork", quietly = TRUE)) {
  patchwork::wrap_plots(p_kept, p_removed, ncol = 2)
} else {
  print(p_kept); print(p_removed)
}
Left: retained fixes after the full pipeline. Right: fixes removed by stages 2–4, coloured by the stage that caught them.

Left: retained fixes after the full pipeline. Right: fixes removed by stages 2–4, coloured by the stage that caught them.

The one-call alternative — mt_clean_track()

The pipeline above narrates each primitive explicitly so the reader can see what is being done and why. For routine cleaning, the unified mt_clean_track() entry-point composes all four primitives behind a single call. It runs the bridge, detour, probability, and speed-cap detectors, applies the class-aware consensus rule by default (configurable via consensus =; see ?mt_flag_consensus), expands topologically isolated blocks, and iterates until convergence.

White storks have a known sustained-flight ceiling around 50 m/s, so supplying that as a physiological cap lets the speed primitive peel any spoof- or teleport-class clusters cleanly before the bridge and probability detectors run on the survivors.

clean_one_call <- mt_clean_track(xq, v_max = 50, plot = FALSE)
cat(sprintf("mt_clean_track kept %d of %d fixes\n",
             nrow(clean_one_call), nrow(xq)))
#> mt_clean_track kept 40244 of 40579 fixes

The kept set should be very close to the union of the bridge and probability flags from the staged pipeline; the class-aware consensus and block expansion give it slightly different recall characteristics on edge cases. Use whichever style matches your taste — the staged version when you want to inspect each signal, the one-call version when you trust the defaults and want to move on.

Composition summary

tab <- data.frame(
  stage = c("1. structural cleaning",
            "2. GPS-quality filter",
            "3. bridge (combined)",
            "4. probability (entropy)"),
  n_in = c(n0, nrow(x), nrow(xq), nrow(xp)),
  n_out = c(nrow(x), nrow(xq), sum(!r_br$is_outlier),
             sum(!r_prob$is_outlier))
)
tab$n_dropped <- tab$n_in - tab$n_out
print(tab, row.names = FALSE)
#>                     stage  n_in n_out n_dropped
#>    1. structural cleaning 49263 48239      1024
#>     2. GPS-quality filter 48239 40579      7660
#>      3. bridge (combined) 40579 40573         6
#>  4. probability (entropy) 40573 40570         3

Each stage handles a distinct class of error:

  • Structural cleaning is O(n) and mechanical.
  • The GPS-quality filter is O(n) and kills the most pathological fixes at source — often the difference between a usable and unusable track.
  • The bridge primitive is O(n) with a one-pass neighbour scan.
  • The probability primitive is O(n) with a fully vectorised KDE lookup; a cap on the 2D turn/step histogram bin count keeps the terra raster operations linear in n even on heavy-tailed step distributions.

Notes on reproducibility

  • The bundled CSV was produced by filtering the Movebank download to a stable subset of columns. All three GPS-quality columns (gps-satellite-count, gps-dop, eobs-horizontal-accuracy-estimate) are preserved.
  • The full Movebank study can be re-downloaded with move2::movebank_download_study(24442409, individual_local_identifier = "Pettstadt1 (AHH16, 14053)") after running move2::movebank_store_credentials() once.
  • Study acknowledgement: LifeTrack White Stork Bavaria, data contributed by Wolfgang Fiedler and colleagues, Max Planck Institute of Animal Behavior.

Further reading