Bridge-residual outlier detection and error-morphology classification
Source:vignettes/OUTLIER_4_outlier_bridge.Rmd
OUTLIER_4_outlier_bridge.RmdYou can use mt_flag_outliers_bridge() on its own when
you want to spot GPS fixes that geometrically don’t belong — a single
point that jumps far from where its neighbours sit, then comes right
back. This function is the “geometric” detector in the four-detector
cascade that mt_clean_track() orchestrates; running it
standalone lets you look at just the geometric signal in isolation, or
build your own composition on top of it.
This vignette walks through what the detector does, the directional variant that additionally tells you what kind of geometric anomaly you found, and how to read the outputs.
If you’re new to the package, you probably want
mt_clean_track() instead — it calls this function for you
alongside three other detectors and a consensus rule that combines their
outputs. Come back here when you want fine-grained control over just one
signal or when you’re building a custom pipeline.
The companion mt_flag_outliers() (probability-based,
movement-metric) is covered in
vignette("OUTLIER_1_getting_started", package = "move2utils").
Both detectors share the same threshold machinery; they target different
kinds of error and can be composed.
How it works — intuition first
If a GPS fix doesn’t belong, the simplest test is: given where the animal was before and where it was just after, where would you expect the middle fix to sit? If the actual fix is far from that expectation — further than the time gap on either side could plausibly account for — it’s probably wrong.
That’s the bridge construction. For each fix, we draw a “bridge” between its two temporal neighbours, weighted by how much time sits on either side, and measure how far the actual fix is from that bridge. A small distance means the fix is roughly where it should be; a large one means something’s off.
Crucially, the width of the bridge — how much wandering is plausible over a given time gap — depends only on the timestamps, never on the fix’s own position. That’s the key design choice: an outlier can’t inflate the denominator that would have flagged it. Methods that normalise by a locally-estimated variance suffer from this “leverage problem” precisely because a bad fix pollutes its own scale.
The formal picture (for the curious)
For each fix i with projected coordinates p_i and time t_i, the temporal neighbours i-1 and i+1 define a Brownian-bridge expectation
m_i = α p_{i-1} + (1 − α) p_{i+1}, α = Δt₂ / (Δt₁ + Δt₂)
where Δt₁ = t_i − t_{i-1} and Δt₂ = t_{i+1} − t_i.
The residual is r_i = p_i − m_i. The natural scaling for that residual is the bridge’s own standard-deviation width,
w_i = √(Δt₁ · Δt₂ / (Δt₁ + Δt₂)).
The bridge score is η_i = ||r_i|| / w_i. The key property described in plain language above: w_i depends only on timestamps, never on positions. An outlier therefore cannot inflate its own denominator.
Two methods
-
method = "isotropic"uses the scalar residual magnitude. General-purpose; answers “is this fix geometrically far from where it should be?”. -
method = "directional"decomposes the residual onto the local travel axis into parallel (η_para, along-track) and perpendicular (η_perp, across-track) components. Flags on η_perp alone, but the diagnostic value is in the (η_para, η_perp) pair — it classifies the kind of error at each flag. -
method = "combined"(default) applies the threshold to both scores independently and flags a fix if either one trips.
The outlier bridge method here inherits the bridge-mean construction
from the dynamic Brownian bridge (dBBMM) and the perpendicular
decomposition from the dynamic bivariate Gaussian bridge (dBGB), but
deliberately does not invoke their variance-estimation
machinery — that avoids the leverage problem where an outlier inflates
the very variance that would denominate it. See
mt_dbbmm_variance() / mt_dbgb_variance() for
the full dynamic models.
Requirements
- The function accepts longitude/latitude or projected input; it auto-projects internally to a local AEQD for the Euclidean math and returns the result in your original CRS.
- Enough context. With fewer than a handful of locations the bridge neighbours are undefined; in practice, tracks of at least a few dozen locations are needed for the threshold estimator to be stable.
A worked example on synthetic data
inst/extdata/synthetic_tracks.csv.gz contains three
central-place-forager tracks with ground-truth outliers for validation.
CPF_A has 23 injected outliers; CPF_B is clean; CPF_C has 4.
library(move2)
library(sf)
library(move2utils)
path <- system.file("extdata/synthetic_tracks.csv.gz",
package = "move2utils")
tracks <- mt_read(path)
tracks <- tracks[!st_is_empty(tracks), ]
cpfA <- tracks[mt_track_id(tracks) == "CPF_A", ]
cpfA_p <- st_transform(cpfA, mt_aeqd_crs(cpfA))
nrow(cpfA_p)
#> [1] 1748Default run (combined method, entropy threshold)
res <- mt_flag_outliers_bridge(cpfA_p, plot = FALSE)
#> Running bridge-residual detection (method = combined) on 1748 locations...
#> Iter 1: flagged 6 (break at eta = 1591.86 / eta_perp = 407.45).
#> Iter 2: flagged 4 (break at eta = 1472.40 / eta_perp = 1182.96).
#> Iter 3: flagged 7 (break at eta = 1255.72 / eta_perp = 1022.76).
#> === 17 outliers (0.97% of 1748) ===
table(res$is_outlier)
#>
#> FALSE TRUE
#> 1731 17
coords <- sf::st_coordinates(res)
par(mar = c(4, 4, 3, 1))
plot(coords, type = "l", col = "grey80", asp = 1,
xlab = "x (m)", ylab = "y (m)",
main = sprintf("CPF_A — %d flags (combined, entropy)", sum(res$is_outlier)))
points(coords[!res$is_outlier, ], pch = 16, cex = 0.25, col = "grey50")
points(coords[res$is_outlier, ], pch = 1, cex = 1.6,
col = "firebrick", lwd = 1.4)
legend("topright", pch = c(16, 1), col = c("grey50", "firebrick"),
legend = c("kept", "flagged"), bty = "n")
CPF_A synthetic track in projected coordinates. Grey line traces the full track in order; firebrick circles mark the fixes mt_flag_outliers_bridge() flagged under the default combined-entropy configuration. Flagged points sit visibly off the otherwise smooth central-place-forager geometry.
The columns added to the input are
| column | meaning |
|---|---|
bridge_residual |
scalar residual magnitude |
bridge_width |
bridge-width normalisation w_i |
bridge_eta |
score η_i = residual / width (scalar / isotropic) |
bridge_percentile |
empirical quantile of η within the track |
bridge_iteration |
iteration number on which the fix was flagged (0 = not flagged) |
is_outlier |
logical flag |
The "entropy" threshold (default) is the deepest valley
in the kernel-density estimate of log(η). On clean data the density is
unimodal, there is no valley, and the function returns zero flags — a
no-op guarantee that makes it safe to apply
universally. The "gap" threshold is more sensitive: it uses
a broken-stick null model plus tail-decay inflection, and will often
pick up borderline points that entropy leaves alone.
res_gap <- mt_flag_outliers_bridge(
cpfA_p,
threshold_type = "gap",
plot = FALSE
)
#> Running bridge-residual detection (method = combined) on 1748 locations...
#> Iter 1: flagged 17 (break at eta = 14.95 / eta_perp = 17.63).
#> Iter 2: flagged 12 (break at eta = 14.89 / eta_perp = 12.14).
#> Iter 3: flagged 8 (break at eta = 14.70 / eta_perp = 14703.05).
#> === 37 outliers (2.12% of 1748) ===
table(res_gap$is_outlier)
#>
#> FALSE TRUE
#> 1711 37Use "gap" when you want sensitivity and expect tolerable
false- positive rates; use "entropy" (the default) when
false positives cost more than missed outliers, or when the track might
be clean.
Directional variant — error-morphology classification
res_d <- mt_flag_outliers_bridge(
cpfA_p,
method = "directional",
plot = FALSE
)
#> Running bridge-residual detection (method = directional) on 1748 locations...
#> Iter 1: flagged 5 (break at eta_perp = 407.45).
#> Iter 2: flagged 2 (break at eta_perp = 1138.77).
#> Iter 3: flagged 3 (break at eta_perp = 1022.82).
#> === 10 outliers (0.57% of 1748) ===
grep("bridge", names(res_d), value = TRUE)
#> [1] "bridge_residual" "bridge_width" "bridge_eta"
#> [4] "bridge_eta_para" "bridge_eta_perp" "bridge_obs_inflation"
#> [7] "bridge_percentile" "bridge_iteration"Two additional columns appear:
-
bridge_eta_para— residual parallel to the local travel axis -
bridge_eta_perp— residual perpendicular to it
Flags are placed on bridge_eta_perp (the across-track
signal that is typically the more diagnostic of genuine measurement
error). But the interesting thing is the pair of values for
each flag.
Reading the (η_para, η_perp) plane
flags <- res_d[res_d$is_outlier, ]
n_flag <- nrow(flags)
plot(log10(flags$bridge_eta_para + 1),
log10(flags$bridge_eta_perp + 1),
xlab = expression(log[10](eta["para"] + 1)),
ylab = expression(log[10](eta["perp"] + 1)),
pch = 16, col = "firebrick", cex = 1.2,
main = sprintf("error morphology on %d flagged points", n_flag))
abline(0, 1, lty = 2, col = "grey50")
Interpretation of the plane:
- Points near the diagonal — η_para ≈ η_perp. Isotropic scatter; classic jitter-type error.
- Points below the diagonal (high η_para, low η_perp) — along-track jumps. Ghost reports, clock glitches, GNSS fold-back.
- Points above the diagonal (low η_para, high η_perp) — across-track drift. Characteristic of multipath, reflective environments, or constellation changes that bias one axis.
This is why the isotropic and directional flag sets can look very different on a real dataset even though both are built on the same residual: the isotropic score answers magnitude, the directional decomposition answers morphology. Together, they separate error classes that a single detector would merge.
Composing bridge with the other primitives
mt_flag_outliers_bridge() is one of four primitives in
the move2utils outlier-detection framework; the others are
mt_flag_outliers() (probability-based),
mt_flag_outliers_detour() (geometric, time-insensitive
path-vs-displacement ratio), and mt_flag_speed_cap()
(step-level physiological cap). They are complementary:
-
mt_flag_outliers()is probability-based; it excels on multi-state, behaviourally heterogeneous tracks where the gap-aware auto-difference normalisation carries the signal. -
mt_flag_outliers_bridge()is geometric and leverage-immune; it excels on long clean tracks and on detecting drift or spoofing. -
mt_flag_outliers_detour()is time-insensitive and scale-invariant; it catches out-and-back GPS spikes at sparse sampling rates where the bridge primitive’s σ-scaling loses sensitivity. -
mt_flag_speed_cap()is step-level; it catches the boundaries of coherent multi-fix error blocks that per-fix detectors structurally cannot reach.
The unified mt_clean_track() cleaner composes all four
under class-aware flagging. A pragmatic standalone pipeline flags the
union (or the intersection, depending on your cost function) of two of
the four; the example below shows bridge + probability, which is the
most common pairing for moderately-sampled tracks:
prob <- mt_flag_outliers(cpfA_p)
bridge <- mt_flag_outliers_bridge(cpfA_p, method = "directional", plot = FALSE)
both_flags <- prob$is_outlier | bridge$is_outlier
cat("probability:", sum(prob$is_outlier), "\n")
cat("bridge :", sum(bridge$is_outlier), "\n")
cat("union :", sum(both_flags), "\n")For routine use, mt_clean_track() performs this
composition for you and adds a step-level speed cap, a per-fix consensus
rule (configurable via consensus =; see
?mt_flag_consensus), and a topological block-expansion pass
that catches coherent multi-fix clusters which per-fix scoring
structurally cannot resolve. Supply v_max when species
biology gives you a physiological cap — that lets the pipeline peel
spoof- or teleport-class clusters at their boundaries before the bridge
and probability detectors run on the survivors.
clean <- mt_clean_track(cpfA_p)
clean <- mt_clean_track(eagle_track, v_max = 30) # eagle ~30 m/s ceilingIterative refinement
A single pass can miss outliers that were themselves used as
neighbours by other outliers. iterations = n re-runs the
detector on the currently-clean subset up to n times
(default 3), stopping early on convergence. Adjust when your tracks have
long runs of consecutive bad fixes.
Disabling neighbour smearing
Near a real outlier, its two immediate neighbours will look anomalous
because one of their bridge expectations is a corrupted point.
dedup_neighbours = TRUE (default) suppresses this
neighbour-smearing artefact via a small peak-picking pass. Turn it off
only if you have a reason to suspect it is masking real clustered
errors.
Further reading
-
vignette("OUTLIER_1_getting_started", package = "move2utils")— the unifiedmt_clean_track()workflow and a brief tour of all four primitives. -
vignette("OUTLIER_2_diagnose_clean_track", package = "move2utils")— the post-run health check -
vignette("OUTLIER_3_state_conditional", package = "move2utils")— when the diagnostic flags bimodal behaviour, the recipe for cleaning each behavioural state separately. -
vignette("OUTLIER_5_persistence_score", package = "move2utils")— multi-scale annotation that scores how confidently each flag is an outlier; useful as a post-cleaning confidence filter. -
vignette("OUTLIER_heterogeneous_error_regimes", package = "move2utils")— outlier detection with heterogeneous error regimes: one sensor at a time -
vignette("OUTLIER_example_outlier_whitestork", package = "move2utils")— a full narrated cleaning pipeline on a real high-frequency stork track. -
vignette("OUTLIER_example_leo_migration", package = "move2utils")— outlier detection on irregular, large-scale satellite data. -
?mt_flag_outliers_bridge— full argument reference and parameter tuning notes.