Per-detector flag audit for a cleaned move2 object

Reads the per-detector flag columns that mt_clean_track (or any of the four primitives) attaches to its output, returns four diagnostic tables plus a consensus-mode comparison, and a map showing where each detector fired versus where the consensus rule produced the final is_outlier decision. Use it to decide whether the cascade is doing what you expect, and where to tune.

Usage

mt_diagnose_flags(x, print_tables = TRUE)

Arguments

x: A move2 object with the per-detector flag columns that mt_clean_track or the mt_flag_* primitives attach. Must contain at minimum is_outlier, plus a subset of flagged_by_bridge, flagged_by_prob, flagged_by_detour, flagged_by_speed. error_class is required for the bottom map panel and is produced by mt_clean_track.
print_tables: Logical. If TRUE (default), print the four diagnostic tables plus the consensus-comparison table to the console. Set FALSE to silence the console output and work from the returned list.

Value

Invisibly, a list with components:

error_class: Frequency table of error_class among flagged fixes.
detector_fires: Data frame: for each detector column present, the count and percentage of flagged fixes that detector fired on.
co_fire: Two-way table of (n detectors fired) x (is_outlier) over all events.
near_miss: Data frame: per detector, the number of fixes where it fired but is_outlier stayed FALSE (consensus did not trip).
consensus_comparison: Data frame: under each built-in consensus mode ("class_aware", "strict", "majority", "speed_trusted", "any"), how many fixes would be flagged, expressed in absolute counts and as a delta from the current is_outlier column.
map: A patchwork object combining the five map panels. Use print(result$map) to draw it.

Details

The four tables answer four questions:

error_class – where in the cascade hierarchy do the flags land? A flag-heavy kinematic_confluence bucket means the per-fix detectors are agreeing 2-of-3. A heavy block bucket means block- expansion is doing the catching. A heavy pool bucket means cohort pool_by is supplying flags the per-track cascade missed.
per-detector fires among flagged – who is doing the work? If one detector fires on >80\ fires on <20\ poorly to another.
co-fire histogram – how many detectors typically agree? Most flagged fixes should have 2 or more agreeing. Many one-detector fires that the consensus rejected means the loudest voice is being silenced.
near-miss zone – per detector, how many fixes fired but were NOT flagged because no other detector corroborated? A detector with thousands of near-misses is the candidate for either tighter thresholds (so other detectors catch up) or a looser consensus rule (so its verdict is accepted more often).

The consensus comparison re-applies the consensus rule alone (no re-running the per-fix detectors) under each built-in mode of mt_flag_consensus and reports how many fixes each mode would flag. This is the cheap interactive piece: switching consensus mode is post-hoc, so you see the cost of every choice without paying the cascade's compute cost.

The map has five panels. Top row: four small panels, one per detector, showing where that detector fired. Within each panel, fires that survived the consensus are filled, fires that did NOT survive (the near-miss zone) are hollow. Bottom row: one large panel showing the kept trajectory plus the flagged fixes coloured by error_class. All panels share one azimuthal- equidistant projection centred on the object's centroid, so across-detector visual comparison is honest.

Comparing diagnostics across pool_by configurations

The diagnostic is most informative when run on the same data with different pool_by settings, to evaluate whether cohort-level pooling is doing useful work or pushing the cleaning past outlier-detection into within-animal behavioural anomaly territory.

Pattern: re-run mt_clean_track with a narrower pool_by (e.g. pool_by = "individual_id" instead of pool_by = c("study_id", "individual_id")) and compare the two diagnostic outputs. Three readings to look for:

Same total flag count and same per-detector fire pattern: the cohort distribution and the per-individual distribution agree; the pool_by choice is making no difference and either is fine.
Narrower pool_by flags more on some individuals: those animals have movement patterns narrower than the cohort. The extra flags may be real outliers the cohort distribution missed (in which case narrower is better) OR normal-for-this-animal behaviour that fell outside its own narrow distribution (in which case broader is better and the extra flags are false positives).
Broader pool_by flags more: the cohort has a heavier tail than the individuals (unusual; usually means a mix of distinct movement modes across animals). Cohort pooling is then aggregating the wrong thing; consider stratifying by movement type before pooling.

Visual check is decisive when narrower pooling flags more. Map the newly-flagged fixes; check their step speeds against the physiological cap. Speeds well below physiology + spatial position inside the kept cluster = within-animal behavioural variability, not outliers. Speeds near or above physiology + spatial isolation from the home-range bulk = real outliers the broader pooling missed.

On datasets where the cohort step-speed distribution is unimodal and the species shows a single dominant movement mode, cohort-level pooling is usually the principled outlier-detection choice. Narrower pooling crosses into behavioural-anomaly detection, which is a different operation than removing measurement errors.

Examples

if (FALSE) { # \dontrun{
library(move2)
x <- mt_read(system.file("extdata/synthetic_tracks.csv.gz",
                           package = "move2utils"))
x <- x[!sf::st_is_empty(x), ]
cleaned <- mt_clean_track(x, plot = FALSE, remove = FALSE)
diag <- mt_diagnose_flags(cleaned)
print(diag$map)
diag$consensus_comparison
} # }