Forecast Engine Changelog

Every major change to DSW's prediction system, documented for transparency. Our post-mortem grades are tied to the engine version that made the forecast.

Current: v5 — Base-rate calibration, deployed June 11, 2026

v5 June 11, 2026 CURRENT

Base-Rate Calibration — "The RRV Is Not Oklahoma"

Deep analysis revealed the system was over-forecasting by 37x — predicting severe weather on 24% of days when it actually occurs on 0.65%. The scoring engine treated Grand Forks like Oklahoma City. Three fundamental fixes applied.

CIN now multiplies the SCORE — Previously CIN only reduced probability percentages, not the threat score. A capped day with CAPE 5000 got the same score as uncapped. Now CIN -75 cuts the score by 45%. June 9 scenario drops from score 95 to 39.

RRV latitude prior — A 35% haircut on scores above L0, reflecting that severe weather at 48°N occurs roughly 1/3 as often as raw parameters suggest. Parameters alarming at 35°N are routine non-events here.

Stricter model initiation gates — 0 models dry = cap at L1 (was L2). 1 model = cap at L1/L2 boundary. 2 models = cap at low L2. Prevents "impressive environment but no storms" scenarios.

AI base-rate anchoring — AI prompt now states the 0.65% severe weather rate explicitly and instructs L0/L1 as the default. L2+ requires uncapped environment + model agreement + clear trigger mechanism.

Trigger: 93% false alarm rate over 153 days. System forecast L2+ on 24% of days; actual rate was 0.65%. Data analysis showed CIN penalty was only on probabilities (cosmetic), not on the threat score that determines the forecast level.

v4 June 10, 2026 1 DAY

Multi-Model Expansion + Radar Nowcast

Massively expanded the data foundation, added live radar nowcasting with smart vector storm tracking, and fixed critical unit conversion bugs that were undermining accuracy.

7+ forecast models — Added HRRR (3km), NAM (12km), NBM (National Blend), best_match, and GFS-GraphCast (AI) via Open-Meteo API alongside existing GFS/ECMWF/GEM/ICON.

173 ensemble members — GEFS (31) + ECMWF ENS (51) + ICON EPS (40) + ECMWF AIFS (51). Real probabilities from 4 independent systems.

Radar nowcast with smart vectors — Live storm cell tracking from IEM NEXRAD, projected positions using closest-approach geometry (not angular approximation). 29 RRV communities tracked. Storm heading NE toward Grand Forks correctly won't alert Fargo.

DSW Threat Alerts — Threat corridor polygons on the radar map for confirmed rotating supercells (meso 5+) and severe storms approaching RRV communities. Strict criteria to avoid false alarms. Max 1 alert per community.

New threat scale — QUIET / LOW / ELEVATED / SIGNIFICANT / DANGEROUS / EXTREME replaces confusing SPC terminology. SPC equivalents preserved for cross-reference.

Model initiation gate — When 0 of 7 models fire storms, threat score caps at ELEVATED regardless of parameters. Prevents the "extreme CAPE but no storms" false alarm pattern.

Wind unit bug fixed — All wind profile SRH/shear calculations were 60% too low due to mph/km-h mismatch. Also fixed STP/SCP conversion factors and steering speed units.

Storm classification calibrated — IEM meso strength 1-3 (algorithmic noise) was being classified as "ROTATING." Now requires strength 5+ for that label, matching NWS operational thresholds.

Radar page: fullscreen toggle, dBZ legend with values, data freshness indicator, 15-minute CAPE monitor, automated post-mortems.

v3 June 10, 2026 SAME DAY

Post-Bust Calibration

After the June 9 bust where DSW (and NWS) over-forecast a severe event that never materialized in the RRV, we applied four major corrections based on verification data.

CIN/cap penalty doubled — Moderate CIN now reduces scores by 50% (was 15%). The STP formula's CIN term was also fixed (was mathematically inverted).

Ensemble reality check — When 0% of GEFS ensemble members produce severe gusts, the threat score is now penalized. Previously this signal was noted but not acted on.

Distance-filtered severe warnings — Tornado warnings 200+ miles away no longer inflate the local threat score. Local (<75mi) = full weight, nearby = half, distant = minimal.

Conditional CI risk reduced — When HRRR doesn't fire storms, we no longer override it with surface boundary detection. The model was right; the cap held.

AI calibration warning — AI now receives DSW's over-forecast bias history and weighs initiation probability as heavily as environment quality.

Automated post-mortems — Every L2+ forecast now gets an after-action review with AI-written analysis.

Public/deep analysis toggle — Plain-language summary by default, technical deep analysis expandable.

Trigger: June 9, 2026 bust — MODERATE forecast, MARGINAL actual. Cap held, zero severe in RRV.

v2 June 9, 2026 1 DAY

Major Rebuild — Original Analysis Platform

Complete overhaul from a data aggregator to an original analysis platform. 13 bugs fixed, 15 new analysis engines built. This version was deployed mid-event on June 9.

HRRR expanded to 26+ parameters — Updraft helicity, model-computed shear/SRH/storm motion, VIL, reflectivity, LCL/LFC, 0-3km CAPE.

15 analysis engines — HRRR storm simulation, run-to-run trending, multi-model CI timing, hodograph SRH, storm mode predictor, ensemble severe probabilities, surface mesoanalysis, NAM Nest comparison, nowcast bridge, verification bias correction, and more.

Fixed silent AI failure — AI enhancement had been broken for all outlooks due to f-string escaping bug. Every outlook was running "algorithmic" only.

Fixed SRH underestimation — Was using crude "shear x 4" approximation giving 92 m/s when real value was 300+. Now uses multi-hour HRRR peak values.

Data-driven storm motion — Replaced hardcoded SW-to-NE assumption with 500hPa steering flow.

Auto AI model selection — Sonnet for high-threat days, Haiku for quiet days.

Known issue: CIN weighting too lenient, ensemble probs not yet moderating scores. Fixed in v3.

v1-alpha January — June 8, 2026 RETIRED

Alpha Prediction Engine

Initial system focused on data aggregation with basic threshold scoring. Limited original analysis.

Basic HRRR integration (10 parameters, analysis time only)

Rule-based probabilities (hardcoded lookup tables)

No AI enhancement (broken since deployment)

Crude SRH estimation (shear x 4 approximation)

Hardcoded SW-to-NE storm motion assumption

No distance filtering on warnings

No surface mesoanalysis or multi-CAM comparison

Verification record: 1/14 correct (7% accuracy). High false alarm rate driven by over-sensitive thresholds and broken AI.

Our Approach

DSW is an experimental platform that aims to produce genuinely original severe weather analysis for the Red River Valley — not just repackage NWS data. We use HRRR 3km storm simulations, multi-model comparison, surface boundary detection, and ensemble probabilities to generate predictions you can't get from weather.gov.

We publish our mistakes as prominently as our hits. Every significant forecast gets an automated post-mortem with an honest grade. When we bust, we document why and build the fix into the next version. That's how the system improves.