Zeus AI awarded DOE Phase II grant to build a kilometer-scale severe weather model
We're excited to share that Zeus AI has been awarded a Department of Energy Small Business Innovation Research (SBIR) Phase II grant to develop a high-resolution AI foundation model for severe weather forecasting over the Continental United States.
The project, A Meteorological Foundation Model for Gap-filled High-Resolution Data in Urban Environments, extends our global foundation model, EarthNet, into a multi-resolution system that combines kilometer-scale regional observations, including convection-resolving MRMS radar composites, with satellite observations and point measurements from weather stations and radiosondes.
The grant supports development across the full pipeline, from data infrastructure to validation:
- ML-ready data pipelines for historical and real-time severe weather observations, including MRMS radar, microwave sounders, and surface stations at 1-2 km resolutions
- Kilometer-scale data assimilation and nowcasting over CONUS at 15-minute temporal resolution, with no reliance on numerical weather prediction systems
- Continuous forecast validation through an extended WeatherBench framework adapted for high-temporal-resolution regional evaluation
- Independent atmospheric profile verification in collaboration with MIT Lincoln Laboratory
Early Results
We've already begun producing ML-ready datasets at a resolution matched to severe weather. Our MRMS radar and GOES-R satellite pipelines now cover the full CONUS domain at 1.6 km and 15-minute resolution, representing a 10-20x spatial resolution increase over our previous global datasets.
Before multi-modal training, most data sources are compressed into a compact latent representation, a process we call tokenization, that reduces GPU memory requirements and allows the foundation model to ingest many sensors simultaneously. Tokenizing radar-derived variables like precipitation and hail fields, that are zero-inflated and heavily right-skewed, is tricky for plain autoencoders. Our solution is a discrete-continuous tokenizer that jointly models occurrence probability and conditional intensity, achieving R = 0.990 against observed MRMS precipitation rates.
We're also exploring hyperspherical VAE representations, where latent vectors are constrained to a manifold rather than unconstrained Euclidean space. These produce latents with structured separation between precipitation regimes, a property we expect to be useful within the multi-modal foundation model.
What's Next?
With our data pipelines and tokenization models in place, we're moving into the core modeling phase: training the multi-modal foundation model that brings together radar, satellite, and surface observations into a unified forecasting system.
We're grateful for DOE's continued support and we look forward to sharing results as the work progresses.