Seasonal hydrologic forecasting: Do multimodel ensemble averages always yield improvements in forecast skill?


Bohn, T.J., Sonessa, M.Y., Lettenmaier, D.P.  2012. To appear in Journal of Hydrometeorology.


Multi-model techniques have proven useful in improving forecast skill in many applications, including hydrology. Seasonal hydrologic forecasting in large basins represents a special case of hydrologic modeling, in which post-processing techniques such as temporal aggregation and time-varying bias correction are often employed to improve forecast skill. To investigate the effects that these techniques have on the performance of multi-model averaging, we compared the performance of three hydrological models (VIC, Sacramento/Snow-17, and NOAH) and two multi-model averages (simple model average (SMA) and multiple linear regression (MLR) with monthly-varying model weights) in three snowmelt-dominated basins in the western U.S. We performed evaluations for both simulating and forecasting (using the Ensemble Streamflow Prediction (ESP) method) monthly discharge, with and without monthly bias corrections. The single best bias-corrected model outperformed the multi-model averages of raw models in both retrospective simulations and ensemble mean forecasts in terms of RMSE. Forming an MLR multi-model average from bias-corrected models added only slight improvements over the best bias-corrected model. Differences in performance among all bias-corrected models and multi-model averages were small. For ESP forecasts, both bias correction and multi-model averaging generally reduced the RMSE of the ESP ensemble means at lead times of up to 6 months in months when flow is dominated by snowmelt, with the reduction increasing as lead time decreased. In other months, both methods reduced RMSE at all lead times, with little dependence on lead time in most cases. The primary reason for this is that aggregating simulated streamflows from daily to monthly timescales increases model cross-correlation, which in turn reduces the effectiveness of multi-model averaging in reducing those components of model error that bias correction cannot address. This effect may be stronger in snowmelt-dominated basins, because the inter-annual variability of winter precipitation is a common input to all models. We also found that both bias correcting and multi-model averaging using monthly-varying parameters yielded much greater error reductions than methods using time-invariant parameters.