The Madden–Julian Oscillation (MJO) is one of the main sources of sub-seasonal atmospheric predictability in the tropical region. The MJO affects precipitation over highly populated areas, especially around southern India. Therefore, predicting its phase and intensity is important as it has a high societal impact. Indices of the MJO can be derived from the first principal components of zonal wind and outgoing longwave radiation (OLR) in the tropics (RMM1 and RMM2 indices). The amplitude and phase of the MJO are derived from those indices. Our goal is to forecast these two indices on a sub-seasonal timescale. This study aims to provide an ensemble forecast of MJO indices from analogs of the atmospheric circulation, computed from the geopotential at 500 hPa (Z500) by using a stochastic weather generator (SWG). We generate an ensemble of 100 members for the MJO amplitude for sub-seasonal lead times (from 2 to 4 weeks). Then we evaluate the skill of the ensemble forecast and the ensemble mean using probabilistic scores and deterministic skill scores. According to score-based criteria, we find that a reasonable forecast of the MJO index could be achieved within 40 d lead times for the different seasons. We compare our SWG forecast with other forecasts of the MJO. The comparison shows that the SWG forecast has skill compared to ECMWF forecasts for lead times above 20 d and better skill compared to machine learning forecasts for small lead times.

Forecasting the Madden–Julian Oscillation (MJO) is a crucial scientific endeavor as the MJO represents one of the most important sources of sub-seasonal predictability in the tropics. The Madden–Julian Oscillation controls tropical convection, with a life cycle going from 30 to 60 d

The improvement of the forecast skill of the MJO is the subject of several studies. Numerical models have shown an ability to forecast the MJO index

Statistical models, such as stochastic weather generators (SWGs), have been used for this purpose. SWGs are designed to mimic the behavior of climate variables

Analogs of circulation were designed to provide forecasts assuming that similar situations in the atmospheric circulation could lead to similar local weather conditions

The goal of this study is to forecast a daily MJO index for a sub-seasonal lead time (

The paper is divided as follows: Sect.

The MJO has been described by various indices that are obtained from different atmospheric variables

The RMM1 and RMM2 allow the computation of the amplitude and the phase of the MJO

To simplify notations in the equations, we note that

The amplitude and the phase describe the evolution of the MJO and its position along the Equator, respectively. The amplitude is related to the intensity of the MJO activity. There are different classifications related to the intensity of the active-MJO events

We obtained daily time series of RMMs, amplitude (

We used the geopotential at 500 hPa (Z500) and 300 hPa (Z300) and outgoing longwave radiation (OLR) daily data to compute the analogs. The data are available from 1948 to 2020 with a horizontal resolution of

In this paper, we predict the daily amplitude

Wheeler–Hendon phase diagram of the MJO event for the period between 3 March and 9 April 1986, for observations. The diagram shows the eight areas of activity of MJO starting from the Indian Ocean.

We start by building a database of analogs. For a day

Hence the distance that is optimized to find analogs of the

We compute separate analogs of Z500, Z300, and OLR following the same procedure over the Indian Ocean as represented in Fig.

The optimal domain of computation of analogs. We computed analogs over the Indian Ocean, in the geographic areas indicated by the dashed black rectangle with coordinates 15

The stochastic weather generator (SWG) aims to generate ensembles of random trajectories that yield physically consistent features. Our SWG is based on circulation analogs that are computed in advance with the procedure described in Sect.

Illustration of the SWG process. The first step goes from a given day to the next day. The second step explains how we randomly select a

For a given day

Weights

Weights

We then replace

To evaluate our forecasts, the predictions made with the SWG are compared to the persistence and climatological forecasts. The persistence forecast consists of using the average value between

We assess the skill of the SWG to forecast the

As the CRPS value depends on the unit of the variable to be predicted, it is useful to normalize it with the CRPS value of a reference forecast, which can be obtained by a persistence or a climatology hypothesis. The continuous ranked probability skill score (CRPSS) is defined as a percentage of improvement over such a reference forecast

The CRPSS values vary between

We also computed the rank (temporal) correlation between the observations and the median of the 100 simulations

A robust forecast requires a good discrimination skill. A discrimination skill represents the ability to distinguish events from non-events. We measure the skill of the SWG in discriminating between situations leading to the occurrence of an MJO event (active MJO) and those leading to the non-occurrence of the event (inactive MJO). To do so, we use the relative operating characteristic (ROC) score. The ROC is used for binary events

If

If

The ROC curve is a plot of the success rate versus the false alarm rate

An increase in AUC indicates an improvement in discriminatory abilities of the model at predicting a negative outcome as a negative outcome and a positive outcome as a positive outcome. An AUC of 0.5 is non-informative.

Finally, we evaluate the ensemble-mean forecast of RMM1 and RMM2 using the usual scalar metrics for MJO forecasts

We compare the RMSE to the ensemble spread in order to evaluate the forecast accuracy. The ensemble spread measures the difference between the members of the ensemble forecast. The ensemble spread

We compute the average amplitude error (

The value of

This formulation stems from the ratio of the cross product (numerator) and dot product (denominator) of the vectors of forecasts

We explore the skill of a SWG in forecasting the

Then, we adjusted the geographical region and the window search of analogs (Fig.

We search for analogs within 30 calendar days. This duration corresponds to the life cycle of the MJO.
In addition, we adjust the SWG in order to select analogs from the same phase, as described in Sect.

To evaluate the skill score of our forecasts, we used two approaches. We used the probabilistic scores such as CRPS, correlation, and ROC score (Sect.

We show results of the forecast of

The first reason is related to the composition of the RMM index. Indeed, the OLR is used as a proxy for organized moist convection

Another reason is related to our forecast approach. The composites of OLR and wind speed highly depend on the phase of the MJO

Indeed, choosing a “large” region to compute analogs yields rather large distances or low correlations for analogs. This implies that the analog SWG gets lower skill scores because the analogs are not very informative. The OLR or zonal wind analogs were computed on the optimal window obtained for Z500 or Z300 as mentioned in Fig.

We tested the forecast of

COR

As an illustration, we show the time series of the simulations and observations of the MJO amplitude for 1986. This year yields an unusually large period of RMM amplitude above 1, suggesting an important MJO activity.
Figure

Time series of observations and simulations of the MJO amplitude for lead times of 3

We evaluate the forecast of amplitude

The CRPSS was computed using as a reference the forecast made from climatology and persistence. We note that the CRPSS vs. persistence reference decreases with time. It has higher values for

We used the ROC diagram to determine the discrimination between active and inactive events of the MJO. We associated

Area under ROC curve (AUC) for the different lead times

Using three probabilistic metrics (CRPSS, correlation, and ROC), we show that the SWG is able to skillfully forecast the MJO amplitude from analogs of Z500. The CRPSS shows a positive improvement of the forecast until 40 d. However, the correlation is significant until 20 d. By using the ROC curve and the discrimination skill, we show that the forecast still has skill until 40 d.

The difference between the lead times that we found using the CRPSS, correlation, and the ROC result from the difference between the skill scores. In fact, the CRPS is used for different categories of events, while the ROC is used for binary events, which is more suitable with our case of study.

Skill scores for the MJO amplitude for lead times going from 3 to 40 d for DJF (blue) and JJA (red) for analogs computed from Z500. Squares indicate CRPSS where the persistence is the reference, triangles indicate CRPSS where the climatology is the reference, and boxplots indicate the probability distribution of correlation between observation and the median of 100 simulations for the period from 1979 to 2020.

ROC curve for all lead times. The plot represents the sensitivity versus the specificity. The diagonal line represents the random classifier obtained when the forecast has no skill. If the ROC curve is below the diagonal line, then the forecast has a poor skill, otherwise it has a good skill; i.e., the forecast has the potential to distinguish between success and false alarms.

In this part, we evaluate the performance of the SWG in forecasting the RMMs (

The COR

In order to verify the forecast skill, we computed the ensemble spread, and we compared it to the RMSE values for the different lead times going from 3 to 40 d (Fig.

We explored the sensitivity of the forecast to seasons as shown in Fig.

We also computed the amplitude and phase errors (Fig.

The assessment of the forecast of MJO amplitude with SWG and analogs of Z500 shows good skill until 40 d using probabilistic scores (CRPSS vs. climatology is 0.2, and CRPSS vs. persistence is 0.4) and scalar scores (

The COR

We assessed the forecast skill of the SWG with other forecasts. We selected two models, POAMA (the Australian Bureau of Meteorology coupled ocean–atmosphere seasonal prediction system) and the ECMWF model, which provide probabilistic and deterministic forecast of the MJO, respectively. We compared mainly the maximum lead time of the MJO amplitude forecast. The POAMA model provides a 10-member ensemble. In hindcast mode, the POAMA model has skill up to 21 d

The average amplitude error (

In addition, we compared quantitatively the SWG forecast with the ECMWF forecast (Fig.

We also compared the SWG forecast skill with a machine learning forecast of MJO indices (RMM1 and RMM2)

To sum up, the comparison of SWG forecasts to ECMWF and

Comparison of the values of COR

We performed an ensemble forecast of the MJO amplitude using analogs of the atmospheric circulation and a stochastic weather generator. We used the Z500 as a driver of the circulation (Fig.

We assessed the forecast skill of the MJO forecast by evaluating the ensemble member and the mean of the ensemble member using probabilistic and scalar verification methods, respectively. This allowed us to evaluate the forecast and also to explore the difference between the two verification methods.

We used probabilistic skill scores as the CRPSS and the AUC of the ROC curve (Table

We found that the forecast is sensitive to seasons (Fig.

This paper hence confirms the skill of the SWG in generating ensembles of MJO index forecasts from analogs of circulation. Such information would be useful to forecast impact variables such as precipitation and temperature.

We did the forecast of RMM1 and RMM2 using analogs of Z300 (Fig.

In this part, we also show the time series for the forecast at different lead times

COR values for different lead times of forecasts from 3 to 90 d over the period from 1979 to 2020 for the SWG forecast based on analogs of Z500 and Z300 for different seasons (DJF, JJA, MAM, and SON).

RMSE values for different lead times of forecasts from 3 to 90 d over the period from 1979 to 2020 for the SWG forecast based on analogs of Z500 and Z300 for different seasons (DJF, JJA, MAM, and SON).

Time series of observations and simulations of the MJO amplitude computed from analogs of OLR for lead times of 3

Time series of observations and simulations of the MJO amplitude computed from analogs of Z300 for lead times of 3

We show in Fig.

The COR reaches the threshold of 0.5 at

Domains of computation of analogs. We computed analogs over the Indian Ocean with coordinates 15

Comparison between the COR

We checked the dependence of the SWG forecast skill of the amplitude of the MJO and the MJO phases. We verified the relationship between the CRPS at

CRPS values above the 75th quantile (Fig.

CRPS values below the 25th quantile (Fig.

Relationship between CRPS and MJO phases.

The code and data files are available at

MK designed and performed the analyses and wrote the manuscript. PY co-designed the analyses. RS provided data for comparison.

The contact author has declared that none of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is part of the EU International Training Network (ITN) Climate Advanced Forecasting of sub-seasonal Extremes (CAFE). The project receives funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Actions (grant agreement no. 813844). The authors would like to thank Alvaro Corral and Monica Minjares for the discussions.

This research has been supported by the Horizon 2020.

This paper was edited by Yun Liu and reviewed by two anonymous referees.