We consider the problem of estimating the ensemble sizes required to characterize the forced component and the internal variability of a number of extreme metrics. While we exploit existing large ensembles, our perspective is that of a modeling center wanting to estimate a priori such sizes on the basis of an existing small ensemble (we assume the availability of only five members here). We therefore ask if such a small-size ensemble is sufficient to estimate accurately the population variance (i.e., the ensemble internal variability) and then apply a well-established formula that quantifies the expected error in the estimation of the population mean (i.e., the forced component) as a function of the sample size

Recently, much attention and resources have been dedicated to running and analyzing large ensembles of climate model simulations under perturbed initial conditions

In this methodological study, we adopt the point of view of a modeling center interested in estimating current and future behavior of several metrics of extremes, having to decide on the size of a large ensemble. Such a decision, we assume, needs to be reached on the basis of a limited number of initial condition members, which the center would run as a standard experiment. We choose a size of five, which is a fairly common choice for future projection experiments, and use the statistics we derive on the basis of such small ensemble to estimate the optimal size of a larger ensemble, according to standards of performance that we specify. We test our estimate of the optimal size by using a perfect model setting, defining what a full large ensemble gives us as “the truth”. We use two large ensembles available through the Climate Variability and Predictability (CLIVAR) single-model initial-condition large ensemble (SMILE) initiative

Our metrics of interest are indices describing the tail behavior of daily temperature and precipitation. We conduct the analysis in parallel for extremes of temperature and precipitation because we expect our answers to be dependent on the signal-to-noise ratio affecting these two atmospheric quantities, which we know to be different in both space and time

We consider the goal of identifying the forced change over the course of the 21st century in the extremes behavior. We seek an answer in terms of the ensemble size for which we expect the estimate of the forced component to approximate the truth within a given tolerance or for which our estimate does not change significantly with additional ensemble members. We also consider the complementary problem of identifying the ensemble size that fully characterizes the variability around the forced component. After all, considering future changes in extremes usually has salience for impact risk analysis, and any risk-oriented framework will be better served by characterizing both the expected outcomes (i.e., the central estimates) and the uncertainties surrounding them. Both types of questions can be formulated on a wide range of geographic scales, as the information that climate model experiments provide is used for evaluation of hazards at local scales, for assessment of risk and adaptation options, all the way to globally aggregated metrics, usually most relevant for mitigation policies. The time horizon of interest may vary as well. Therefore, we present results from grid-point scales all the way to global average scales, and for mid-century and late-century projections, specific years or decades along the simulations, or whole century-long trajectories.

The consideration of two models, two atmospheric quantities and several extreme metrics, each analyzed at a range of spatial and temporal scales, helps our conclusions to be robust and – we hope – applicable beyond the specifics of our study.

The CESM1-CAM5 LENS (CESM ensemble from now on) has been the object of significant interest and many published studies, as the more-than-1300 citations of

We use daily output of minimum and maximum temperature at the surface (TASMIN and TASMAX) and average precipitation (PR) and compute a number of extreme metrics, all of them part of the Expert Team on Climate Change Detection and Indices (ETCCDI) suite

TXx – highest value over the year of daily maximum temperature (interpretable as the warmest day of the year);

TXn – lowest value over the year of daily maximum temperature (interpretable as the coldest day of the year);

TNx – highest value over the year of daily minimum temperature (interpretable as the warmest night of the year);

TNn – lowest value over the year of daily minimum temperature (interpretable as the coldest night of the year);

Rx1Day – precipitation amount falling on the wettest day of the year;

Rx5Day – average daily amount of precipitation during the wettest 5 consecutive days (i.e., the wettest pentad) of the year.

an estimate of the ensemble variability that we compute on the basis of the five members available; and

the variable size

It is a well-known result of descriptive statistics that the standard error of the sample mean around the true mean decreases as a function of

Since we are considering extreme metrics that can be modeled by a GEV, we also derive a range of return levels at a set of individual locations.

If a random variable

If

We estimate the parameters of the GEVs, and therefore the quantities that are function of them, like

Available from

Because of the availability of multiple ensemble members we can choose a narrow window along the simulations (we choose 11 years) to satisfy the requirement of stationarity that the standard GEV fit postulates. We perform separate GEV fits centered around several dates along the simulation, i.e., 2000, 2050 and 2095

Since some of the simulations end at 2099, this becomes 2094 in such cases.

(the last chosen to allow extracting a symmetric window at the end of the simulations). The GEV parameters are estimated separately for a range of ensemble sizesWe perform the analysis for a set of individual locations (i.e., grid points), as for most extreme quantities there would be little value in characterizing very rare events as means of large geographical regions. Figure

Recognizing the importance of characterizing variability besides the signal of change, we ask how many ensemble members are required to fully characterize the size of internal variability and its possible changes over the course of the simulation due to increasing anthropogenic forcing.
Process-based studies are suited to tackle the question of how and why changes in internal variability manifest themselves in transient scenarios

In the following presentation of our main findings, we choose two representative metrics, TNx (warmest night of the year) and Rx5Day (average rainfall amount during the 5 wettest days of the year) using the 40-member CESM ensemble. In the Appendix, we include the same type of results for the additional metrics considered and the 50-member CanESM ensemble. We will discuss if and when the results presented in this section differ from those shown in the Appendix.

We start from time series of annual values of globally averaged TNx and Rx5Day (Fig.

Time series for TNx (warmest night of the year, left) and Rx5Day (average daily amount during the 5 consecutive wettest days of the year, right) showing how the estimate of the forced component of their global mean trajectories over the period 1950–2100 changes when averaging an ensemble of increasing size. The top row shows the entire time series. The middle row zooms into the relatively flatter period of 1950–2000, so that the

As Fig.

Global mean of TNx as simulated by the CESM ensemble: values of the RMSE in approximating the full ensemble mean by the individual runs (first row,

This behavior is to be expected, as we know the RMSE of a mean behaves in inverse proportion to the square root of the size of the sample from which the mean is computed, but the actual behavior shown in the plots and Tables

Table

Same as Table

The lessons learned here are as follows:

for both metrics, an accurate estimate of

if the formula for computing the RMSE on the basis of a given sample size is adopted, and that estimate for

We assess how the results of the formula compare to the actual error by considering the difference between the smaller size ensemble means and the truth (the full ensemble mean), year by year and comparing that difference to twice the expected RMSE derived by the formula, i.e.,

In each plot, for each year, the height of the bar gives the error in the estimate of the forced component (defined as the mean of the entire ensemble) as a percentage of the expected 95 % probability bound, estimated by the formula

In the Appendix, we report the results of applying the same analysis to the rest of the indices. We cannot show all results, but we tested country averages, zonal averages, land- and ocean-area averages separately, confirming that the qualitative behavior we assess here is common to all these other scales of aggregation.

Here, we go on to show how the same type of analysis can be applied at the grid-point scale and still deliver an accurate bound for the error in approximating the forced component. For the grid-scale analysis, we define as the forced component anomalies by mid- and end-of-century (compared to a baseline) obtained as differences between 5-year averages: 2048–2052 and 2096–2100 vs. 2000–2005. We use only five members (and as before, a 5-year window for each to increase the sample size) to estimate the ensemble standard deviation of the two anomalies (separately, as that standard deviation may differ at mid-century and the end of the century) at each grid point and compare the actual error when approximating the “true” anomalies (i.e., those obtained on the basis of the full ensemble) by increasingly larger ensemble sizes to the 95 % confidence bound, calculated by the formula

Error in the estimation of anomalies in TNx by mid-century (top two rows) and end of century (bottom two rows) from the CESM ensemble. In each plot, for increasing ensemble sizes, the color of each grid point indicates the ratio (as a percentage) between actual error and the 95 % confidence bound. Values of less than 100 % indicate that the actual error in estimating the anomaly at that location is contained within the bound. The color scale highlights in dark red the values above 100 %, whose total fraction is reported in Table

Error in the estimation of anomalies in Rx5Day by mid-century (top two rows) and end of century (bottom two rows) from the CESM ensemble. In each plot, for increasing ensemble sizes, the color of each grid point indicates the ratio (as a percentage) between actual error and the 95 % confidence bound. Values of less than 100 % indicate that the actual error in estimating the anomaly at that location is contained within the bound. The color scale highlights in dark red the values above 100 %, whose total fraction is reported in Table

Overall, these results attest to the fact that we can use a small ensemble of five members to estimate the population standard deviation and plug it into the formula for the standard error of the sample mean as a function of sample size. Imposing a ceiling for this error allows us then to determine how large an ensemble should be, in order to approximate the forced component to the desired level of accuracy. This holds true across the range of spatial scales afforded by these models, from global means all the way to grid-point values.

As explained in Sect.

How many ensemble members are needed for the estimates to stabilize and the size of the confidence interval not to change in a substantial way?

Is there any gain in applying GEV fitting rather than simply “counting” rare events across the ensemble?

Return levels for TNx at (row-wise) 2000, 2050 and 2100 for (column-wise) 2-, 5-, 10-, 20-, 50- and 100-year return periods, based on estimating a GEV by using 11-year windows of data around each date. In each plot, for increasing ensemble sizes along the

Return levels for Rx5Day at (row-wise) 2000, 2050 and 2100 for (column-wise) 2-, 5-, 10-, 20-, 50- and 100-year return periods, based on estimating a GEV by using 11-year windows of data around each date. In each plot, for increasing ensemble sizes along the

Figures

Further statistical precision could be attained by relaxing the quasi-stationarity assumption and extending the analysis period to contain a longer window of years. Exchanging time for ensemble members, however, when beyond a decade's worth, necessitates in most cases the inclusion of temporal covariates: for example, indicators of the phase and magnitude of major modes of variability known to affect the behavior of the atmospheric variables in question over multi-decadal scales. The inclusion of covariates of course adds another source of fitting uncertainty.

After concerning ourselves with the characterization of the forced component, we turn to the complementary problem of characterizing internal variability. Rather than aiming at eliminating the effects of internal variability as we have done so far in the estimation of a forced signal, we take here the opposite perspective, wanting to fully characterize its size and behavior over space and time. After all, the real world realization will not be akin to the mean of the ensemble but to one of its members, and we want to be sure to estimate the range of variations such members may display. Thus, we ask how large the ensemble needs to be to fully characterize the variations that the full-size ensemble produces, which once again we take as the truth (as mentioned, the answer to this question can be seen as a systematic confirmation that five members are sufficient for the estimation of

Estimating the ensemble variance for TNx: each plot corresponds to a year along the simulation length (1950, 1975, 2000, 2025, 2050, 2075, 2100). The color indicates the number of ensemble members needed to estimate an ensemble variance at that location that is statistically indistinguishable from that computed on the basis of the full 40-member ensemble, using an

Estimating the ensemble variance for Rx5Day: each plot corresponds to a year along the simulation length (1950, 1975, 2000, 2025, 2050, 2075, 2100). The color indicates the number of ensemble members needed to estimate an ensemble variance at that location that is statistically indistinguishable from that computed on the basis of the full 40-member ensemble, using an

The two columns on the left-hand side of Fig.

Detecting changes in the size of the variance over time by comparing two dates over the simulation is a problem that we expect to require more statistical power than the problem of characterizing the size of the variance at a given point, as the difference between stochastic quantities is affected by larger uncertainty than the quantities individually considered, unless those are strongly correlated. Figure

Estimating changes in ensemble variance for TNx: each plot corresponds to a pair of years along the simulation (same set of years as depicted in Figs.

Estimating changes in ensemble variance for Rx5Day: each plot corresponds to a pair of years along the simulation (same set of years as depicted in Figs.

Blank areas are regions where the full ensemble has not detected any changes in the ensemble variance at that location when comparing the two dates. Colored areas are regions where such change has been detected by the full ensemble, and the color indicates what (smaller) ensemble size is sufficient to detect the same change. Here, as in the previous analysis, a significant change is detected when the

We do not show it explicitly here, as it is not the focus of our analysis, but, for both model ensembles, when the change is significant, the ensemble variance increases over time for both precipitation metrics, indicating that the ensemble spread increases with the strength of external forcing over time under RCP8.5. This is expected as the variance of precipitation increases in step with its mean. For the temperature-based metrics, the changes, when significant, are mostly towards an increase in variance (ensemble spread) with forcings for hot extremes (TNx and TXx, the hottest night and day of the year), for which the significant changes are mostly located in the Arctic region. The ensemble spread decreases instead for cold extremes (TNn and TXn, the coldest night and day of the year), for which the significant changes are mostly located in the Southern Ocean.

Another aspect that is implicitly relevant to the establishment of a required ensemble size, if the estimation is concerned with emergence of the forced component, or, more in general, with “detection and attribution”-type analysis is the signal-to-noise ratio of the quantity of interest.
Assuming as we have done in our study that the quantity of interest can be regarded as the mean

Ensemble size

In this study, we have addressed the need for deciding a priori the size of a large ensemble, using an existing five-member ensemble as guidance. Aware that the optimal size ultimately depends on the purpose the ensemble is used for, and in order to cover a wide range of possible uses, we chose metrics of temperature and precipitation extremes and we considered output from grid-point scale to global averages. We tackled the problem of characterizing forced changes along the length of a transient scenario simulation and that of characterizing the system's internal variability and its possible changes. By using a high emission scenario like RCP8.5, but considering behaviors all along the length of the simulations, we are also implicitly addressing a wide range of signal-to-noise magnitudes. Using the availability of existing large ensembles with two different models, CESM1-CAM5 and CanESM2, we could compare our estimates of the expected errors that a given ensemble size would generate with actual errors, obtained using the full ensembles' estimates as our “truth”.

First, we find that for the many uses that we explored, it is possible to put a ceiling on the expected error associated with a given ensemble size by exploiting a small ensemble of five members. We estimate the ensemble variance at a given simulation date (e.g., 2000, or 2050, or 2095), which is the basis for all our error computations, on the basis of five members, “borrowing strength” by using a window of 5 years around that date. The results we assess are consistent with assuming that the quantities of interest are normally distributed with standard deviation

In all cases considered, a much smaller ensemble size of 5 to 10 members, if enriched by sampling along the time dimension (that is, using a 5-year window around the date of interest) is sufficient to characterize the ensemble variability, while its changes along the course of the simulations under increasing greenhouse gases, when found significant using the full ensemble size, can be detected using 15 or 20 ensemble members.

Some caveats are in order.
Obviously, the question of how many ensemble members are needed is fundamentally ill-posed, as the answer ultimately and always depends on the most exacting use to which the ensemble is put. One can always find a higher-frequency, smaller-scale metric and a tighter error bound to satisfy, requiring a larger ensemble size than any previously identified. As tropical-cyclone-permitting and eventually convection-permitting climate model simulations become available, these metrics will be more commonly analyzed. Even for a specific use, the answer depends on the characteristics of internal variability. The fact that for both the models considered here five ensemble members are sufficient to obtain an accurate estimate of it is promising, but this does not guarantee that five are sufficient for all models. In fact, this could also be invalidated by a different experimental exploration of internal variability: new work is adopting different types of initialization, involving ocean states, which could uncover a dimension of internal variability that has so far being underappreciated

With this work, however we have shown a way to attack the problem “bottom up”, starting from a smaller ensemble and building estimates of what would be required for a given problem. One can imagine a more sophisticated setup where an ensemble can be recursively augmented (rather than assuming a fixed five-member ensemble as we have done here) in order to approximate the full variability incrementally better. We have also shown that for a large range of questions the size needed is actually well below what we have come to associate with “large ensembles”. There exist other important sources of uncertainties in climate modeling, one of which is beyond reach of any single modeling center, having to do with structural uncertainty (e.g.,

Global mean of TNn as simulated by the CESM ensemble: values of the RMSE in approximating the full ensemble mean by the individual runs (first row,

Global mean of TXx as simulated by the CESM ensemble: values of the RMSE in approximating the full ensemble mean by the individual runs (first row,

Global mean of TXn as simulated by the CESM ensemble: values of the RMSE in approximating the full ensemble mean by the individual runs (first row,

Global mean of Rx1Day as simulated by the CESM ensemble: values of the RMSE in approximating the full ensemble mean by the individual runs (first row,

Global mean of TNx as simulated by the CanESM ensemble: values of the RMSE in approximating the full ensemble mean by the individual runs (first row,

Global mean of Rx5Day as simulated by the CanESM ensemble: values of the RMSE in approximating the full ensemble mean by the individual runs (first row,

Global mean of TNn as simulated by the CanESM ensemble: values of the RMSE in approximating the full ensemble mean by the individual runs (first row,

Global mean of TXx as simulated by the CanESM ensemble: values of the RMSE in approximating the full ensemble mean by the individual runs (first row,

Global mean of TXn as simulated by the CanESM ensemble: values of the RMSE in approximating the full ensemble mean by the individual runs (first row,

Global mean of Rx1Day as simulated by the CanESM ensemble: values of the RMSE in approximating the full ensemble mean by the individual runs (first row,

Percentage of the global, land or ocean surface where the actual errors exceed the errors estimated on the basis of the formula “a priori” using five ensemble members to estimate

Percentage of the global, land or ocean surface where the actual errors exceed the errors estimated on the basis of the formula “a priori” using five ensemble members to estimate

Percentage of the global, land or ocean surface where the actual errors exceed the errors estimated on the basis of the formula “a priori” using five ensemble members to estimate

Percentage of the global, land or ocean surface where the actual errors exceed the errors estimated on the basis of the formula “a priori” using five ensemble members to estimate

Like Fig.

Like Fig.

Like Fig.

As Fig.

As Fig.

As Fig.

As Fig.

As Fig.

The 15 locations at which we fit GEV distributions to the various quantities.

Return levels for TNn from the CESM ensemble at (row-wise) 2000, 2050 and 2100 for (column-wise) 2-, 5-, 10-, 20-, 50- and 100-year return periods, based on estimating a GEV by using 11-year windows of data around each date. In each plot, for increasing ensemble sizes along the

Return levels for TXx from the CESM ensemble at (row-wise) 2000, 2050 and 2100 for (column-wise) 2-, 5-, 10-, 20-, 50- and 100-year return periods, based on estimating a GEV by using 11-year windows of data around each date. In each plot, for increasing ensemble sizes along the

Return levels for TXn from the CESM ensemble at (row-wise) 2000, 2050 and 2100 for (column-wise) 2-, 5-, 10-, 20-, 50- and 100-year return periods, based on estimating a GEV by using 11-year windows of data around each date. In each plot, for increasing ensemble sizes along the

Return levels for Rx1Day from the CESM ensemble at (row-wise) 2000, 2050 and 2100 for (column-wise) 2-, 5-, 10-, 20-, 50- and 100-year return periods, based on estimating a GEV by using 11-year windows of data around each date. In each plot, for increasing ensemble sizes along the

Return levels for TNx from the CanESM ensemble at (row-wise) 2000, 2050 and 2100 for (column-wise) 2-, 5-, 10-, 20-, 50- and 100-year return periods, based on estimating a GEV by using 11-year windows of data around each date. In each plot, for increasing ensemble sizes along the

Return levels for TNn from the CanESM ensemble at (row-wise) 2000, 2050 and 2100 for (column-wise) 2-, 5-, 10-, 20-, 50- and 100-year return periods, based on estimating a GEV by using 11-year windows of data around each date. In each plot, for increasing ensemble sizes along the

Return levels for TXx from the CanESM ensemble at (row-wise) 2000, 2050 and 2100 for (column-wise) 2-, 5-, 10-, 20-, 50- and 100-year return periods, based on estimating a GEV by using 11-year windows of data around each date. In each plot, for increasing ensemble sizes along the

Return levels for TXn from the CanESM ensemble at (row-wise) 2000, 2050 and 2100 for (column-wise) 2-, 5-, 10-, 20-, 50- and 100-year return periods, based on estimating a GEV by using 11-year windows of data around each date. In each plot, for increasing ensemble sizes along the

Return levels for Rx1Day from the CanESM ensemble at (row-wise) 2000, 2050 and 2100 for (column-wise) 2-, 5-, 10-, 20-, 50- and 100-year return periods, based on estimating a GEV by using 11-year windows of data around each date. In each plot, for increasing ensemble sizes along the

Return levels for Rx5Day from the CanESM ensemble at (row-wise) 2000, 2050 and 2100 for (column-wise) 2-, 5-, 10-, 20-, 50- and 100-year return periods, based on estimating a GEV by using 11-year windows of data around each date. In each plot, for increasing ensemble sizes along the

Estimating the ensemble variance for TNn in the CESM ensemble: each plot corresponds to a year along the simulation length (1950, 1975, 2000, 2025, 2050, 2075, 2100). The color indicates the number of ensemble members needed to estimate a variance at that location that is statistically indistinguishable from that computed on the basis of the full 40-member ensemble. The results of the first two columns use only the specific year for each ensemble member. The results of the third and fourth columns enrich the samples by using 5 years around the specific date.

Estimating the ensemble variance for TXx in the CESM ensemble: each plot corresponds to a year along the simulation length (1950, 1975, 2000, 2025, 2050, 2075, 2100). The color indicates the number of ensemble members needed to estimate a variance at that location that is statistically indistinguishable from that computed on the basis of the full 40-member ensemble. The results of the first two columns use only the specific year for each ensemble member. The results of the third and fourth columns enrich the samples by using 5 years around the specific date.

Estimating the ensemble variance for TXn in the CESM ensemble: each plot corresponds to a year along the simulation length (1950, 1975, 2000, 2025, 2050, 2075, 2100). The color indicates the number of ensemble members needed to estimate a variance at that location that is statistically indistinguishable from that computed on the basis of the full 40-member ensemble. The results of the first two columns use only the specific year for each ensemble member. The results of the third and fourth columns enrich the samples by using 5 years around the specific date.

Estimating the ensemble variance for Rx1Day in the CESM ensemble: each plot corresponds to a year along the simulation length (1950, 1975, 2000, 2025, 2050, 2075, 2100). The color indicates the number of ensemble members needed to estimate a variance at that location that is statistically indistinguishable from that computed on the basis of the full 40-member ensemble. The results of the first two columns use only the specific year for each ensemble member. The results of the third and fourth columns enrich the samples by using 5 years around the specific date.

Estimating the ensemble variance for TNx in the CanESM ensemble: each plot corresponds to a year along the simulation length (1950, 1975, 2000, 2025, 2050, 2075, 2100). The color indicates the number of ensemble members needed to estimate a variance at that location that is statistically indistinguishable from that computed on the basis of the full 50-member ensemble. The results of the first two columns use only the specific year for each ensemble member. The results of the third and fourth columns enrich the samples by using 5 years around the specific date.

Estimating the ensemble variance for TNn in the CanESM ensemble: each plot corresponds to a year along the simulation length (1950, 1975, 2000, 2025, 2050, 2075, 2100). The color indicates the number of ensemble members needed to estimate a variance at that location that is statistically indistinguishable from that computed on the basis of the full 50-member ensemble. The results of the first two columns use only the specific year for each ensemble member. The results of the third and fourth columns enrich the samples by using 5 years around the specific date.

Estimating the ensemble variance for TXx in the CanESM ensemble: each plot corresponds to a year along the simulation length (1950, 1975, 2000, 2025, 2050, 2075, 2100). The color indicates the number of ensemble members needed to estimate a variance at that location that is statistically indistinguishable from that computed on the basis of the full 50-member ensemble. The results of the first two columns use only the specific year for each ensemble member. The results of the third and fourth columns enrich the samples by using 5 years around the specific date.

Estimating the ensemble variance for TXn in the CanESM ensemble: each plot corresponds to a year along the simulation length (1950, 1975, 2000, 2025, 2050, 2075, 2100). The color indicates the number of ensemble members needed to estimate a variance at that location that is statistically indistinguishable from that computed on the basis of the full 50-member ensemble. The results of the first two columns use only the specific year for each ensemble member. The results of the third and fourth columns enrich the samples by using 5 years around the specific date.

Estimating the ensemble variance for Rx1Day in the CanESM ensemble: each plot corresponds to a year along the simulation length (1950, 1975, 2000, 2025, 2050, 2075, 2100). The color indicates the number of ensemble members needed to estimate a variance at that location that is statistically indistinguishable from that computed on the basis of the full 50-member ensemble. The results of the first two columns use only the specific year for each ensemble member. The results of the third and fourth columns enrich the samples by using 5 years around the specific date.

Estimating the ensemble variance for Rx5Day in the CanESM ensemble: each plot corresponds to a year along the simulation length (1950, 1975, 2000, 2025, 2050, 2075, 2100). The color indicates the number of ensemble members needed to estimate a variance at that location that is statistically indistinguishable from that computed on the basis of the full 50-member ensemble. The results of the first two columns use only the specific year for each ensemble member. The results of the third and fourth columns enrich the samples by using 5 years around the specific date.

Histograms of required ensemble sizes at grid boxes where significant change in variability is detected: each plot corresponds to a map in Fig.

Histograms of required ensemble sizes at grid boxes where significant change in variability is detected: each plot corresponds to a map in Fig.

Estimating changes in ensemble variance for TNn in the CESM ensemble: each plot corresponds to a pair of years along the simulation. Colored areas are regions where on the basis of the full 40-member ensemble a significant change in variance was detected. The colors indicate the size of the smaller ensemble needed to detect the same change. Here too the sample size is increased by using 5 years around each date.

Estimating changes in ensemble variance for TXx in the CESM ensemble: each plot corresponds to a pair of years along the simulation. Colored areas are regions where on the basis of the full 40-member ensemble a significant change in variance was detected. The colors indicate the size of the smaller ensemble needed to detect the same change. Here too the sample size is increased by using 5 years around each date.

Estimating changes in ensemble variance for TXn in the CESM ensemble: each plot corresponds to a pair of years along the simulation. Colored areas are regions where on the basis of the full 40-member ensemble a significant change in variance was detected. The colors indicate the size of the smaller ensemble needed to detect the same change. Here too the sample size is increased by using 5 years around each date.

Estimating changes in ensemble variance for Rx1Day in the CESM ensemble: each plot corresponds to a pair of years along the simulation. Colored areas are regions where on the basis of the full 40-member ensemble a significant change in variance was detected. The colors indicate the size of the smaller ensemble needed to detect the same change. Here too the sample size is increased by using 5 years around each date.

Estimating changes in ensemble variance for TNx in the CanESM ensemble: each plot corresponds to a pair of years along the simulation. Colored areas are regions where on the basis of the full 50-member ensemble a significant change in variance was detected. The colors indicate the size of the smaller ensemble needed to detect the same change. Here too the sample size is increased by using 5 years around each date.

Estimating changes in ensemble variance for TNn in the CanESM ensemble: each plot corresponds to a pair of years along the simulation. Colored areas are regions where on the basis of the full 50-member ensemble a significant change in variance was detected. The colors indicate the size of the smaller ensemble needed to detect the same change. Here too the sample size is increased by using 5 years around each date.

Estimating changes in ensemble variance for TXx in the CanESM ensemble: each plot corresponds to a pair of years along the simulation. Colored areas are regions where on the basis of the full 50-member ensemble a significant change in variance was detected. The colors indicate the size of the smaller ensemble needed to detect the same change. Here too the sample size is increased by using 5 years around each date.

Estimating changes in ensemble variance for TXn in the CanESM ensemble: each plot corresponds to a pair of years along the simulation. Colored areas are regions where on the basis of the full 50-member ensemble a significant change in variance was detected. The colors indicate the size of the smaller ensemble needed to detect the same change. Here too the sample size is increased by using 5 years around each date.

Estimating changes in ensemble variance for Rx1Day in the CanESM ensemble: each plot corresponds to a pair of years along the simulation. Colored areas are regions where on the basis of the full 50-member ensemble a significant change in variance was detected. The colors indicate the size of the smaller ensemble needed to detect the same change. Here too the sample size is increased by using 5 years around each date.

Estimating changes in ensemble variance for Rx5Day in the CanESM ensemble: each plot corresponds to a pair of years along the simulation. Colored areas are regions where on the basis of the full 50-member ensemble a significant change in variance was detected. The colors indicate the size of the smaller ensemble needed to detect the same change. Here, too, the sample size is increased by using 5 years around each date.

Ensemble size

Ensemble size

The large ensembles output is available through the CLIVAR Large Ensemble Working Group webpage in the archive maintained through the NCAR CESM community project

CT conceived the study, analyzed the data and wrote the paper. KD and MW provided data pre-processing and co-wrote the paper. RL advised and co-wrote the paper.

The contact author has declared that neither they nor their co-authors have any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors would like to thank Flavio Lehner and Chengzhu (Jill) Zhang for help with data access, and two anonymous reviewers and the editor for comments that helped improve the study significantly.

This study was supported by the Energy Exascale Earth System Model (E3SM) project funded by the US Department of Energy, Office of Science, Office of Biological and Environmental Research. Michael Wehner was supported by the CASCADE project also funded by the US Department of Energy, Office of Science, Office of Biological and Environmental Research. The Pacific Northwest National Laboratory is operated by Battelle for the US Department of Energy (contract no. DE-AC05-76RLO1830). Lawrence Berkeley National Laboratory is operated by the US Department of Energy (contract no. DE340AC02-05CH11231).

This paper was edited by Christian Franzke and reviewed by two anonymous referees.