Climate model output emulation has long been attempted to support impact research, mainly to fill in gaps in the scenario space. Given the computational cost of running coupled earth system models (ESMs), which are usually the domain of supercomputers and require on the order of days to weeks to complete a century-long simulation, only a handful of different scenarios are usually chosen to externally force ESM simulations. An effective emulator, able to run on standard computers in times of the order of minutes rather than days could therefore be used to derive climate information under scenarios that were not run by ESMs. Lately, the necessity of accounting for internal variability has also made the availability of initial-condition ensembles, under a specific scenario, important, further increasing the computational demand. At least so far, emulators have been limited to simplified ESM-like output, either seasonal, annual, or decadal averages of basic quantities, like temperature and precipitation, often emulated independently of one another. With this work, we propose a more comprehensive solution to ESM output emulation. Our emulator, STITCHES, uses existing archives of earth system models' (ESMs) scenario experiments to construct ESM-like output under new scenarios or enrich existing initial-condition ensembles, which is what other emulators also aim to do. Importantly, however, STITCHES' output has the same characteristics of the ESM output it sets out to emulate: multivariate, spatially resolved, and high frequency, representing both the forced component and the internal variability around it. STITCHES extends the idea of time sampling – according to which climate outcomes are stratified by the global warming level at which they manifest themselves, irrespective of the scenario and time at which they occur – to the construction of a continuous history of ESM-like output over the whole 21st century, consistent with a 21st-century trajectory of global surface air temperature (GSAT) derived from the scenario that has been chosen as the target of the emulation. STITCHES does so by first splitting the target GSAT trajectory into decade-long windows, then matching each window in turn to a decade-long window within an existing model simulation from the available scenario runs according to its proximity to the target in absolute size of the temperature anomaly and its rate of change. A look-up table is therefore created of a sequence of existing experiment–time-window combinations that, when stitched together, create a GSAT trajectory “similar” to the target. Importantly, we can then stitch together much more than GSAT from these windows, i.e., any output that the ESM has saved for these existing experiment–time-window combinations, at any frequency and spatial scale available in its archive. We show that the stitching does not introduce artifacts in the great majority of cases (we look at temperature and precipitation at monthly frequency and on the native grid of the ESM and at an index of ENSO activity, the Southern Oscillation Index). This is true even if the criteria for the identification of the decades to be stitched together are chosen to work for a smoothed time series of annual GSAT, a result we expect given the larger amount of noise affecting most other variables at finer spatial scales and higher frequencies, which therefore are more “forgiving” of the stitching. We successfully test the method's performance over many ESMs and scenarios. Only a few exceptions surface, but these less-than-optimal outcomes are always associated with a scarcity of the archived simulations from which we can gather the decade-long windows that form the building blocks of the emulated time series. In the great majority of cases, STITCHES' performance is satisfactory according to metrics that reward consistency in trends, interannual and inter-ensemble variance, and autocorrelation structure of the time series stitched together. The method therefore can be used to create ESM-like output according to new scenarios, on the basis of a trajectory of GSAT produced according to that scenario, which could be easily obtained by a simple climate model. It can also be used to increase the size of existing initial-condition ensembles. There are aspects of our emulator that will immediately disqualify it for specific applications, like when climate information is needed whose characteristics result from accumulated quantities over windows of times longer than those used as pieces by STITCHES, droughts longer than a decade for example. But for many applications, we argue that a stitched product can satisfy the climate information needs of impact researchers. STITCHES cannot emulate ESM output from scenarios that result in GSAT trajectories outside of the envelope available in the archive, nor can it emulate trajectories with shapes different from existing ones (overshoots with negative derivative, for example). Therefore, the size and characteristics of the available archives of ESM output are the principal limitations for STITCHES' deployment. Thus, we argue for the possibility of designing scenario experiments within, for example, the next phase of the Coupled Model Intercomparison Project according to new principles, relieved of the need to produce a number of similar trajectories that vary only in radiative forcing strength but more strategically covering the space of temperature anomalies and rates of change.

In this paper, we introduce a novel and comprehensive solution to climate model emulation. Our principal motivation is to support the climate information needs of the impact research community under arbitrary future scenarios of anthropogenic forcings, but we believe that our proposal may potentially benefit the scenario development, integrated assessment, and climate modeling communities.

The overarching problem that our method seeks to resolve stems from the computational and human labor costs of running climate model experiments
according to plausible future scenarios (as opposed to idealized forcings, e.g., 1 %

The latest phase of the Coupled Model Intercomparison Project, Phase 6 (CMIP6;

The range of radiative forcing in 2100 covered by the experiments in Tier 1 of ScenarioMIP, when complemented by the Paris-inspired low-warming
scenario reaching only 1.9

Thus far, the need for additional scenarios not available in ESM output archives has been addressed – if at all – by simple emulators of ESM
output, usually producing multidecadal averages of temperature and – separately – precipitation change fields. Most popular has been simple pattern
scaling, starting from its initial conception

Our approach, STITCHES, emulates an ESM by using its own output as building blocks, thus reproducing by construction the high dimensionality, complexity, and multiple frequencies of original ESM output. Working with existing scenario experiments run by an individual ESM, we stitch together output from experiment–time-window combinations that we extract from the available archive on the basis of the corresponding value of global average temperature in those experiment–time-window combinations.

The idea of using existing simulations' output over a window when global average temperature reaches a given warming level of interest, often called
time sampling, has been frequently and prominently used in recent years

In the next sections, we first describe our method in detail (Sect.

We here describe the emulator rationale and its main aspects and discuss our validation approach.

Many applications have in the recent past focused on a window, along the length of an ESM simulation, when global average temperature change conforms
to a given criterion (e.g., is on average 1.5

Our method, which we suggestively call STITCHES, extends the time sampling approach to an entire century-long global average temperature trajectory
rather than just individual and discrete global average temperature levels. Our hypothesis is that we can devise stringent enough criteria in matching
successive pieces of a time series of global temperature (GSAT) generated under a target scenario to pieces chosen from available GSAT time series
generated by ESMs according to the scenarios run and archived in community databases (e.g., through the CMIP6
database (

Our algorithm is applied separately to each individual ESM, as stitching together different models' lengths of simulations would almost certainly
introduce spurious behavior. Within a single ESM universe, we can envision two distinct types of application of our algorithm, both of which would
build from existing simulations under future scenarios by that model. In one case, the goal is to minimize the number of scenarios run by that ESM,
supplementing the existing ones with stitched ones. To demonstrate the utility of STITCHES in this case, we show the effectiveness of the method
in emulating ESM output under intermediate scenarios to existing ones. This application benefits impact research, enriching the choice of scenarios
whose impacts can be evaluated and compared; it also translates into saving resources by lowering the number of scenarios to be simulated by the ESMs,
in no small measure when considering the large effort involved in preparing forcing inputs. (We repeat here, however, that by construction our
algorithm does not allow extrapolation to levels of warming above those of the highest scenario available in the archive or below the lowest. We will elaborate further on the limiting factors of the archive characteristics for the creation of new scenarios.) In the other case, the goal is to enrich
the number of ensemble members available for existing scenarios. To this effect, STITCHES can be deployed on available simulations of the target
scenario and neighboring scenarios, all potential sources of usable time samples. In this context however we also see promising complementarity with
recently developed emulators that focus specifically on estimating the statistical characteristics of an ESM internal variability and randomly
generating new realizations of it

GSAT archive content, plotted in the space of (

We now describe the steps of the STITCHES algorithm. See also Fig.

Time series of annual GSAT from all available simulations of the 21st century by a given model (all scenarios and initial-condition ensemble
members) are computed; the time series are made into anomalies with respect to a baseline period of 1995–2014 (we refer to GSAT time series in the
following for brevity, but in all cases what we mean is GSAT

An

For each available piece

The same smoothing and splitting procedure is applied to the trajectory of GSAT for the target scenario to be emulated; we call the result
“target pieces”. Note that in the examples of this paper, we derive the target GSAT trajectory from the same ESM, run under a scenario that we
choose as the target of the emulation. Therefore, we apply the smoothing procedure to the target GSAT time series as well. Often the real application of
the algorithm will target a time series of GSAT that is produced by a simple model, like MAGICC (

Each target piece and each available piece can now be represented by a point in the two-dimensional space
(

For each of the target pieces in the sequence spanning the 21st century, one of the neighbors within its

So far the algorithm has produced a new GSAT trajectory, emulating the target one. Importantly, however, the algorithm delivers in essence an ordered series of pointers to the specific experiment–time-window combinations in the archived output from which the chosen neighboring pieces were extracted. Any output from the model (any variable, in isolation or jointly, at any archived frequency and on the native grid of the ESM) can be stitched together according to this sequence, recreating the climate outcome of the desired variable(s) consistent with the emulated scenario.

As pointed out in the description of the algorithm, its parameters (

The ESMs, experiments from ScenarioMIP

At the time of writing, STITCHES is built to integrate with (and depends on) the PANGEO CMIP6 archive of
results (

We now show results for several test cases. Table

Our first goal is to test the ability of STITCHES to reconstruct ESM-like output for new scenarios using ESM output from existing scenarios. We do so
for all available ESMs in the PANGEO CMIP6 archive that provide at least one member under SSP1-2.6 and one member under SSP5-8.5, targeting the two
intermediate scenarios SSP2-4.5 and SSP3-7.0 (see Table

Table

The number of emulated trajectories produced to assess the performance of STITCHES in recreating intermediate scenarios (SSP2-4.5 and SSP3-7.0) from the two “bracketing” scenarios (SSP1-2.6 and SSP5-8.5).

As mentioned in Sect.

For all the models used in our emulation of ESM output under SSP2-4.5 and SSP3-7.0 we report the number of “seams” at which annual GSAT presents a jump that is larger than twice the interannual standard deviation. The latter is computed from either the interannual variations in the archive simulations used in the stitching (in practice, the interannual standard deviations of the stitched trajectories without including the seams in its computation) or the target experiments (the interannual standard deviations of the real series that we are emulating). We also show the total number of seams from which the percentages discussed are computed.

Our first concern is to not

We then compare linear trends fitted to the stitched trajectories to linear trends fitted to the target series by separately fitting a linear trend
to the historical period (1850–2014) and the future period (2015–2100). The trends are defined as the angular coefficient of a linear regression of
annual mean values of GSAT onto years, and we consider central estimates (by ordinary least squares) and 95 % confidence intervals. We find that
in all cases (109 stitched trajectories across the models and the two scenarios) historical trend central estimates for the stitched series fall
comfortably within the confidence intervals of the historical trends of the target series. For the future trends, the confidence intervals of the
stitched series overlap with the confidence intervals of the trends from the target series in all cases. There are 21 trajectories out of the 109 for
which the central estimates fall outside those confidence intervals. In all these cases, the difference between the central estimate and the closest
bound of the confidence interval is a very small value: in one single case, the central estimate is outside the confidence intervals by
0.056

We also compute interannual standard deviations for target and stitched trajectories, finding that once again, historical simulations remain within
the ranges of the target trajectories in all cases. For the future period, in 78 % of cases, the stitched series show interannual variability
within 20 % of that of the target series. The remaining 24 cases, out of the 109 tested, whose interannual variations fall outside the range of
the target series show discrepancies that amount to less than 0.2

Examples of target (black lines) and stitched (colored) GSAT time series for three ESMs in the PANGEO archive that ran at least one trajectory along the Tier 1 experiments of ScenarioMIP (SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5). We choose these three models as they provide differing ensemble sizes (see Table 1) and are characterized by different values of equilibrium climate sensitivity (4.73, 3.40, and 2.72

Even if for a large majority of cases the performance of the emulator seems acceptable, and in many cases indistinguishable from the target cases, we
underline that some model–experiment combinations appear to be challenging for this uniform setup. Most of these cases coincide with models providing
only one ensemble member per scenario, and the spurious behavior is often found at the higher end of the warming range within the scenario emulated,
where the only possible matches come from the model's only available SSP5-8.5 trajectory. It is not unlikely that the matches from the higher scenario
result in less-than-optimal windows, given the limited choice available for the higher temperature levels. Likely, fixing the tolerance parameter to a
tighter value could improve these specific emulation cases or simply fail to create an emulated trajectory so that the user would have an outright
warning of the difficulty in matching. Here we remain within a generic setting in order to show the trade-offs at play and identify lessons. We show
in Fig.

Absolute difference in future trends of monthly temperature variability (TAS) and precipitation (PR) between stitched and target realizations. The value of the difference is expressed by the color scale, and we marked with black crosses those locations where the trends computed from target and stitched time series do not overlap in their 95 % confidence intervals, indicating statistically significant differences. Emulation of CAMS-CM1-0 monthly time series for 2015–2100 under SSP2-4.5 and SSP3-7.0.

Ratio of monthly variability (standard deviation of residuals from trends) in future temperature (TAS) and precipitation (PR) between stitched (at the numerator) and target (at the denominator) realizations. The value of the ratio is expressed by the color scale, which highlights the transition at 0.8 and 1.2. Emulation of CAMS-CM1-0 monthly time series for 2015–2100 under SSP2-4.5 and SSP3-7.0.

For all cases when the emulation of GSAT time series (made of annual average values) does not present inconsistencies our hypothesis is that noisier
quantities would not suffer from detectable discontinuities either. We have tested this expectation for a range of quantities (temperature,
precipitation, and sea level pressure) and scales (from subcontinental to local, i.e., grid-point level). Here, as examples, we compare trends and
variability (computed as the standard deviations of the residuals from the trend) between stitched and target time series under the two scenarios
(over the 2015–2100 period) for temperature (TAS) and precipitation (PR). All metrics here are computed using time series of gridded output at
monthly frequency, covering the entire annual cycle, for the length of the emulated output (2015–2100). In the Appendix we show similar results for
month-specific output sampling behavior during boreal winter (January) and boreal summer (July), addressing the possibility that the emulation could
be differently challenged by stronger or weaker forced trends. We use results from the emulation of two models that represent extremes in the PANGEO
dataset, in terms of availability of archive trajectories: CAMS-CM1-0 (with only two ensemble members each for SSP1-2.6 and SSP5-8.5), for which we
have derived one emulated trajectory per scenario (SSP2-4.5 and SSP3-7.0), and MIROC6 (with 50 ensemble members for each), for which we have emulated
three trajectories per scenario. In the trend figures we blacken grid points where the trends computed from the stitched trajectories are
significantly different from those computed from the target trajectory. We use here the same criterion that we applied to the validation of GSAT:
trends are significantly different when their 95 % confidence intervals do not overlap. For the analysis of monthly variability we show maps of
the ratio of the two variances computed from the stitched and target time series, after removing the linear trends. We consider substantially
different variances that are not within 20 % of one another, i.e., whose ratio is either less than 0.8 or more than 1.2. The color bar is chosen to
highlight these two thresholds. Figure

Performance in terms of monthly variability in temperature is within 20 % of the true variability practically over all the land regions and over the large majority of the oceans' areas, with the exception of a systematic bias over the western Pacific cold tongue. Rainfall variability appears less homogeneously accurate, until one realizes that the areas where variability appears inconsistent (i.e., areas where the value of the ratio is smaller than 0.8 or larger than 1.2) coincide with climatologically very dry areas of both the Northern Hemisphere and Southern Hemisphere. In these regions variability is low, and therefore small differences in the numerator and denominator may cause large variation in the ratio, without implying meaningful differences in rainfall behavior.

Last, still concerned with time series behavior, we consider a different quantity altogether: the Southern Oscillation Index (SOI), describing
the evolution of the El Niño–Southern Oscillation (ENSO) mode of variability. The SOI is defined as the standardized difference between sea level
pressure (SLP) monthly anomalies at Tahiti and Darwin,
Australia (

Figure

Examples of target (left) and stitched (right) SOI time series for three 20-year windows along the length of the simulation: 2015–2034 in the top four panels, 2035–2054 in the middle four panels, 2081–2100 in the bottom four panels. Results from emulation of SSP2-4.5 and SSP3-7.0 for CAMS-CM1-0.

Auto-correlation functions (ACFs) and partial auto-correlation functions (PACFs) for real and stitched SOI time series. Top two rows: SSP2-4.5 ACF for target and stitched series and respective PACFs. Bottom two rows: SSP3-7.0 ACF for target and stitched series and respective PACFs. (Our software – R function

On the basis of these results we confirm the correctness of our expectation that, after validating the statistical characteristics of a large-scale, low-frequency quantity like annual GSAT, further validation of emulated variables at grid-point scale and higher temporal frequency do not seem to present larger challenges. The higher noise of these quantities indeed accommodates the discontinuities introduced by their emulation.

Our emulator can also be used to provide multiple ensemble members under the same scenario, akin to initial-condition ensembles. For this type of
application, besides the necessary validation of the individual members according to the above-described metrics, we want to validate the properties
of the synthetic ensembles as such, comparing their mean behavior and their spread to those of real initial-condition ensembles from the same ESM.
Figures

We adopt the two-dimensional metric of performance introduced by

The two components of the

Its first component (which we indicate below as

Several outcomes can be gleaned from Table

We have performed the same exercise by limiting the archive to the two bracketing scenarios, SSP1-2.6 and SSP5-8.5, and trying to construct ensembles
for SSP2-4.5 and SSP3-7.0. In this case STITCHES is significantly challenged: its performance, as measured by the

The size of a stitched ensemble targeting a given experiment is directly related to the number of ESM ensemble members present in the archive, as well
as the tolerance for matching,

To identify

For each ESM and the two scenarios targeted by the emulation, we show the size of the archive, the number of trajectories used as target, and the number of stitched trajectories obtained from them for the value of

By comparing the generated ensemble size from Table

We have proposed an algorithm, STITCHES, that exploits available simulations of future scenarios to deliver fully consistent and complete ESM-like output according to a new scenario, based on the trajectory of global temperature that the new scenario produces. STITCHES works by stitching together decade-long windows (we use 9 years to be precise, but the length of the window is a tunable parameter) of existing 21st-century ESM simulation output. These windows are chosen on the basis of their corresponding GSAT absolute value and derivative, identified to match those of subsequent windows of the GSAT time series derived from the scenario to be emulated. The same algorithm can also be used to enrich the size of existing initial-condition ensembles. We have demonstrated the algorithm performance using the PANGEO CMIP6/ScenarioMIP archive of the four Tier 1 experiments, SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5, targeting the emulation of the two intermediate scenarios.

Our numerous validation tests have shown that the stitched time series do not reveal in the great majority of cases spurious behavior, even when the
matching criteria are set without being specifically tailored to the internal variability in the ESM to be emulated. We have shown that jumps or
discontinuities are seldom created at the global scale, when considering surface temperature. Since surface temperature is the smoothest quantity among
the variables commonly used to drive impact models, our hypothesis has been that any other variable at the global or regional scale, and for yearly
frequencies or higher, would be even better behaved at the seams, since the larger internal variability would even more easily overwhelm
discontinuities introduced by STITCHES. We have confirmed this hypothesis with case studies for gridded temperature and precipitation at the monthly
frequency. We have also shown that for ENSO, a salient mode of variability for many natural and human systems, a 9-year window does not introduce odd
frequency artifacts in the SOI time series. This should reassure modelers of impacts sensitive to ENSO teleconnections. Synthetic “large ensembles”
created to enrich initial-condition experiments show an ensemble behavior within a small neighborhood of the truth (in most cases much narrower than

Our exploration of the performance of the algorithm as a function of the available archive size suggests that five 21st-century trajectories ensure an acceptable performance (according to our metrics), and even smaller archive sizes often – if not always – deliver acceptable stitched new trajectories. Thus, for modeling centers choosing to invest resources in future scenario simulations, running a well-chosen small set of trajectories that span what the community considers the plausible range of GSAT absolute change and rates of change, or radiative forcing, could suffice, and the center could be better served by focusing on running a few initial-condition ensemble members for each trajectory rather than investing in multiple similarly shaped scenarios. This also entails savings for the community that provides the direct forcing inputs to ESMs by translating IAM output into spatially and temporally resolved forcing fields for scenario simulations. Resources in post-processing of model output, extending to the need of downscaling and bias correcting, will be saved as well, as the emulated scenarios can be built from those post-processed ones.

Of course, our proposal does come with caveats. ENSO frequencies are right around the timescale that is preserved by 9-year windows, but there exist
slower modes of variability in the climate system whose single phases may instead align with such a time span and whose coherent behavior would be
broken by our window splitting and stitching together. Thus, any investigation of impacts that are known to be sensitive to low-frequency variability
at decadal timescales needs to proceed with caution, try lengthening the window

There are more subtle aspects of stitched scenarios that may pose questions of fidelity and representativeness. We have not addressed the challenges
that short but intense forcing episodes, like volcanic eruptions, may pose, since we have focused the application of STITCHES on future scenarios,
which do not represent them. A careful look at Fig.

Last, some technical aspects of our algorithm will benefit from further analysis and considerations: possibly some applications may be able to relax the
tolerance parameter and thus set the conditions for easier matching and more numerous stitched realizations. This might be true of applications that
would not be too sensitive to interannual differences. In contrast, tightening the tolerance to match specific ESMs' internal variability will be
beneficial in eliminating spurious behavior that we have documented in some cases, especially when the archive of available runs is poor. More
generally we could choose a different distance measure in the (

We would have liked to make more than just a rule-of-thumb recommendation for the number of ensemble members that modeling centers should run and
link that formally to the number of expected trajectories created by STITCHES. That said, the last phases of CMIP have shown that, ultimately,
modeling centers will commit what they can to running future scenarios. Our proposal shifts those energies and resources away from running a number of
scenarios of similar shapes. One additional possibility that we have not explored is utilizing idealized experiments like 1 %

In addition to stabilized scenarios, which were not systematically explored by the last set of simulations and that therefore would pose a challenge to STITCHES, STITCHES cannot emulate at this time another type of scenario that is becoming more and more prominent in the policy discourse: the overshoot, i.e., a scenario that presents a peak and decline in forcing and therefore global average temperature. If a range of overshoots are sought, there is the need to run with ESMs some cases with different steepness and length in order to provide building blocks of decreasing temperature at different rates.

Despite the warranted caveats, we believe that our proposal has desirable outcomes for the research communities occupied with climate, scenario, and
impact modeling. Impact and IAM modelers that want to assess impacts for scenarios other than those that have been generated by ESMs, including
endogenously generated forcing pathways within IAMs, could rely on STITCHES to fill the gaps, acquiring the same type of output, in all its complexity
and refinement, that an ESM would provide. An “online” application of STITCHES within an IAM simulation could allow climate impacts to be modeled within
the evolving system that the IAM is modeling and therefore represent fully consistent feedback loops between climate change drivers (emissions) and
climate change impacts. The wider impact research community could choose from a larger set of trajectories and possibly a larger set of initial-condition ensembles than the ESM ran. Climate modelers can reduce the effort devoted to preparing inputs for, setting up, running, and post-processing
future scenarios. We acknowledge here the richness of climate model output archives already at our disposal (CMIP5, CMIP6, SMILES), which right now
provide a wide variety of building blocks. The next phases of CMIP could complement what is available now by deliberately exploring types of scenarios
that are not well represented in the current archives, like stabilized trajectories and overshoots. The challenge would lie in choosing the best set
of runs to optimally populate the (

Numbers to the side of the boxes refer to the algorithm steps detailed in Sect.

Examples of target (black lines) and stitched (colored) GSAT time series for ESMs in the PANGEO archive that ran at least one trajectory along the Tier 1 experiments of ScenarioMIP (SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5). We use the two bracketing scenarios and emulate trajectories that follow the two intermediate scenarios.

Examples of target (black lines) and stitched (colored) GSAT time series for ESMs in the PANGEO archive that ran at least one trajectory along the Tier 1 experiments of ScenarioMIP (SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5). We use the two bracketing scenarios and emulate trajectories that follow the two intermediate scenarios.

Examples of target (black lines) and stitched (colored) GSAT time series for ESMs in the PANGEO archive that ran at least one trajectory along the Tier 1 experiments of ScenarioMIP (SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5). We use the two bracketing scenarios and emulate trajectories that follow the two intermediate scenarios.

Absolute difference in decadal trends of temperature (TAS) and precipitation (PR) between stitched and target realizations. The value of the difference is expressed by the color scale, and we marked as significant with black crosses those locations where the 95 % confidence intervals of the trends computed from target and stitched time series do not overlap, indicating statistically significant differences. Emulation of MIROC6, monthly time series over 2015–2100, for SSP2-4.5 and SSP3-7.0. First realization.

Absolute difference in decadal trends of temperature (TAS) and precipitation (PR) between stitched and target realizations. The value of the difference is expressed by the color scale, and we marked as significant with black crosses those locations where the 95 % confidence intervals of the trends computed from target and stitched time series do not overlap, indicating statistically significant differences. Emulation of MIROC6, monthly time series over 2015–2100, for SSP2-4.5 and SSP3-7.0. Second realization.

Absolute difference in decadal trends of temperature (TAS) and precipitation (PR) between stitched and target realizations. The value of the difference is expressed by the color scale, and we marked as significant with black crosses those locations where the 95 % confidence intervals of the trends computed from target and stitched time series do not overlap, indicating statistically significant differences. Emulation of MIROC6, monthly time series over 2015–2100, for SSP2-4.5 and SSP3-7.0. Third realization.

Absolute difference in decadal trends of January temperature (TAS) and precipitation (PR) between stitched and target realizations. The value of the difference is expressed by the color scale, and we marked as significant with black crosses those locations where the 95 % confidence intervals of the trends computed from target and stitched time series do not overlap, indicating statistically significant differences. Emulation of CAMS, January time series over 2015–2100, for SSP2-4.5 and SSP3-7.0.

Absolute difference in decadal trends of July temperature (TAS) and precipitation (PR) between stitched and target realizations. The value of the difference is expressed by the color scale, and we marked as significant with black crosses those locations where the 95 % confidence intervals of the trends computed from target and stitched time series do not overlap, indicating statistically significant differences. Emulation of CAMS, July time series over 2015–2100, for SSP2-4.5 and SSP3-7.0.

Absolute difference in decadal trends of January temperature (TAS) and precipitation (PR) between stitched and target realizations. The value of the difference is expressed by the color scale, and we marked as significant with black crosses those locations where the 95 % confidence intervals of the trends computed from target and stitched time series do not overlap, indicating statistically significant differences. Emulation of MIROC6, January time series over 2015–2100, for SSP2-4.5 and SSP3-7.0. First realization.

Absolute difference in decadal trends of July temperature (TAS) and precipitation (PR) between stitched and target realizations. The value of the difference is expressed by the color scale, and we marked as significant with black crosses those locations where the 95 % confidence intervals of the trends computed from target and stitched time series do not overlap, indicating statistically significant differences. Emulation of MIROC6, July time series over 2015–2100, for SSP2-4.5 and SSP3-7.0. First realization.

Ratio of monthly variability (standard deviation of residuals from trends) in temperature (TAS) and precipitation (PR) between stitched (at the numerator) and target (at the denominator) time series. The value of the ratio is expressed by the color scale, which highlights the transitions at 0.8 and 1.2. Emulation of MIROC6, monthly time series over 2015–2100, for SSP2-4.5 and SSP3-7.0. First realization.

Ratio of monthly variability (standard deviation of residuals from trends) in temperature (TAS) and precipitation (PR) between stitched (at the numerator) and target (at the denominator) time series. The value of the ratio is expressed by the color scale, which highlights the transitions at 0.8 and 1.2. Emulation of MIROC6, monthly time series over 2015–2100, for SSP2-4.5 and SSP3-7.0. Second realization.

Ratio of monthly variability (standard deviation of residuals from trends) in temperature (TAS) and precipitation (PR) between stitched (at the numerator) and target (at the denominator) time series. The value of the ratio is expressed by the color scale, which highlights the transitions at 0.8 and 1.2. Emulation of MIROC6, monthly time series over 2015–2100, for SSP2-4.5 and SSP3-7.0. Third realization.

Examples of target (left) and stitched (right) SOI time series for three 20-year windows along the length of the simulation: 2015–2034 in the top four panels, 2035–2054 in the middle four panels, 2081–2100 in the bottom four panels. Results from emulation of SSP2-4.5 and SSP3-7.0 for one of three ensemble members emulated under each scenario for MIROC6.

Auto-correlation functions (ACFs) and partial auto-correlation functions (PACFs) for real and stitched SOI time series. Top two rows: SSP2-4.5 ACF for target and stitched series and respective PACFs. Bottom two rows: SSP3-7.0 ACF for target and stitched series and respective PACFs. Results from emulation of one of three ensemble members emulated under each scenario for MIROC6. (Our software – R function

Examples of target (left) and stitched (right) SOI time series for three 20-year windows along the length of the simulation: 2015–2034 in the top four panels, 2035–2054 in the middle four panels, 2081–2100 in the bottom four panels. Results from emulation of SSP2-4.5 and SSP3-7.0 for one of three ensemble members emulated under each scenario for MIROC6.

Auto-correlation functions (ACFs) and partial auto-correlation functions (PACFs) for real and stitched SOI time series. Top two rows: SSP2-4.5 ACF for target and stitched series and respective PACFs. Bottom two rows: SSP3-7.0 ACF for target and stitched series and respective PACFs. Results from emulation of one of three ensemble members emulated under each scenario for MIROC6. (Our software – R function

Examples of target (left) and stitched (right) SOI time series for three 20-year windows along the length of the simulation: 2015–2034 in the top four panels, 2035–2054 in the middle four panels, 2081–2100 in the bottom four panels. Results from emulation of SSP2-4.5 and SSP3-7.0 for one of three ensemble members emulated under each scenario for MIROC6.

Auto-correlation functions (ACFs) and partial auto-correlation functions (PACFs) for real and stitched SOI time series. Top two rows: SSP2-4.5 ACF for target and stitched series and respective PACFs. Bottom two rows: SSP3-7.0 ACF for target and stitched series and respective PACFs. Results from emulation of one of three ensemble members emulated under each scenario for MIROC6. (Our software – R function

Spectral densities computed from target (solid lines) and stitched (dashed lines) SOI time series, for both models and one (for CAMS-CM-0, top four plots) and three (for MIROC6, bottom four plots) realizations.

Examples of enriched ensembles of GSAT time series for ESMs in the PANGEO archive that have at least five trajectories available over the 21st century. As in the figures in Appendix A, warmer colors indicate a larger number of stitched trajectories in the figure, as the title also describes.

Examples of enriched ensembles of GSAT time series for ESMs in the PANGEO archive that have at least five trajectories available over the 21st century. As in the figures in Appendix A, warmer colors indicate a larger number of stitched trajectories in the figure, as the title also describes.

Examples of enriched ensembles of GSAT time series for ESMs in the PANGEO archive that have at least five trajectories available over the 21st century. As in the figures in Appendix A, warmer colors indicate a larger number of stitched trajectories in the figure, as the title also describes.

The two components of the

The STITCHES software is available via GitHub (

CT conceived the general approach. AS and KD significantly refined it, trouble-shot it, and implemented it in Python. All authors collaborated in testing the results. CT led this paper write-up with AS and KD contributing to it.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We acknowledge the World Climate Research Programme, which, through its Working Group on Coupled Modelling, coordinated and promoted CMIP6. We thank the climate modeling groups for producing and making available their model output, the Earth System Grid Federation (ESGF) for archiving the data and providing access, and the multiple funding agencies who support CMIP6 and ESGF. We would like to thank three anonymous reviewers and the journal editor. We also thank Brian O’Neill for substantial feedback in the early stage of the manuscript preparation.

This work was conducted with the support of the US Department of Energy, Office of Science, as part of the GCIMS project within the MultiSector Dynamics program area of the Earth and Environmental System Modeling program. Claudia Tebaldi was also supported by the CASCADE project, funded by the US Department of Energy, Office of Science, Office of Biological and Environmental Research, as part of the Regional and Global Modeling and Analysis program area. The Pacific Northwest National Laboratory is operated by Battelle for the US Department of Energy (contract no. DE-AC05-76RLO1830). Lawrence Berkeley National Laboratory is operated by the US Department of Energy (contract no. DE340AC02-05CH11231).

This paper was edited by Gabriele Messori and reviewed by three anonymous referees.