Climate–groundwater dynamics inferred from GRACE and the role of hydraulic memory

. Groundwater is the largest store of freshwater on Earth after the cryosphere and provides a substantial proportion of the water used for domestic, irrigation and industrial purposes. Knowledge of this essential resource remains incomplete, in part, because of observational challenges of scale and accessibility. Here we examine a 14-year period (2002–2016) of Gravity Recovery and Climate Experiment (GRACE) observations to investigate climate–groundwater dynamics of 14 tropical and sub-tropical aquifers selected from WHYMAP’s (Worldwide Hydrogeological Mapping and Assessment Programme) 37 large aquifer systems of the world. GRACE-derived changes in groundwater storage resolved using GRACE Jet Propulsion Laboratory (JPL) mascons and the Community Land Model’s land surface model are related to precipitation time series and regional-scale hydrogeology. We show that aquifers in dryland environments exhibit long-term hydraulic memory through a strong correlation between groundwater storage changes and annual precipitation anomalies integrated over the time series; aquifers in humid environments show short-term memory through strong correlation with monthly precipitation. This classiﬁcation is consistent with estimates of groundwater response times calculated from the hydrogeological properties of each system, with long (short) hydraulic memory associated with slow (rapid) response times. The results suggest that groundwater systems in dryland environments may be less sensitive to seasonal climate variability but vulnerable to long-term trends from which they will be slow to recover. In contrast, aquifers in humid regions may be more sensitive to climate disturbances such as drought related to the El Niño–Southern Oscillation but may also be relatively quick to recover. Exceptions to this general pattern are traced to human interventions through groundwater abstraction. Hydraulic memory is an important factor in the management of groundwater resources, particularly under climate change.

Thi s v e r sio n is b ei n g m a d e a v ail a bl e in a c c o r d a n c e wit h p u blis h e r p olici e s. S e e h t t p://o r c a . cf. a c. u k/ p olici e s. h t ml fo r u s a g e p olici e s. Co py ri g h t a n d m o r al ri g h t s fo r p u blic a tio n s m a d e a v ail a bl e in ORCA a r e r e t ai n e d by t h e c o py ri g h t h ol d e r s .

Introduction
The availability of freshwater is essential for sustaining human life, economic security and access to the benefits of a wide range of ecosystem services (Taylor et al., 2013a). After the cryosphere, groundwater is the second largest store of freshwater on the planet, supplying 36 % of domestic water, 42 % of irrigation for agriculture and 27 % of industrial water use (Döll et al., 2012). Bidirectional flows be-tween surface water and groundwater are fundamentally important to the ecology of semi-arid and arid regions (drylands) where surface water often recharges groundwater and baseflow from groundwater can sustain rivers and wetlands in the absence of rainfall (Alley et al., 2002;de Graaf et al., 2019). Climate change in which anthropogenic emissions of greenhouse gases transform patterns of natural variability, together with substantial socio-economic change, predi-cates that management of freshwater resources has and will increasingly become a critical task (Famiglietti, 2014). In a climate where it is broadly predicted that "wet gets wetter, dry gets drier" (Trenberth, 2011), water storage at and below the land surface will be a vital tool in enabling successful adaptation to the changing global environment (Damkjaer and Taylor, 2017;Wada, 2016). Despite the importance of groundwater, there are considerable gaps in current knowledge and understanding (Güntner et al., 2007). Direct observations of groundwater are sparse in relation to its geographical scale so most global or regional groundwater data are based on output from large-scale models. These include global hydrological models (GHMs) (Sood and Smakhtin, 2015) and land surface models (LSMs) (Bierkens, 2015;Overgaard et al., 2006;Wood et al., 2011) for which there are often insufficient data available to constrain or calibrate (Döll et al., 2016). Model simulation of key processes such as soil hydrodynamics and groundwater recharge is therefore based on theoretical frameworks rather than field data (Scanlon et al., 2002). As a result, there is also considerable uncertainty about climate-groundwater dynamics. Recent work in this area has either focused on localised observations of changes in groundwater storage ( GWS) from piezometry (Cuthbert et al., 2019a) or occurred adjacent to large centres of population where human intervention, through the extraction of groundwater by pumping, can greatly influence observational measurements (Scanlon et al., 2018). In the context of an ∼ 85 % increase in global groundwater abstraction from 1979 to 2010 (Wada et al., 2014), an understanding of climate-groundwater dynamics, supported by large-scale observational data, is required to inform sustainable access to groundwater resources (Taylor et al., 2009).
In response to the lack of in situ field observations, remote sensing by satellite is increasingly being utilised to expand the scope of observational data available to Earth Sciences (Acker and Leptoukh, 2007). An important advance in the quality of global data for hydrological studies has come from the Gravity Recovery and Climate Experiment (GRACE), a collaboration between the National Aeronautics and Space Administration (NASA) in the USA and the German Aerospace Centre (DLR) launched in March 2002 (Tapley et al., 2004). Completed sets of approximately monthly measurements are used to derive the changes in mass at the Earth's surface, and from these data mass fluxes can be extracted that directly relate to the hydrosphere. Over land, the flux is expressed as a change in total water storage ( TWS) at a spatial resolution of ∼ 300 km and with an expected accuracy of better than 2 cm equivalent water height (EWH) (Tapley et al., 2004). GRACE ceased operation due to battery failure in mid-2016 having created a record of 163 monthly gravity solutions (Tapley et al., 2019). Although GRACE operated for 10 years longer than anticipated at its launch, it is a relatively brief dataset in relation to large-scale climate patterns impacting the global hydrologi-cal system with frequencies of several years or decades (e.g. Pacific Decadal Oscillation -PDO, Atlantic Multidecadal Oscillation -AMO). Nevertheless, inter-annual periodicities associated with the El Niño-Southern Oscillation (ENSO) and the Antarctic Circumpolar Wave (ACW) have been detected (Mémin et al., 2015;Ni et al., 2018;Phillips et al., 2012).
Intrinsic parameters of GRACE data effectively define the spatial and temporal dimensions of this study, but there are additional constraints related to the derivation of GWS data from GRACE TWS that also need to be considered. The sub-division of GRACE TWS into its component parts, including GWS, requires the application of GHM or LSM output that is itself subject to associated uncertainty, as already noted (Döll et al., 2014). It has been demonstrated that there is a relatively poor correlation between GRACE and GHMs and LSMs in the evaluation of TWS, with significant discrepancies at the basic level of whether storage trends are increasing or decreasing (Scanlon et al., 2018). These findings have been confirmed with reference to regional piezometric groundwater measurements from tropical aquifers in Africa (Bonsor et al., 2018). Thus, the application of GRACE data to GWS implies three distinct areas of uncertainty: in the processing of the GRACE signal, accuracy of GHM and LSM model projections and mutual consistency of the observed (GRACE) and modelled (GHM and LSM) data (Long et al., 2015).
This study investigates the spatio-temporal properties of climate-groundwater dynamics using a subset of the 37 large aquifer systems of the world (LASW) as defined by the Worldwide Hydrogeological Mapping and Assessment Programme (WHYMAP) (Anon, 2008) and shown in Fig. S1 in the Supplement. This subset comprises aquifers that lie broadly within the tropics and sub-tropics where climate variability is mostly defined by rainfall (Shepherd, 2014). The 14 aquifers selected are listed in Table 1, together with their key characteristics including aridity index (AI) calculated from the Consultative Group for International Agriculture Research's Consortium for Spatial Information (CGIAR-CSI) Global Aridity Index dataset (Trabucco and Zomer, 2019) shown in Fig. 1. Following the work of Shamsudduha and Taylor (2020), the groundwater storage response to regional climate variability for these 14 largescale aquifer systems is investigated using GWS data extracted from the whole of the available GRACE TWS time series (August 2002-July 2016), together with climate data that are defined by the areal extent of each of the aquifer systems.
Several studies have used GRACE data to examine storage changes within a particular groundwater (GW) system (e.g. Becker et al., 2010;Bonsor et al., 2018;Chen et al., 2010Chen et al., , 2016Henry et al., 2011;Z. Huang et al., 2015;Ramillien et al., 2014;Shamsudduha et al., 2012Shamsudduha et al., , 2017Tiwari et al., 2009;Xavier et al., 2010;Yeh et al., 2006). Here, we examine the dynamics of climate-groundwater interactions in- ferred from the underlying patterns of large-scale GWS in response to extremes of precipitation. We find that hydraulic memory (HM) is a key component in the classification of groundwater responses to climate variability. We then seek to reconcile the results with reference to the physical characteristics of individual aquifer systems (Cuthbert et al., 2019b) whilst accounting for anomalous responses in GWS to climate variability.

GWS derived from GRACE data
Mass fluxes relating to the hydrosphere contained in the GRACE land-signal measurement of changes in the Earth's gravitational field are defined as TWS. In order to obtain information relating specifically to groundwater, this signal is separated into the component parts that comprise TWS, generally represented as where SWS is surface water storage, SMS is soil moisture storage and SNS is snow-water equivalent storage. GWS is then derived from TWS according to the following equation: The locations of the 14 aquifers are outside areas where changes in snow-water equivalent substantially impact TWS (Getirana et al., 2017). SNS can consequently be omitted so that Eq. (2) can be rewritten for the purposes of this particular study as Since GRACE started transmitting, several solutions have been developed for analysing and producing GRACE TWS data to increasing levels of accuracy with the intention that the data be readily and freely available for research (Landerer and Swenson, 2012). In this instance, three different products were drawn from Shamsudduha and Taylor (2020), two of which are spherical harmonic (SH) solutions comprising CSR Land (version RL05.DSTvSCS1409) from the Centre for Space Research (CSR) at the University of Texas at Austin and CNES/GRGS (version RL03-v1) from the French Centre National d'Etudes Spatiale (CNES) and the Groupe de Recherche de Géodésie Spatiale (GRGS) and one of which is JPL-mascon (version RL05M 1.MSCNv01) from the Jet Propulsion Laboratory (JPL) NASA. To derive TWS, all GRACE solutions require additional processing that includes corrections for glacial isostatic rebound and atmospheric mass variation (Landerer and Swenson, 2012). SH solutions also require spatial filtering (or "de-striping"), whereas JPL-mascon does not as it directly converts the GRACE signal into mass concentration blocks (mascons), rendering monthly gravitational fields directly as 3 • × 3 • gridded spatial components to reduce errors (Watkins et al., 2015).
On inspection, the divergence between the three TWS datasets was significant when summed over the time series. The relatively large coefficient of variance, −104 %, even though derived from a small sample size, calls into question the use of an ensemble mean for this study. Such an approach may be appropriate for the use of SH products alone (Sakumura et al., 2014), but it is preferable not to combine SH products and mascons (Felix W. Landerer, personal communication, 2019). Consequently we rely solely on the JPLmascon dataset possessing a better signal-to-noise ratio and potentially less error Watkins et al., 2015;Xie et al., 2018). The employed JPL-mascon dataset has been spatially sampled at a 0.5 • grid using dimensionless scaling factors provided as 0.5 • × 0.5 • bins derived from the CLM4.0 LSM (Long et al., 2015;Wiese et al., 2016). Table 1. Characteristics of the 14 aquifer systems selected for the study according to the WHYMAP and CGIAR-CSI databases with statistics giving (L to R): total number of resident population, aquifer area, proportion of groundwater-fed irrigation, mean aridity index classification (Trabucco and Zomer, 2019), mean annual rainfall and mean variability in annual rainfall.  (Wahr et al., 1998), and in the standard datasets all anomalies are given with respect to a baseline which is the mean over the period January 2004 to December 2009 (JPL NASA, 2019). However, we examine the completed available GRACE TWS time series with respect to climate anomalies over the consistent timeframe of the entire series. Consequently, the employed JPL-mascon TWS dataset has been rescaled with respect to a time mean taken over the whole period of GRACE operation (August 2002-July 2016), which is the study reference period (SRP) (JPL NASA, 2019). As set out in Eq. (3), datasets for SMS and SWS derived from LSMs are required to determine GWS from TWS since observational data at the spatio-temporal scales of this study do not exist. Datasets for the 14 aquifer systems were drawn from NASA's Global Land Data Assimilation System (GLDAS) (Rodell et al., 2004) comprising the output from four different LSMs (Shamsudduha and Taylor, 2020): the Common (Community) Land Model (CLM, version 2.0), Noah (version 2.7.1), the Variable Infiltration Capacity (VIC) model (version 1.0) and Mosaic (version 1.0) (Rui and Beaudoing, 2019). As with TWS, analysis of the four LSM datasets for SWS + SMS shows that their divergence summed over the entire time series is substantial, with a coefficient of variance of 258 %, suggesting that a LSM-ensemble mean approach may also not be appropriate for this analysis even given the restricted sample size. Further, the inter-and intra-model variability of SWS in the LSM datasets , assessed as surface runoff (e.g. Shamsudduha and Taylor, 2020;Thomas et al., 2017), is much less substantial than that of SMS (inter-model coefficient of variance 378 %). In the absence of consideration of SWS, groundwater recharge is primarily determined by the effect of evapotranspiration on moisture in the soil zone (Long and Mahler, 2013). Therefore, for this study, the modelling of SMS is a key determinant of the outcomes for GWS computed using Eq. (3) (de Vries and Simmers, 2002). Modelled soil profiles vary substantially in each of the four LSMs ranging in depth from 3.5 m (Mosaic) to 1.9 m (VIC) and, in vertical layers, from 10 m (CLM) to 3 m (VIC & Mosaic) (Rodell et al., 2004). CLM 2.0 (Bonan et al., 2002;Dai et al., 2003), with 3.4 m depth and 10 vertical layers, features the most well-developed soil model (Scanlon et al., 2018) and has been shown to perform well in comparative testing (Scanlon et al., 2018;Spennemann et al., 2014). In addition, CLM has demonstrated appropriate variability in initial ensemble model runs undertaken here, meaning that SMS is almost always less than the magnitude of TWS, thereby ensuring that GWS estimates derived from Eq. (3) are not arbitrarily high or low (Shamsudduha and Taylor, 2020). Therefore, this study employs a single model, CLM, for SMS and SWS rather than adopting a LSM ensemble mean approach.

Climatology
Individual aquifer system shapefiles from the WHYMAP LASW were prepared as ASCII files and uploaded to KNMI Climate Explorer (KNMI Climate Explorer, 2018). This allowed a range of climate data to be extracted for the precise spatial boundaries of each system. In particular, precipitation (PCP) data from the CRU TS4.03 dataset at 0.5 • resolution (Climate Research Unit, University of East Anglia, 2019) were obtained, together with anomalies (PCPA) normalised for the SRP (2002SRP ( -2016. The CRU TS4.03 datasets, together with the GWS derived from JPL-mascon TWS and CLM 2.0 SMS and SWS, in accordance with Eq. (3), were used to create time series analyses to explore correlations over different time and volume components through integration. In this respect, the use of "annual" in this study implies the appropriate hydrological year (HY).
In order to calibrate the time series for each aquifer system prior to further analysis, the lag between monthly PCP, as the primary climate-groundwater index, and monthly GRACE TWS was set by maximising the Pearson correlation coefficient (PCC) between the two datasets, validated by pointwise verification of alignment of the time series. In the majority of cases, this comparison showed TWS lagging PCP by 2 months. The lag for the PCPA time series were set in the same way with relation to GWS but with the already determined PCP time series lag set as a minimum. In the case of all aquifer systems except for the Congo, Canning and Indus River basins, this procedure resulted in a consistent lag being applied to all of the time series investigations of each aquifer. Initial investigations also established that only relatively weak first-order correlations exist between TWS and other monthly observational climate data such as the self-calibrating Palmer Drought Severity Index (PDSI-sc) (Wells et al., 2004) and mean temperature anomalies (CPC GHCN/CAMS t2m analysis) (Fan and van den Dool, 2008). By a comparison with both these measures, it appeared that PCPA carried a stronger climate variability signal due to the tropical or sub-tropical location of the selected aquifers (Allan et al., 2010;Shepherd, 2014). An analysis was then conducted to test for correlations between GWS and a series of measures of precipitation. Three separate time series of precipitation were developed to examine the temporal response of the study LASW with respect to the process by which precipitation at the land surface contributes to GWS.
2. PCPA equals monthly precipitation anomalies with respect to the consistently applied study reference period time-mean baseline, 2002-2016.
3. PCPA equals cumulative monthly rainfall anomalies derived by integrating the PCPA time series.
These monthly series were also summed to provide annual time series for each aquifer system. Correlation was measured using the PCC with statistical significance determined by a t test with α = 0.05 (Spearman, 1904). In addition, as previously stated, the CGIAR-CSI Global Aridity Index dataset (Trabucco and Zomer, 2019) was obtained, and a numerical AI for each aquifer was extracted as a spatial mean value using quantum geographic information system (QGIS). AI was used to place each aquifer into the climate zone classification specified by the dataset as set out in Table 1. Of the climate zones relating to the 14 aquifer systems, three are arid, five are semi-arid and one is dry semi-humid, giving nine in total in dryland zones (Corvalán et al., 2005), and five are humid.

Hydraulic memory
In using cumulative rainfall anomalies, this study invokes the concept of system memory (Weber and Stewart, 2004). Several studies have considered the question of hydraulic or hydrologic memory, as it impacts both soil moisture, including land-atmosphere dynamics (Castro et al., 2009;Lo and Famiglietti, 2010;Wu et al., 2002), and groundwater (Currell et al., 2016;Cuthbert et al., 2019b;Güntner et al., 2007;Rodell and Famiglietti, 2001). Central to the definition of this "memory" is that it represents the time taken for a system to re-equilibrate following a change in boundary conditions (Downing et al., 1974). In the case of an aquifer system, approximated to a one-dimensional flow of uniform diffusivity, the groundwater response time (GRT) is given by Eq. (4): where L is a measure of the scale of the system, S is the storativity, β is a dimensionless constant and T is transmissivity. Qualitatively, Eq. (4) implies that long response times are characterised by large-scale systems and/or low hydraulic diffusivity (i.e. combination of high S and low T ) (e.g. Kooi and Groen, 2003). An alternative approach to quantifying memory may be needed in more complex -and realisticmultidimensional flow situations (see Cuthbert et al., 2019b). Nevertheless, Eq. (4) still provides a useful order of magnitude approximation. Here, it is helpful to consider the response time as a delay between system input and system output whereby the output state H , at time t, is given by where p(τ ) is the input state or function at time τ , (t − τ ) is the delay between output and input, and θ is an impulse response function (IRF), also known as a transfer function (Long and Mahler, 2013). The IRF is a multi-parameter function that is intended to model the properties of the system so that the output of the IRF determines the time t at which the state H is reached. The hydraulic memory is quantified by the length of time that the effect of the input persists in the system. As the IRF is commonly exponential, making the equilibrium state asymptotic, system memory can be defined as the time interval at which the IRF is 95 % complete. This approach has been successfully applied to modelling aquifer responses to precipitation validated by piezometry in both the USA (Long and Mahler, 2013) and the Netherlands (von Asmuth and Knotters, 2004). Alternatively, system memory may be defined as the length of time taken for the effect of the anomalous input to decay to 1/e of its starting value when this can be explicitly measured (Cuthbert et al., 2019b;Lo and Famiglietti, 2010). In relation to Eq. (5), chosen precipitation measures are p(τ ) input functions, and GWS represents H (t), the output measure. The time step τ for each of the precipitation time series used is as shown in Table 2. Correlation between GWS (output) and a particular precipitation dataset (input) can be considered to be a measure of the persistence of the effect of that input integrated over the time step. The degree of correlation between GWS and annual PCPA is thus indicative of the duration of HM in the aquifer system.

Regional-scale hydrogeology
In an exploration of climate-groundwater dynamics using GRACE data, the lack of direct physical observational data means that it is necessary to demonstrate that results are not simply artefacts of modelling and signal processing (Rodell et al., 2009). The role of hydrogeology in determining groundwater dynamics is widely acknowledged (Befus et al., 2017;Cuthbert et al., 2019b;de Vries and Simmers, 2002;van Lanen et al., 2013). Here, we seek to validate results inferred from GRACE data with reference to the physical characteristics of specific aquifer systems. In order to categorise the hydrogeology of each aquifer system, a number of available global datasets were sourced as raster files and interrogated in QGIS using the aquifer vector files from WHYMAP LASW. Examined datasets include the following: As defined above, the GRT is a temporal measure of the latency of aquifer systems that is derived from their scale and physical properties via Eq. (4). This measure relies on the other datasets listed for its calculation (Cuthbert et al., 2019b). K and are high-resolution datasets derived from recently developed lithological maps of the Earth's surface (Hartmann and Moosdorf, 2012) and their computation uses established geological parameters (Gleeson et al., 2014). However, K is based on permeability mapping from hydrolithologies that have a standard deviation of ∼ 2 orders of magnitude (Gleeson et al., 2011), and this variance underlies the uncertainty in each of these datasets used. WTD is a 30 arcsec (∼ 1 km) resolution dataset compiled from available observational data extended by modelled interpolation with both of these data sources being subject to considerable sampling bias and model uncertainty respectively (Fan et al., 2013). All of these datasets are global and derived from combinations of observations and modelled data.

Results
The main results for each aquifer system are given as a monthly time series of TWS and GWS vs. PCP and an annual time series of GWS vs. PCPA and PCPA, shown as Fig. 2a-r for dryland systems and Fig. 3a-j for humid systems. The outcomes are summarised in Table 3. As a general result, all time series plots show a qualitative relationship between GWS and PCP that exhibits interesting and potentially important spatio-temporal variations. The quantitative results show that for GWS there is a strong correlation with annual PCPA for aquifer systems in dryland environments, whereas in humid environments, the strongest correlation is with monthly PCP. Three aquifers -Guarani aquifer, Indus River Basin and Canning Basin -do not follow this general classification, and anomalies are discussed further in Sect. 4.3 below and in the Supplement. GRT, shown in Fig. 4, is a measure of the time it takes for an aquifer system to equilibrate after a change in boundary conditions, as discussed above. For the 14 studied aquifers, it extends from centennial to millennial timescales as indicated from median values reported in Tables 4 and S1 (in the Supplement). For humid aquifers, GRT ranges from 100 to 350 years, whereas for dryland systems GRT then escalates to values well in excess of 1,000 years for semi-arid and arid basins; the sub-humid North China Plain aquifer has a GRT of ∼ 550 years. This order of magnitude point of transition can be identified as the threshold between sensitive (rapid) and insensitive (slow) aquifer response times (Cuthbert et al., 2019b), which show a broad global relationship with aridity. This observation helps to explain groundwater storage responses to climate variability through the memory of the aquifer system defined by both physical characteristics and geographical location. The role of HM is discussed further in Sect. 4.1.
Presented results represent the outcome of a detailed analysis of the available datasets and, as such, contain important assumptions that need to be acknowledged here. Firstly, the allocation of lag time has been done on a "best fit to the TWS data" basis. It is therefore not derived from analysis of intrinsic physical characteristics of the aquifer systems but is consistent with the range of theoretical values derived from hydrodynamic first principles that anticipate a maximum lag time of 3 months for systems with a large GRT (Townley, 1995), as has been observed by Ahmed et al. (2011). Time lags have been tested for consistency through the alignment of specific events in the various time series (von Storch and Zwiers, 2001). The evident anomaly of a 7-month lag time for the Karoo Basin is discussed in the Supplement. Secondly, the restricted duration of the GRACE dataset should be acknowledged, particularly with regard to the annual time series. In mitigation, statistical significance appears to be robust when tested using the methods described by Zwiers and von Storch (Zwiers and von Storch, 1995), and the use of PCPA and PCPA datasets is designed to minimise the effect of seasonal climate and short-term trends in GWS (Craddock, 1965). Thirdly, the use of Eq. (3) to derive GWS from GRACE TWS data represents a temporal and spa-tial approximation in representing sub-surface hydrological processes. Simply put, all water below the soil zone neither necessarily comprises GWS nor will it all eventually reach GWS due to lateral flow processes. However, on the scale of the aquifer systems considered here, the use of Eq. (3) is a reasonable approximation (de Vries and Simmers, 2002).

Role of hydraulic memory
A key finding of this study is that GRACE-derived GWS correlates most strongly with annual PCPA for large-scale aquifers in dryland environments of the tropics and subtropics, whereas GRACE-derived GWS correlates most strongly with monthly PCP in humid environments at these latitudes. Further, we show that there is correspondence between the annual PCPA vs. GWS correlations and GRTs of large-scale aquifer systems (Table 4); the latter is a measure derived in accordance with Eq. (4) (Cuthbert et al., 2019b). HM ultimately derives from the physical properties of the saturated portion of the aquifer system (Townley, 1995), and system memory as measured by Eq. (5) is representative of the physical properties of an aquifer system and its climate. Von Asmuth and Knotters (2004) use four parameters to describe groundwater dynamics in their transfer function (τ in Eq. 5) that they argue represents a more accurate description of the physical system than previously used parametric methods (von Asmuth and Knotters, 2004). Further, their description of groundwater dynamics is capable of accommodating non-stationary elements such as climate change and groundwater abstraction (von Asmuth and Knotters, 2004). HM as measured by Eq. (5) is therefore representative of both spatial and temporal variability in aquifer systems, but HM itself can vary spatio-temporally. Indeed the response time to a given boundary change can vary according to the physical circumstances, with persistence lasting from months to hundreds of thousands of years (Cuthbert et al., 2019b).
In this study, the GRACE dataset is not long enough to allow detailed IRF modelling of aquifer systems based on GWS data, which would require an observational record longer than the system memory (Long and Mahler, 2013). An extended GRACE series, together with reduced uncertainty in the permeability dataset from which GRT is derived, may generate closer numerical matches between GRT (Eq. 4) and  HM as measured by the method of this study (Eq. 5). Nevertheless, we show that aquifer responses to anomalous precipitation, discussed below, exhibit long HM in dryland environments and relatively short HM in humid environments. The correspondence with GRT extends the classification to two broad categories: dryland environment/long HM/slow GRT and humid environment/short HM/rapid GRT. Note that these categories represent a simplification of the correspondence between HM derived from the study datasets and GRT, which in fact exhibits a spectrum in which Umm Ruwaba (dryland), Congo Basin and Maranhão (both humid) occupy an intermediate position in terms of the correlation between GWS and annual PCPA, as can be seen in Table 3. Aquifers in humid environments, with the exception of the Congo Basin, generally exhibit less HM in this study than expected from GRT values. These humid aquifers, as can be seen in Fig. 4, have some of their area with GRTs in the order of years to tens of years, perhaps meaning that a disproportionate amount of groundwater processes may be moving through these lower GRT areas. This may explain why humid regions have less HM overall than is implied by their median GRT.  Table 3, where TWS ( GWS) lags PCP (PCPA) by the specified number of months. Y -axis units are equivalent water height (EWH) in centimetres. Note the variation in the y-axis scales. Seven of the annual PCPA data series have been scaled by a factor of 10 for clarity where indicated.

Aquifer responses to anomalous precipitation
The annual time series of GWS vs. PCPA for each aquifer have been examined to identify years in which the maximum annual increase in GWS occurred, as identified by the steepest positive gradient of the GWS line (Table 5). These years of extreme recharge, inferred from the increase in GWS, are then further categorised by whether (1) prior to the event PCPA was negative, indicating anomalously dry conditions when soil moisture deficits (SMDs) are likely to be widespread, and (2) the PCPA is concurrently shifting from a negative to positive cumulative anomaly associated with an extreme rainfall event. Finally, the NINO3.4 index for 2002(B. Huang et al., 2015 has been examined (KNMI Climate Explorer, 2018) to indicate the state of ENSO, the dominant control on equatorial precipitation, at the time of the recharge. Nearly all recharge events in dryland aquifer systems take place at a time of negative PCPA (likely SMD) with most coinciding with extreme rainfall as recently observed in a pan-African study by Cuthbert et al. (2019a). Extreme recharge events also generally coincide with El Niño and La Niña events indicating an association with large-scale modes of climate variability identified previously in tropical Africa (Kolusu et al., 2019;Taylor et al., 2013a, b). In contrast, extreme recharge in humid aquifer systems is not consistently associated with either negative PCPA (likely SMD) or anomalous rainfall, though the latter is correlated with ENSO state.  Table 3, where TWS lags PCP by the specified number of months. Y -axis units are equivalent water height (EWH) in centimetres. Note the variation in the y-axis scales. The Congo Basin annual PCPA data series has been scaled by a factor of 10 for clarity.

Anomalous trends in groundwater storage
Over the SRP determined by the availability of GRACE data, six aquifer systems show a net decline in groundwater storage: California Central Valley, North China Plain, and the Maranhão, Ganges, Indus and Canning basins. Of these, two aquifer systems (Indus River and Canning basins) do not show a strong correlation between GWS and any of the precipitation data series. Table 4 shows that these same two aquifers do not fit the general classification of the 14 aquifer systems into either dryland/slow GRT/long HM or humid/rapid GRT/short HM systems. These anomalous characteristics may reflect groundwater storage decline through the escalation of groundwater abstraction referenced previously (Wada et al., 2014), and this hypothesis was tested through further analysis as follows below and in further detail in the Supplement.
The Indus River and Canning basins superficially present similar stories of groundwater storage decline, yet contextual analysis of their respective GRACE/CLM GWS datasets reveals two quite different realities. The Indus River Basin supports a population of ∼ 210 million people (Immerzeel et al., 2010), and its hydrology is strongly influenced by water supply from upstream of the basin, much of it intended for irrigation (Immerzeel et al., 2010). Surface water is augmented by groundwater abstraction, which supplies ∼ 31 % of the total irrigation demand, but it has been estimated that ∼ 84 % of the groundwater abstracted returns to the aquifer system as leakage from canals and intensively irrigated fields (Cheema et al., 2014). A net calculation of these effects on GWS, which is detailed in the Supplement, shows that the underlying climate-groundwater dynamics are consistent with the GRT derived from the regional-scale hydrogeology of the aquifer system. In contrast, the Canning Basin is sparsely populated and is not a centre of agriculture (Richey et al., 2015). It is, however, a source of freshwater for iron-ore extraction in adjacent areas (Western Australia Department of Water, 2011), and very little of the abstracted groundwater is returned to the aquifer system as its use in mining causes it to become contaminated (Western Australia Department of Water, 2013). This contaminated groundwater is subsequently disposed in the sea or evaporation ponds (Prosser et al., 2011). The Canning Basin has a very slow GRT and, situated in an arid environment, is subject to low rates of groundwater recharge so that the physically sustainable rate of groundwater abstraction is expected to be very low (Scan-lon et al., 2006). The analysis of the Indus and Canning basins is evidence of how groundwater depletion, which has been reported elsewhere (e.g. Famiglietti, 2014;Rodell et al., 2009), impacts relationships between precipitation and GWS.

Conclusions
Strong correlations are found between GRACE-derived annual GWS and PCPA for large-scale aquifer systems in dryland environments. This correlation is much weaker for large-scale aquifer systems in humid zones where a stronger correlation generally exists between monthly GWS and monthly PCP. We propose that the correlation between annual GWS and PCPA demonstrates the existence of hydraulic memory which is central to large-scale climategroundwater dynamics. For the studied aquifer systems, the measure of correlation between annual GWS and PCPA also shows very good correspondence with the groundwater response time, a measure of the hydraulic memory of an aquifer system derived from its regional-scale hydrogeological and catchment properties (Cuthbert et al., 2019a). The 14 aquifer systems can be broadly categorised into two groups with each group listed in ascending order of groundwater response time: -Group 2: humid/short HM/rapid GRT: the Amazon, Ganges, Congo, Guarani and Maranhão basins.
Aquifer systems in Group 1 may be less sensitive to seasonal climate variability but also vulnerable to long-term trends from which they will be slow to recover. In contrast, aquifers  (7) 0.25 (7) 0.07 (7) 0.21 (7) 0.71 (7) 0.88 (7)  in Group 2 may be more sensitive to seasonal climate disturbances such as ENSO-related drought but may also be relatively quick to recover. These characteristics can be applied to anticipate the groundwater response to present conditions and to future pressures that can be expected from anthropogenic climate change (Taylor et al., 2013a). The results from the analysis of GRACE data are reconciled to regionalscale hydrogeological conditions, which gives confidence in their validity (Beven and Germann, 2013) albeit with the caveat regarding the uncertainties inherent in all the datasets used (Wilks, 2016).
The new GRACE Follow-On (GRACE-FO) project has now been launched (Frappart and Ramillien, 2018;Tapley et al., 2019), providing an opportunity to augment the existing GRACE TWS dataset without recourse to modelling (Ahmed et al., 2019) and to give greater certainty in linking climate-groundwater dynamics to decadal and longer timescale climate systems including the Pacific Decadal Oscillation and Atlantic Multidecadal Oscillation (Wunsch ,  Table 4. Relationship between aridity index, climate and regional-scale hydrogeology: data linking climate and regional-scale hydrogeology to GW dynamics. Italicised results fall below t-test threshold.  1999). An extended dataset will improve the calibration of HM as it relates to specific aquifer systems, providing a robust context for monitoring GWS, including groundwater decline, in real time and protecting fundamentally important groundwater resources.
Data availability. Data generated and used in this study can be made available upon request to the corresponding author.
Author contributions. SO led the analysis of datasets originally compiled by MS and MC, supplemented by datasets developed by SO. RT, CB and MS contributed to the original design of the study with key modifications made by SO. SO drafted the paper with input from RT. All authors contributed to, and commented on, revisions to the submitted paper.
Competing interests. The authors declare that they have no conflict of interest.
Review statement. This paper was edited by Daniel Kirk-Davidoff and reviewed by Bridget Scanlon and one anonymous referee.