Improving terrestrial carbon flux simulations with machine learning and global Earth observations

Seiler, Christian

doi:10.5194/esd-17-651-2026

Articles | Volume 17, issue 3

https://doi.org/10.5194/esd-17-651-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/esd-17-651-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 17, issue 3

Research article

|

01 Jun 2026

Research article |

| 01 Jun 2026

Improving terrestrial carbon flux simulations with machine learning and global Earth observations

Christian Seiler

Download

Final revised paper (published on 01 Jun 2026)
Preprint (discussion started on 16 Jun 2025)

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-2517', Anonymous Referee #1, 01 Aug 2025

This study applies a machine learning-based Genetic Algorithm (GA) and multiple global Earth observation datasets to systematically optimize poorly constrained parameters in the CLASSIC land surface model. The optimization is conducted over a long historical period (1701–2020), simultaneously targeting multiple variables and using multiple observational data streams, aiming to improve historical simulation performance and assess future terrestrial carbon fluxes under the SSP5-8.5 scenario. Despite these strengths, several issues may limit the scientific impact and clarity of the manuscript. My detailed comments are as follows:
L233: The global representativeness of the randomly selected 160 grid cells should be evaluated. These cells may not capture regional differences or small-scale processes, and if the selected grids differ substantially from the target regions, the optimized parameters may not be suitable for local applications. While the 160 grids were randomly selected, it is not stated whether multiple random samplings were performed to test the stability of results. Different random seeds could lead to different optimal parameter sets.
Using the same set of observational data for both fitness evaluation and parameter optimization lacks an independent validation set or cross-validation. This may result in good performance on the training data but poor generalization capability.
L235: The computation time of two weeks is substantial, yet the manuscript does not specify the convergence criteria, number of iterations, or early stopping strategy, raising concerns about potential waste of computational resources. If the solution space is large, GA may still remain trapped in suboptimal solutions.
L253–258: Are the six land surface variables (ALBS, GPP, HFLS, HFSS, LAI, LST) weighted equally in the cost function? Different variables may differ greatly in importance (e.g., GPP is more critical for the carbon cycle), but the manuscript does not explain how weights were assigned.
L270–272: The robustness analysis was conducted with fewer grid cells, a shorter time period, and fewer generations. The representativeness of these reduced settings should be discussed in the manuscript.
L299: The finding that model performance stops improving after 25 generations may be due to GA parameter settings. This should be considered and discussed.
L315: The statement that “some variables did not improve” is made without analyzing the possible causes. This could be due to structural model errors rather than parameter settings, or uncertainties in the observational datasets. The discussion should include potential reasons and possible future improvements.
L338: Although the optimized simulation is slightly better than the default in some statistical metrics, the differences are described as “too minor to be considered meaningful.” The manuscript should discuss why optimizing 28 parameters results in only limited improvement in NBP, which may be related to observation errors, insufficient parameter representativeness, or model structural deficiencies.
L385: While two GA configurations were found to perform better than the default, the manuscript does not analyze their characteristics (e.g., differences in selection/crossover/mutation strategies) or why they perform better. Such analysis would help in better understanding the influence of GA settings on optimization results.
In the main text, some figures and tables could be moved to the supplementary materials to improve readability, such as Figures 1, 2, 7 and Tables 1, 2.

Citation: https://doi.org/10.5194/egusphere-2025-2517-RC1
- AC1: 'Reply on RC1', Christian Seiler, 20 Aug 2025
  
  I thank the reviewer for their thoughtful and constructive feedback on my manuscript. Please find my point-by-point responses below.
  REVIEWER: This study applies a machine learning-based Genetic Algorithm (GA) and multiple global Earth observation datasets to systematically optimize poorly constrained parameters in the CLASSIC land surface model. The optimization is conducted over a long historical period (1701–2020), simultaneously targeting multiple variables and using multiple observational data streams, aiming to improve historical simulation performance and assess future terrestrial carbon fluxes under the SSP5-8.5 scenario. Despite these strengths, several issues may limit the scientific impact and clarity of the manuscript. My detailed comments are as follows:
  L233: The global representativeness of the randomly selected 160 grid cells should be evaluated. These cells may not capture regional differences or small-scale processes, and if the selected grids differ substantially from the target regions, the optimized parameters may not be suitable for local applications. While the 160 grids were randomly selected, it is not stated whether multiple random samplings were performed to test the stability of results. Different random seeds could lead to different optimal parameter sets.
  ANSWER: I completed the optimization for a single set of randomly selected grid cells. Whether a different selection of grid cells will lead to substantially different parameters values depends on how representative the sample size is. The sample size is based on computational limits rather than representativity. I will address this comment by conducting additional optimizations using a different selection of grid cells. Given the computational expense, I will only be able to provide few additional optimization experiments.
  REVIEWER: Using the same set of observational data for both fitness evaluation and parameter optimization lacks an independent validation set or cross-validation. This may result in good performance on the training data but poor generalization capability.
  ANSWER: The optimization is performed for 160 grid cells, while the evaluation shown in Figure 6 includes all 2,444 grid cells. Thus, only about 7% of the grid cells used in the evaluation were also included in the tuning process. Therefore, the evaluation results are largely driven by grid cells that were not part of the optimization.
  REVIEWER: L235: The computation time of two weeks is substantial, yet the manuscript does not specify the convergence criteria, number of iterations, or early stopping strategy, raising concerns about potential waste of computational resources. If the solution space is large, GA may still remain trapped in suboptimal solutions.
  ANSWER: This information is shown in Figure 4 and described in the text (L304). The figure indicates that I used 25 generations with a population size of 100 chromosomes. This corresponds to 25 x 100 = 2500 simulations for 160 grid cells. I will add this information to the text to make it more explicit.
  
  The improvement in performance decreases from generation to generation, and Figure 4 illustrates that very little gain can be expected after generation 25. One might argue that computational time could have been saved by stopping the optimization after generation 15. However, this is not evident unless additional simulations are conducted that demonstrate diminishing progress. While I am confident that the solution could be improved by adding more iterations, I believe that the cost–benefit ratio would become too large.
  
  It is possible that the solution represents a local rather than a global optimum. However, I would like to emphasize that the method I chose is less prone to being trapped in local optima due to the use of populations. Even if the result does reflect a local optimum, it is still superior to the default solution. Finally, if systematic parameter optimization is not conducted, parameter values must be hand-tuned - a cumbersome approach that is far more likely to result in a suboptimal solution.
  REVIEWER: L253–258: Are the six land surface variables (ALBS, GPP, HFLS, HFSS, LAI, LST) weighted equally in the cost function? Different variables may differ greatly in importance (e.g., GPP is more critical for the carbon cycle), but the manuscript does not explain how weights were assigned.
  ANSWER: Yes, I assign all variables equal weight. I have considered weighting them differently, but that immediately raises the question of which criteria should determine the weights. One could argue that GPP is more critical for the carbon cycle, but the carbon, energy, and water cycles are all coupled and must remain consistent. It could also be argued that larger weights should be assigned to variables with lower observational uncertainty, but such uncertainties are difficult to quantify. In my view, defining weights opens the door to very subjective discussions that I would prefer to avoid. From my perspective, all aspects of the carbon, water, and energy fluxes should be considered equally important. I will add this argument to the text.
  REVIEWER: L270–272: The robustness analysis was conducted with fewer grid cells, a shorter time period, and fewer generations. The representativeness of these reduced settings should be discussed in the manuscript.
  ANSWER: Agree, I will either raise this limitation in the discussion section, or replace this part of the analysis with the additional experiments using a different selection of grid cells, as outlined above.
  REVIEWER: L299: The finding that model performance stops improving after 25 generations may be due to GA parameter settings. This should be considered and discussed.
  ANSWER: Optimizing the optimization process is challenging given the large number of different possible combinations of selection, crossover, and mutation functions and corresponding hyperparameters. I briefly raise the issue in Line 425 and will expand on this in the revised version of the manuscript.
  REVIEWER: L315: The statement that “some variables did not improve” is made without analyzing the possible causes. This could be due to structural model errors rather than parameter settings, or uncertainties in the observational datasets. The discussion should include potential reasons and possible future improvements.
  ANSWER: Agree, I will include this in the revisions.
  REVIEWER: L338: Although the optimized simulation is slightly better than the default in some statistical metrics, the differences are described as “too minor to be considered meaningful.” The manuscript should discuss why optimizing 28 parameters results in only limited improvement in NBP, which may be related to observation errors, insufficient parameter representativeness, or model structural deficiencies.
  ANSWER: Please note that the optimization significantly improves model performance, particularly for gross primary productivity, leaf area index, and sensible heat flux. The model was not optimized for NBP, as no reliable globally gridded observational NBP data sets are available. My hope was that improving other surface variables would lead to global NBP values more consistent with global observations (i.e. globally accumulated NBP, which is reasonably well constrained). This hope was somewhat disappointed. Interestingly though, the NBP from the optimized run differs considerably from that of the default run, but the overall improvement is too minor to be meaningful. The limited improvement arises because NBP was not included in the optimization. I address how this limitation could be overcome in future studies (L419), namely by replacing the model with a much faster statistical emulator and by optimizing this emulator for global NBP. I will expand on this point in the revised manuscript.
  REVIEWER: L385: While two GA configurations were found to perform better than the default, the manuscript does not analyze their characteristics (e.g., differences in selection/crossover/mutation strategies) or why they perform better. Such analysis would help in better understanding the influence of GA settings on optimization results.
  ANSWER: I agree. I will either discuss the differences or replace this part of the analysis with the additional experiments using a different selection of grid cells, as outlined above. The analysis shown in Figure 11 is based on a much shorter optimization period and a smaller sample of grid cells. If I conduct multiple optimizations with the same settings as in the final optimization, then the results from that analysis will be directly comparable.
  REVIEWER: In the main text, some figures and tables could be moved to the supplementary materials to improve readability, such as Figures 1, 2, 7 and Tables 1, 2.
  ANSWER: I will carefully revisit what figures and tables should be in the main text.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2517-AC1
RC2:
'Comment on egusphere-2025-2517', Anonymous Referee #2, 20 Sep 2025

The paper proposes a Genetic Algorithm-based framework for optimizing parameters in the CLASSIC land surface model, using multiple global Earth observation datasets. It finds that the optimized parameters significantly improve key variables including GPP, LAI, and sensible heat fluxes. The paper is generally well-written and is suitable for publication after addressing the following comments.
Major comments
1. The author notes that multiple datasets are used per variable "to reduce the risk of overfitting" and "help account for observational uncertainty". However, it seems like the paper does not rigorously incorporate observational uncertainties into the optimization. A more rigorous treatment, or discussion on this, of observational uncertainty would strengthen the robustness of the conclusions.
2. I am particularly concerned about the generalizability of the optimized parameters, which the paper does not fully address. Since the optimization uses Earth observations from the modern climate, it remains unclear whether these parameter values will remain valid under future climate conditions, potentially limiting the robustness of the projections. A discussion of this limitation can strengthen the manuscript.
3. The author acknowledges that the optimization is evaluated only in offline mode, with prescribed CO2 and meteorological forcing, and notes that a fully coupled setup would alter NBP feedbacks. It would strengthen the paper if this limitation can be emphasized more clearly in the conclusions, with a brief discussion of how coupled feedbacks might influence the results.
Minor comments
L300: Figure 5a
L304: Figure 5b
Figure 10: caption does not mention (g) and (h)
Maybe Figures 2 and 7 can be moved to supplementary materials.

Citation: https://doi.org/10.5194/egusphere-2025-2517-RC2
- AC2: 'Reply on RC2', Christian Seiler, 29 Sep 2025
  
  I thank the reviewer for their thoughtful and constructive feedback on my manuscript. Please find my point-by-point responses below.
  REVIEWER: The paper proposes a Genetic Algorithm-based framework for optimizing parameters in the CLASSIC land surface model, using multiple global Earth observation datasets. It finds that the optimized parameters significantly improve key variables including GPP, LAI, and sensible heat fluxes. The paper is generally well-written and is suitable for publication after addressing the following comments.
  ANSWER: Thank you for your positive evaluation of the manuscript.
  Major comments
  REVIEWER: 1. The author notes that multiple datasets are used per variable "to reduce the risk of overfitting" and "help account for observational uncertainty". However, it seems like the paper does not rigorously incorporate observational uncertainties into the optimization. A more rigorous treatment, or discussion on this, of observational uncertainty would strengthen the robustness of the conclusions.
  ANSWER: There are several sources of uncertainty relevant to this study, including uncertainties in the model forcing data, model configuration, parameter ranges, grid-cell selection, optimization period, optimization algorithm, and hyperparameters. Observational uncertainty is therefore only one of many contributing factors. A particular challenge is that the uncertainty of observation-based products is often poorly documented, as highlighted in many parameter optimization studies. Moreover, there is no community-wide consensus on how best to represent observational error. To address your comment, I propose to discuss the different methods that have been used and the strengths and weaknesses of the approach adopted in my manuscript.
  REVIEWER: 2. I am particularly concerned about the generalizability of the optimized parameters, which the paper does not fully address. Since the optimization uses Earth observations from the modern climate, it remains unclear whether these parameter values will remain valid under future climate conditions, potentially limiting the robustness of the projections. A discussion of this limitation can strengthen the manuscript.
  ANSWER: I think it is important to acknowledge that whenever a new parameterization is introduced in a model, developers typically select parameter values within an uncertainty range so that the model output matches observations from the modern climate. This kind of ad hoc tuning is common practice, and your criticism applies equally to it. Replacing ad hoc tuning with a more systematic approach is not different in principle - it is simply far more effective. I suggest emphasizing this point more strongly in the discussion section.
  REVIEWER: 3. The author acknowledges that the optimization is evaluated only in offline mode, with prescribed CO2 and meteorological forcing, and notes that a fully coupled setup would alter NBP feedbacks. It would strengthen the paper if this limitation can be emphasized more clearly in the conclusions, with a brief discussion of how coupled feedbacks might influence the results.
  ANSWER: I agree and will elaborate on this in the Discussion section.
  Minor comments
  REVIEWER: L300: Figure 5a
  ANSWER: Yes, thank you. I will change 4a to 5a.
  REVIEWER: L304: Figure 5b
  ANSWER: Yes, thank you. I will change 4b to 5b.
  REVIEWER: Figure 10: caption does not mention (g) and (h)
  ANSWER: Yes, I will add the description of (g) and (h) in the caption.
  REVIEWER: Maybe Figures 2 and 7 can be moved to supplementary materials.
  ANSWER: I will carefully revisit the selection of the figures that will go into the main text.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2517-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Reconsider after major revisions (13 Oct 2025) by Anping Chen

AR by Christian Seiler on behalf of the Authors (24 Jan 2026) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (13 Feb 2026) by Anping Chen

RR by Anonymous Referee #1 (26 Feb 2026)

RR by Anonymous Referee #2 (04 Apr 2026)

ED: Publish subject to minor revisions (review by editor) (10 Apr 2026) by Anping Chen

AR by Christian Seiler on behalf of the Authors (13 Apr 2026) Author's response Author's tracked changes Manuscript

ED: Publish as is (15 May 2026) by Anping Chen

AR by Christian Seiler on behalf of the Authors (15 May 2026)

Short summary

This study shows how machine learning combined with global Earth observations can improve simulations of the land carbon cycle. Optimizing key model parameters enhances the accuracy of historical carbon fluxes, while machine-learning tools help assess the robustness of these results in the presence of compensating parameter effects. The findings demonstrate that parameter optimization strongly influences simulated carbon fluxes, highlighting its importance for improving climate models.

Improving terrestrial carbon flux simulations with machine learning and global Earth observations

Download

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Suggestions for revision or reasons for rejection