Significant loss of soil inorganic carbon at the continental scale

Abstract Widespread soil acidification due to atmospheric acid deposition and agricultural fertilization may greatly accelerate soil carbonate dissolution and CO2 release. However, to date, few studies have addressed these processes. Here, we use meta-analysis and nationwide-survey datasets to investigate changes in soil inorganic carbon (SIC) stocks in China. We observe an overall decrease in SIC stocks in topsoil (0–30 cm) (11.33 g C m–2 yr–1) from the 1980s to the 2010s. Total SIC stocks have decreased by ∼8.99 ± 2.24% (1.37 ± 0.37 Pg C). The average SIC losses across China (0.046 Pg C yr–1) and in cropland (0.016 Pg C yr–1) account for ∼17.6%–24.0% of the terrestrial C sink and 57.1% of the soil organic carbon sink in cropland, respectively. Nitrogen deposition and climate change have profound influences on SIC cycling. We estimate that ∼19.12%–19.47% of SIC stocks will be further lost by 2100. The consumption of SIC may offset a large portion of global efforts aimed at ecosystem carbon sequestration, which emphasizes the importance of achieving a better understanding of the indirect coupling mechanisms of nitrogen and carbon cycling and of effective countermeasures to minimize SIC loss.


Soil data
Soil samples were gathered from 13769 sites across mainland China in the 1980s (n=7299), 2000s (n=774) and 2010s (n=5696) (Supplementary Fig. S1). Soil data in the 1980s were manually digitalized from 30 hard-copy monographs (Supplementary Table   S5) of the Second National Soil Survey (1979)(1980)(1981)(1982)(1983)(1984). This subset might be the most comprehensive legacy soil data [1][2][3], in which detailed pedological information and physicochemical properties were recorded. Soil data for the islands of the South China Sea, Macao, Hong Kong and Taiwan were not available.
We conducted the national field resampling campaign from 2009 to 2019 across different ecosystems and collected 23535 soil samples [4]. For the soil sampling design, the soil pedons in the 1980s were taken as control plots. The sampling locations were mainly guided by the sites and soil-forming information of the 1980s' points to detect the spatiotemporal soil variation. Therefore, the sampling sites in the 2010s were located adjacent to the sampling sites in the 1980s. A 1×2 m plot was laid out before digging, and soil profiles were dug to 1 m soil depth or to fresh bedrock (< 1 m). Soil samples (~1 kg) of each horizon were taken. Morphologic features of genetic horizons, as well as photos of soil profiles and corresponding landscapes were recorded in the field according to the Soil Taxonomy guidelines. The soil samples collected across different ecosystems differed in their edaphic properties.
We compiled soil inorganic carbon (SIC) data from publications from 2000 to 2010 via "China National Knowledge Infrastructure" (https://www.cnki.net/) and "Web of Science" (http://www.isiknowledge.com/). Various keywords related to this study were used, such as "China", "cropland", "grassland", "forest", "soil inorganic carbon", "carbonate", "deep soil", "arid soil" and "soil taxonomy". A total of 145 articles and dissertations were screened from 75 publications, and 774 average values were obtained from 3,612 sampling sites (Supplementary Table S6). If the soil sampling time was not given, the manuscript submission time was used. The median time was employed if the sampling campaign was conducted over multiple years. Soil data in graphs were digitalized with ScanIt 2.0.7 (AmsterCHEM, Netherlands). Since carbonate was not always found in noncalcareous soils, SIC data were rarely reported in publications.
Therefore, a linear function (R 2 =0.88) was used to convert SIC data measured by different methods [5].
After air-drying, soil samples were sieved through a 2-mm mesh. The carbonates include MgCO3, CaCO3 and other carbonate minerals, and CaCO3 is usually dominant [6]. CaCO3 concentrations of the 1980s and 2010s subsets were measured with acid dissolution (pressure calcimeter method) [2]. A coefficient of 0.12 was used to convert the CaCO3 concentration to SIC. Bulk density (BD) was analyzed with the core method. SIC density (SICD; kg C m -2 ) was computed by aggregating SIC concentration (g kg -1 ), BD (g cm -3 ), the volume of coarse fragment (Gr) (%), and thickness (Th, m) in each soil layer: The sum of SICD in the 0-30 cm soil was used to represent topsoil SICD at a site.
The BD data were unavailable for many samples, and several published linear and nonlinear pedotransfer functions were fitted. For soils with soil organic carbon (SOC) greater than 10 g kg -1 , the most accurate formula, with an R 2 of 0.45, was used: BD = 1/(0.665+0.00719×SOC). The study area was divided into six agroecological zones ( Supplementary Fig. S1) that are widely used in soil carbon pool studies in China [7,8].
The mean values of BD under different land use types were calculated for each zone and were used for soils with SOC less than 10 g kg -1 . There were few data on exchangeable calcium (Exch. Ca) in the 1980s and 2010s, most of which were from locations in Hubei, Henan, Sichuan and Guangdong provinces, in the south-central and southwestern zones.

Data analysis
The repeated sampling sites in the 2010s were not perfectly matched to those in the 1980s due to human disturbance (land use change) and the lack of accurate longitudinal and latitudinal information on legacy points. We established the paired contrasts from samples in the 1980s and 2010s, followed by the paired t-test to evaluate their difference.
Only samples in cropland, forestland and grassland were considered. First, samples in the 1980s were taken as control plots. Samples in which carbonate was not found were excluded from the analysis. We conducted a spatial analysis of the location in ArcGIS 10.2 (ESRI Inc., USA) to calculate the buffer areas of sites in the 1980s using a given distance (50 km). This step generated a polygon map with several circles whose centers were the points in the 1980s. Then, the points in the 2010s within these circles were taken as paired points if their pedological information (e.g., land use and parent material) was the same as that of the centered points in the 1980s. The maximum distance between paired points was less than 50 km, and the mean distance between all pairs was 22.6 km. This distance threshold was sufficient to perform the paired t-test analysis at the national scale, as all the paired sites were characterized with the same land management practices. We also calculated the spatial autocorrelation ranges (~300 km) [9], which were much greater than 22.6 km. Finally, a total of 2299 paired samples were grouped in the local area ( Fig. 1). For comparison, we also performed paired t-tests based on paired samples between which the maximal distance ranged from 10 km to 50 km ( Supplementary Fig. S2).
In addition, changes in SICD in the 1980s, 2000s and 2010s were also assessed by unpaired t-tests due to the different numbers and locations of soil data. The skewed data ( Fig. 1) were square-root transformed to a normal distribution prior to the t-test. Since the data on Exch. Ca were few, we used an unpaired t-test to examine whether there was a significant increase in Exch. Ca.
For the pathway analysis (Fig. 2), the information on erosion rates, leaching and accumulation rates, and NHx and NOy deposition from 1970 to 2010 was derived from [10], [11] and [12], respectively. According to the Third National Soil Erosion Survey [10], the rate of annual soil erosion was approximately equal to a decrease in soil depth of 0.3~1.0 cm. The overall changes in soil pH for cropland, forest and grassland were obtained from [13], [14] and [15], respectively. The national fertilizer consumption data between 1980 and 2010 were provided by the National Bureau of Statistics of China (http://www.stats.gov.cn/). The average wind erosion in China significantly declined from 1990 (12.32 t ha -1 yr -1 ) to 2015 (6.60 t ha -1 yr -1 ) [16], most of which occurred in arid and semiarid regions, half of which fell out beyond [17]. Therefore, we recalculated the average wind erosion rates according to the SIC values, soil wind erosion modulus [16] and the proportions of land use in different regions.

Estimation of soil inorganic carbon stock
Widely used digital soil mapping techniques were adopted to model the spatiotemporal pattern of SICD. It was assumed that soil variation can be modeled as a function of soilforming factors, such as climate and terrain variables [18]. Four algorithms were considered to establish predictive models: geographically weighted regression, random forest (RF), quantile regression forest and extreme gradient boosting. An ensemble prediction was adopted if several techniques achieve similar accuracy (e.g., the difference in R 2 values of two models was less than 5%). In different predictive cases, RF obviously outperformed other algorithms ( Supplementary Fig. S6) and was used to generate SICD maps (Supplementary Figs. S5 and S8). The RF algorithm was employed with the R package "randomForest" [19].
The significance of deep soil C should not be neglected. Approximately 47% of SOC is located in subsoils (30-100 cm) [20], and considerable SIC is stored deeper than 1 m [21,22]. The SIC stocks in the topmost meter in China are estimated to be 53.3-77.9 Pg C [2,3,23], which are slightly less than those of SOC, but the SIC stocks of soil profiles (up to 2-3 m) could be 234.2 Pg C, which are much more than the SOC pool (147.9 Pg C) [2]. Here, we also had interest in the SIC pool in subsoils (>30 cm). Since many soil pits were dug at soil depths less than 2 m, we fitted the depth functions for samples missing values down to 3 m depth. In addition to compiling SIC data from publications published during the 2000s, we synthesized SIC data for which the sampling depths were deeper than 2 m from publications published between 2011 and 2020 (Supplementary Table S7). Soil profiles with sampling depths deeper than 1 m were used to fit the depth functions of SIC mass density as follows: where SICMD is the SIC mass density (kg C m -3 ), dep is the depth of the soil sample (m), and a1, b1, a2, b2, a3, b3, a4, b4, and c4 are the parameters of the exponential function (fexp), power function (fpower), logarithmic function (floga) and second-order polynomial function (fpoly). The depth functions were separately fitted in each land use of China's zones divided by geography, climate and agroecology [2,3,7]. After a thousand rounds of fitting, the models fitted regarding geographic zones [2] were the most accurate. In general, topsoils are very sensitive to environmental changes, and therefore, many studies on SIC dynamics have been performed on topsoil [1,24]. The depth function may involve uncertainty ( Supplementary Fig. S7). Therefore, topsoil (0-30 cm) SICD maps were generated for the 1980s and 2010s, and SICD at soil depths of 0-1 m, 1-2 m and 2-3 m was interpolated for the 2010s. The total SIC stocks greatly depended on soil depth. Hence, the ensemble means of three published soil depth maps [25][26][27] were used to mask the produced soil maps. The soil mapping for each case was performed 100 times, in which resampling without replacement was adopted to select 90% of samples for prediction, and the rest was used for validation during each run.
The mean values and standard deviations (STDs) of predictions were used to assess the uncertainty in soil mapping. Regarding the large computation requirement, we developed a new package of "ParallelDSM" [28] for parallel computing of soil mapping.

Data-driven future projection
To quantify the long-term SIC changes, trained predictive models were used to perform projections over the 21st century by incorporating dynamic environmental conditions.
The predictive model in the 2010s could be extrapolated to future scenarios while holding the model's parameters constant and incorporating covariates between 2020 and 2100 [29,30]. According to the identified spatial controls (Supplementary Figs. S9 and S10), the cumulative effect of future anthropogenic activity (i.e., land use change), climate change and N deposition on SIC dynamics was evaluated.
The environmental variables included climatic variables, N deposition, land use and terrain attributes. Variables that were significantly correlated with SICD (P < 0.05) were selected by stepwise regression in both directions. Variables with a variance inflation factor greater than 5 were removed. The remote sensing variables were not considered because they were unavailable under future scenarios. The Shuttle Radar Topography Mission digital elevation model [31] at 90 m resolution was employed to generate terrain attributes using SAGA GIS (http://saga-gis.org/): elevation, plan curvature, profile curvature, slope, slope position, multiresolution index of valley bottom flatness, topographic wetness index, and terrain ruggedness index.
Three kinds of future dynamic variables that were derived mainly from Coupled Model Intercomparison Project Phase 6 (CMIP6) models were used, including climatic variables (mean annual precipitation (MAP) and mean annual temperature (MAT)), land use (cropland, forest and grassland) and N deposition variables (NHx and NOy).
SSP1-2.6 was a new version of representative concentration pathways (RCP) 2.6 in CMIP5. The climatic variables at 2.5-minute resolution for 2021-2100 were obtained from WorldClim 2.0 (https://worldclim.org) [33]. For each SSP, climatic variables were produced from 9 global climate models (GCMs). Only the MAP and MAT were used to avoid a multicollinearity problem. Global N deposition data were acquired from [34].
This dataset was produced specifically in support of CMIP6 at a resolution of 1.9°×2°.
The land use maps at a 1-km resolution for 2010-2100 were obtained from [35]. These land use maps were downscaled from future land use scenarios for 2 RCPs in CMIP5: RCP 2.6 and RCP 6.0, which approximately correspond to SSP1-2.6 and SSP3-7.0, respectively [32]. These maps might be the finest-scale and thus were adopted, as the future land use maps within CMIP6 models were characterized with coarser resolutions.
To produce soil maps at a finer resolution, all the environmental variables were resampled to 250 m. Due to the large computing cost of running the predictive model 100 times, the prediction uncertainty for each future scenario was not assessed. The minimal and maximal projections were presented to provide information on the variation in SIC stocks.

Supplementary Figures
Supplementary Figure S1.  models. The changes in total N deposition (NTot) are illustrated in insets, which were obtained from [34]. The time step for SIC stocks changes is modeled in the decades from 2010 to 2100, aiming to better quantify the cumulative effect of global change, including land use, climate change and N deposition, on SIC. Supplementary Table S1. Comparisons with related studies on the SIC stocks in China. Geo-statistics  Topsoil (10 cm) SIC stocks decreased significantly (26.8 g C m -2 yr -1 ) in grassland.

No. Sources
 Great SIC loss was generally found in areas with strong soil acidification.
18 [47] Qingyang City (NW) 28 1 m 2013 Isotopes method  After a 60-year vegetation restoration period, grasslands stored less SIC than that of forestland but generated more pedogenic carbonate. provinces, which were located in the south-central and southwestern zones (Fig. S1), were used. Due to limited soil samples (N<20), the analysis was not performed for grassland.