1 Overview
2 Introduction
3 Data setup
4 Exploratory data analyses
- 4.1 Compositional data transforms
- 4.2 Ionomic profiles of cereal crops
5 Spatial predictions
6 Main takeaways
7 Definitions and key literature by GPT 4.0

1 Overview

This notebook examines the ionomic (elemental) composition and spatial distribution of essential nutrient minerals in cereal grains. We employed machine learning methods to generate fully replicable, high-resolution ionomic maps of cereal crops sampled across Ethiopian and Malawian croplands.

The outlined workflow merges georeferenced laboratory measurements with various spatial datasets, such as satellite imagery, land cover, soil properties, climate, and topography measurements, to establish connections between these factors and the ionomic nutrient composition of cereal grains. The combination of diverse data sources with multiple machine-learning algorithms and stacked generalizations generate accurate predictions of cereal crop ionomes on national and food system scales. The resulting spatial predictions are useful for:

Identifying nutrient deficiencies: By understanding the elemental composition of crops in different regions and food systems, researchers, policymakers, private sector operators and community knowledge workers can identify areas with nutrient deficiencies hindering crop yields. Targeted interventions can be developed to address these deficiencies and improve crop productivity.
Enhancing food security: Mapping crop ionomes can contribute to efforts to enhance food security in Africa by providing data on the nutrient content of crops. This information can be used to develop strategies to improve the nutritional quality of food produced in the region.
Guiding agricultural practices: Spatial predictions of crop ionomes can help farmers make informed decisions about the best management practices for their crops, such as selecting appropriate fertilizers and adjusting application rates to optimize nutrient availability and uptake.
Environmental monitoring: Understanding the distribution of nutrient (and anti-nutrient) elements in crops can also help to monitor the potential environmental impacts of agricultural activities, such as the accumulation of toxic elements in soil and water resources.

2 Introduction

Models of trophic dynamics in spatial ecology generally assume no interactions between the essential nutrient elements and describe the associated transfer processes as a function of only the potentially most limiting parts (i.e., Liebig’s law). However, the “Law of the Minimum” does not hold consistently at landscape scales (e.g., Sinclair and Park, 1993), nor at different trophic levels. Ecological theory has not yet been able to accurately characterize and predict the underlying spatial nutrient transfer relationships, and their associated fluxes across food system boundaries. So, understanding the statistical nature of both survey and experimental results remain important undertakings for managing crop, animal, and human nutrition, as well as for the conservation of essentially non-renewable resources such as productive soils and potable water.

Fig. 1: The main trophic pathways in agricultural systems within a pixel or voxel. Note that the dashed arrow (from Food crops to Livestock) flags one potentially pathological trophic pathway that can occur in some food systems.

Ionomics is the study of the ionome that encompasses the quantitative and simultaneous measurement of the elemental composition of a particular organism or its tissues (see Salt, Baxter and Lahner, 2008). In agricultural systems, ionomes are associated with both crop yields as well as the overall nutritional quality of the plant products that are subsequently consumed by animals, including humans. “What this means is that for people consuming food sourced locally, as is the case for many smallholder farming communities, the location of residence will be a major, sometimes the largest, influencing factor in determining the dietary intake of micronutrients from cereals (Kumssa et al., 2022).”

$**Fig. 2**: Are supplements readily available in your neighborhood? ... in rural Malawi and Ethiopia they are generally not! Also note the big list of ingredients that go into the supplement, which can generally not be produced locally in Africa.$

Fig. 2: Are supplements readily available in your neighborhood? … in rural Malawi and Ethiopia they are generally not! Also note the big list of ingredients that go into the supplement, which can generally not be produced locally in Africa.

Compositional data profiles such as those shown in Fig. 2 present a special class of multivariate labeling problems for data mining, machine learning, prediction, monitoring and recommender systems because they contain multiple labels that may be interdependent. Another important characteristic of compositional data is that they are usually defined as vectors with strictly positive components whose sum is constant (e.g, 1, 100%, ppm). Both these characteristics are solvable from a machine learning perspective, but they have not been adequately tested in terms of their potential influence in describing and predicting the associated soil-crop trophic interactions and their potential nutritional outcomes for people (and livestock).

This notebook generates spatial predictions associated with mineral nutrient deficiency risks for humans using cereal grain data from Ethiopia and Malawi. The data we shall be using were generated by the GeoNutrition Project. Nationally representative cropland sampling frames and the laboratory methods that were used in the data collections are described in Gashu et al., (2021). An overall data descriptor article provides additional information and the raw data downloads at: Kumssa et al., (2022). The notebook itself is maintained on GitHub (here). Feel free to alter and use it as you see fit, but also please cite it when you do so. It is copyrighted as CC BY 4.0.

3 Data setup

To actually run this notebook, you will need to load the packages indicated in the chunk below.

# Package names
packages <- c("tidyverse", "rgdal", "sp", "raster", "leaflet", "htmlwidgets", "DT", "compositions",
              "doParallel", "caret", "caretEnsemble", "quantreg")

# Install packages
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
  install.packages(packages[!installed_packages])
}

# Load packages
invisible(lapply(packages, library, character.only = TRUE))

The primary data used in this notebook include Inductively Coupled Plasma Mass Spectrometry (ICP-MS) mineral nutrient concentration measurements of cereal grain samples (in mg kg^-1), including: Na, Mg, P, K, Ca, V, Cr, Mn, Fe, Co, Cu, Zn, Se, Mo. They also include the georeference of where the samples were collected (lon/lat, EPSG:3857), the cereal crop that was sampled at that location, and the source of the grain samples (i.e., from the standing crop, a field stack or a closely located grain store). They also include the gridded spatial variables we shall be using for predictive modeling (see their short description and raw data links in the table below).

You can download all of the data needed to run this notebook from our GeoNutrition OSF repository using the following links to: MW_ET_grain_ICP.csv, ET_grids_250m.zip and MW_grids_250m.zip. Place the 3 files into your working directory. Note that the ET_grids_250.zip download is relatively large (> 1 Gb), and may take a bit of time to transfer, depending on your connection. The next chunks load and merge the GeoNutrition survey data with the spatial (gridded) features.

icpdat <- read.table("MW_ET_grain_ICP.csv", header=T, sep=",") ## ICP-MS data for ET & MW
et_icp <- icpdat[which(icpdat$survey == 'ET'), ]

# Unzip Ethiopia grid files into a sub-directory called "ET_grids"
dir.create("ET_grids", showWarnings=F)
unzip("ET_grids_250m.zip", exdir = "ET_grids", overwrite = T)

# Load rasters
et_lis <- list.files(path="./ET_grids", pattern="tif", full.names = T)
et_gri <- stack(et_lis)

# Ethiopia survey projection
et_pro <- as.data.frame(project(cbind(et_icp$lon, et_icp$lat), "+proj=laea +ellps=WGS84 +lon_0=20 +lat_0=5
                                +units=m +no_defs"))

colnames(et_pro) <- c("x","y")
et_icp <- cbind(et_icp, et_pro)
coordinates(et_icp) <- ~x+y
projection(et_icp) <- projection(et_gri)

# Extract gridded variables at Ethiopia survey locations
et_grid <- extract(et_gri, et_icp)
et_icp <- as.data.frame(cbind(et_icp, et_grid))

Apply the same procedure to the Malawi GeoNutrition data.

mw_icp <- icpdat[which(icpdat$survey == 'MW'), ]

# Unzip Malawi grid files into a sub-directory called "MW_grids"
dir.create("MW_grids", showWarnings=F)
unzip("MW_grids_250m.zip", exdir = "MW_grids", overwrite = T)

# Load rasters
mw_lis <- list.files(path="./MW_grids", pattern="tif", full.names = T)
mw_gri <- stack(mw_lis, quick = TRUE)

# Malawi survey projection
mw_pro <- as.data.frame(project(cbind(mw_icp$lon, mw_icp$lat), "+proj=laea +ellps=WGS84 +lon_0=20 +lat_0=5
                                +units=m +no_defs"))

colnames(mw_pro) <- c("x","y")
mw_icp <- cbind(mw_icp, mw_pro)
coordinates(mw_icp) <- ~x+y
projection(mw_icp) <- projection(mw_gri)

# Extract gridded variables at Malawi survey locations
mw_grid <- extract(mw_gri, mw_icp)
mw_icp <- as.data.frame(cbind(mw_icp, mw_grid))

This chunk then merges and writes out a clean, combined data frame for Ethiopia and Malawi that can be reused should you wish to process it in software other than R. You can also download the file directly from our OSF site here.

# Merge the Ethiopia and Malawi data
icpdat <- rbind(et_icp, mw_icp)

# Write-out main data frame
dir.create("Results", showWarnings=F)
write.csv(icpdat, "./Results/ET_MW_icpdat.csv", row.names = F)
glimpse(icpdat)

## Rows: 3,163
## Columns: 47
## $ survey  <chr> "ET", "ET", "ET", "ET", "ET", "ET", "ET", "ET", "ET", "ET", "E…
## $ sid     <chr> "ETH0077", "ETH0078", "ETH0079", "ETH0296", "ETH0080", "ETH029…
## $ icp_run <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1,…
## $ lat     <dbl> 11.86226, 11.86383, 11.81106, 11.86824, 11.71414, 11.66689, 11…
## $ lon     <dbl> 39.28545, 39.21890, 39.09897, 39.28904, 38.95456, 38.86983, 39…
## $ sdate   <chr> "04/11/17", "04/11/17", "04/11/17", "04/11/17", "05/11/17", "0…
## $ crop    <chr> "barl", "barl", "barl", "barl", "whea", "whea", "barl", "barl"…
## $ source  <chr> "crop", "stack", "stack", "crop", "crop", "crop", "stack", "st…
## $ ssamp   <chr> "close", "main", "main", "close", "main", "close", "main", "ma…
## $ Ca      <dbl> 570.08284, 541.31048, 467.03425, 490.94460, 468.21753, 316.983…
## $ Co      <dbl> 0.028669911, 0.025005454, 0.035791448, 0.031038052, 0.02523686…
## $ Cr      <dbl> 0.01733333, 0.01733333, 0.01733333, 0.01733333, 0.01733333, 0.…
## $ Cu      <dbl> 4.223454, 3.988301, 3.389143, 2.870418, 3.815869, 2.630814, 2.…
## $ Fe      <dbl> 85.80682, 80.01296, 94.08756, 77.68504, 41.99542, 34.44408, 57…
## $ K       <dbl> 6990.930, 5756.031, 5725.569, 5681.951, 2930.936, 3311.315, 57…
## $ Mg      <dbl> 1375.2316, 1172.4774, 1088.2649, 1189.0942, 876.4320, 908.9308…
## $ Mn      <dbl> 12.732099, 11.646607, 11.249660, 10.378853, 34.554311, 38.0722…
## $ Mo      <dbl> 5.367878909, 3.276685885, 3.620112203, 5.288268721, 2.54115444…
## $ Na      <dbl> 129.894114, 137.342082, 101.605533, 138.704873, 17.952582, 12.…
## $ P       <dbl> 3325.314, 3649.858, 2895.515, 2755.102, 1434.128, 1741.645, 34…
## $ Se      <dbl> 0.000666667, 0.000666667, 0.000666667, 0.000666667, 0.00066666…
## $ V       <dbl> 0.172698288, 0.177443069, 0.217143669, 0.160310287, 0.07822827…
## $ Zn      <dbl> 26.08365, 21.77466, 15.12363, 16.15510, 12.36118, 13.37781, 16…
## $ albe    <dbl> 153, 152, 149, 149, 124, 151, 145, 133, 115, 115, 119, 124, 12…
## $ bio1    <dbl> 108, 111, 118, 107, 130, 135, 120, 126, 148, 148, 147, 180, 16…
## $ bio12   <dbl> 969, 1007, 1068, 977, 1201, 1285, 1108, 1132, 1284, 1491, 1447…
## $ bio15   <dbl> 123, 131, 115, 123, 134, 129, 127, 130, 92, 120, 117, 79, 124,…
## $ bio7    <dbl> 167, 168, 168, 167, 169, 169, 168, 168, 167, 168, 168, 170, 16…
## $ bp      <dbl> 0.7018412, 0.2944334, 0.5880405, 0.6142320, 0.4378009, 0.35177…
## $ cec     <dbl> 39.00, 40.25, 48.75, 40.75, 47.75, 42.50, 46.25, 44.25, 55.25,…
## $ cp      <dbl> 0.9441356, 0.9677382, 0.9692430, 0.9602199, 0.9578317, 0.97414…
## $ dows    <dbl> 12.2570591, 9.0583858, 16.6818695, 12.2084780, 18.7506886, 20.…
## $ lstd    <dbl> 30.69744, 31.36294, 30.69071, 30.79954, 31.78205, 31.93278, 31…
## $ lstn    <dbl> 1.461406, 3.228888, 2.350411, 1.491514, 6.096198, 6.604501, 2.…
## $ mdem    <dbl> 3415, 3325, 3251, 3436, 3028, 2911, 3214, 3110, 2743, 2538, 26…
## $ mlat    <dbl> 11.86177, 11.86437, 11.81084, 11.86837, 11.71354, 11.66752, 11…
## $ mlon    <dbl> 39.28527, 39.21801, 39.09854, 39.28800, 38.95331, 38.86930, 39…
## $ para    <dbl> 31.74879, 26.29791, 25.68277, 29.74396, 18.51530, 24.75845, 34…
## $ parv    <dbl> 190.7053, 173.5824, 182.5577, 224.5906, 111.2490, 129.7058, 20…
## $ ph      <dbl> 6.200, 6.250, 6.775, 6.050, 7.125, 7.000, 6.825, 6.550, 7.025,…
## $ slope   <dbl> 1.7088853, 5.7218022, 0.6022949, 2.8924010, 1.5036777, 1.58801…
## $ snd     <dbl> 40.75, 35.75, 27.50, 36.25, 28.50, 26.00, 31.25, 25.00, 21.25,…
## $ soc     <dbl> 42.25, 29.25, 23.00, 27.75, 22.50, 24.00, 21.50, 23.00, 20.75,…
## $ twi     <dbl> 8.238656, 13.537657, 8.084811, 9.573053, 7.702112, 9.655262, 8…
## $ wp      <dbl> 0.01817295, 0.01554691, 0.01551769, 0.01695270, 0.01592344, 0.…
## $ x       <dbl> 2094549, 2087383, 2074832, 2094896, 2059921, 2051103, 2069622,…
## $ y       <dbl> 799943.1, 799830.4, 793415.2, 800627.2, 781964.0, 776324.5, 79…

This next chunk shows where the grain samples were collected. You can zoom into the map to display the distribution of the individual survey locations. Clicking on individual markers will indicate which crop species were collected at that location. The red points are maize (the most frequent and designated reference crop in these surveys); blue dots indicate other crops.

col <- ifelse(icpdat$crop == "maiz", "red", "blue")

# Plot sample locations
w <- leaflet() %>%
  setView(lng = mean(icpdat$lon), lat = mean(icpdat$lat), zoom = 4) %>%
  addProviderTiles(providers$OpenStreetMap.Mapnik) %>%
  addCircleMarkers(icpdat$lon, icpdat$lat,
                   popup = icpdat$crop,
                   color = col,
                   stroke = FALSE,
                   fillOpacity = 0.8,
                   radius = 6,
                   clusterOptions = markerClusterOptions())

saveWidget(w, 'MW_ET_ICP_sample_locs.html', selfcontained = T) ## save widget
w ## plot widget

4 Exploratory data analyses

4.1 Compositional data transforms

Compositional Data Analysis (Aitchison, 1986) refers to the analyses of compositional data (CoDa). CoDa are defined as vectors with strictly positive components whose sum is constant (e.g, 1 or 100%). The terms also cover all other parts of a whole (so-called subcompositions), which carry only relative information (van den Boogaart and Tolosana-Delgado, 2013). Because a change in any proportion of a whole changes at least one other proportion, CoDa are interdependent and should be log ratio transformed (see e.g., Parent et al., 2013).

The next chunk closes the CoDa and calculates centered log ratios (clr). These can be analyzed, summarized, and interpreted with conventional univariate, multivariate, data and machine learning modeling methods (Greenacre, 2021). Note that you could also use other log ratios such as the alr, or the ilr, but we shall leave those possibilities for you to explore.

# Calculate the clr
varc <- c("Na", "Mg", "P", "K", "Ca", "V", "Cr", "Mn", "Fe", "Co", "Cu", "Zn", "Se", "Mo")
comps <- icpdat[varc]
comps <- comps / rowSums(comps) ## close the composition
comps <- as.data.frame(clr(comps)) ## centered log ratio (clr) transform

# Attach survey data info
locv <- c("survey", "x", "y", "crop", "source", "albe", "bio1", "bio12", "bio15", "bio7", "bp", 
          "cec", "cp", "dows", "lstd", "lstn", "mdem", "mlat", "mlon", "para", "parv", "ph",
          "slope", "snd", "soc", "twi", "wp") 
locs <- icpdat[locv]
coda <- cbind(locs, comps)
coda <- na.omit(coda) ## includes only complete cases

# Scrub extraneous objects in memory
rm(list=setdiff(ls(), c("icpdat", "coda", "et_gri", "mw_gri")))

Fig. 3: Boxplots of ionomic differences between cereal grains from Ethiopia and Malawi (on a centered log ratio scale).

4.2 Ionomic profiles of cereal crops

We run a compositional main effects regression (van den Boogaart and Tolosana-Delgado, 2013) using maize as the control crop. This quantifies the average ionomic differences between crop species (and their variability), which are shown in Fig. 3 and that can be used as a baseline for the spatial models that are presented in the subsequent sections. Section 4.5 augments this initially simple model with spatial predictions.

icpdat$crop <- factor(icpdat$crop, levels=c('maiz','sorg','pmil','fmil','teff','rice','whea','barl','trit'))
y = acomp(icpdat[,10:23])
covars <- icpdat[,c("survey", "crop")]
survey <- covars$survey
crop <- covars$crop

# Ionomic differences between surveys and crop species
survey.lm <- lm(clr(y) ~ survey + crop)
survey.lm

## 
## Call:
## lm(formula = clr(y) ~ survey + crop)
## 
## Coefficients:
##              Ca         Co         Cr         Cu         Fe         K        
## (Intercept)   2.867820  -5.993741  -3.800809  -0.904741   1.661129   6.858373
## surveyMW     -0.250721   0.009139  -0.111807   0.095386  -0.782710   0.012955
## cropsorg      0.385953  -0.249903  -0.254472  -0.213382   0.441947  -0.474549
## croppmil      0.290662   1.388626  -1.406933  -0.141376   1.015741  -0.725633
## cropfmil      2.805897   1.537981   0.539302  -0.113399   0.175297  -1.060162
## cropteff      1.613555   0.994415  -0.179782  -0.072903   0.456102  -1.272507
## croprice     -0.023757   0.963942   0.574384  -0.286107  -0.131289  -0.949804
## cropwhea      1.218947  -0.054914  -0.743851   0.307839   0.118265  -0.391986
## cropbarl      1.041296  -0.172688   0.371640   0.362296   0.377183  -0.394719
## croptrit      1.150063   0.379483  -0.922923   0.623262   0.303206  -0.138020
##              Mg         Mn         Mo         Na         P          Se       
## (Intercept)   5.589228   0.273935  -3.208795  -1.242322   6.511895  -6.068581
## surveyMW      0.074284   0.020851   0.104644  -0.330769   0.018572   0.069508
## cropsorg     -0.225981   0.326927   0.595187  -0.431896  -0.445968   1.094237
## croppmil     -0.710540  -0.047461   0.660002  -0.611542  -0.822899   1.918352
## cropfmil     -0.768928   2.286442  -1.280192  -0.133939  -1.288824  -1.245186
## cropteff     -0.842513   1.328401  -0.811611   0.469935  -1.072517   0.467493
## croprice     -0.513727   0.744090   0.517804  -0.315510  -0.592588   0.642667
## cropwhea     -0.464242   1.388205  -0.672200   0.326082  -0.421077   0.056739
## cropbarl     -0.564796   0.325906  -0.658686   1.359082  -0.476126  -1.337267
## croptrit     -0.418029   1.461776  -1.608849   1.471482  -0.296011  -0.800891
##              V          Zn       
## (Intercept)  -4.131234   1.587845
## surveyMW      1.006245   0.064423
## cropsorg      0.062995  -0.611094
## croppmil     -0.386692  -0.420306
## cropfmil     -0.174739  -1.279551
## cropteff     -0.002637  -1.075431
## croprice     -0.098950  -0.531154
## cropwhea     -0.355467  -0.312339
## cropbarl      0.149539  -0.382659
## croptrit     -0.898160  -0.306388

5 Spatial predictions

“The Rashomon effect is a storytelling and writing method in cinema in which an event is given contradictory interpretations or descriptions by the individuals involved, thereby providing different perspectives and points of view of the same incident. The term, derived from the 1950 film Rashomon (directed by Akira Kurosawa), is used to describe the phenomenon of the unreliability of eyewitnesses.” – also see: Anderson (2016).

We will be using a stacking workflow to generate spatial predictions. Stacking is an ensemble machine learning technique that combines multiple base models to achieve better overall performance in predictive modeling tasks. The primary aim of stacking is to improve the generalization capability of individual models by leveraging the strengths of diverse models to reduce prediction errors on unseen/test data (Wolpert, 1992).

The core idea of stacking is to train multiple base models on the same training dataset and then to use their predictions as input features for a second-level model, called the stacked or meta-model. The stacked model is then trained to make the final prediction by combining the outputs of the base models (Breiman, 1996). This two-level prediction process can also be extended to additional levels where needed. The basic stacking process is as follows:

Data splitting: The original training dataset is split into a training set and a validation set. This split can be done using techniques like k-fold cross-validation.
Base model training: Multiple base models are trained on the training set, which could include different algorithms, different configurations of the same algorithm, or feature data.
Base model predictions: The trained base models make predictions on the validation set, generating a new set of features for the stacked model.
Stacked model training: The ensemble model is trained on the new set of features generated by the base model predictions, learning to combine them optimally.
Final predictions: The ensemble makes its final predictions by feeding the test data through the base models and then through the stacked model.

Stacking is especially effective when the base models have complementary strengths and weaknesses. By combining diverse models, the ensemble can exploit their individual strengths and mitigate their weaknesses, minimizing overfitting (or underfitting), resulting in improved overall performance on unseen data. This is very similar to guarding against bias and combining the predictions of a roomful of witnesses all of whom have very different recollections, opinions, and mental models about the same incident … hence, the “Rashomon Effect” analogy above.

5.1 Training / testing setup

We initially set a 90/10% calibration and validation partition. The function createDataPartition in the caret package can be used to generate balanced splits of the data. If the argument to this function is a factor, random sampling occurs within each class and will preserve the overall class balance of the data. We partition on the survey variable here.

# Set calibration/validation set randomization seed
seed <- 12358
set.seed(seed)
coda <- coda[sample(1:nrow(coda)), ] # randomize observations

# Split data into calibration and validation sets by survey
gnIndex <- createDataPartition(coda$survey, p = 9/10, list = F, times = 1)
gn_cal <- coda[ gnIndex,]
gn_val <- coda[-gnIndex,]

# Calibration labels
labs <- c("Ca") ## insert other labels here 
lcal <- as.vector(t(gn_cal[labs]))

# Raster calibration features
fcal <- gn_cal[ ,c(6:27)]

5.2 Spatial base learner training and prediction

This chunk fits 2 base learner models (one bagging, one boosting), which use gridded training data with 10-fold cross-validation. You can learn more about how these algorithms work by looking up the links at: randomForest and gbm. Change these as you see fit. There are about 160 different algorithms available via the caret package.

You can also use caretEnsemble package instead of caret as long as the feature variables (et_gri and mw_gri in this case), and the trainControl methods are the same for each model in the caretList function. This shortens the script-length of this notebook but does not otherwise affect the overall caret functionality. Note however that the calculations take a bit of time to run on a normal 8-core, 16 Gb memory computer. We fit these models with default-tuning of the relevant hyperparameters.

# Start doParallel to parallelize model fitting
set.seed(seed)
mc <- makeCluster(detectCores())
registerDoParallel(mc)

# Specify model training controls
tc <- trainControl(method="cv", number=10, allowParallel = TRUE, savePredictions="final")

# Fit 2 base-learners using the specified ionomic labels
blist <- caretList(fcal, lcal,
                   trControl = tc,
                   tuneList = NULL,
                   methodList = c("rf", "gbm"),
                   preProcess = c("center","scale"),
                   metric = "RMSE")

stopCluster(mc)
fname <- paste("./Results/", labs, "_blist.rds", sep = "")
saveRDS(blist, fname)

# Generate spatial base-learner predictions for Ethiopia
et_rf_pred <- predict(et_gri, blist$rf)
et_gb_pred <- predict(et_gri, blist$gbm)
et_spreds <- stack(et_rf_pred, et_gb_pred)
names(et_spreds) <- c("rf","gb")
fname <- paste("./Results/", labs, "_ET_base.tif", sep = "")
writeRaster(et_spreds, filename=fname, datatype="FLT4S", options="INTERLEAVE=BAND", overwrite=T)

# Generate spatial base-learner predictions for Malawi
mw_rf_pred <- predict(mw_gri, blist$rf)
mw_gb_pred <- predict(mw_gri, blist$gbm)
mw_spreds <- stack(mw_rf_pred, mw_gb_pred)
names(mw_spreds) <- c("rf","gb")
fname <- paste("./Results/", labs, "_MW_base.tif", sep = "")
writeRaster(mw_spreds, filename=fname, datatype="FLT4S", options="INTERLEAVE=BAND", overwrite=T)

5.3 Stacked learner training and prediction

This next chunk then fits and predicts linear ensemble regressions based on the initial two base-learner models, using the training data.

# Extract base-learner predictions at the Ethiopia training locations
et_cal <- gn_cal[which(gn_cal$survey == 'ET'), ]
coordinates(et_cal) <- ~x+y
projection(et_cal) <- projection(et_spreds)
et_base <- extract(et_spreds, et_cal)
et_cal <- as.data.frame(cbind(et_cal, et_base))

# Extract base-learner predictions at the Malawi training locations
mw_cal <- gn_cal[which(gn_cal$survey == 'MW'), ]
coordinates(mw_cal) <- ~x+y
projection(mw_cal) <- projection(mw_spreds)
mw_base <- extract(mw_spreds, mw_cal)
mw_cal <- as.data.frame(cbind(mw_cal, mw_base))

# Row bind the Ethiopia and Malawi data frames
gn_cal <- rbind(et_cal, mw_cal)

# Raster calibration features (the base-learners)
lcal <- as.vector(t(gn_cal[labs]))
fcal <- gn_cal[ ,c(40:41)]

# Start doParallel
mc <- makeCluster(detectCores())
registerDoParallel(mc)

# Control setup
set.seed(1385321)
tc <- trainControl(method = "cv", number = 10, allowParallel = TRUE, savePredictions="final")

# Model training
en <- train(fcal, lcal,
            method = "lm",
            metric = "RMSE",
            trControl = tc)

# Model predictions
et_en_pred <- predict(et_spreds, en) ## ET ensemble spatial predictions
mw_en_pred <- predict(mw_spreds, en) ## MW ensemble spatial predictions
stopCluster(mc)
fname <- paste("./Results/", labs, "_en.rds", sep = "")
saveRDS(en, fname)

# Write out the ensemble-learner prediction grids
et_fname <- paste("./Results/", labs, "_ET_ens.tif", sep = "")
writeRaster(et_en_pred, filename=et_fname, datatype="FLT4S", overwrite=T)
mw_fname <- paste("./Results/", labs, "_MW_ens.tif", sep = "")
writeRaster(mw_en_pred, filename=mw_fname, datatype="FLT4S", overwrite=T)
unlink("./Results/*.xml")

To save time we have pre-trained all 28 base and 14 ensemble learners we’ll be using for the model testing and prediction rule generation steps below. This is a bit of hack! However, you can download and check all of the associated .rds model objects and .gtif files from our OSF repository here. Figures 4 & 5 below, show the stacked-learner predictions, after applying a few GIS cosmetics in GRASS. You can download, examine, and reuse the pre-trained model objects and predictions at https://osf.io/eyda2. Unzip and place those into your ./Results sub-directory.

Fig. 4: Compositional mineral nutrient predictions for Malawi (on a centered log ratio scale). Note that the numbers run from red (low), to yellow, to green (high) and have been histogram equalized to emphasize potentially anomalous spatial patterns.

Fig. 5: Compositional mineral nutrient predictions for Ethiopia (on a centered log ratio scale). The color scheme is as in Fig. 4.

5.4 Prediction uncertainties

The models that have been developed have not seen any of the test-set (evaluation) data up to this point. While there are many ways to quantify the uncertainty inherent in the ensemble predictions, we take a simple but quite robust approach here using quantile regression with (quantreg). We are mainly interested in the initial spread of the ROI-wide predictions (sensu, their 90% probable intervals). Once new data are collected, we shall be amending these and will also point out some additional (more spatially explicit) approaches.

# Stack Ethiopia prediction grids
et_lis <- list.files(path="./Results", pattern="ET_ens.tif", full.names = T)
et_sgrd <- stack(et_lis)
names(et_sgrd) <- c("sCa","sCo","sCr","sCu","sFe","sK","sMg","sMn","sMo","sNa","sP",
                   "sSe","sV","sZn")

# Extract ensemble predictions at the Ethiopia testing locations
et_val <- gn_val[which(gn_val$survey == 'ET'), ]
coordinates(et_val) <- ~x+y
projection(et_val) <- projection(et_sgrd)
et_ens <- extract(et_sgrd, et_val)
et_val <- as.data.frame(cbind(et_val, et_ens))

# Stack Malawi prediction grids
mw_lis <- list.files(path="./Results", pattern="MW_ens.tif", full.names = T)
mw_sgrd <- stack(mw_lis)
names(mw_sgrd) <- c("sCa","sCo","sCr","sCu","sFe","sK","sMg","sMn","sMo","sNa","sP",
                   "sSe","sV","sZn")

# Extract ensemble predictions at the Malawi testing locations
mw_val <- gn_val[which(gn_val$survey == 'MW'), ]
coordinates(mw_val) <- ~x+y
projection(mw_val) <- projection(mw_sgrd)
mw_ens <- extract(mw_sgrd, mw_val)
mw_val <- as.data.frame(cbind(mw_val, mw_ens))

# Row bind the Ethiopia and Malawi test data frames
gn_val <- rbind(et_val, mw_val)

# Quantile regression test-set fits
qNa <- rq(Na~sNa, tau=c(0.05,0.5,0.95), data = gn_val)
qMg <- rq(Mg~sMg, tau=c(0.05,0.5,0.95), data = gn_val)
qP <- rq(P~sP, tau=c(0.05,0.5,0.95), data = gn_val)
qK <- rq(K~sK, tau=c(0.05,0.5,0.95), data = gn_val)
qCa <- rq(Ca~sCa, tau=c(0.05,0.5,0.95), data = gn_val)
qV <- rq(V~sV, tau=c(0.05,0.5,0.95), data = gn_val)
qCr <- rq(Cr~sCr, tau=c(0.05,0.5,0.95), data = gn_val)
qMn <- rq(Mn~sMn, tau=c(0.05,0.5,0.95), data = gn_val)
qFe <- rq(Fe~sFe, tau=c(0.05,0.5,0.95), data = gn_val)
qCo <- rq(Co~sCo, tau=c(0.05,0.5,0.95), data = gn_val)
qCu <- rq(Cu~sCu, tau=c(0.05,0.5,0.95), data = gn_val)
qZn <- rq(Zn~sZn, tau=c(0.05,0.5,0.95), data = gn_val)
qSe <- rq(Se~sSe, tau=c(0.05,0.5,0.95), data = gn_val)
qMo <- rq(Mo~sMo, tau=c(0.05,0.5,0.95), data = gn_val)

Fig. 6: Quantile regression plots of predicted vs measured test-set values (on a centered log ratio scale). The red lines are estimated at the median, the blue lines are estimated at the 5% and 95% quantiles.

5.5 Ionomic prediction rules

These final chunks are intended to generate a foundation (at least from the computing side) for the development of a recommender system that would allow users to make evidence/content based decisions about crop selections, biofortification and/or mineral nutrient supplementation practices in different environments. This is a spatially augmented version of the model that was presented in section 3.2. Also considered now are the dependencies between the individual ionomic components.

# Extract ensemble predictions at the Ethiopia survey locations
et_icp <- icpdat[which(icpdat$survey == 'ET'), ]
coordinates(et_icp) <- ~x+y
projection(et_icp) <- projection(et_sgrd)
et_ens <- extract(et_sgrd, et_icp)
et_icp <- as.data.frame(cbind(et_icp, et_ens))

# Extract ensemble predictions at the Malawi survey locations
mw_icp <- icpdat[which(icpdat$survey == 'MW'), ]
coordinates(mw_icp) <- ~x+y
projection(mw_icp) <- projection(mw_sgrd)
mw_ens <- extract(mw_sgrd, mw_icp)
mw_icp <- as.data.frame(cbind(mw_icp, mw_ens))

# Row bind the Ethiopia and Malawi icp data frames
icpdat <- rbind(et_icp, mw_icp)
write.csv(icpdat, "./Results/ET_MW_icpdat.csv", row.names = F)

icpdat$crop <- factor(icpdat$crop, levels=c('maiz','sorg','pmil','fmil','teff','rice','whea','barl','trit'))
icpdat <- na.omit(icpdat) ## includes only complete cases
y = acomp(icpdat[,10:23])

# Select covariates
covars <- icpdat[,c("survey", "crop", "sNa", "sMg", "sP", "sK", "sCa", "sV", "sCr", "sMn", "sFe",
                    "sCo", "sCu", "sZn", "sSe", "sMo")]
survey <- covars$survey
crop <- covars$crop
sNa <- covars$sNa
sMg <- covars$sMg
sP <- covars$sP
sK <- covars$sK
sCa <- covars$sCa
sV <- covars$sV
sCr <- covars$sCr
sMn <- covars$sMn
sFe <- covars$sFe
sCo <- covars$sCo
sCu <- covars$sCu
sZn <- covars$sZn
sSe <- covars$sSe
sMo <- covars$sMo

# Fit compositional regression
rules.lm <- lm(clr(y) ~ survey + crop + sNa + sMg + sP + sK + sCa + sV + sCr + sMn + sFe + sCo
                        + sCu + sZn + sSe + sMo)
rules.lm

## 
## Call:
## lm(formula = clr(y) ~ survey + crop + sNa + sMg + sP + sK + sCa + 
##     sV + sCr + sMn + sFe + sCo + sCu + sZn + sSe + sMo)
## 
## Coefficients:
##              Ca          Co          Cr          Cu          Fe        
## (Intercept)  -0.8097129  -0.2403896  -0.4228927   0.1522619   0.0605697
## surveyMW     -0.0114673  -0.0183197   0.0085906   0.0027045  -0.0296458
## cropsorg      0.2455106   0.0278857   0.1013634  -0.0332349   0.0955683
## croppmil      0.2180395   0.1407845   0.0577203  -0.0119189   0.0996921
## cropfmil      1.4247442   0.3389458   0.1731577  -0.0822123   0.1062598
## cropteff      0.7910583   0.2951233   0.2101580  -0.0633064   0.1696278
## croprice      0.2517039   0.0891165   0.1221628  -0.0451928   0.0358361
## cropwhea      0.4723967   0.1627923   0.1680034  -0.0311299   0.0661462
## cropbarl      0.3715431   0.0660817   0.2061388  -0.0552415   0.0134064
## croptrit      0.4385948   0.2365370   0.0904757   0.0677227   0.0500758
## sNa          -0.0371093  -0.0489598  -0.1412270  -0.0182880  -0.0641815
## sMg          -0.0520173  -0.1054818  -0.0269826  -0.0164426  -0.1521321
## sP           -0.0555744  -0.0752736  -0.2803216  -0.0137499  -0.0523554
## sK            0.1655473   0.0852523  -0.0544128  -0.0464507   0.0131098
## sCa           0.7535204  -0.1170983  -0.2212442  -0.0071347  -0.1156185
## sV           -0.0510624  -0.0238315  -0.1132952  -0.0231553  -0.0485970
## sCr          -0.0324801  -0.0525894   0.8278860  -0.0208342  -0.0703662
## sMn          -0.1034826  -0.0528661  -0.1535974  -0.0310895  -0.0594948
## sFe          -0.0357364  -0.0574911  -0.1443202  -0.0217405   0.9184754
## sCo          -0.0400805   0.9562756  -0.1494185  -0.0231742  -0.0517543
## sCu          -0.0432852  -0.1149480  -0.2610733   1.0102811  -0.0958008
## sZn           0.0524011  -0.0049446  -0.1685245  -0.0488449  -0.0415650
## sSe          -0.0366844  -0.0522191  -0.1541457  -0.0196910  -0.0704192
## sMo          -0.0388370  -0.0419150  -0.1688457  -0.0153223  -0.0647572
##              K           Mg          Mn          Mo          Na        
## (Intercept)   0.7675142   0.4665913  -0.6025815  -0.0096210   0.0577397
## surveyMW     -0.0124404  -0.0004094   0.0547214   0.0538346   0.0034738
## cropsorg     -0.1854290  -0.0954630   0.1725629  -0.1059281   0.0049602
## croppmil     -0.1335657  -0.1376194   0.1435461  -0.1926099   0.0730115
## cropfmil     -0.6634768  -0.5045956   1.0248911  -0.6894862   0.3863795
## cropteff     -0.6061299  -0.3903427   0.5467260  -0.4293237   0.1359727
## croprice     -0.2651192  -0.1477737   0.2126557  -0.0761231   0.0471181
## cropwhea     -0.3032382  -0.2760876   0.3960894  -0.2269801   0.0699022
## cropbarl     -0.2367721  -0.2346181   0.1801942  -0.0826164   0.1371026
## croptrit     -0.1784433  -0.2315447   0.3882819  -0.2724452   0.2600153
## sNa          -0.0403989  -0.0368171  -0.0469544  -0.1427372   0.8631255
## sMg           0.0522375   0.8574368  -0.0103724  -0.2053037  -0.0873806
## sP            0.0256199   0.0336223  -0.0391449  -0.0427652  -0.1001164
## sK            0.6519407  -0.1257078   0.0312865  -0.2170750  -0.2005785
## sCa           0.0756725   0.0521480  -0.1203245  -0.0308132  -0.1898527
## sV           -0.0614129  -0.0544245  -0.0477144  -0.1618030  -0.1293387
## sCr          -0.0459002  -0.0353683  -0.0348341  -0.1317991  -0.1361581
## sMn          -0.0127553  -0.0004591   0.8561576  -0.0702361  -0.1627400
## sFe          -0.0470581  -0.0390096  -0.0364884  -0.1359375  -0.1360797
## sCo          -0.0535521  -0.0441500  -0.0375221  -0.1452258  -0.1444412
## sCu          -0.0122458  -0.0081296  -0.0346627  -0.1204234  -0.1179189
## sZn          -0.0927927  -0.0537481   0.0408378  -0.2093173  -0.1342996
## sSe          -0.0473062  -0.0386883  -0.0390171  -0.1322788  -0.1368610
## sMo          -0.0402727  -0.0335929  -0.0416637   0.8595164  -0.1384309
##              P           Se          V           Zn        
## (Intercept)   0.4843776  -0.5502767   0.3605072   0.2859128
## surveyMW     -0.0063556  -0.0440631   0.0033148  -0.0039384
## cropsorg     -0.1689514   0.1153775   0.0286026  -0.2028249
## croppmil     -0.1378572   0.0406930  -0.0865635  -0.0733524
## cropfmil     -0.7488344  -0.0541172   0.0440597  -0.7557154
## cropteff     -0.5076221   0.2602347   0.1120822  -0.5242584
## croprice     -0.1654165   0.0924034   0.0165379  -0.1679091
## cropwhea     -0.2931278   0.1100118  -0.0529311  -0.2618473
## cropbarl     -0.2184501   0.1102285  -0.0273508  -0.2296461
## croptrit     -0.2279518  -0.1495011  -0.2431852  -0.2286319
## sNa          -0.0446258  -0.1835283  -0.0164849  -0.0418132
## sMg           0.0890469  -0.1894542  -0.2187989   0.0656451
## sP            0.8310791  -0.2245585   0.0069145  -0.0133760
## sK           -0.1516348   0.0040139   0.0549725  -0.2102635
## sCa           0.0671859  -0.1717190  -0.0482001   0.0734785
## sV           -0.0660023  -0.1546085   0.9870323  -0.0517866
## sCr          -0.0442412  -0.1605306  -0.0251121  -0.0376724
## sMn          -0.0060648  -0.1662600  -0.0265273  -0.0105848
## sFe          -0.0473955  -0.1741277  -0.0025784  -0.0405123
## sCo          -0.0524496  -0.1522539  -0.0126276  -0.0496258
## sCu          -0.0307018  -0.1531166  -0.0264607   0.0084855
## sZn          -0.0895469  -0.1284841   0.0491272   0.8297015
## sSe          -0.0444327   0.8337229  -0.0228714  -0.0391081
## sMo          -0.0401992  -0.1646829  -0.0365644  -0.0344327

The next chunk transforms the model coefficients back to the original compositional scale.

coefs <- clrInv(coef(rules.lm), orig=y)
coefs

##             Ca           Co           Cr           Cu           Fe          
## (Intercept) "0.02884916" "0.05097851" "0.04247438" "0.07549420" "0.06887984"
## surveyMW    "0.07059005" "0.07010799" "0.07202023" "0.07159756" "0.06931842"
## cropsorg    "0.09049330" "0.07279527" "0.07834551" "0.06847922" "0.07789280"
## croppmil    "0.08815347" "0.08159959" "0.07509545" "0.07004380" "0.07831443"
## cropfmil    "0.23622800" "0.07975816" "0.06757316" "0.05234419" "0.06320055"
## cropteff    "0.14451819" "0.08801176" "0.08084268" "0.06150022" "0.07763162"
## croprice    "0.09090013" "0.07725982" "0.07985562" "0.06754978" "0.07325112"
## cropwhea    "0.11111418" "0.08152860" "0.08195457" "0.06715690" "0.07401796"
## cropbarl    "0.10182207" "0.07502080" "0.08629939" "0.06644950" "0.07117134"
## croptrit    "0.10742681" "0.08777283" "0.07584491" "0.07413871" "0.07284186"
## sNa         "0.06621976" "0.06543965" "0.05967189" "0.06747790" "0.06445109"
## sMg         "0.06507413" "0.06168635" "0.06672381" "0.06743080" "0.05887476"
## sP          "0.06507020" "0.06380091" "0.05197275" "0.06784944" "0.06528000"
## sK          "0.08211749" "0.07578163" "0.06590345" "0.06643028" "0.07050710"
## sCa         "0.14696490" "0.06153309" "0.05544709" "0.06868554" "0.06162422"
## sV          "0.06444324" "0.06622220" "0.06055500" "0.06626700" "0.06460231"
## sCr         "0.06676730" "0.06543807" "0.15783965" "0.06754941" "0.06428507"
## sMn         "0.06198699" "0.06520532" "0.05895708" "0.06664084" "0.06477452"
## sFe         "0.06592687" "0.06450814" "0.05914324" "0.06685606" "0.17118723"
## sCo         "0.06537482" "0.17706082" "0.05860378" "0.06648947" "0.06461609"
## sCu         "0.06468137" "0.06020831" "0.05202296" "0.18549716" "0.06137224"
## sZn         "0.07253390" "0.06849141" "0.05815597" "0.06554966" "0.06602860"
## sSe         "0.06645405" "0.06542968" "0.05908927" "0.06759298" "0.06424963"
## sMo         "0.06612855" "0.06592532" "0.05806667" "0.06770197" "0.06443651"
##             K            Mg           Mn           Mo           Na          
## (Intercept) "0.13967358" "0.10337728" "0.03548863" "0.06421089" "0.06868519"
## surveyMW    "0.07052139" "0.07137496" "0.07542040" "0.07535355" "0.07165266"
## cropsorg    "0.05881143" "0.06434777" "0.08412705" "0.06367788" "0.07114539"
## croppmil    "0.06202106" "0.06177015" "0.08182524" "0.05846509" "0.07625258"
## cropfmil    "0.02927039" "0.03431071" "0.15837161" "0.02851890" "0.08363254"
## cropteff    "0.03573809" "0.04434520" "0.11319059" "0.04264983" "0.07506240"
## croprice    "0.05421395" "0.06096401" "0.08741905" "0.06549242" "0.07408222"
## cropwhea    "0.05115823" "0.05256623" "0.10295078" "0.05521206" "0.07429650"
## cropbarl    "0.05541840" "0.05553790" "0.08408918" "0.06465512" "0.08054261"
## croptrit    "0.05796108" "0.05496355" "0.10215557" "0.05276087" "0.08985797"
## sNa         "0.06600227" "0.06623911" "0.06557101" "0.05958185" "0.16291257"
## sMg         "0.07222469" "0.16157692" "0.06784136" "0.05582606" "0.06281311"
## sP          "0.07057395" "0.07114097" "0.06614810" "0.06590906" "0.06223544"
## sK          "0.13355914" "0.06136845" "0.07180041" "0.05600991" "0.05694154"
## sCa         "0.07461532" "0.07288052" "0.06133489" "0.06707827" "0.05721526"
## sV          "0.06377966" "0.06422694" "0.06465936" "0.05768772" "0.05959124"
## sCr         "0.06587727" "0.06657474" "0.06661032" "0.06045471" "0.06019176"
## sMn         "0.06787391" "0.06871366" "0.16183295" "0.06408248" "0.05842051"
## sFe         "0.06518467" "0.06571143" "0.06587731" "0.05964111" "0.05963263"
## sCo         "0.06450003" "0.06510932" "0.06554229" "0.05885000" "0.05889619"
## sCu         "0.06672052" "0.06699573" "0.06524150" "0.05987955" "0.06002971"
## sZn         "0.06273129" "0.06522905" "0.07170000" "0.05583136" "0.06018080"
## sSe         "0.06575192" "0.06632102" "0.06629921" "0.06039560" "0.06011949"
## sMo         "0.06603368" "0.06647625" "0.06594189" "0.16238239" "0.05985989"
##             P            Se           V            Zn          
## (Intercept) "0.10523242" "0.03739426" "0.09297225" "0.08628940"
## surveyMW    "0.07095181" "0.06832620" "0.07164127" "0.07112352"
## cropsorg    "0.05978853" "0.07945118" "0.07284748" "0.05779720"
## croppmil    "0.06175547" "0.07382761" "0.06500578" "0.06587028"
## cropfmil    "0.02687560" "0.05383566" "0.05938923" "0.02669130"
## cropteff    "0.03943781" "0.08499410" "0.07329037" "0.03878714"
## croprice    "0.05989787" "0.07751418" "0.07185106" "0.05974876"
## cropwhea    "0.05167809" "0.07733707" "0.06570865" "0.05332015"
## cropbarl    "0.05644313" "0.07840692" "0.06832891" "0.05581472"
## croptrit    "0.05516139" "0.05966311" "0.05432746" "0.05512389"
## sNa         "0.06572388" "0.05720034" "0.06759968" "0.06590899"
## sMg         "0.07493277" "0.05671792" "0.05507774" "0.07319957"
## sP          "0.15792500" "0.05495324" "0.06926610" "0.06787482"
## sK          "0.05979780" "0.06986869" "0.07352137" "0.05639273"
## sCa         "0.07398477" "0.05826225" "0.06592207" "0.07445180"
## sV          "0.06348762" "0.05810425" "0.18197687" "0.06439659"
## sCr         "0.06598664" "0.05874247" "0.06726106" "0.06642152"
## sMn         "0.06832955" "0.05821523" "0.06694556" "0.06802140"
## sFe         "0.06516269" "0.05740634" "0.06814952" "0.06561276"
## sCo         "0.06457117" "0.05843785" "0.06719441" "0.06475376"
## sCu         "0.06550042" "0.05795355" "0.06577881" "0.06811816"
## sZn         "0.06293523" "0.06053180" "0.07229682" "0.15780412"
## sSe         "0.06594113" "0.15868450" "0.06737834" "0.06629318"
## sMo         "0.06603853" "0.05830890" "0.06627900" "0.06642045"
## attr(,"class")
## [1] "acomp"

6 Main takeaways

This GeoNutrition prediction challenge shares features with other environmental risks to humans: (1) nutritional risks from consuming cereal staples are geographically dispersed but have strongly expressed spatial patterns; (2) readily available information exists to predict risky areas; (3) while it is possible to perform staple crop measurements to identify the risks at individual locations, it would be prohibitively expensive and logistically impossible to measure every location and rural household to apply specific mitigation measures such as staple cropping system changes, biofortification and/or supplementation. These need to be modeled, predicted, and monitored.
“Predictive accuracy is substantially improved by blending multiple predictors. Our experience is that most efforts should be concentrated in deriving substantially different approaches, rather than refining a single technique. Consequently, our solution is an ensemble of many methods.” (Bell, Koren and Volinsky, 2007). This successful strategy by the Netflix prize winners applies here as well using “out of the box” feature transformation approaches and machine learning algorithms.
With monitoring, these types of relatively simple prediction models could be deployed for e.g., estimating nutrient amounts per serving of a given crop and the % daily values similar to those shown on the supplement bottle in Fig. 2 … but with the respective labels placed and updated on a national (GeoNutrition) map or information service.
One additional thing that would be needed in this context would be accurate mixed staple crop and livestock distribution data, predictions and maps … see the notebook for Tanzania and Rwanda, which covers that topic in depth here.
Because the current data set is fairly small (by machine learning standards), there are caveats with regard to how well the current predictions generalize to below analytical detection limits, unseen, or unmeasured regions of interest (see e.g., Figs. 5 & 6). Our main intent for this notebook was to provide a computational example of how ionomic information might be gathered more cost-effectively in the future. However, it is also quite encouraging to be able to show that for two countries as environmentally different as Ethiopia and Malawi common, spatial, ionomic relationships and spatial patterns can be learned reliably using our proposed approach.
Generating more field and laboratory data are not expected to be cost-prohibitive for either national governments or international donors and could be accomplished quickly. In fact, the models presented here should be further exposed to georeferenced validation data. We will provide more information about this and will also suggest “GeoNutrition” standard operating procedures in a subsequent notebook at some point.

Any questions, comments or corrections related to this notebook are most welcome via AFSIS, and we will attempt to include them in any updates.

7 Definitions and key literature by GPT 4.0

7.1 Food Systems

Prompt: in 300 words summarize the state of knowledge about food systems including in text citations

Food systems are complex networks of activities, organizations, and actors involved in the production, processing, distribution, and consumption of food. Over the past few decades, the study of food systems has gained significant attention as it is critical for addressing global challenges such as hunger, malnutrition, food insecurity, and environmental sustainability.

One important area of research in food systems is understanding the impact of agriculture on the environment. According to a study by Foley et al. (2011), agriculture is responsible for 80% of global deforestation, 70% of freshwater use, and 30% of greenhouse gas emissions. Furthermore, food production is a significant contributor to soil erosion, water pollution, and loss of biodiversity. As such, there is a growing interest in sustainable agriculture practices such as agroecology, which aims to promote the use of ecological principles and biodiversity to enhance agricultural productivity and resilience (Altieri, 2002).

Another area of research in food systems is the impact of globalization on food production and consumption patterns. As argued by McMichael (2004), globalization has led to the emergence of a global food system characterized by the concentration of power and resources in the hands of a few transnational corporations, the standardization of food production and consumption, and the displacement of small-scale farmers and local food systems. This has resulted in a number of negative consequences, including the loss of food sovereignty, increased food insecurity, and the erosion of cultural diversity (Clapp & Fuchs, 2009).

Finally, there is a growing recognition of the need to address food system challenges through a holistic and integrated approach that takes into account the social, economic, and environmental dimensions of food systems. This is reflected in the concept of food system transformation, which aims to create more sustainable, equitable, and resilient food systems through a range of interventions such as policy reforms, institutional innovations, and community-led initiatives (HLPE, 2021).

In conclusion, the study of food systems is critical for addressing some of the most pressing global challenges. While there have been significant advances in our understanding of food systems, there is still much to be learned and achieved in terms of building more sustainable, equitable, and resilient food systems for all.

Some key references for food systems (to 2021):

Altieri, M. A. (2002). Agroecology: the science of natural resource management for poor farmers in marginal environments. Agriculture, ecosystems & environment, 93(1-3), 1-24.
McMichael, P. (2004). Globalization, development and food sovereignty. Third World Quarterly, 25(6), 1079-1088.
Clapp, J., & Fuchs, D. (2009). Corporate power in the agro-food system. In: The political economy of food and nutrition (pp. 67-92). Routledge.
Foley, J. A., Ramankutty, N., Brauman, K. A., Cassidy, E. S., Gerber, J. S., Johnston, M., … & Zaks, D. (2011). Solutions for a cultivated planet. Nature, 478(7369), 337-342.
HLPE. (2021). Food Systems and Nutrition: Report of the High Level Panel of Experts on Food Security and Nutrition of the Committee on World Food Security. Rome: FAO.

7.2 Ionomics

Prompt: define ionomics in response to environmental factors including key citations

Ionomics is an interdisciplinary research field that investigates the composition, distribution, and function of ions (charged atoms or molecules) in organisms, primarily plants (Baxter, 2010). This field provides a holistic view of the ionome - the complete set of mineral nutrients, trace elements, and other ions in an organism (Lahner et al., 2003). Ionomics aims to understand how the ionome is regulated and affected by genetic and environmental factors, such as soil conditions, nutrient availability, and stress (Salt et al., 2008). It also seeks to identify genes, proteins, and regulatory networks involved in ion homeostasis, transport, and storage (Baxter et al., 2008). This knowledge has implications for agriculture, environmental science, and human health.

In agriculture, ionomics can lead to the development of crops with improved nutrient use efficiency, resistance to mineral toxicity, and enhanced nutrient content in edible tissues (White & Broadley, 2009). This can help address global food security and malnutrition issues, especially in regions with poor soil quality or limited access to mineral fertilizers.

Ionomics also plays a crucial role in understanding plant adaptation to environmental stress. For instance, ionomic profiling has revealed how plants adjust their ionome in response to conditions such as drought, salinity, and heavy metal contamination (Baxter, 2010; Ziegler et al., 2013). This information can be used to breed or genetically engineer plants with improved resilience to these challenges, helping to maintain agricultural productivity in the face of climate change and other environmental pressures.

In human health, ionomic studies have uncovered links between ionome composition and various diseases, including cardiovascular disease, cancer, and neurodegenerative disorders (Krämer, 2010). Further exploration of these relationships could lead to novel diagnostic tools, preventive measures, and therapeutic strategies.

One of the key challenges in ionomics is to develop high-throughput and cost-effective techniques for ionomic profiling (Baxter, 2010). Advances in analytical chemistry, such as inductively coupled plasma mass spectrometry (ICP-MS) and X-ray fluorescence microscopy (XRF), have allowed for rapid and precise measurements of ion concentrations in various biological samples (Salt et al., 2008). These technologies, combined with bioinformatics tools, enable researchers to analyze large datasets and … .

Overall, ionomics has the potential to transform our understanding of the roles of ions in food systems. By leveraging advances in technology and interdisciplinary research, ionomics can contribute to developing innovative solutions for some of the world’s most pressing challenges in agriculture, environmental science, and human health.

Some key references for ionomics (to 2021):

Baxter, I. (2010). Ionomics: studying the social network of mineral nutrients. Current Opinion in Plant Biology, 13(3), 381-386.
Baxter, I., Ouzzani, M., Orcun, S., Kennedy, B., Jandhyala, S. S., & Salt, D. E. (2007). Purdue ionomics information management system. An integrated functional genomics platform. Plant Physiology, 143(2), 600-611.
Krämer, U. (2010). Metal hyperaccumulation in plants. Annual Review of Plant Biology, 61, 517-534.
Lahner, B., Gong, J., Mahmoudian, M., Smith, E. L., Abid, K. B., Rogers, E. E., … & Salt, D. E. (2003). Genomic scale profiling of nutrient and trace elements in Arabidopsis thaliana. Nature Biotechnology, 21(10), 1215-1221.
Salt, D. E., Baxter, I., & Lahner, B. (2008). Ionomics and the study of the plant ionome. Annual Review of Plant Biology, 59, 709-733.
White, P. J., & Broadley, M. R. (2009). Biofortification of crops with seven mineral elements often lacking in human diets – iron, zinc, copper, calcium, magnesium, selenium, and iodine. New Phytologist, 182(1), 49-84.
Ziegler, G., Terauchi, A., Becker, A., Armstrong, P., Hudson, K., & Baxter, I. (2013). Ionomic screening of field-grown soybean identifies mutants with altered seed elemental composition. The Plant Genome, 6(2), 1-9.

7.3 Compositional data

Prompt: define compositional data analysis including key citations

Compositional Data Analysis (CoDA) is a statistical methodology used to analyze data that consists of proportions, percentages, or other relative measures. It is primarily applied to situations where the data represents parts of a whole or a constrained system, and the sum of the individual components is constant (e.g., 100% or 1). CoDA is particularly useful for analyzing data that exhibits compositional behavior, such as dealing with closure effects and spurious correlations.

Some key references for CoDA:

Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society: Series B (Methodological), 44(2), 139-160.
Aitchison, J. (1986). The statistical analysis of compositional data. Monographs on statistics and applied probability. Chapman & Hall Ltd.
Pawlowsky-Glahn, V., & Buccianti, A. (2011). Compositional Data Analysis: Theory and Applications. John Wiley & Sons.
Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G., & Barceló-Vidal, C. (2003). Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35(3), 279-300.
Templ, M., Hron, K., & Filzmoser, P. (2011). robCompositions: An R-package for robust statistical analysis of compositional data. In Pawlowsky-Glahn, V., & Buccianti, A. (Eds.), Compositional Data Analysis: Theory and Applications (pp. 341-355). John Wiley & Sons.

The first two citations, both by John Aitchison, established the foundation of CoDA and introduced the log-ratio approach to address the unique properties of compositional data. The third citation is a comprehensive book by Pawlowsky-Glahn and Buccianti that covers both the theory and applications of CoDA. Egozcue et al.’s paper (citation 4) introduced the isometric logratio (ilr) transformation as an alternative to the clr (centered logratio) transformation. Finally, the fifth citation describes an R-package called “robCompositions,” which provides tools for robust statistical analysis of compositional data.

Spatial predictions of cereal grain ionomes in Ethiopia and Malawi

M.G. Walsh, D.B. Kumssa, A.R. Mossa, T. Amede, P.C. Nalivata, D. Gashu, S.M. Haefele, S. Gameda, M.R. Broadley, E.J.M. Joy, E.H. Bailey, R.M. Lark, R. Rodríguez Iglesias, B.A. Walsh, J. Chamberlin and S. Snapp

09, May, 2023