Staple foods make up the dominant food energy and nutrient intake in a population’s diet. While there may be >50,000 edible plants in the world, just 15 of these provide ~90% of the world’s gross food energy (GFE). Rice, maize, and wheat alone make up ~60% of the world’s GFE. Other, more regional/local staples include: millet and sorghum, pulses such as beans, lentils and chickpeas, roots and tubers such as potatoes, sweet potatoes, cassava, yams, and taro, and animal foods such as meat, fish, eggs and dairy products (see FAO). In turn, the combinations of staple foods that are consumed by people are thought to have specific population-level nutritional and environmental outcomes, which are currently not well described locally.
Association rules (AR) are used to answer common exploratory questions like: If someone walks into a supermarket and buys ground beef and french fries, what is the probability that this person would also buy burger buns? This is why it is also often referred to as market basket analysis. Market basket analysis is a method of discovering customer purchasing patterns by extracting interesting co-occurrences from databases. For example, an if-then rule \(\{ground\ beef,\ french\ fries\} \Rightarrow \{burger\ buns\}\) in the sales data of a supermarket might indicate that if customers were to buy ground beef and french fries together, they may also wish to buy burger buns. If that AR is frequent (enough) within a set of purchases at the store, the information might be used as a basis for decisions about store management such as e.g., targeted advertisement, promotional pricing and/or product placements within the store.
Many supermarket chains in the US and elsewhere are using these types of rule-based approaches for managing stores. This is why, for example, aisles in US supermarkets are arranged the way that they are … milk (a frequently bought item) is nearly always located toward the rear of the store to encourage impulse buying. Impulse purchase items, such as baked goods and chewing gum, are placed toward the front of the store. Tonic is often close to gin, jam is close to bread and peanut butter, and chips are close to salsa. Diapers and beer for targeting young fathers is a parable, but it’s funny. There are also usually coupons or other offers for items that are likely to be purchased together. When browsing what is on offer look for products on either the top or bottom of the shelves as the eye-level shelves tend to be more pricey.
Items that are purchased together as part of a shopping list present a special class of labeling problems for machine learning and various statistical applications because they contain multiple labels, which may be interdependent. The figure below illustrates the structure of multi-label data relative to binary and multi-class data types that occur much more frequently in data mining and spatial prediction tasks.
From a data mining and machine learning perspective, the majority of East African food production systems resemble an itemized receipt from a supermarket, in which fields (shopping carts) tend to be mixed with several staple food crops grown simultaneously or in seasonal sequence, rather than being monocropped as they are in e.g., Iowa, Ukraine and South Africa. Even quite small, 0.25 ha, cropland areas are commonly planted with several staple food crops during any given season (see the pictures below). However, which specific crops are grown together in a given small area, and how these combinations may be evolving due to internal and environmental forces remain uncertain and largely unquantified. We’ll use ARs and multi-label classification in this notebook to explore and map cropping systems based on the occurrence of 20 georeferenced, staple crop indicator species that can subsequently be used for crop diversity monitoring applications.