1 Introduction

Fires are a natural and beneficial process in landscapes. Fires recycle nutrients into the soil, and play an essential role in primary and secondary succession of plant communities in rangelands, forests and wetlands / peatlands. In managed rangelands and forests prescribed burns are used for woody vegetation as well as understory vegetation control. However, when vegetation burns, it emits greenhouse gases (GHGs) and various other materials into the atmosphere. Wildfires can also be extremely destructive when they impinge on urban areas. In many regions around the world the increasing frequency and severity of wildfires, exacerbated by climate change, have now underscored the need for near-real-time spatial fire forecasting for land management and disaster response.

In this notebook, I develop a predictive Bayesian repeated measures approach on a Discrete Global Grid (DGG). The approach utilizes openly available fire detection time series from the Visible Infrared Imaging Radiometer Suite - Suomi National Polar-orbiting Partnership (VIIRS-SNPP) satellite for Uganda (2012 – 2023). VIIRS-SNPP, which operates globally on all fire affected areas, is equipped with multiple spectral bands that are sensitive to the wavelengths of light emitted by fires. Specifically, it uses the middle-infrared (MIR) and thermal-infrared (TIR) bands to detect the heat signature of fires. The MIR band is particularly useful for identifying hotspots that indicate active fires, while the TIR band helps in estimating the temperature of the detected fires.

The goal is to uncover spatial patterns and temporal trends that are often overlooked. By doing so, the associated models can forecast potential hotspots in Uganda for the upcoming year. They can also be used for fire risk mapping and management. This data-driven approach could enable more strategic efforts and use cases to mitigate wildfire risks and impacts while harnessing their ecological benefits, and provide a basis for better-informed land management policies. The aim is not just to react to fires; it’s about anticipating them. Such a proactive stance in fire management could significantly reduce the associated risks and their environmental impacts.

This R markdown document is maintained and versioned on the Uganda OSF repository here, from where it can be downloaded and modified. Please cite this work as: M.G. Walsh (2024). Uganda wildfires – A Bayesian approach for forecasting VIIRS-SNPP fires in a space-time cube. https://doi.org/10.17605/OSF.IO/4RMKP.

2 Example use cases

2.1 Prescribed burns for vegetation and soil management

Prescribed burns, also known as controlled burns, are an essential tool in managing vegetation, reducing wildfire risks, and enhancing soil fertility. Utilizing the wildfire forecasting system, land management agencies can identify optimal times and locations for prescribed burns that minimize risk and maximize ecological benefits. Controlled fires help to clear underbrush, dead trees, and other flammable materials, reducing the fuel available for wildfires. They also play a crucial role in maintaining ecosystem health by promoting the growth of native plant species and improving the availability of soil nutrients and reducing soil acidity where it is present.

The forecasting system assesses various factors, including historical patterns such as those used in this notebook, but also including, ground survey investigations, weather conditions, vegetation types, and fire behavior models, to determine the ideal periods for conducting prescribed burns. This data-driven approach ensures that burns are conducted safely, with minimal impact on air quality and surrounding areas. The process involves:

Planning: Identifying target areas for vegetation management based on fire risk assessments and ecological goals.
Timing: Selecting the optimal time for burns, considering weather patterns, humidity levels, and wind conditions to control the burn effectively and safely.
Execution: Implementing the prescribed burn under controlled conditions with prepared firefighting resources on standby to manage any unforeseen spread of fire.
Monitoring: Continuously monitoring the burn area and surrounding regions for any signs of unintended fire spread and assessing the effectiveness and ecological impact of the burn.
Post-burn analysis: Evaluating the impact on vegetation control, soil fertility enhancement, and the reduction in wildfire risk to inform future prescribed burn plans and adjust strategies as necessary.

Incorporating prescribed burns into wildfire management strategies, guided by precise forecasting and environmental data, ensures that these activities are both beneficial for land management and instrumental in mitigating the threat of uncontrolled wildfires. This proactive approach seeks to support biodiversity, aids in the restoration of natural fire regimes, and contributes to the health and resilience of forests, rangelands and wetlands/peatlands.

2.2 Integrated wildfire management system for emergency response and public health

This integrated approach combines early warning systems, firefighting strategy optimization, and public health advisories into a comprehensive wildfire management system. Leveraging advanced forecasting technologies, the system enhances emergency preparedness, directs firefighting resources efficiently, and minimizes public health risks from wildfire smoke. By predicting wildfire outbreaks and behavior, it enables timely evacuation warnings, strategic deployment of firefighting resources, and proactive public health measures to safeguard communities and ecosystems including:

Early warning and evacuation planning: The system uses predictive modeling to identify areas at high risk of wildfires, issuing early warnings to those regions. It aids in devising evacuation plans, ensuring safety protocols are in place for vulnerable communities.
Resource allocation and firefighting strategy: Based on forecasted fire behavior and spread patterns, the system allocates firefighting resources strategically. It prioritizes areas for intervention, optimizes the use of aerial and ground firefighting assets, and identifies safe and effective firefighting lines.
Public health advisory: By predicting the direction and dispersion of wildfire smoke, the system issues health advisories to affected communities, recommending precautions to reduce exposure to smoke and safeguarding public health. It also guides the implementation of air quality management strategies in real-time.
Operational execution: Operational teams use the system’s insights to execute evacuation plans, firefighting operations, and public health advisories. Continuous monitoring and real-time data feed into the system, allowing for dynamic adjustments to strategies as conditions evolve.
Impact assessment and feedback loop: Post-event analysis evaluates the effectiveness of the response strategies, the accuracy of forecasts, and the health outcomes. This feedback is used to refine models, improve prediction accuracy, and enhance response strategies for future incidents.

This use case emphasizes a coordinated and data-driven approach to managing wildfires, integrating emergency response, firefighting efforts, and public health considerations. By leveraging advanced forecasting and analytics, the system ensures that responses are timely, resources are utilized efficiently, and communities are protected against the myriad risks posed by wildfires.

2.3 Wildfire urban interface management system

The Wildfire Urban Interface (WUI) Management System is designed to specifically address the unique challenges of managing wildfires in areas where urban development meets or intermingles with wildland or vegetative fuels. This system focuses on reducing the vulnerability of these areas through strategic planning, community engagement, and the integration of fire-adapted building and landscaping practices. Utilizing predictive analytics and risk assessment tools, the system aids in identifying high-risk zones, optimizing mitigation efforts, and ensuring rapid response capabilities to protect lives, properties, and ecosystems. Implementing a comprehensive approach involves collaboration between fire management agencies, urban planners, and the community to develop and enforce strategies tailored to the WUI’s unique conditions:

Risk assessment and mapping: Employing advanced spatial analysis and modeling to identify high-risk areas within the WUI, considering factors such as vegetation type, topography, climate conditions, and urban density. This assessment informs the development of detailed risk maps that guide mitigation and response efforts.
Mitigation strategies: Based on risk assessments, the system facilitates the planning and implementation of targeted mitigation strategies, such as creating defensible spaces around properties, enforcing fire-resistant building codes, and managing vegetation to reduce fuel loads.
Community engagement and education: Developing and deploying educational programs aimed at residents and property owners in the WUI to promote awareness of wildfire risks and encourage proactive measures to reduce vulnerability, such as adopting fire-smart landscaping practices and participating in community fire preparedness programs.
Emergency planning and response coordination: Integrating real-time data from the forecasting system to support emergency planning, including evacuation routes, shelter locations, and communication plans. Enhancing coordination among firefighting units, emergency services, and community response teams to ensure rapid and effective response to wildfires.
Recovery and resilience building: Post-fire assessments to evaluate the effectiveness of mitigation measures and response strategies, with a focus on learning and adaptation. Initiatives to rebuild affected areas with enhanced resilience, incorporating fire-adapted designs and materials in reconstruction efforts, and restoring natural landscapes to reduce future wildfire risks.

The Wildfire Urban Interface Management System represents a holistic approach to reducing the impact of wildfires in vulnerable urban-edge communities. By combining predictive analytics with strategic planning and community engagement, this system aims to create more resilient WUI areas that can withstand and recover from wildfire events, thereby safeguarding human life and property.

2.4 Climate Change Adaptation for Rural Communities

This use case focuses on leveraging the wildfire forecasting system to aid rural communities in adapting to the challenges posed by climate change. As these areas often face heightened risks of wildfires due to changing climate conditions, the system provides insights for long-term planning and resilience building. It assesses how shifting weather patterns, increased temperatures, and altered precipitation levels contribute to wildfire risks, offering rural communities evidence-based strategies for adaptation and mitigation. The adaptation process involves a series of coordinated steps, integrating the forecasting system’s capabilities with community planning and environmental management practices:

Vulnerability assessment: Utilizing the forecasting system to analyze historical and predictive data on climate change impacts specific to rural regions. This includes identifying areas most at risk of experiencing increased wildfire frequency and intensity.
Adaptation strategy development: Based on the vulnerability assessment, developing comprehensive adaptation strategies that address both immediate and long-term needs. This may include altering land use practices, enhancing vegetation management to reduce fuel loads, and implementing water conservation measures to mitigate drought conditions.
Community capacity building: Engaging with local communities through workshops, training sessions, and educational programs to raise awareness about climate change impacts on wildfire risks. Empowering residents with the knowledge and tools to participate in mitigation efforts, such as creating defensible spaces and adopting fire-resistant building materials and techniques.
Emergency response planning: Incorporating climate-adapted wildfire forecasts into emergency preparedness plans, ensuring that rural communities have updated evacuation plans, emergency supplies, and communication systems in place to respond effectively to wildfire threats.
Monitoring: Establishing a feedback loop where ongoing climate and wildfire risk assessments inform adjustments to adaptation strategies, ensuring they remain effective as conditions change.

This use case emphasizes the role of advanced fire forecasting in supporting rural communities’ efforts to adapt to climate change. By providing actionable intelligence on future wildfire risks, the system enables these communities to implement targeted adaptation measures, build resilience, and reduce their vulnerability to the increasing threats posed by wildfires in a changing climate.

3 Setup

To run this notebook, you will need to install and load the R-packages indicated in the chunk directly below.

# Package names
packages <- c("osfr", "tidyverse", "patchwork", "rgdal", "sf", "raster",
              "zoo", "brms", "mgcv", "ggridges", "DT")

# Install packages
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
    install.packages(packages[!installed_packages])
}

# Load packages
invisible(lapply(packages, library, character.only = TRUE))

3.1 Read VIIRS fire detections

The archived VIIRS-SNNP fire data are openly available at FIRMS. Do note the you have to place a request for the newer (NRT) data for 2022 – 2023. You can get more information at what level those data have been processed to from the FIRMS FAQ. I have put a clean, concatenated file in the Uganda OSF repository here, and you can download it manually from there. You can also use the following osfr download option (recommended):

# download VIIRS data from OSF
osf_retrieve_node("np3wy") %>%
osf_ls_files(n_max = Inf) %>%
osf_download(path = "./fires", conflicts = "overwrite")

viirs <- read.table("./fires/viirs_ts.csv", header = TRUE, sep = ",")

The following is an animation of the VIIRS fire detection data for the years 2012 – 2023 on a monthly time step. Each VIIRS active fire/thermal hotspot location represents the center of a 375 meter pixel. Wildfires are commonplace in Uganda, and the distinctive space-time oscillation pattern that repeats every year, synchronizes with the latitudinal movement of the Intertropial Convergence Zone (ITCZ) near the equator.

You can download the .gif from the Uganda OSF repository here. Stepping though the .gif manually (144 frames) can help with identifying potentially anomalous months. If you have a Google Maps API key you can also recreate the animation from the markdown document that is available here.

4 ETL

# convert 'acq_date' to Date and extract components
viirs <- viirs %>%
  mutate(time = as.Date(acq_date),
         year_month = format(time, "%Y-%m"),
         year = year(time),
         month = month(time))

4.1 Raw data national time series

# aggregate data by year and month
monthly_counts <- viirs %>%
  group_by(year_month) %>%
  summarize(count = n(), lat = mean(latitude), lon = mean(longitude))

# convert to a zoo time series
count_data <- zoo(monthly_counts$count, order.by = as.yearmon(monthly_counts$year_month))
lat_data <- zoo(monthly_counts$lat, order.by = as.yearmon(monthly_counts$year_month))
lon_data <- zoo(monthly_counts$lon, order.by = as.yearmon(monthly_counts$year_month))

# dataframes
count_df <- data.frame(
  date = index(count_data),
  count = coredata(count_data))

lat_df <- data.frame(
  date = index(lat_data),
  lat = coredata(lat_data))

lon_df <- data.frame(
  date = index(lon_data),
  lon = coredata(lon_data))

# subplots
p1 <- ggplot(count_df, aes(x = date, y = count)) +
  geom_line(color = "black") +
  labs(y = "VIIRS fire frequency", x = NULL) +
  theme_minimal() +
  theme(plot.margin = margin(5.5, 5.5, 0, 5.5, "pt"),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank())

p2 <- ggplot(lat_df, aes(x = date, y = lat)) +
  geom_line(color = "black") +
  labs(y = "Latitude", x = NULL) +
  theme_minimal() +
  theme(plot.margin = margin(12, 5.5, 0, 5.5, "pt"),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank())

p3 <- ggplot(lon_df, aes(x = date, y = lon)) +
  geom_line(color = "black") +
  labs(y = "Longitude", x = "Date") +
  theme_minimal() +
  theme(plot.margin = margin(12, 5.5, 5.5, 5.5, "pt"))

nat_ts <- p1 / p2 / p3

ggsave("./fires/results/UG_VIIRS_time_series.png", plot = last_plot(),
       width = 6.5, height = 7, units = "in")

Figure 1. Uganda VIIRS-SNPP monthly fire frequencies, average latitudinal and longitudinal positions of fires at the national level (2012 - 2023).

Figure 1 shows the repeating space-time pattern of fire detections in Uganda. The peak of the fire season occurs in December – January, centered on the North-Eastern part of the country. The pattern subsequently attenuates in a South-Westerly direction during the 2^nd and 3^rd quarters of each year, as the ITCZ moves closer to the equator, and because of the presence of large water bodies such as Lakes Victoria, Kyoga, and Albert.

4.2 Aggregate to DGG

In the next sections, I aggregate the data on a discreet grid with a Lambert azimuthal equal-area projection CRS(+proj=laea +ellps=WGS84 +lon_0=20 +lat_0=5 +units=m +no_defs). This transformation is aimed at simplifying the upcoming modeling and forecasting efforts by converting the original data, which has a resolution of 375 meters, into larger, equal-area grid cells. This approach not only facilitates easier data handling and analysis but also allows for integration with other datasets (e.g., MODIS) and models that have been reverse geocoded in this format.

I initially filter the data to exclude the low confidence fire pixels in the data. Confidence values are set to low, nominal, and high. “Low confidence daytime fire pixels are typically associated with areas of Sun glint and lower relative temperature anomaly (<15 K) in the mid-infrared channel I4. Nominal confidence pixels are those free of potential Sun glint contamination during the day and marked by strong (>15 K) temperature anomaly in either day or nighttime data. High confidence fire pixels are associated with day or nighttime saturated pixels” (consult the FIRMS FAQ).

# select high and nominal confidence fires
viirs <- viirs %>% 
  filter(confidence %in% c("h", "n"))

viirs_agg <- viirs %>%
  group_by(latitude, longitude, year, month) %>%
  summarize(viirs = n(), .groups = 'drop')

The next chunks project the data to the LAEA grid and calculate the reverse geocodes at different, nested levels of resolution.

# project VIIRS coords to grid CRS
viirs.proj <- as.data.frame(project(cbind(viirs_agg$longitude, viirs_agg$latitude), 
"+proj=laea +ellps=WGS84 +lon_0=20 +lat_0=5 +units=m +no_defs"))
colnames(viirs.proj) <- c("x","y")
fire <- cbind(viirs_agg, viirs.proj)

xy <- fire[ ,c(6:7)]

# specify vector of resolution values (in meters) for each grid level
resolution <- c(120000, 60000, 30000, 15000, 7500)

# initialize an empty list to store the results
gids <- list()

# loop over each resolution value
for (i in 1:length(resolution)) {
    res.pixel <- resolution[i]
    xgid <- ceiling(abs(xy$x)/res.pixel)
    ygid <- ceiling(abs(xy$y)/res.pixel)
    gidx <- ifelse(xy$x < 0, paste("W", xgid, sep=""), paste("E", xgid, sep=""))
    gidy <- ifelse(xy$y < 0, paste("S", ygid, sep=""), paste("N", ygid, sep=""))
    gids[[i]] <- paste(gidx, gidy, sep="")
}

# append GIDs to `gsdat`
gids <- data.frame(gids)
names(gids) <- c("L1", "L2", "L3", "L4", "LN")
fire <- cbind(gids, fire)
str(fire)

## 'data.frame':    1071063 obs. of  12 variables:
##  $ L1       : chr  "E10S6" "E10S6" "E10S6" "E10S6" ...
##  $ L2       : chr  "E19S12" "E19S12" "E19S12" "E19S12" ...
##  $ L3       : chr  "E37S24" "E37S24" "E38S24" "E38S24" ...
##  $ L4       : chr  "E74S48" "E74S48" "E75S47" "E75S47" ...
##  $ LN       : chr  "E148S95" "E148S95" "E150S94" "E150S94" ...
##  $ latitude : num  -1.44 -1.44 -1.42 -1.42 -1.42 ...
##  $ longitude: num  29.9 30 30.1 30.1 29.9 ...
##  $ year     : num  2016 2016 2023 2023 2021 ...
##  $ month    : num  7 8 8 8 6 6 5 8 7 7 ...
##  $ viirs    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ x        : num  1107460 1108399 1120869 1120623 1104132 ...
##  $ y        : num  -706471 -705571 -703641 -703569 -703609 ...

4.3 Set model GID-level

There are some decisions to be made here regarding the level of spatial and temporal resolution at which to aggregate the time series for modeling. This represents a kind of “Goldilocks” problem, where setting a high resolution incurs a significant computational cost, making it difficult to explore and test different model scenarios on a standard computer. Conversely, setting the resolution too low may be insufficiently informative for realistic space-time analyses, as well as for national and/or local-level wildfire management purposes. The choice of satellite system (e.g., Sentinel-1 at 5×20 meters every 6 days, VIIRS-SNPP at 375 meters daily, and MODIS Terra & Aqua at 1 km four times per day) establishes the upper limit for resolution. The land area to be covered determines the extent to which one should aggregate the data without losing the spatial or temporal detail that is necessary for accurate modeling and forecasting.

I have chosen 30,000 meters on a quarterly time step for the types of Bayesian models that I am training here. This appears to be near the upper limit for a standard computer with 16 cores and 32 Gb of ram. 15,000 meter resolution on a monthly time-step works well for smaller areas of Uganda. I expect that 60,000 meters on a quarterly time step would be feasible for modeling the East African region or large countries such as the Democratic Republic of the Congo. Everything else is likely to require the use of GPUs and/or cloud computing services. The script is easily adapted to accommodate those possibilities. The next chunks set this up.

# set gid level at which to group. Choose between L1, L2, L3 or L4 resolution
level <- "L3"

fire_grid <- fire %>%
  group_by(across(all_of(level)), year, month) %>%
  summarize(viirs = n(), .groups = 'drop')

4.4 Fill time series

Because there are GIDs that have zero VIIRS detections during certain months of the year, the next chunk fills in those zeros to complete the respective series, and writes them out as model inputs.

# set time index
fire_grid$time <- as.Date(paste(fire_grid$year, fire_grid$month, "01", sep = "-"))

# convert level column to factor
fire_grid[[level]] <- as.factor(fire_grid[[level]])

# calculate time range
start_date <- as.Date("2012-01-01")
end_date <- as.Date("2023-12-01")
full_time_range <- seq(from = start_date, to = end_date, by = "month")

# create a grid with all combinations of level and time
fire_time <- expand.grid(gid = unique(fire_grid[[level]]),
                         time = full_time_range)

# rename 'gid' to the value of level
names(fire_time)[names(fire_time) == "gid"] <- level

# merge expanded grid with the fire_grid df
fire_time <- merge(fire_time, fire_grid, by = c(level, "time"), all.x = TRUE)

# replace NA with 0 in the 'viirs' column
fire_time$viirs[is.na(fire_time$viirs)] <- 0

# merge with gid area files
if (level == "L1") {
    fire_time <- merge(fire_time, L1_area, by = "L1")
} else if (level == "L2") {
    fire_time <- merge(fire_time, L2_area, by = "L2")
} else if (level == "L3") {
    fire_time <- merge(fire_time, L3_area, by = "L3")
} else if (level == "L4") {
    fire_time <- merge(fire_time, L4_area, by = "L4")
}

# select only the necessary columns
fire_time <- fire_time[, c(level, "area", "time", "viirs")]

# write time series at GID-levels (1-4)
write.csv(fire_time, paste0("./fires/", level, "_viirs_ts.csv", sep = ""), row.names = FALSE)

4.5 L3-GID VIIRS space-time cube

The next chunk generates the needed space-time cube for models at L3-GID level of aggregation. Make sure that you have the L3_viirs_ts.csv in your ./fires directory.

# convert 'time' to Date and extract components
L3 <- L3 %>%
  mutate(time = as.Date(time),
         year = year(time),
         month = month(time),
         quarter = ceiling(month / 3))

# aggregate data by year and quarter
L3_qag <- L3 %>%
  group_by(L3, year, quarter) %>%
  summarize(area = mean(area), viirs = sum(viirs), .groups = "drop")
L3_qag <- as.data.frame(L3_qag)

# append next year for prediction
next_year <- L3_qag %>%
  select(L3, area) %>%
  distinct() %>%
  crossing(quarter = 1:4) %>%
  mutate(year = "2024", viirs = 0)

next_year <- next_year %>%
  select(L3, year, quarter, area, viirs)

# bind the next year data to the aggregated data
L3_qag <- rbind(L3_qag, next_year)

# calculate lagged VIIRS values
L3 <- L3_qag %>%
  arrange(L3, year, quarter) %>%
  mutate(
    lag_4 = lag(viirs, 4)) %>%
  ungroup()
L3 <- na.omit(L3)

# attach mean fire coordinates
coords <- fire %>%
  group_by(L3) %>%
  summarize(x = mean(x), y = mean(y), .groups = "drop")

# merge the aggregated data with coordinates
L3 <- left_join(L3, coords, by = "L3")

# prepare 'last' (for model calibration) and 'next' (to be forecast) data frames
L3$year <- as.numeric(L3$year)
# max_year <- max(L3$year)
L3_cal <- L3 %>% filter(year < 2023)
L3_test <- L3 %>% filter(year == 2023)
L3_new <- L3 %>% filter(year == 2024)
L3_all <- rbind(L3_cal, L3_test)

5 Models

This section illustrates the use of the brms package (Bürkner, P., 2017) in deceloping a spatial repeated measures fire forecasting approach. The package provides a comprehensive interface for Bayesian regression modeling using Stan, a platform for statistical modeling and high-performance statistical computation. Spatial repeated measures data require models that can account for the inherent correlations within each unit of observation i.e., multilevel data. brms allows for such specifications in its versatile formula syntax and its ability to include random effects and correlation structures. There are 4 models that are fit to the calibration data (2013 – 2022) in increasing order of complexity and computing time.

Note that you do not have to fit any of the models yourself, … which takes a bit of time. Instead, you can download all of the prefitted .rds model objects from the Uganda OSF repository manually, or (preferably) with the following:

# download VIIRS data from OSF
osf_retrieve_node("dma5p") %>%
osf_ls_files(n_max = Inf) %>%
osf_download(path = "./fires/results", conflicts = "overwrite")

5.1 L3-GID average quarterly wildfire frequencies

This initial section is primarily intended to ensure that the data are forecast model ready, and to demonstrate that the outputs can be rasterized and easily mapped to the discrete grid. I initially run a relatively simple model that computes the 10-year quarterly fire frequency averages at the L3-GID level as follows:

frL3 <- brm(data = L3_cal,
           family = gaussian(),
           formula = log(viirs + 1) ~ 1 + (1 | L3/quarter),
           # prior = priors,
           iter = 4000, warmup = 1000, chains = 4, cores = 8,
           control = list(adapt_delta = .975, max_treedepth = 20),
           seed = 1235813)

saveRDS(frL3, "./fires/results/frL3_frisk.rds")

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: log(viirs + 1) ~ 1 + (1 | L3/quarter) 
##    Data: L3_cal (Number of observations: 12536) 
##   Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
##          total post-warmup draws = 12000
## 
## Group-Level Effects: 
## ~L3 (Number of levels: 285) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.74      0.07     0.59     0.88 1.00     1488     2645
## 
## ~L3:quarter (Number of levels: 1140) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.62      0.04     1.54     1.70 1.00     3790     7236
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     2.52      0.07     2.39     2.66 1.00    11088     8973
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.90      0.01     0.89     0.91 1.00    12070     8170
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

The next chunk generates the posterior predictions and their 95% credible intervals for quarters nested within 285 L3-GIDs.

frisk <- predict(frL3)
frisk <- cbind(L3_cal, frisk)

frisk <- frisk %>%
  group_by(L3, quarter) %>%
  summarize(viirs = mean(viirs), x = mean(x), y = mean(y), .groups = "drop")

frisk_pred <- predict(frL3, newdata = frisk)
frisk_pred <- cbind(frisk, frisk_pred)

Spatial points data frames of the posterior predictions by quarters can then be generated with the sf package (Pebesma and Bivand, 2023) as follows:

# select by quarter
frisk_Q1 <- frisk_pred %>% filter(quarter == 1)
frisk_Q2 <- frisk_pred %>% filter(quarter == 2)
frisk_Q3 <- frisk_pred %>% filter(quarter == 3)
frisk_Q4 <- frisk_pred %>% filter(quarter == 4)

# convert to SpatialPointsDataFrame
Q1_sf <- st_as_sf(frisk_Q1, coords = c("x", "y"), crs = st_crs(L3_grid))
Q2_sf <- st_as_sf(frisk_Q2, coords = c("x", "y"), crs = st_crs(L3_grid))
Q3_sf <- st_as_sf(frisk_Q3, coords = c("x", "y"), crs = st_crs(L3_grid))
Q4_sf <- st_as_sf(frisk_Q4, coords = c("x", "y"), crs = st_crs(L3_grid))

The predictions are rasterized, and written out as a .tif file for post-processing and reuse.

# create rasters
Q1_est <- raster(L3_grid)
Q1_est <- rasterize(Q1_sf, Q1_est, field = "Estimate") 

Q2_est <- raster(L3_grid)
Q2_est <- rasterize(Q2_sf, Q2_est, field = "Estimate") 

Q3_est <- raster(L3_grid)
Q3_est <- rasterize(Q3_sf, Q3_est, field = "Estimate") 

Q4_est <- raster(L3_grid)
Q4_est <- rasterize(Q4_sf, Q4_est, field = "Estimate") 

# write raster stack
spreds <- brick(Q1_est, Q2_est, Q3_est, Q4_est)
fname <- paste("./fires/results/", "fr_quarters.tif", sep = "")
writeRaster(spreds, filename = fname, format = "GTiff", options = "INTERLEAVE=BAND", overwrite = TRUE)

A little bit of post-processing magic of the rasters (in GRASS GIS) produces Figure 2. The figure really clarifies the distinctive space-time fire frequency pattern with the main fire season occurring in 1^rst and 4^th quarters of each year. The questions now is how reliably this pattern can be used for forecasting next year’s fire season.

Figure 2. Average quarterly fire frequency maps at the L3-GID level (2013 - 2022).

The next 4 sections describe four models that are specifically tasked with generating fire forecasts for Uganda based solely on the observed historical patterns. For the models that are presented here I assume that the that fire frequency i.e., the dependent variable, is log normally distributed over the 10-year period of observation. This can be changed to use other GLM families (e.g, Poisson and negative binomial or the various zero-inflated varieties of these) and link (e.g. log) functions. I’ll leave those options for you to explore. In practice the log normal assumption works reasonably well, and also ensures that the model fits converge relatively quickly on a standard computer.

5.2 Fixed effects model

\[ \log(y_i + 1) = \beta_0 + \beta_1\log(x_{1,i} + 1) + s(x,y) + \epsilon_i \]

Where,

\(y_i\) is the fire count for observation \(i\),
\(x_{1,i}\) is the lagged fire count at time \(i\) for 1 time unit (quarter) back,
\(x_{4,i}\) is the lagged fire count at time \(i\) for 4 quarters back,
\(s(x,y)\) is a smooth term for the L3-grid center point coordinates x and y,
\(\beta_0\) is the overall intercept,
\(\beta_1\) and \(\beta_2\) are the coefficients for the log-transformed lagged values, and
\(\epsilon_i\) is the normally distributed error term for observation \(i\) with mean 0 and some standard deviation \(\sigma\).

fcL3 <- brm(data = L3_cal,
           family = gaussian(),
           formula = log(viirs + 1) ~ log(lag_4 + 1) + s(x,y),
           # prior = priors,
           iter = 4000, warmup = 1000, chains = 4, cores = 8,
           control = list(adapt_delta = .975, max_treedepth = 20),
           seed = 1235813)

saveRDS(fcL3, "./fires/results/fcL3_cal.rds")

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: log(viirs + 1) ~ log(lag_4 + 1) + s(x, y) 
##    Data: L3_cal (Number of observations: 12536) 
##   Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
##          total post-warmup draws = 12000
## 
## Smooth Terms: 
##            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sds(sxy_1)     1.95      0.29     1.45     2.60 1.00     1596     2684
## 
## Population-Level Effects: 
##            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept      1.06      0.02     1.02     1.10 1.00    17447     8873
## loglag_4P1     0.63      0.01     0.62     0.65 1.00    18039     9246
## sxy_1          4.77      0.83     3.15     6.42 1.00     6796     7953
## sxy_2         -0.69      0.83    -2.35     0.92 1.00     5683     7593
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.34      0.01     1.33     1.36 1.00    19608     8598
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

5.3 Random intercepts model

\[ \log(y_i + 1) = \beta_0 + \beta_1\log(x_{4,i} + 1) + u_{\text{quarter}[i]} + u_{\text{year}[i]} + u_{\text{L3}[i]} + \epsilon_i \]

Where,

\(y_i\) is the fire count for observation \(i\),
\(x_{4,i}\) is the lagged fire count at time \(i\) for 4 quarters back,
\(u_{\text{quarter}[i]}\), \(u_{\text{year}[i]}\), and \(u_{\text{L3}[i]}\) are random intercepts for quarter, year, and L3 GID, respectively,
\(\beta_0\) is the overall intercept,
\(\beta_1\) is the coefficient for the log-transformed lagged values, and
\(\epsilon_i\) is the normally distributed error term for observation \(i\) with mean 0 and some standard deviation \(\sigma\).

fcL3a <- brm(data = L3_cal,
           family = gaussian(),
           formula = log(viirs + 1) ~ log(lag_4 + 1) + (1 | quarter) + (1 | year) 
                                      + (1 | L3),
           # prior = priors,
           iter = 4000, warmup = 1000, chains = 4, cores = 8,
           control = list(adapt_delta = .975, max_treedepth = 20),
           seed = 1235813)

saveRDS(fcL3a, "./fires/results/fcL3a_cal.rds")

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: log(viirs + 1) ~ log(lag_4 + 1) + (1 | quarter) + (1 | year) + (1 | L3) 
##    Data: L3_cal (Number of observations: 12536) 
##   Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
##          total post-warmup draws = 12000
## 
## Group-Level Effects: 
## ~L3 (Number of levels: 285) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.54      0.03     0.49     0.59 1.00     2683     5183
## 
## ~quarter (Number of levels: 4) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.95      0.59     0.37     2.55 1.00     4463     5873
## 
## ~year (Number of levels: 11) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.53      0.14     0.33     0.89 1.00     3612     6334
## 
## Population-Level Effects: 
##            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept      1.29      0.54     0.17     2.39 1.00     3758     5120
## loglag_4P1     0.53      0.01     0.52     0.55 1.00    19416     9676
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.14      0.01     1.12     1.15 1.00    24784     9258
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

5.4 Quarters nested in years model

\[ \log(y_i + 1) = \beta_0 + \beta_2\log(x_{4,i} + 1) + u_{\text{year:quarter}[i]} + u_{\text{L3}[i]} + \epsilon_i \]

Where,

\(y_i\) is the fire count for observation \(i\),
\(x_{4,i}\) is the lagged fire count at time \(i\) for 4 quarters back,
\(u_{\text{year:quarter}[i]}\), and \(u_{\text{L3}[i]}\) are random intercepts for quarters nested in years, and L3 GID, respectively,
\(\beta_0\) is the overall intercept,
\(\beta_1\) is the coefficient for the log-transformed lagged values, and
\(\epsilon_i\) is the normally distributed error term for observation \(i\) with mean 0 and some standard deviation \(\sigma\).

fcL3b <- brm(data = L3_cal,
           family = gaussian(),
           formula = log(viirs + 1) ~ log(lag_4 + 1) + (1 | year/quarter) + (1 | L3),
           # prior = priors,
           iter = 4000, warmup = 1000, chains = 4, cores = 8,
           control = list(adapt_delta = .975, max_treedepth = 20),
           seed = 1235813)

saveRDS(fcL3b, "./fires/results/fcL3b_cal.rds")

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: log(viirs + 1) ~ log(lag_4 + 1) + (1 | year/quarter) + (1 | L3) 
##    Data: L3_cal (Number of observations: 12536) 
##   Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
##          total post-warmup draws = 12000
## 
## Group-Level Effects: 
## ~L3 (Number of levels: 285) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.42      0.02     0.38     0.46 1.00     2797     5132
## 
## ~year (Number of levels: 11) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.39      0.23     0.03     0.90 1.00      895     1627
## 
## ~year:quarter (Number of levels: 44) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.83      0.10     0.66     1.06 1.00     2651     4996
## 
## Population-Level Effects: 
##            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept      1.02      0.19     0.66     1.40 1.00     3371     5081
## loglag_4P1     0.65      0.01     0.64     0.67 1.00    13163     9356
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.00      0.01     0.99     1.01 1.00    23207     8935
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

5.5 Combined model

\[ \log(y_i + 1) = \beta_0 + \beta_1\log(x_{4,i} + 1) + s(x,y) + u_{\text{year:quarter}[i]} + u_{\text{L3}[i]} + \epsilon_i \]

Where,

\(y_i\) is the fire count for observation \(i\),
\(x_{4,i}\) is the lagged fire count at time \(i\) for 4 quarters back,
s(x,y) is a smooth term for the L3-grid center point coordinates x and y,
\(u_{\text{year:quarter}[i]}\), and \(u_{\text{L3}[i]}\) are random intercepts for quarters nested in years, and L3 GID, respectively,
\(\beta_0\) is the overall intercept,
\(\beta_1\) is the coefficient for the log-transformed lagged values, and
\(\epsilon_i\) is the normally distributed error term for observation \(i\) with mean 0 and some standard deviation \(\sigma\).

fcL3c <- brm(data = L3_cal,
           family = gaussian(),
           formula = log(viirs + 1) ~  log(lag_4 + 1) + s(x,y) + (1 | year/quarter) + (1 | L3),
           # prior = priors,
           iter = 4000, warmup = 1000, chains = 4, cores = 8,
           control = list(adapt_delta = .975, max_treedepth = 20),
           seed = 1235813)

saveRDS(fcL3c, "./fires/results/fcL3c_cal.rds")

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: log(viirs + 1) ~ log(lag_4 + 1) + s(x, y) + (1 | year/quarter) + (1 | L3) 
##    Data: L3_cal (Number of observations: 12536) 
##   Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
##          total post-warmup draws = 12000
## 
## Smooth Terms: 
##            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sds(sxy_1)     1.72      0.30     1.21     2.39 1.00     2408     4562
## 
## Group-Level Effects: 
## ~L3 (Number of levels: 285) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.25      0.02     0.22     0.28 1.00     4036     6772
## 
## ~year (Number of levels: 11) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.39      0.22     0.03     0.87 1.01      813     1972
## 
## ~year:quarter (Number of levels: 44) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.84      0.11     0.66     1.07 1.00     2465     4506
## 
## Population-Level Effects: 
##            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept      1.02      0.19     0.65     1.40 1.00     3057     4645
## loglag_4P1     0.65      0.01     0.64     0.67 1.00    13531     8958
## sxy_1          4.14      1.15     1.92     6.39 1.00     4521     6778
## sxy_2         -0.49      1.13    -2.70     1.72 1.00     4361     6790
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.00      0.01     0.99     1.01 1.00    23189     9242
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

5.6 Posterior distributions

Posterior distributions are a fundamental concept in Bayesian statistics, serving as the basis for making predictions and understanding the uncertainty in the predictions. The posterior predictive distribution is a probability distribution that represents what we expect for new, unseen data points, given the data we have already observed. In Bayesian terms, it combines our prior beliefs (the prior distribution) and the evidence from the observed data (the likelihood) to yield the posterior distribution. The posterior predictive distribution takes this one step further by integrating all possible parameter values, weighted by how likely each parameter value is according to the posterior distribution. The next chunks illustrate these for the respective model median predictions on the log-transformed scale, which I will also refer to as a fire (frequency) index.

# set calibration set fire frequenies
viirs_data <- log(L3_cal$viirs + 1)

# extract posterior predictions
fcL3_pred <- predict(fcL3)
fcL3a_pred <- predict(fcL3a)
fcL3b_pred <- predict(fcL3b)
fcL3c_pred <- predict(fcL3c)

# convert to data frames
fcL3_pred <- as.data.frame(fcL3_pred)
fcL3a_pred <- as.data.frame(fcL3a_pred)
fcl3b_pred <- as.data.frame(fcL3b_pred)
fcL3c_pred <- as.data.frame(fcL3c_pred)

# combine data frames
comb_pred <- bind_cols(viirs_data, fcL3_pred[,1], fcL3a_pred[,1], fcL3b_pred[,1],
                       fcL3c_pred[,1])
names(comb_pred) <- c("VIIRS", "fcL3", "fcL3a", "fcL3b", "fcL3c")
write.csv(comb_pred, "./fires/fc_preds.csv", row.names = F)

# convert to long format
comb_long <- comb_pred %>%
  pivot_longer(
    cols = everything(),
    names_to = "Distribution",
    values_to = "Value")

ppred <- ggplot(comb_long, aes(x = Value, y = Distribution, fill = Distribution)) +
  geom_density_ridges() +
  theme_ridges() +
  labs(x = "Fire frequency index", y = "No. of observations") +
  xlim(-1, 7.5)

Figure 4. Comparison of the VIIRS data to the posterior predictive distributions of 4 forcasting models of the log-transformed fire frequency calibration set (2013 - 2022).

5.7 Model comparisons

Starting with the idea that “all models are wrong, but some are useful” (Box and Draper, 1987), it’s not immediately clear which model from Figure 4 most accurately captures the VIIRS data. Bayesian models utilizing the same label data but with different features and model parameters can be compared in terms of their predictive accuracy, using the leave-out-one (LOO) method as implemented in the loo package (Vehtari et al., 2024). LOO is a cross-validation technique where a model is fit to all data points except one, and the omitted point is then used to test the model. This process is repeated for each data point in the dataset. The aim is to assess the model’s predictive performance based on its Expected Log Predictive Density (ELPD). Higher values of ELPD indicate better predictive performance.

fcL3_loo <- loo(fcL3)
fcL3a_loo <- loo(fcL3a)
fcL3b_loo <- loo(fcL3b)
fcL3c_loo <- loo(fcL3c)

loo_compare(fcL3_loo, fcL3a_loo, fcL3b_loo, fcL3c_loo)

##       elpd_diff se_diff
## fcL3c     0.0       0.0
## fcL3b   -18.7       6.2
## fcL3a -1616.4      58.8
## fcL3  -3570.1      94.1

The comparison shows that the combined model (fcL3c) substantially outperforms the predictions generated by 3 alternative models in the cross-validation runs that were considered here.

5.8 `fcL3c` model testing

In this section I test the accuracy of the forecast generated by the combined model on the hold-out test set for 2023. I initially calculate a post-prediction regression adjustment to convert the predicted log fire frequencies back to the original frequency scale.

# model prediction
test <- as.data.frame(predict(fcL3c, newdata = L3_test, allow_new_levels = TRUE))
test <- test[, c(1, 3:4)] 
test <- test %>% mutate(across(everything(), exp))
fcL3c_test <- cbind(L3_test, test)
test <- fcL3c_test[, c(9, 5)]
colnames(test) <- c("est", "viirs")
  
# post-prediction adjustment
post_adj <- lm(test$viirs ~ test$est)
test$adj_est <- predict(post_adj)

## 
## Call:
## lm(formula = test$viirs ~ test$est)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -255.37  -16.34    5.35   13.67  555.51 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.14354    1.99331  -8.601   <2e-16 ***
## test$est      2.38520    0.02689  88.689   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 58.97 on 1138 degrees of freedom
## Multiple R-squared:  0.8736, Adjusted R-squared:  0.8735 
## F-statistic:  7866 on 1 and 1138 DF,  p-value: < 2.2e-16

The summary indicates a strong correlation between the predicted and actual fire frequencies in the test set, as evidenced by a Root Mean Square Error (RMSE) of 59.0 and an R-squared value of 0.87. Nonetheless, it’s important to note the presence of substantial gain and offset factors on the original count scale. These factors are the result of compressing the range of the counts by transforming the data into their logarithmic form. Figure 4 presents this information, depicting both the initial, back-transformed predictions (shown as black points) and the adjusted predictions (shown as grey points) on the original frequency (count) scale.

mtest <- ggplot(test) + 
  geom_point(aes(x = adj_est, y = viirs), color = "grey") + 
  geom_point(aes(x = est, y = viirs)) + 
  labs(x = "Predicted fire frequencies (2023)", 
       y = "Observed fire frequencies (2023)") + 
  coord_fixed(ratio = 1) +
  scale_x_continuous(limits = c(-20, 1500)) +
  scale_y_continuous(limits = c(-20, 1500)) +
  geom_abline(intercept = 0, slope = 1, color = "red") +
  theme_minimal()

Figure 4. Observed vs Predicted Uganda VIIRS-SNPP fire frequencies for 2023, based on the fcL3c model.

6 Forecast 2024

To use the combined model for generating a 2024 fire forecast, we have to complete one final step by updating the combined model fcL3c with the 2023 data that was initially used for testing. The the updated .rds model file should be in your ./fires/results folder if you have followed the osfr download option in section 4 (Models).

# load forecast model
fcL3d <- readRDS(file = "./fires/results/fcL3d_all.rds")
summary(fcL3d, warning = FALSE)

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: log(viirs + 1) ~ log(lag_4 + 1) + s(x, y) + (1 | year/quarter) + (1 | L3) 
##    Data: L3_all (Number of observations: 13676) 
##   Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
##          total post-warmup draws = 12000
## 
## Smooth Terms: 
##            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sds(sxy_1)     1.71      0.30     1.20     2.39 1.00     2000     3709
## 
## Group-Level Effects: 
## ~L3 (Number of levels: 285) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.24      0.02     0.21     0.28 1.00     3784     6815
## 
## ~year (Number of levels: 12) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.37      0.20     0.03     0.82 1.01      708     1677
## 
## ~year:quarter (Number of levels: 48) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.82      0.10     0.65     1.04 1.00     1997     3771
## 
## Population-Level Effects: 
##            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept      0.99      0.17     0.65     1.33 1.00     2473     3946
## loglag_4P1     0.66      0.01     0.64     0.67 1.00    11572     9206
## sxy_1          4.79      1.15     2.56     7.11 1.00     3459     6174
## sxy_2          0.64      0.98    -1.25     2.56 1.00     3200     5743
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.99      0.01     0.97     1.00 1.00    20054     8417
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

The forecast is generated and mapped with:

# predict 2024
ffore <- as.data.frame(predict(fcL3d, newdata = L3_new, allow_new_levels = TRUE))
L3_new <- cbind(L3_new, ffore)
L3_for <- L3_new[, c(1, 3:4, 9, 11:12)]

# select by quarter
ffore_Q1 <- L3_new %>% filter(quarter == 1)
ffore_Q2 <- L3_new %>% filter(quarter == 2)
ffore_Q3 <- L3_new %>% filter(quarter == 3)
ffore_Q4 <- L3_new %>% filter(quarter == 4)

# convert to SpatialPointsDataFrame
Q1_sf <- st_as_sf(ffore_Q1, coords = c("x", "y"), crs = st_crs(L3_grid))
Q2_sf <- st_as_sf(ffore_Q2, coords = c("x", "y"), crs = st_crs(L3_grid))
Q3_sf <- st_as_sf(ffore_Q3, coords = c("x", "y"), crs = st_crs(L3_grid))
Q4_sf <- st_as_sf(ffore_Q4, coords = c("x", "y"), crs = st_crs(L3_grid))

# create rasters
Q1_est <- raster(L3_grid)
Q1_est <- rasterize(Q1_sf, Q1_est, field = "Estimate") 

Q2_est <- raster(L3_grid)
Q2_est <- rasterize(Q2_sf, Q2_est, field = "Estimate") 

Q3_est <- raster(L3_grid)
Q3_est <- rasterize(Q3_sf, Q3_est, field = "Estimate") 

Q4_est <- raster(L3_grid)
Q4_est <- rasterize(Q4_sf, Q4_est, field = "Estimate") 

# write raster stack for post-processing
spreds <- brick(Q1_est, Q2_est, Q3_est, Q4_est)
fname <- paste("./fires/results/", "forecast_quarters.tif", sep = "")
writeRaster(spreds, filename = fname, format = "GTiff", options = "INTERLEAVE=BAND", overwrite = TRUE)

Again, following a bit GIS post-processing, these are the quarterly 2024 forecast maps.

Figure 5. Quarterly fire (frequency) index forecasts at the L3-GID level for 2024.

However, the forecasts are more suitably documented for reference in a searchable tabular format that also shows the current 95% credible prediction levels. An addendum to this notebook in the first quarter of 2025 will show how well the 2024 fires were predicted, and will also serve as a near-real-time model validation step. Please take another look at this notebook in early 2025 for an update.

7 Main takeaways

Thanks to all of the hard work of the Fire Information for Resource Management System (FIRMS) team at NASA, this notebook is able to demonstrate that producing fire forecasts at very low additional effort and cost, in near-real-time, and at a national scale is feasible. The main value added by the workflow developed here is the use of the Bayesian modeling approach to representing the historical space-time patterns on a DGG, which is primarily intended to further stimulate, enable and accelerate the development of the types of the use cases outlined in section 2. Note that other approaches to generate the forecasts, such machine learning with e.g., Recurrent Neural Networks and/or Long Short-Term Memory Networks are now “low hanging fruit” and should also be tested in this context.
The combined fire forecast model (fcL3c) for Uganda fits the fire frequency data well, aggregated to 30 km resolution DGG on a quarterly time-step. There is no evidence of either model under- or over-fitting, which was controlled for with both LOO cross-validation and the use of a separate, unseen test set of data from 2023. Model validation is in progress with VIIRS data that is being collected this year. What are not modeled here are detection probabilities, fire densities and durations, all of which can be derived from the frequency data. Also not represented are the associated fire brightness temperatures (in deg. K) and the radiative power (in MW) variables. I plan to look at those in a separate analysis … later.
Since there is no ground-segment for the VIIRS detections in Uganda, it would also be useful to check for additional fire evidence such as fire scars, charcoal and ash deposits as part of any land surveys being conducted in the country, shortly after they are detected. Using the currently available data and model results, it would be easy to set up a spatially representative sampling plan for ground validation surveys.
Conceptually, fitting, testing, and predicting with the Bayesian models is straightforward. However, these steps already require considerable computational power and careful memory management on standard computers. As shown in this notebook, which was tailored for use on standard computers, the entire workflow can be completed in about 2.5 hours on a MacBook Pro. The code can also be run using GPUs or in cloud environments, which could be crucial for large-scale projects like forecasting fires on e.g. a monthly time-step, at medium spatial resolution across all of Africa. The same would be true for the routine use and updating of machine learning models in this context. Space-time data such as the VIIRS fire frequencies are just inherently big and ever growing.

“I got a little too close-for-comfort to this one. Tete Province, Mozambique, 2006.”

Session info

## R version 4.3.1 (2023-06-16)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Big Sur 11.7.10
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Denver
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] DT_0.30         ggridges_0.5.4  mgcv_1.8-42     nlme_3.1-162   
##  [5] brms_2.20.4     Rcpp_1.0.11     zoo_1.8-12      raster_3.6-23  
##  [9] sf_1.0-14       rgdal_1.6-7     sp_2.1-0        patchwork_1.2.0
## [13] lubridate_1.9.3 forcats_1.0.0   stringr_1.5.0   dplyr_1.1.3    
## [17] purrr_1.0.2     readr_2.1.4     tidyr_1.3.0     tibble_3.2.1   
## [21] ggplot2_3.4.4   tidyverse_2.0.0 osfr_0.2.9     
## 
## loaded via a namespace (and not attached):
##   [1] tensorA_0.36.2       rstudioapi_0.15.0    jsonlite_1.8.7      
##   [4] magrittr_2.0.3       farver_2.1.1         rmarkdown_2.25      
##   [7] ragg_1.2.5           fs_1.6.3             vctrs_0.6.3         
##  [10] memoise_2.0.1        base64enc_0.1-3      terra_1.7-46        
##  [13] htmltools_0.5.6      distributional_0.3.2 curl_5.1.0          
##  [16] sass_0.4.7           StanHeaders_2.26.28  KernSmooth_2.23-21  
##  [19] bslib_0.5.1          htmlwidgets_1.6.2    plyr_1.8.9          
##  [22] cachem_1.0.8         igraph_1.5.1         mime_0.12           
##  [25] lifecycle_1.0.3      pkgconfig_2.0.3      colourpicker_1.3.0  
##  [28] Matrix_1.5-4.1       R6_2.5.1             fastmap_1.1.1       
##  [31] shiny_1.7.5.1        digest_0.6.33        colorspace_2.1-0    
##  [34] ps_1.7.5             textshaping_0.3.6    crosstalk_1.2.0     
##  [37] labeling_0.4.3       fansi_1.0.5          timechange_0.2.0    
##  [40] httr_1.4.7           abind_1.4-5          compiler_4.3.1      
##  [43] proxy_0.4-27         withr_2.5.1          backports_1.4.1     
##  [46] inline_0.3.19        shinystan_2.6.0      DBI_1.1.3           
##  [49] QuickJSR_1.0.7       pkgbuild_1.4.2       classInt_0.4-10     
##  [52] gtools_3.9.4         loo_2.6.0            tools_4.3.1         
##  [55] units_0.8-4          httpuv_1.6.11        threejs_0.3.3       
##  [58] glue_1.6.2           callr_3.7.3          promises_1.2.1      
##  [61] grid_4.3.1           checkmate_2.3.0      reshape2_1.4.4      
##  [64] generics_0.1.3       gtable_0.3.4         tzdb_0.4.0          
##  [67] class_7.3-22         hms_1.1.3            utf8_1.2.3          
##  [70] pillar_1.9.0         markdown_1.9         posterior_1.5.0     
##  [73] later_1.3.1          splines_4.3.1        lattice_0.21-8      
##  [76] tidyselect_1.2.0     miniUI_0.1.1.1       knitr_1.44          
##  [79] gridExtra_2.3        V8_4.4.0             crul_1.4.0          
##  [82] stats4_4.3.1         xfun_0.40            bridgesampling_1.1-2
##  [85] matrixStats_1.0.0    rstan_2.32.3         stringi_1.7.12      
##  [88] yaml_2.3.7           evaluate_0.22        codetools_0.2-19    
##  [91] httpcode_0.3.0       cli_3.6.1            RcppParallel_5.1.7  
##  [94] systemfonts_1.0.4    shinythemes_1.2.0    xtable_1.8-4        
##  [97] munsell_0.5.0        processx_3.8.2       jquerylib_0.1.4     
## [100] coda_0.19-4          parallel_4.3.1       rstantools_2.3.1.1  
## [103] ellipsis_0.3.2       prettyunits_1.2.0    dygraphs_1.1.1.6    
## [106] bayesplot_1.10.0     Brobdingnag_1.2-9    mvtnorm_1.2-3       
## [109] scales_1.2.1         xts_0.13.1           e1071_1.7-13        
## [112] crayon_1.5.2         rlang_1.1.1          shinyjs_2.1.0

Uganda wildfires – A Bayesian approach for forecasting VIIRS-SNPP fires on a discrete global grid

M.G. Walsh

12, March, 2024

1 Introduction

2 Example use cases

2.1 Prescribed burns for vegetation and soil management

2.2 Integrated wildfire management system for emergency response and public health

2.3 Wildfire urban interface management system

2.4 Climate Change Adaptation for Rural Communities

3 Setup

3.1 Read VIIRS fire detections

4 ETL

4.1 Raw data national time series

4.2 Aggregate to DGG

4.3 Set model GID-level

4.4 Fill time series

4.5 L3-GID VIIRS space-time cube

5 Models

5.1 L3-GID average quarterly wildfire frequencies

5.2 Fixed effects model

5.3 Random intercepts model

5.4 Quarters nested in years model

5.5 Combined model

5.6 Posterior distributions

5.7 Model comparisons

5.8 `fcL3c` model testing

6 Forecast 2024

7 Main takeaways

Session info

Uganda wildfires – A Bayesian approach for forecasting VIIRS-SNPP fires on a discrete global grid

M.G. Walsh

12, March, 2024

1 Introduction

2 Example use cases

2.1 Prescribed burns for vegetation and soil management

2.2 Integrated wildfire management system for emergency response and public health

2.3 Wildfire urban interface management system

2.4 Climate Change Adaptation for Rural Communities

3 Setup

3.1 Read VIIRS fire detections

4 ETL

4.1 Raw data national time series

4.2 Aggregate to DGG

4.3 Set model GID-level

4.4 Fill time series

4.5 L3-GID VIIRS space-time cube

5 Models

5.1 L3-GID average quarterly wildfire frequencies

5.2 Fixed effects model

5.3 Random intercepts model

5.4 Quarters nested in years model

5.5 Combined model

5.6 Posterior distributions

5.7 Model comparisons

5.8 fcL3c model testing

6 Forecast 2024

7 Main takeaways

Session info

5.8 `fcL3c` model testing