Smooth Air. Sharp Change? - Farouk Ghodhbane

Context

Let's say a city decides to monitor air quality across its region to ensure the well-being of its population. Sensors are installed as uniformly as possible, collecting Air Quality Index (AQI) measurements across different neighborhoods. These readings can be organized into a spatial grid and analyzed to extract information that supports the city's public health goals.

Now, this is where "Smooth Air. Sharp change?" comes in. It is a C++ application that models the collected AQI data as a raster grid. The user provides an input dataset containing AQI values arranged in rows and columns. As the name suggests, the app applies spatial filtering by first smoothing the data to reduce sensor noise and second detecting regions of sharp, rapid change. Finally, it outputs a summary of the analysis findings.

This experiment focuses on spatial relationships between cells without relying on georeferenced data. Each AQI value is associated with a location defined by its row and column indices within the grid, rather than real-world geographic coordinates.

Implementation

AQI measurements are represented as a 2D raster grid, implemented using a matrix of doubles, where each cell corresponds to an AQI value at a specific spatial location.

Sensors may produce noisy or erroneous readings. To reduce this noise in the dataset, the app applies a 3x3 mean filter. This algorithm replaces each cell with the average of its local 3x3 neighborhood (including itself), producing a smoothed grid with a more consistent distribution.

Edge case: the mean filter algorithm dynamically shrinks the 3x3 neighborhood at grid boundaries by only including valid neighboring cells, ensuring consistent averaging even at corners and edges.

To detect regions of rapid spatial change, the application computes the gradient magnitude grid. It does so by approximating the rate of change in the horizontal and vertical directions at each cell using finite differences. These approximations are then combined to compute the gradient magnitude, which replaces the original cell value.

Edge case: at the borders, the gradient magnitude algorithm switches from central differences to using forward and backward differences. Special cases such as single-row (1 x N) and single-column (N x 1) grids are also handled to avoid invalid neighbor access (out-of-bounds).

The implementation also includes a performance benchmarking process, where a simple timer utility (built around the C++ clock) measures the runtime of both algorithms and shows how it scales with increasing grid sizes (more on that under the "discussion" section below).

Example input:

# rows cols
5 5
10 12 14 13 11
9  11 15 14 10
8  10 13 12 9
7  8  11 10 8
6  7  9  8  7

Results

For each dataset, the application produces a smoothed grid, a gradient grid, and outputs a summary report aggregating key statistics such as minimum and average AQIs, along with the sharpest spatial change and its location. The latter may, for instance, call for a follow-up action to research why the air quality drastically changes at that city's location, potentially pointing to localized environmental factors such as traffic congestions or industrial activity.

Example output:

Summary Report:
-------------------------------------------------
Dataset Information:
  Grid dimensions: 500 x 500

Air Quality (After Smoothing):
  Minimum AQI: 36.833
  Maximum AQI: 62.667
  Average AQI: 49.504

Spatial Pollution Gradient Analysis:
  Mean gradient magnitude: 0.697
  Maximum gradient magnitude: 7.763
  Location of Steepest Change: (row 499, col 318)

Interpretation:
  Overall air quality is classified as GOOD.
  A significant pollution transition zone is detected at the
  reported location with coordinates (499, 318)
  indicating possible localized environmental change.

Testing

Unit tests validate core raster operations: mean filtering and gradient magnitude computation. For example (see below), if a grid is uniform, meaning all of its values are the same, then there is no change and the gradient magnitude should be effectively zero.
Golden tests compare program output against expected results for a predefined small dataset, namely "air_quality_5.txt". This verifies the file parsing, filtering, and gradient computation, and can help detect regressions during future changes.

Sample gradient unit test

// Constant grid: all zeros
TEST(GradientTest, ConstantGridAllZero) {
  RasterGrid grid = MakeGrid(3, 3, {
    {7.0, 7.0, 7.0},
    {7.0, 7.0, 7.0},
    {7.0, 7.0, 7.0}
  });

  RasterGrid result = ComputeGradientMagnitude(grid);

  for (int i = 0; i < 3; ++i) {
    for (int j = 0; j < 3; ++j) {
      ASSERT_NEAR(result.data[i][j], 0.0, kEpsilon);
    }
  }
}

Sample mean filter unit test

// Non uniform 3 x 3
TEST(MeanFilterTest, NonUniformThreeByThree) {
  RasterGrid grid = MakeGrid(3, 3, {
    {1.0, 1.0, 1.0},
    {1.0, 7.0, 1.0},
    {1.0, 1.0, 1.0}
  });

  RasterGrid result = ApplyMeanFilter(grid);

  // Center = (1+1+1+1 + 7 + 1+1+1+1) / 9.0 = 15.0 / 9.0
  ASSERT_NEAR(result.data[1][1], 15.0 / 9.0, kEpsilon);

  // Top left corner = (1+1+1+7) / 4.0 = 2.5
  ASSERT_NEAR(result.data[0][0], 2.5, kEpsilon);
}

Visual

Open in a new tab

Mean Filter and Gradient Magnitude algorithms visual explanation.

Discussion & Resources

Both the mean filtering and gradient magnitude algorithms run in O(n), where n represents the number of cells in the grid. More precisely, this corresponds to O(rows * cols), since each cell is processed once with a constant amount of work. As demonstrated in the measurements below, doubling grid dimensions results in approximately quadrupling the runtime, reflecting the quadrupling of the number of cells.

Sample performance comparison run

rows,cols,mean_filter_ms,gradient_ms
500,500,19,10
1000,1000,71,40
2000,2000,291,157

One limitation of this implementation is the dependency on the grid's resolution. A coarse grid, where there are relatively few cells with each cell covering a relatively large spatial area, may smooth over significant local transitions. On the other hand, a finer grid, where there are more cells with each cell covering a smaller spatial area, can capture more detailed spatial transitions, improving the accuracy of gradient-based change detection, but at the cost of increased computation time as observed in the runtime analysis above.

Another limitation is the assumption of complete data coverage. Each grid cell is assigned a value, implying that measurements exist uniformly across the entire region. In practice, a city may have clusters of sensors localized in one area, while having few, if any, in another. In such cases, preprocessing techniques such as interpolation may be useful in estimating missing values, though this may introduce accuracy tradeoffs.

This experiment is framed as "exp2-raster-model" because it represents spatial data as a continuous surface using a grid of uniformly distributed cells. This approach is commonly used for modeling continuous phenomena such as temperature, wildfire spread, elevation change, and satellite imagery.