Reimagining RSEI Values

Dylan Hamme, 2022

Introduction

RSEI Values are unitless scores that help quantify the potential for risk based on the release of chemicals from a certain location. There are multiple factors that influence scoring RSEI values, all of which fall under four categories: Level of chemical release, the fate and transport of the chemical through the environment, the size and location of the exposed population, and the potency of chemical. The RSEI values themselves are linear, meaning that if an RSEI value is twice as great as another, the potential risk is twice as high. Different chemicals are associated with different effects on the human body, so RSEI values can be associated with cancer, or noncancer risks. There are separate RSEI scores for these, but the sum of the two, or overall score, is also important in weighing potential risk.

From https://www.epa.gov/rsei/understanding-rsei-results

Overview

This is an attempt to more accurately predict the risk of adverse effects on human health posed by the release of contaminants by industry in Baltimore. RSEI values are given as point data, and lack any sort of spatial component. I intend to use available data to create a spatial component that can help display who and where RSEI affect.

Analysis

Beginning with the RSEI data table (source) and the table including the point locations for each facility, the tables were joined in R in order to correlate the point locations with the RSEI values:

RSEI_geo <- inner_join(Locations, RSEI, by = "TRI.Facility.ID")
RSEI_geo_df <- data.frame(RSEI_geo) %>%
  st_as_sf(coords = c("X13..LONGITUDE", "X12..LATITUDE"))
RSEI_geo_df

RSEI_geo_df <- st_set_crs(RSEI_geo_df, "EPSG:4326") 
st_is_longlat(RSEI_geo_df)

Next, the census data was downloaded using the tidycensus package in R:

balt_census_tract_20 <- get_acs(geography = "tract", 
                           variables = c("pop" = "B03002_001"),
                           year = 2020,
                           survey = "acs5",
                           state = c(24), 
                           county = c(510), 
                           geometry = TRUE, 
                           output = "wide",
                           cache = TRUE)
crs(balt_census_tract_20)
#Make sure to complete this when the variables of interest are added

The data was then transformed to matching coordinate reference systems (ESPG 6318), and in order to make further calculations possible in QGIS, the data had to be transformed to the correct format. This required removing the commas from the numbers, changing the data type to numeric, then setting all non-number values to zero:

#First change the both datasets to a matching CRS
crsuggest::suggest_crs(balt_census_tract_20)

balt_tract_proj <- balt_census_tract_20 %>%
  st_transform(6318)
RSEI_geo_df_proj <- RSEI_geo_df %>%
  st_transform(6318)

#Write the RSEI fields to numeric, but the commas are messing up the numbers, we need to get rid of those first
RSEI_geo_df_proj$RSEI.Score <- gsub(",","", RSEI_geo_df_proj$RSEI.Score)
RSEI_geo_df_proj$RSEI.Score <- as.numeric(RSEI_geo_df_proj$RSEI.Score)
RSEI_geo_df_proj$RSEI.Score[is.nan(RSEI_geo_df_proj$RSEI.Score)] <- 0

RSEI_geo_df_proj$RSEI.Score.Cancer <- gsub(",","", RSEI_geo_df_proj$RSEI.Score.Cancer)
RSEI_geo_df_proj$RSEI.Score.Cancer <- as.numeric(RSEI_geo_df_proj$RSEI.Score.Cancer)
RSEI_geo_df_proj$RSEI.Score.Cancer[is.nan(RSEI_geo_df_proj$RSEI.Score.Cancer)] <- 0

RSEI_geo_df_proj$RSEI.Score.Noncancer <- gsub(",","", RSEI_geo_df_proj$RSEI.Score.Noncancer)
RSEI_geo_df_proj$RSEI.Score.Noncancer <- as.numeric(RSEI_geo_df_proj$RSEI.Score.Noncancer)
RSEI_geo_df_proj$RSEI.Score.Noncancer[is.nan(RSEI_geo_df_proj$RSEI.Score.Noncancer)] <- 0

#Save these files to use in QGIS

st_write(balt_tract_proj, "File_location")
st_write(RSEI_geo_df_proj, "File_location")

Now the data were ready to be analysied in QGIS…

Creating RSEI maps before integrating distance

After loading the files saved in R into a QGIS document, the count points in polygon function was used to count the RSEI values in each tract. This was completed three times, each with a different value filling the weight field. The weight field calculated the sum of the RSEI values in each tract, which yields an overall RSEI score within each tract which can be interpreted as potential risk.
The table of RSEI values now consisted of twenty two locations divided into just under one thousand objects due to the values being sorted by chemical contaminant. This analysis is concerned with total RSEI values, so the table was aggregated in QGIS, leaving the sum RSEI values per point source for overall score, cancer-risk score, and noncancer-risk score.

The first calculation displays the overall RSEI score per tract:

The second calculation displays only the Cancer-related RSEI values per tract:

The third and final calculation displays only noncancer-related RSEi values per tract:

These values are a good indicator of potential risk, but they are lacking a spatial component of risk other than the point values. In order to integrate this, a value of distance between each point source and each tract must be calculated.

Calculating distances

In order to begin calculations for distance, point values were assigned to each transect in the form of the centroid of each polygon. This was calculated in R using the st_geometry() function, then the accuracy was checked using the tmap package:

tract_centroids <- balt_census_tract_20 %>%
  mutate(cent = st_centroid(geometry))

cent_test = tm_shape(tract_centroids) +
  tm_dots() +
  tm_shape(balt_census_tract_20) +
  tm_borders()
cent_test

#extract the points
cent_points_only <- tract_centroids$cent

Next, a table is calculated for the distances between each point source and the centroid of each transect. The yield is a matrix of distances that can be used to manipulate the RSEI values at each point to get an estimate of potential risk at each tract.

centroids_proj <- tract_centroids %>%
  st_transform(6318)

RSEI_condensed <- st_read("point_source_shapefile") %>%
  st_transform(6318)
  
#This ensures the two files are in the same projection
  
RSEI_distance <- st_distance(centroids_proj, RSEI_condensed)

Now that there are values for distance, the matrix can be joined back to the centroids shapefile and exported to QGIS

Three new fields are then created, one for each of the new RSEI values for each tract. These are fields are filled with RSEI values of each point source multiplied by (1/distance) to find new values for RSEI at each tract. Because RSEI values are unitless, and distances are relative and lineaar, the units for distance are arbitrary.

To quickly return the joined data to a polygon rather than point data, a join by location was completed in QGIS. Now we have complete data ready to be visualized.

Now the analysis can continue in QGIS…

Creating the maps for new RSEI values

Three new maps are created in QGIS using the distance-adjusted RSEI values for each tract. Again, this will require a count points in polygon analysis, where the weight field will be populated by the distance-adjusted RSEI values.
Following the work flow for creating the previous three maps, the same three will be recreated:

The first calculation displays the overall RSEI score per tract:

The second calculation displays only the Cancer-related RSEI values per tract:

The third and final calculation displays only noncancer-related RSEi values per tract:

How does this look if we use a Hex grid instead of tracts?

Comparing RSEI values to census variables

Next, the RSEI values per tract can be compared to different census variables. I will look at a few different variables from the census download, and compare them to the Overall RSEI values.

There are no solid correlations in these data that can be found, further trials must be completed in order to locate any patterns.
Disappointingly, no noticeable correlation could be found between any of the described variables. This does lead to some thoughts: the data may be spatially limited and may be more conclusive if further area is included. If Baltimore County were included as well as any point sources within its boundaries, perhaps the community data would show more of a trend. A second observation is the presence of a few outliers, most noticeably, one tract in the SouthEastern corner of the city has been set to a population of 0. This scews all of the graphs slightly as it is the tract with the second highest RSEI score. That being said, the other outlier does have a population count, but an RSEI score much high than any other tract. These outliers would again be better weighted if the Baltimore County data were included. The attempt at giving RSEI scores a spatial component works in some aspects, but fall short in others. It does well at conceptualizing what areas are most affected by contaminant releases, and can be used as a warning to any interested parties about potential risk. However, this study falls short in integrating social data. The boundaries of study were set too small for findings to be backed well, and increasing the area of study will most likely lead to more convincing correlations.

References

“EasyRSEI Dashboard Version 2.3.10.” EPA. Environmental Protection Agency. Accessed May 20, 2022. https://edap.epa.gov/public/extensions/EasyRSEI/EasyRSEI.html#.

“Toxics Release Inventory (TRI) Program.” EPA. Environmental Protection Agency, n.d. https://www.epa.gov/toxics-release-inventory-tri-program/tri-basic-data-files-calendar-years-1987-present.