Aligning with EIA’s metrics

Assembling the tools

This post is the second in a series exploring the DOE’s Eagle-I dataset on energy grid outages. See previous posts here: first

First, I want to pause and discuss differences in the data available from Eagle-I and the Energy Information Administration (EIA), the latter appearing in this article.

The main EIA metrics of interest include multiple “duration” indices, and one “frequency” index:

System Average Interruption Duration Index (SAIDI),
System Average Interruption Frequency Index (SAIFI), and
Customer Average Interruption Duration Index (CAIDI).

In the article’s second figure (copied below), SAIDI and SAIFI show up on the horizontal and vertical axes, respectively.

For a quick-hit understanding, check the EIA’s video guide

All this background to say that SAIDI divides the number of customers affected by an outage by the number of customers in the system. We do not have the latter available to us in Eagle-I, so for the sake of this exercise, I will pull in county-level population data from the tidycensus package, and see how closely we can replicate the EIA’s published values.

As always, there are important assumptions embedded in our assumptions. Here, we’re assuming everyone in every county in the US is connected to the grid, or “in the system”, to use the EIA’s language. Not sure I like the sound of “in the system”, but alas, we’re stuck with the term.

Calculating SAIDI

Here’s an example of a function (calc_saidi) that takes our 15min-interval Eagle-I data and calculates a SAIDI metric. Along the way, it creates a variable for the numerator in the SAIDI formula (total number of customer hours without power), joins the necessary census data for the denominator, then converts time intervals into the proper units for SAIDI in the concluding mutate call. Giving the function a meaningful name will make it easier to understand what the targets pipeline is doing.

calc_saidi <- function(eaglei_df,
                       census_df,
                       # Time intervals are 15min per documentation, or 0.25h
                       data_interval = as.numeric(0.25, units = "hours"),
                       summ_by) {
  
  eaglei_df %>%
    summarise(
      # Within each outage interval, how many customers were affected
      tot_cust_hrs = sum(customers_out),
      # whether grouped by state, state+county, etc.
      .by = all_of(summ_by)
    ) %>% 
    # Join the relevant population data (may need to add year to possible vars)
    join_eaglei_census(census_df,
                       join_spec = intersect(summ_by,
                                             c("state", "county"))) %>% 
    # Turn number of intervals into units of time (hours by default)
    mutate(
      across(c(tot_cust_hrs), \(x) { x * data_interval}),
      # Then calculate SAIDI
      saidi = tot_cust_hrs / pop
    )
}

In _targets.R, we call calc_saidi in whichever targets require it, using the feature-enhanced Eagle-I dataset in the preceding target, named add_features, and specifying different variables to summarize by. See the screengrab below for example function output from the county_saidi target.

list(
  # Targets before...,
  tar_target(
    name = add_features,
    command = add_outage_id(states_eaglei)
  ),
  tar_target(
    name = county_saidi,
    command = calc_saidi(add_features,
                         states_census,
                         summ_by = c("state", "county"))
  ),
  tar_target(
    name = state_saidi,
    command = calc_saidi(add_features,
                         states_census,
                         summ_by = c("state"))
  )
  # Targets after...
)

The state SAIDI target produces similar results, but fewer rows (one per state). They don’t exactly line up with the EIA chart, but states’ relative performances are close enough to justify further investigation (read: next post).

SAIFI & CAIDI

I discussed metrics which I have not yet quantified. To recap, SAIFI is the number of interruptions the average customer experienced, and CAIDI (Customer Average Interruption Duration Index), the average length of each interruption.

At first I hoped to tackle multiple metrics with one function, but for now I’m opting to keep them separate. Calculating SAIFI (or CAIDI) may not end up looking too different from the SAIDI function, but I am leaving that (and potential consolidation of the functions) to future posts. Gotta stretch out the material ;)

I will continue to document my work here, but superfans (lol) or future contributors can follow commits to the underlying repo as well!

Aligning with EIA’s metrics

Assembling the tools

Calculating SAIDI

SAIFI & CAIDI

Support my work with a cafecito

Share