Animal Shelter Data Analysis Software Technical

Austin & Sacramento Open Data #3: Intake

In this third post in the series, we’ll see how to use R and open data to calculate and visualize some basic numbers about intake at the Austin and Sacramento city shelters (Austin Animal Center and City of Sacramento Animal Care Services).

Intake

We’ll look at the open data from 2014 to the time of this writing, approximately two and a half years of data (i.e., thirty-one months; from January 2014 to July 2016). The toolkit provides a function that filters impoundment data for a given range of intake dates, breaks the data into three groups (cats and dogs, cats only, and dogs only), and returns three corresponding data frames. Each data frame has columns for city, group, number of days held, year, month, and day of intake, which makes it easy to pick out rows or combine rows from any of these data frames. By default the function filters out intake events for animals that are not cats or dogs (the Sacramento open data does not include records for other kinds of animals in the first place) and animals being returned from foster care.

atxOpenData <- atxLoadOpenData()
sacOpenData <- sacLoadOpenData()

StartDate <- "2014-01-01"
EndDate <- "2016-07-31"
    
atxIntakeList <- atxCreateIntakeList(atxOpenData, StartDate, EndDate)
sacIntakeList <- sacCreateIntakeList(sacOpenData, StartDate, EndDate)

Intake Volume

Once we have our data organized, we can readily calculate the intake volume of cats and dogs by year for each shelter:

> # Austin intake volume by year to date
> table(atxIntakeList$cats_dogs$year)
 2014  2015  2016 
17505 17815  9747

> # Austin intake volume by animal kind and year to date
> with(atxIntakeList$cats_dogs, table(kind, year))
     year
kind   2014  2015  2016
  Cat  6937  7486  3776
  Dog 10568 10329  5971

> # Sacramento intake volume by year to date
> table(sacIntakeList$cats_dogs$year)
 2014  2015  2016 
 9788 10716  6715

> # Sacramento intake volume by animal kind and year to date
> with(sacIntakeList$cats_dogs, table(kind, year))
     year
kind  2014 2015 2016
  CAT 4212 5314 3387
  DOG 5576 5402 3328

Last year Austin took in 7,486 cats and 10,329 dogs (total 17,815). Sacramento took in 5,314 cats and 5,402 dogs (total 10,716). The Austin ratio of cat to dog intake was 0.725, meaning that for every five cats taken in, seven dogs were taken in. By contrast, the Sacramento ratio was 0.984, meaning that cat and dog intake was one-for-one.

Intake Type

Animal shelters typically categorize intake and outcome events by type and sub-type. Unfortunately, the Austin and Sacramento open data fails to include all of this information. The Austin open data is lacking sub-type for intake events while the Sacramento open data is lacking sub-type for all events. For now, we will look at intake and outcome types only.

Here are the frequency tables of Austin and Sacramento intake types by year:

Austin intake type counts
> # Austin intake volume by animal kind, intake type and year to date
> lapply(atxIntakeList, function (data) { with(data, table(year, intake_type)) })
$cats_dogs
      intake_type
year   Euthanasia Request Owner Surrender Public Assist Stray
  2014                105            3316          1113 12971
  2015                 74            3176          1043 13522
  2016                 18            1944           582  7203

$cats
      intake_type
year   Euthanasia Request Owner Surrender Public Assist Stray
  2014                 22            1297           188  5430
  2015                 14            1267           131  6074
  2016                  6             770            75  2925

$dogs
      intake_type
year   Euthanasia Request Owner Surrender Public Assist Stray
  2014                 83            2019           925  7541
  2015                 60            1909           912  7448
  2016                 12            1174           507  4278
Austin intake type proportions
> # Austin intake volume, same as above, in proportions
> lapply(atxIntakeList, function (data) { round(prop.table(with(data, table(year, intake_type)), 1), 2) })
$cats_dogs
      intake_type
year   Euthanasia Request Owner Surrender Public Assist Stray
  2014               0.01            0.19          0.06  0.74
  2015               0.00            0.18          0.06  0.76
  2016               0.00            0.20          0.06  0.74

$cats
      intake_type
year   Euthanasia Request Owner Surrender Public Assist Stray
  2014               0.00            0.19          0.03  0.78
  2015               0.00            0.17          0.02  0.81
  2016               0.00            0.20          0.02  0.77

$dogs
      intake_type
year   Euthanasia Request Owner Surrender Public Assist Stray
  2014               0.01            0.19          0.09  0.71
  2015               0.01            0.18          0.09  0.72
  2016               0.00            0.20          0.08  0.72
Sacramento intake type counts
> # Sacramento intake volume by animal kind, intake type and year to date
> lapply(sacIntakeList, function (data) { with(data, table(year, intake_type)) })
$cats_dogs
      intake_type
year   CONFISCATE EUTH REQ OWNER SUR PR CUSTODY QUARANTINE RETURN STRAY TRANSFER
  2014        406      271       645          0         35    288  8077       66
  2015        399      248       513          4         32    327  9163       30
  2016        195       70       273          3         26    236  5892       20

$cats
      intake_type
year   CONFISCATE EUTH REQ OWNER SUR PR CUSTODY QUARANTINE RETURN STRAY TRANSFER
  2014         21       59       205          0          6    127  3776       18
  2015         39       80       212          0          4    161  4803       15
  2016         19       27       146          3          0     90  3099        3

$dogs
      intake_type
year   CONFISCATE EUTH REQ OWNER SUR PR CUSTODY QUARANTINE RETURN STRAY TRANSFER
  2014        385      212       440          0         29    161  4301       48
  2015        360      168       301          4         28    166  4360       15
  2016        176       43       127          0         26    146  2793       17

$dogs
      intake_type
year   CONFISCATE EUTH REQ OWNER SUR PR CUSTODY QUARANTINE RETURN STRAY TRANSFER
  2014        385      212       440          0         29    161  4301       48
  2015        360      168       301          4         28    166  4359       15
  2016        151       39       100          0         23    126  2357       17
Sacramento intake type proportions
> # Sacramento intake volume, same as above, in proportions
> lapply(sacIntakeList, function (data) { round(prop.table(with(data, table(year, intake_type)), 1), 2) })
$cats_dogs
      intake_type
year   CONFISCATE EUTH REQ OWNER SUR PR CUSTODY QUARANTINE RETURN STRAY TRANSFER
  2014       0.04     0.03      0.07       0.00       0.00   0.03  0.83     0.01
  2015       0.04     0.02      0.05       0.00       0.00   0.03  0.86     0.00
  2016       0.03     0.01      0.04       0.00       0.00   0.04  0.88     0.00

$cats
      intake_type
year   CONFISCATE EUTH REQ OWNER SUR PR CUSTODY QUARANTINE RETURN STRAY TRANSFER
  2014       0.00     0.01      0.05       0.00       0.00   0.03  0.90     0.00
  2015       0.01     0.02      0.04       0.00       0.00   0.03  0.90     0.00
  2016       0.01     0.01      0.04       0.00       0.00   0.03  0.91     0.00

$dogs
      intake_type
year   CONFISCATE EUTH REQ OWNER SUR PR CUSTODY QUARANTINE RETURN STRAY TRANSFER
  2014       0.07     0.04      0.08       0.00       0.01   0.03  0.77     0.01
  2015       0.07     0.03      0.06       0.00       0.01   0.03  0.81     0.00
  2016       0.05     0.01      0.04       0.00       0.01   0.04  0.84     0.01

We can visualize intake volume broken down by intake type as a Cleveland dot plot.[1] Below is the 2015 plot of intake of cats and dogs (n.b., infrequent intake types are omitted). Stray intake dominates, which is typical. Last year in Austin 76% of cats and dogs taken in were strays, whereas 86% were strays in Sacramento.

Intake type frequency, Austin and Sacramento, 2015
Plot of animal shelter intake type frequency for 2015
R source code
#
#   Function: atxSacCreateIntakeTypesPlot
#
#       Creates a Cleveland dot plot of intake type frequency for Austin
#       and Sacramento intake data.
#
#   Parameters:
#
#       atxIntakeData - Data frame of Austin intake data
#       sacIntakeData - Data frame of Sacramento intake data
#       threshold     - Intake types appearing in the data fewer times than this
#                       will be dropped
#       palette       - List of colors to use for drawing
#       xSpace        - Space between data point and label in units of the x-axis;
#                       also used to compute the right margin
#
#       Both input data frames must contain intake_type, group, and city columns.
#
#   Returns:
#
#       Plot of intake type frequency for both cities.
#

atxSacCreateIntakeTypesPlot <- function (atxIntakeData, sacIntakeData, threshold = 150,
                                         palette = hmColorPalette(), xSpace = 500)
{
    # Build a data frame that has intake type frequency (column n) by city
    # and group. Filter out frequencies less than the threshold.

    intakeTypes <- rbind(count(atxIntakeData, city, group, intake_type),
                         count(sacIntakeData, city, group, intake_type)) %>%
                   filter(n > threshold)

    xLimit <- max(intakeTypes$n) + (xSpace * 4)

    # Build a Cleveland dot plot of the intake type frequencies, faceted by city.

    ggplot(intakeTypes, aes(x = n, y = reorder(intake_type, n))) +
        geom_segment(aes(yend = intake_type, color = city), xend = 0) +
        geom_point(size = 4, aes(color = city)) +
        geom_text(aes(label = n), nudge_x = xSpace, hjust = "left", color = palette$PointLabel) +
        scale_x_continuous(limits = c(0, xLimit)) +
        scale_color_manual(values = c(palette$Atx, palette$Sac), labels = c("Atx", "Sac"), guide = FALSE) +
        xlab("Frequency") + ylab("Intake Type") + hmAxisTheme() +
        facet_grid(city ~ ., scales = "free_y", space = "free_y")
}

> # Plot Austin and Sacramento intake volume by intake type, 2015
> atxSacCreateIntakeTypesPlot(filter(atxIntakeList$cats_dogs, year == 2015), filter(sacIntakeList$cats_dogs, year == 2015))

R provides built-in functions for graphing data, as well as two widely used add-on packages: ggplot2[2] and Lattice[3]. We are using ggplot2, and, for those interested, the R source code for each graph is viewable in a drop-down panel.

Daily Intake

In Austin, the shelter is open to the public every day. Heavier intake is seen on Saturday and the first days of the week, and lighter intake on Sunday and the final days of the week. The Sacramento shelter is closed to the public on Mondays and Tuesdays. Consequently, Wednesday has the heaviest intake.

Here is median daily intake for each day of the week at each shelter:

Austin day-of-week intake
> # Austin median daily intake by day of week, 2014 to July 2016
> lapply (atxIntakeList, function (data) { aggregate(n ~ weekday, count(data, intake_date, weekday), median) })
$cats_dogs
    weekday  n
1  Saturday 48
2    Sunday 42
3    Monday 47
4   Tuesday 47
5 Wednesday 47
6  Thursday 45
7    Friday 43

$cats
    weekday    n
1  Saturday 18.0
2    Sunday 15.0
3    Monday 19.0
4   Tuesday 17.5
5 Wednesday 16.0
6  Thursday 16.0
7    Friday 16.0

$dogs
    weekday    n
1  Saturday 29.0
2    Sunday 25.0
3    Monday 28.0
4   Tuesday 27.5
5 Wednesday 31.0
6  Thursday 27.0
7    Friday 27.0
Sacramento day-of-week intake
> # Sacramento median daily intake by day of week, 2014 to July 2016
> lapply (sacIntakeList, function (data) { aggregate(n ~ weekday, count(data, intake_date, weekday), median) })
$cats_dogs
    weekday  n
1  Saturday 35
2    Sunday 27
3    Monday 11
4   Tuesday 13
5 Wednesday 45
6  Thursday 32
7    Friday 31

$cats
    weekday    n
1  Saturday 15.0
2    Sunday 13.0
3    Monday  3.0
4   Tuesday  4.0
5 Wednesday 21.5
6  Thursday 14.0
7    Friday 15.5

$dogs
    weekday  n
1  Saturday 17
2    Sunday 15
3    Monday  8
4   Tuesday  9
5 Wednesday 23
6  Thursday 17
7    Friday 16

In Austin, the median daily intake is 45 dogs and 16 cats, versus Sacramento's 15 dogs and 12 cats. Cat intake is seasonal, however; a winter day might see ten cats taken in at either shelter, compared to an early-summer day that might see 40 or more cats taken in by Austin and 30 or more cats taken in by Sacramento.

Austin daily intake distributions
> # Austin distributions of daily intake, 2014 to July 2016
> lapply(atxIntakeList, function (data) { summary(count(data, intake_date)$n) })
$cats_dogs
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   8.00   37.00   45.00   47.79   56.00  137.00 

$cats
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00   10.00   16.00   19.34   26.00   95.00 

$dogs
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   7.00   23.00   28.00   28.49   34.00   82.00 
Sacramento daily intake distributions
> # Sacramento distributions of daily intake, 2014 to July 2016
> lapply(sacIntakeList, function (data) { summary(count(data, intake_date)$n) })
$cats_dogs
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00   17.00   28.00   28.89   39.00   89.00 

$cats
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    5.00   12.00   14.32   20.00   68.00 

$dogs
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1.0    10.0    15.0    15.2    19.0    44.0
Distribution shapes
Daily intake distributions, Austin and Sacramento, 2014 to July 2016
AtxSacDailyIntakeVolumeDistributions

Here are plots of the time series and a Loess curve model[4] of the Austin daily cat intake and daily dog intake:

Daily cat intake volume, Austin, 2014 to July 2016
AtxDailyIntakeVolumeCats
Daily dog intake volume, Austin, 2014 to July 2016
AtxDailyIntakeVolumeDogs
R source code
#
#   Function: atxCreateDailyVolumePlot
#
#       Creates a plot of volume against date for Austin, overlaying the
#       line plot with a smoothed mean curve.
#
#   Parameters:
#
#       atxData - Data frame of Austin data
#       smooth  - How much to smooth the curve model
#       palette - List of colors to use for drawing
#       yLabel  - Label for the y axis
#       yLimit  - Use this maximum for y-axis instead of the computed maximum
#
#       Both input data frames must contain intake_date, group, and city columns.
#
#   Returns:
#
#       Line plot of daily intake volume
#

atxCreateDailyVolumePlot <- function (atxData,
                                      smooth = 0.2,
                                      palette = hmColorPalette(),
                                      yLabel = "Volume",
                                      yLimit = NULL)
{
    # Get data frame containing (city, group, intake_date, n)
    # columns (i.e., a count for each city, group, date combination).
    
    atxVolumeData <- count(atxData, city, group, intake_date)

    # Scale the y-axis either directly or according to the max data.
    
    if (is.null(yLimit))
        yLimit = max(atxVolumeData$n)
    
    # Build a plot of the daily intake, overlaid with a curve
    # of the smoothed mean.
    
    ggplot(atxVolumeData, aes(x = intake_date, y = n)) +
        geom_line(color = palette$AtxPoint, alpha = 0.6, lineend = "round") +
        geom_smooth(color = palette$Atx, se = FALSE, span = smooth) +
        ylab(yLabel) + xlab("Date") + hmAxisTheme() + theme(legend.position = "none") +
        scale_y_continuous(limits = c(0, yLimit))
}

> # Plot Austin daily intake volume, 2014 to July 2016
> atxCreateDailyVolumePlot(atxIntakeList$cats, yLabel = "Daily Intake Volume  \u2013  Cats", yLimit = 85)
> atxCreateDailyVolumePlot(atxIntakeList$dogs, yLabel = "Daily Intake Volume  \u2013  Dogs", yLimit = 85)

Cat intake typically has a sinusoidal shape, following the ebb and flow that comes from the seasonal mating of intact cats, whose offspring flow into animal shelters. More adult cats enter animal shelters during the warmer months, as well. The high point for cat intake is in late spring or summer, and the low point is in winter.

Here are the same plots for Sacramento:

Daily cat intake volume, Sacramento, 2014 to July 2016
SacDailyIntakeVolumeCats
Daily dog intake volume, Sacramento, 2014 to July 2016
SacDailyIntakeVolumeDogs

Monthly Intake

We can also look at the intake data aggregated by month. Here is the plot for cats, showing the monthly data points, a Loess curve model, and a linear model (n.b., the aspect ratio differs from the graphs above):

Monthly cat intake volume, Austin and Sacramento, 2014 to July 2016
AtxSacMonthlyIntakeVolumeCats
R source code
#
#   Function: atxSacCreateMonthlyVolumePlot
#
#       Creates a plot of volume against month for each city, overlaying the
#       point plot with a smoothed mean curve and a trend line.
#
#   Parameters:
#
#       atxData - Data frame of Austin data
#       sacData - Data frame of Sacramento data
#       smooth  - How much to smooth the line graph of the mean.
#       palette - List of colors to use for drawing
#       yLabel  - Label for the y axis
#       yLimit  - Use this maximum for y-axis instead of the computed maximum
#
#       Both input data frames must contain month, group, and city columns.
#
#   Returns:
#
#       Box plot of monthly volume by city.group
#

atxSacCreateMonthlyVolumePlot <- function (atxData, sacData,
                                           smooth = 0.2,
                                           palette = hmColorPalette(),
                                           yLabel = "Volume",
                                           yLimit = NULL)
{
    # Get data frame for each city containing (city, group, month, n)
    # columns (i.e., a count for each city, group, month combination).
    
    atxVolumeData <- count(atxData, city, group, month)
    sacVolumeData <- count(sacData, city, group, month)
    
    # Scale y-axis directly or by finding maximum data value.
    
    if (is.null(yLimit))
    {  
        yLimit = max(max(atxVolumeData$n), max(sacVolumeData$n))
    }

    # Build a plot of the monthly intake for each city, overlaid with a curve
    # of the smoothed mean and a trend line.

    ggplot(sacVolumeData, aes(x = month, y = n)) +
        geom_smooth(data = sacVolumeData, alpha = 0.1, color = palette$Sac, se = FALSE, method = lm, linetype = "dotted") +
        geom_smooth(aes(color = "sac"), data = sacVolumeData, se = FALSE, span = smooth) +
        geom_smooth(data = atxVolumeData, color = palette$Atx, se = FALSE, method = lm, linetype = "dotted") +
        geom_smooth(aes(color = "atx"), data = atxVolumeData, se = FALSE, span = smooth) +
        geom_point(data = atxVolumeData, color = palette$AtxPoint, stroke = 0, alpha = 0.6, size = 3) +
        geom_point(data = sacVolumeData, color = palette$SacPoint, stroke = 0, alpha = 0.5, size = 3) +
        ylab(yLabel) + xlab("Month") + hmAxisTheme() + labs(color = "City") +
        scale_color_manual(values = c("sac" = palette$Sac, "atx" = palette$Atx), labels = c("Atx", "Sac")) +
        scale_y_continuous(limits = c(0, yLimit))
}

> # Plot Austin and Sacramento monthly intake volume for cats, 2014 to July 2016
> atxSacCreateMonthlyVolumePlot(atxIntakeList$cats, sacIntakeList$cats, yLabel = "Intake Volume  \u2013  Cats", smooth = 0.32, yLimit = 1200)

Austin intake is affected by storm season in central Texas. Flash flooding, which displaces people and animals from their homes, is most common in April, May, and June, and causes spikes in animal shelter intake. The conjunction of kitten season and storm season is a special challenge for Austin, since cat intake can surge past 1,100 cats in a single month (versus lows around 300 cats per month in the winter).

Here is the plot of monthly intake over the past two and a half years for dogs:

Monthly dog intake volume, Austin and Sacramento, 2014 to July 2016
AtxSacMonthlyIntakeVolumeDogs

Sacramento monthly dog intake generally ranges between 400 and 520. Austin monthly dog intake generally ranges between 800 and 900, but has exceeded 1,000 in May and June. Combined with surging cat intake, the Austin shelter can come up against early summer months in which over two thousand animals are taken in.

Austin monthly intake distributions
> # Austin distributions of monthly intake, 2014 to July 2016
> lapply(atxIntakeList, function (data) { summary(count(data, month)$n) })
$cats_dogs
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1079    1204    1436    1454    1626    2117 

$cats
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  266.0   337.5   573.0   587.1   781.0  1103.0 

$dogs
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  765.0   818.0   857.0   866.7   893.5  1020.0 
Sacramento monthly intake distributions
> # Sacramento distributions of monthly intake, 2014 to July 2016
> lapply(sacIntakeList, function (data) { summary(count(data, month)$n) })
$cats_dogs
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    538     751     845     878    1000    1255 

$cats
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  114.0   261.5   430.0   416.5   515.5   805.0 

$dogs
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  398.0   430.5   450.0   461.5   495.0   543.0

The monthly intake data can also be represented and manipulated in R using a time series class of object.[5] The organization of a time series object is set by its frequency, which is the integral number of data points in each time cycle. In our case, there are twelve data points per year, yielding a numeric matrix in which each row is a year and each column is a recurring month. Here are the time series objects for monthly Austin intake:

> # Austin time series of monthly intake volume, 2014 to July 2016 (31 points)
> print( atxTsList <- lapply(atxIntakeList, function (data) { count(data, group, month)$n %>% ts(start = c(2014, 1), end = c(2016, 7), frequency = 12) }) )
$cats_dogs
      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
2014 1212 1087 1248 1424 1867 1699 1807 1520 1621 1458 1329 1233
2015 1161 1079 1197 1436 1997 2117 1573 1630 1538 1638 1353 1096
2016 1152 1089 1196 1474 1941 1557 1338                         

$cats
      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
2014  335  269  353  566  901  821  881  679  704  605  482  341
2015  295  266  340  537 1009 1103  778  812  754  784  488  320
2016  304  279  333  663  921  703  573                         

$dogs
      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
2014  877  818  895  858  966  878  926  841  917  853  847  892
2015  866  813  857  899  988 1014  795  818  784  854  865  776
2016  848  810  863  811 1020  854  765                         

And here are the time series objects for Sacramento monthly intake:

> # Sacramento time series of monthly intake volume, 2014 to July 2016 (31 points)
> print( sacTsList <- lapply(sacIntakeList, function (data) { count(data, group, month)$n %>% ts(start = c(2014, 1), end = c(2016, 7), frequency = 12) }) )
$cats_dogs
      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
2014  584  538  777  742  944  840 1011 1018  845  905  763  821
2015  741  733  749  923 1100  989 1145  931  891 1023  829  662
2016  779  660  753  921 1255 1221 1126                         

$cats
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2014 145 114 263 319 485 430 497 546 433 416 286 278
2015 247 255 301 493 656 591 642 517 485 514 398 215
2016 260 222 257 487 805 750 606                    

$dogs
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2014 439 424 514 423 459 410 514 472 412 489 477 543
2015 494 478 448 430 444 398 503 414 406 509 431 447
2016 519 438 496 434 450 471 520                    

Time series can be decomposed to extract trend and seasonal components. The basic idea is to estimate the trend component, by smoothing the data or by fitting a regression model, and then estimate the seasonal component, by averaging the de-trended seasonal data points (e.g., the December seasonal effect comes from the data points for all Decembers in the series). What is left over is called the "random" or the "remainder" component. Summing the three components yields the original data. Here is a quick plot of the Austin monthly cat intake series, decomposed by STL:[6]

Decomposition of Austin cat intake time series, 2014 to July 2016
AtxMonthlyCatIntakeStl
R source code
> # Austin decomposition of monthly cat intake time series, 2014 to July 2016 (31 points)
> plot(stl(atxTsList$cats, s.window = "periodic"), col.range = "orange")

In this plot, the orange range bars give a relative sense of scale, as each bar represents the same distance along the y-axis. Because two and a half years is not much data, the trend we observe is a local trend only. Still, we can plainly see that cat intake was higher during the summer of 2015 than the previous summer, and then was appreciably lower during the summer of 2016.

Another way of visualizing the same data is by subtracting out the seasonal effect and observing the trend plus the random component. Here are the seasonally adjusted time series for monthly cat intake and dog intake at both shelters:

Trend and seasonally adjusted cat intake, Austin and Sacramento, 2014 to July 2016
AtxSacSeasonallyAdjustedMonthlyIntakeCats
Trend and seasonally adjusted dog intake, Austin and Sacramento, 2014 to July 2016
AtxSacSeasonallyAdjustedMonthlyIntakeDogs

Austin dog intake has been trending down for the past year and a half (at an approximate rate of three fewer dogs per month). Sacramento cat intake has been trending up for two and a half years (at an approximate rate of six more cats per month).

Distinct Animals

So far we have been analyzing intake events, but, when an animal has been taken in and discharged several times, multiple intake events will pertain to the same animal. Generally, animal shelters try to assign the identical animal identification number whenever they know an animal coming in has been impounded before. Therefore, we can get some idea about recurring intakes by looking at unique animal IDs.

Austin intake events vs. distinct animals
> # Austin counts of intake events and distinct animals, 2014 to July 2016 
> lapply(atxIntakeList, function(data) { c(intake_events = nrow(data), animals = length(unique(data$animal_id))) })
$cats_dogs
intake_events       animals 
        45067         41306 

$cats
intake_events       animals 
        18199         17612 

$dogs
intake_events       animals 
        26868         23694 
Sacramento intake events vs. distinct animals
> # Sacramento counts of intake events and distinct animals, 2014 to July 2016 
> lapply(sacIntakeList, function(data) { c(intake_events = nrow(data), animals = length(unique(data$animal_id))) })
$cats_dogs
intake_events       animals 
        27219         25736 

$cats
intake_events       animals 
        12913         12415 

$dogs
intake_events       animals 
        14306         13321
Austin intake recurrence counts and proportions
> # Austin counts and proportions of animals taken up once, twice, etc.
> lapply(atxIntakeList, calcIntakeRecurrence)
$cats_dogs
  times_impounded num_animals proportion
1               1       38216      0.925
2               2        2584      0.063
3               3         395      0.010
4               4          75      0.002
5               5          26      0.001
6               6           6      0.000
7               7           2      0.000
8               8           1      0.000
9              10           1      0.000

$cats
  times_impounded num_animals proportion
1               1       17069      0.969
2               2         501      0.028
3               3          40      0.002
4               4           2      0.000

$dogs
  times_impounded num_animals proportion
1               1       21147      0.893
2               2        2083      0.088
3               3         355      0.015
4               4          73      0.003
5               5          26      0.001
6               6           6      0.000
7               7           2      0.000
8               8           1      0.000
9              10           1      0.000
Sacramento intake recurrence counts and proportions
> # Sacramento counts and proportions of animals taken up once, twice, etc.
> lapply(sacIntakeList, calcIntakeRecurrence)
$cats_dogs
  times_impounded num_animals proportion
1               1       24432      0.949
2               2        1162      0.045
3               3         117      0.005
4               4          18      0.001
5               5           4      0.000
6               6           1      0.000
7               7           2      0.000

$cats
  times_impounded num_animals proportion
1               1       11951      0.963
2               2         433      0.035
3               3          28      0.002
4               4           3      0.000

$dogs
  times_impounded num_animals proportion
1               1       12481      0.937
2               2         729      0.055
3               3          89      0.007
4               4          15      0.001
5               5           4      0.000
6               6           1      0.000
7               7           2      0.000
R source code
#
#   Function: calcIntakeRecurrence
#
#       Create a data frame describing recurring intake events in the
#       given data.
#
#   Parameters:
#
#       intakeData  - Intake data frame
#
#   Returns:
#
#       Data frame describing recurring intake events, with columns
#       for number of times taken in, count of animals taken in that
#       many times, and the proportion of animals.
#

calcIntakeRecurrence <- function (intakeData)
{
    # Group the intake events by animal ID and then count the number
    # of intake events for each distinct animal. Create a frequency table
    # (really a data frame) from the results.

    intakeRecurrenceData <- as.data.frame(table(summarize(group_by(intakeData, animal_id), count = n())$count))
    colnames(intakeRecurrenceData) <- c("times_impounded", "num_animals")
    
    # Add the column that states the proportion of animals taken in
    # that many times.

    intakeRecurrenceData$proportion <- with(intakeRecurrenceData, round(num_animals / sum(num_animals), digits = 3))

    return(intakeRecurrenceData)
}

In Austin, during the past two and a half years, about 11% of distinct dogs impounded were impounded more than once, and 3% of distinct cats. In Sacramento, the proportions were 6% of distinct dogs and 4% of distinct cats. The table below shows the breakdown:

Austin Sacramento
Cats Dogs Cats Dogs
Once 97 % 89 % 96 % 94 %
More than once 3 % 11 % 4 % 6 %
More than twice < 1 % 2 % < 1 % 1 %
Proportions of cat and dog recurring intake, 2014 to July 2016

Adoption Churn

We might also ask how many distinct animals were adopted and then returned one or more times. The best approach is to identify return intake events by intake sub-type, but that information is not available in the open data. Consequently, we will estimate adoption churn by simply calculating the proportions of dogs and cats with multiple adoption outcome events.

Austin adoption recurrence counts and proportions
> # Austin counts and proportions of animals adopted once, twice, etc.
> lapply(atxOutcomeList, calcAdoptionRecurrence)
$cats_dogs
  times_adopted num_animals proportion
1             1       16431      0.933
2             2        1057      0.060
3             3         111      0.006
4             4          14      0.001

$cats
  times_adopted num_animals proportion
1             1        6567      0.964
2             2         229      0.034
3             3          19      0.003

$dogs
  times_adopted num_animals proportion
1             1        9864      0.914
2             2         828      0.077
3             3          92      0.009
4             4          14      0.001
Sacramento adoption recurrence counts and proportions
> # Sacramento counts and proportions of animals adopted once, twice, etc.
> lapply(sacOutcomeList, calcAdoptionRecurrence)
$cats_dogs
  times_adopted num_animals proportion
1             1       10904      0.938
2             2         669      0.058
3             3          50      0.004
4             4           5      0.000

$cats
  times_adopted num_animals proportion
1             1        5566      0.942
2             2         320      0.054
3             3          19      0.003
4             4           3      0.001

$dogs
  times_adopted num_animals proportion
1             1        5338      0.933
2             2         349      0.061
3             3          31      0.005
4             4           2      0.000

In Austin, during the past two and a half years, about 9% of distinct dogs that were adopted out had multiple adoption events, and 4% of distinct cats. In Sacramento, the proportions were 7% of distinct dogs adopted out and 6% of distinct cats. The table below shows the breakdown:

Austin Sacramento
Cats Dogs Cats Dogs
Once 96 % 91 % 94 % 93 %
More than once 4 % 9 % 6 % 7 %
More than twice < 1 % 1 % < 1 % < 1 %
Proportions of cat and dog recurring adoptions, 2014 to July 2016

To Be Continued

The basic numbers describing intake tell us a little bit, but our best awareness will come from investigating outcomes. Axiomatically, life-saving performance can only be improved by looking at outcomes, by scrutinizing what animals are still being killed and understanding the circumstances and forces at work.

Austin Pets Alive! has for many years employed a data-driven strategy to identify and save classes of shelter animals with high probabilities of losing their lives because of lethal injection.[7] As a result, and while there are still some classes of animals that are not well-served, the Austin Animal Center with the help of Austin Pets Alive! has developed uncommon expertise in saving, caring for, and re-homing a range of animals, including those with injuries or illnesses.[8]

In the next post in this series, we'll explore the outcome data for the Austin and Sacramento shelters, and see what we can learn.

Photo "Indie!!!!!!" by Eileen McFall is licensed under CC BY-NC 2.0, cropped from original.

References

  1. Cleveland, William S. 1994. The Elements of Graphing Data. Hobart Press. (Amazon)
  2. Wickham, Hadley. 2016. ggplot2: Elegant Graphics for Data Analysis. Springer. (Amazon)
  3. Sarkar, Deepayan. 2008. Lattice: Multivariate Data Visualization with R. Springer. (Amazon)
  4. Cleveland, William S. 1979. Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association 74, no. 368: 829-836.
  5. Cowpertwait, Paul S.P., and Andrew Metcalfe. 2009. Introductory Time Series with R. Springer. (Amazon)
  6. Cleveland, Robert B., William Cleveland, Jean McRae, and Irma Terpenning. 1990. STL: A Seasonal-Trend Decomposition Procedure Based on Loess. Journal of Official Statistics 6, no. 1: 3-73.
  7. Jefferson, Ellen. 2012. Using Data to Make Austin a No-Kill City.
  8. Auerbach, Kristen. 2015. How Austin Became America's Largest No-Kill City.

Leave a Reply

Your email address will not be published. Required fields are marked *