In this third post in the series, we’ll see how to use R and open data to calculate and visualize some basic numbers about intake at the Austin and Sacramento city shelters (Austin Animal Center and City of Sacramento Animal Care Services).
Intake
We’ll look at the open data from 2014 to the time of this writing, approximately two and a half years of data (i.e., thirty-one months; from January 2014 to July 2016). The toolkit provides a function that filters impoundment data for a given range of intake dates, breaks the data into three groups (cats and dogs, cats only, and dogs only), and returns three corresponding data frames. Each data frame has columns for city, group, number of days held, year, month, and day of intake, which makes it easy to pick out rows or combine rows from any of these data frames. By default the function filters out intake events for animals that are not cats or dogs (the Sacramento open data does not include records for other kinds of animals in the first place) and animals being returned from foster care.
atxOpenData <- atxLoadOpenData() sacOpenData <- sacLoadOpenData() StartDate <- "2014-01-01" EndDate <- "2016-07-31" atxIntakeList <- atxCreateIntakeList(atxOpenData, StartDate, EndDate) sacIntakeList <- sacCreateIntakeList(sacOpenData, StartDate, EndDate)
Intake Volume
Once we have our data organized, we can readily calculate the intake volume of cats and dogs by year for each shelter:
> # Austin intake volume by year to date > table(atxIntakeList$cats_dogs$year) 2014 2015 2016 17505 17815 9747 > # Austin intake volume by animal kind and year to date > with(atxIntakeList$cats_dogs, table(kind, year)) year kind 2014 2015 2016 Cat 6937 7486 3776 Dog 10568 10329 5971 > # Sacramento intake volume by year to date > table(sacIntakeList$cats_dogs$year) 2014 2015 2016 9788 10716 6715 > # Sacramento intake volume by animal kind and year to date > with(sacIntakeList$cats_dogs, table(kind, year)) year kind 2014 2015 2016 CAT 4212 5314 3387 DOG 5576 5402 3328
Last year Austin took in 7,486 cats and 10,329 dogs (total 17,815). Sacramento took in 5,314 cats and 5,402 dogs (total 10,716). The Austin ratio of cat to dog intake was 0.725, meaning that for every five cats taken in, seven dogs were taken in. By contrast, the Sacramento ratio was 0.984, meaning that cat and dog intake was one-for-one.
Intake Type
Animal shelters typically categorize intake and outcome events by type and sub-type. Unfortunately, the Austin and Sacramento open data fails to include all of this information. The Austin open data is lacking sub-type for intake events while the Sacramento open data is lacking sub-type for all events. For now, we will look at intake and outcome types only.
Here are the frequency tables of Austin and Sacramento intake types by year:
Austin intake type counts
> # Austin intake volume by animal kind, intake type and year to date > lapply(atxIntakeList, function (data) { with(data, table(year, intake_type)) }) $cats_dogs intake_type year Euthanasia Request Owner Surrender Public Assist Stray 2014 105 3316 1113 12971 2015 74 3176 1043 13522 2016 18 1944 582 7203 $cats intake_type year Euthanasia Request Owner Surrender Public Assist Stray 2014 22 1297 188 5430 2015 14 1267 131 6074 2016 6 770 75 2925 $dogs intake_type year Euthanasia Request Owner Surrender Public Assist Stray 2014 83 2019 925 7541 2015 60 1909 912 7448 2016 12 1174 507 4278
Austin intake type proportions
> # Austin intake volume, same as above, in proportions > lapply(atxIntakeList, function (data) { round(prop.table(with(data, table(year, intake_type)), 1), 2) }) $cats_dogs intake_type year Euthanasia Request Owner Surrender Public Assist Stray 2014 0.01 0.19 0.06 0.74 2015 0.00 0.18 0.06 0.76 2016 0.00 0.20 0.06 0.74 $cats intake_type year Euthanasia Request Owner Surrender Public Assist Stray 2014 0.00 0.19 0.03 0.78 2015 0.00 0.17 0.02 0.81 2016 0.00 0.20 0.02 0.77 $dogs intake_type year Euthanasia Request Owner Surrender Public Assist Stray 2014 0.01 0.19 0.09 0.71 2015 0.01 0.18 0.09 0.72 2016 0.00 0.20 0.08 0.72
Sacramento intake type counts
> # Sacramento intake volume by animal kind, intake type and year to date > lapply(sacIntakeList, function (data) { with(data, table(year, intake_type)) }) $cats_dogs intake_type year CONFISCATE EUTH REQ OWNER SUR PR CUSTODY QUARANTINE RETURN STRAY TRANSFER 2014 406 271 645 0 35 288 8077 66 2015 399 248 513 4 32 327 9163 30 2016 195 70 273 3 26 236 5892 20 $cats intake_type year CONFISCATE EUTH REQ OWNER SUR PR CUSTODY QUARANTINE RETURN STRAY TRANSFER 2014 21 59 205 0 6 127 3776 18 2015 39 80 212 0 4 161 4803 15 2016 19 27 146 3 0 90 3099 3 $dogs intake_type year CONFISCATE EUTH REQ OWNER SUR PR CUSTODY QUARANTINE RETURN STRAY TRANSFER 2014 385 212 440 0 29 161 4301 48 2015 360 168 301 4 28 166 4360 15 2016 176 43 127 0 26 146 2793 17 $dogs intake_type year CONFISCATE EUTH REQ OWNER SUR PR CUSTODY QUARANTINE RETURN STRAY TRANSFER 2014 385 212 440 0 29 161 4301 48 2015 360 168 301 4 28 166 4359 15 2016 151 39 100 0 23 126 2357 17
Sacramento intake type proportions
> # Sacramento intake volume, same as above, in proportions > lapply(sacIntakeList, function (data) { round(prop.table(with(data, table(year, intake_type)), 1), 2) }) $cats_dogs intake_type year CONFISCATE EUTH REQ OWNER SUR PR CUSTODY QUARANTINE RETURN STRAY TRANSFER 2014 0.04 0.03 0.07 0.00 0.00 0.03 0.83 0.01 2015 0.04 0.02 0.05 0.00 0.00 0.03 0.86 0.00 2016 0.03 0.01 0.04 0.00 0.00 0.04 0.88 0.00 $cats intake_type year CONFISCATE EUTH REQ OWNER SUR PR CUSTODY QUARANTINE RETURN STRAY TRANSFER 2014 0.00 0.01 0.05 0.00 0.00 0.03 0.90 0.00 2015 0.01 0.02 0.04 0.00 0.00 0.03 0.90 0.00 2016 0.01 0.01 0.04 0.00 0.00 0.03 0.91 0.00 $dogs intake_type year CONFISCATE EUTH REQ OWNER SUR PR CUSTODY QUARANTINE RETURN STRAY TRANSFER 2014 0.07 0.04 0.08 0.00 0.01 0.03 0.77 0.01 2015 0.07 0.03 0.06 0.00 0.01 0.03 0.81 0.00 2016 0.05 0.01 0.04 0.00 0.01 0.04 0.84 0.01
We can visualize intake volume broken down by intake type as a Cleveland dot
R source code
# # Function: atxSacCreateIntakeTypesPlot # # Creates a Cleveland dot plot of intake type frequency for Austin # and Sacramento intake data. # # Parameters: # # atxIntakeData - Data frame of Austin intake data # sacIntakeData - Data frame of Sacramento intake data # threshold - Intake types appearing in the data fewer times than this # will be dropped # palette - List of colors to use for drawing # xSpace - Space between data point and label in units of the x-axis; # also used to compute the right margin # # Both input data frames must contain intake_type, group, and city columns. # # Returns: # # Plot of intake type frequency for both cities. # atxSacCreateIntakeTypesPlot <- function (atxIntakeData, sacIntakeData, threshold = 150, palette = hmColorPalette(), xSpace = 500) { # Build a data frame that has intake type frequency (column n) by city # and group. Filter out frequencies less than the threshold. intakeTypes <- rbind(count(atxIntakeData, city, group, intake_type), count(sacIntakeData, city, group, intake_type)) %>% filter(n > threshold) xLimit <- max(intakeTypes$n) + (xSpace * 4) # Build a Cleveland dot plot of the intake type frequencies, faceted by city. ggplot(intakeTypes, aes(x = n, y = reorder(intake_type, n))) + geom_segment(aes(yend = intake_type, color = city), xend = 0) + geom_point(size = 4, aes(color = city)) + geom_text(aes(label = n), nudge_x = xSpace, hjust = "left", color = palette$PointLabel) + scale_x_continuous(limits = c(0, xLimit)) + scale_color_manual(values = c(palette$Atx, palette$Sac), labels = c("Atx", "Sac"), guide = FALSE) + xlab("Frequency") + ylab("Intake Type") + hmAxisTheme() + facet_grid(city ~ ., scales = "free_y", space = "free_y") } > # Plot Austin and Sacramento intake volume by intake type, 2015 > atxSacCreateIntakeTypesPlot(filter(atxIntakeList$cats_dogs, year == 2015), filter(sacIntakeList$cats_dogs, year == 2015))
R provides built-in functions for graphing data, as well as two widely used add-on packages:
Daily Intake
In Austin, the shelter is open to the public every day. Heavier intake is seen on Saturday and the first days of the week, and lighter intake on Sunday and the final days of the week. The Sacramento shelter is closed to the public on Mondays and Tuesdays. Consequently, Wednesday has the heaviest intake.
Here is median daily intake for each day of the week at each shelter:
Austin day-of-week intake
> # Austin median daily intake by day of week, 2014 to July 2016 > lapply (atxIntakeList, function (data) { aggregate(n ~ weekday, count(data, intake_date, weekday), median) }) $cats_dogs weekday n 1 Saturday 48 2 Sunday 42 3 Monday 47 4 Tuesday 47 5 Wednesday 47 6 Thursday 45 7 Friday 43 $cats weekday n 1 Saturday 18.0 2 Sunday 15.0 3 Monday 19.0 4 Tuesday 17.5 5 Wednesday 16.0 6 Thursday 16.0 7 Friday 16.0 $dogs weekday n 1 Saturday 29.0 2 Sunday 25.0 3 Monday 28.0 4 Tuesday 27.5 5 Wednesday 31.0 6 Thursday 27.0 7 Friday 27.0
Sacramento day-of-week intake
> # Sacramento median daily intake by day of week, 2014 to July 2016 > lapply (sacIntakeList, function (data) { aggregate(n ~ weekday, count(data, intake_date, weekday), median) }) $cats_dogs weekday n 1 Saturday 35 2 Sunday 27 3 Monday 11 4 Tuesday 13 5 Wednesday 45 6 Thursday 32 7 Friday 31 $cats weekday n 1 Saturday 15.0 2 Sunday 13.0 3 Monday 3.0 4 Tuesday 4.0 5 Wednesday 21.5 6 Thursday 14.0 7 Friday 15.5 $dogs weekday n 1 Saturday 17 2 Sunday 15 3 Monday 8 4 Tuesday 9 5 Wednesday 23 6 Thursday 17 7 Friday 16
In Austin, the median daily intake is 45 dogs and 16 cats, versus Sacramento's 15 dogs and 12 cats. Cat intake is seasonal, however; a winter day might see ten cats taken in at either shelter, compared to an early-summer day that might see 40 or more cats taken in by Austin and 30 or more cats taken in by Sacramento.
Austin daily intake distributions
> # Austin distributions of daily intake, 2014 to July 2016 > lapply(atxIntakeList, function (data) { summary(count(data, intake_date)$n) }) $cats_dogs Min. 1st Qu. Median Mean 3rd Qu. Max. 8.00 37.00 45.00 47.79 56.00 137.00 $cats Min. 1st Qu. Median Mean 3rd Qu. Max. 1.00 10.00 16.00 19.34 26.00 95.00 $dogs Min. 1st Qu. Median Mean 3rd Qu. Max. 7.00 23.00 28.00 28.49 34.00 82.00
Sacramento daily intake distributions
> # Sacramento distributions of daily intake, 2014 to July 2016 > lapply(sacIntakeList, function (data) { summary(count(data, intake_date)$n) }) $cats_dogs Min. 1st Qu. Median Mean 3rd Qu. Max. 1.00 17.00 28.00 28.89 39.00 89.00 $cats Min. 1st Qu. Median Mean 3rd Qu. Max. 1.00 5.00 12.00 14.32 20.00 68.00 $dogs Min. 1st Qu. Median Mean 3rd Qu. Max. 1.0 10.0 15.0 15.2 19.0 44.0
Here are plots of the time series and a Loess curve
R source code
# # Function: atxCreateDailyVolumePlot # # Creates a plot of volume against date for Austin, overlaying the # line plot with a smoothed mean curve. # # Parameters: # # atxData - Data frame of Austin data # smooth - How much to smooth the curve model # palette - List of colors to use for drawing # yLabel - Label for the y axis # yLimit - Use this maximum for y-axis instead of the computed maximum # # Both input data frames must contain intake_date, group, and city columns. # # Returns: # # Line plot of daily intake volume # atxCreateDailyVolumePlot <- function (atxData, smooth = 0.2, palette = hmColorPalette(), yLabel = "Volume", yLimit = NULL) { # Get data frame containing (city, group, intake_date, n) # columns (i.e., a count for each city, group, date combination). atxVolumeData <- count(atxData, city, group, intake_date) # Scale the y-axis either directly or according to the max data. if (is.null(yLimit)) yLimit = max(atxVolumeData$n) # Build a plot of the daily intake, overlaid with a curve # of the smoothed mean. ggplot(atxVolumeData, aes(x = intake_date, y = n)) + geom_line(color = palette$AtxPoint, alpha = 0.6, lineend = "round") + geom_smooth(color = palette$Atx, se = FALSE, span = smooth) + ylab(yLabel) + xlab("Date") + hmAxisTheme() + theme(legend.position = "none") + scale_y_continuous(limits = c(0, yLimit)) } > # Plot Austin daily intake volume, 2014 to July 2016 > atxCreateDailyVolumePlot(atxIntakeList$cats, yLabel = "Daily Intake Volume \u2013 Cats", yLimit = 85) > atxCreateDailyVolumePlot(atxIntakeList$dogs, yLabel = "Daily Intake Volume \u2013 Dogs", yLimit = 85)
Cat intake typically has a sinusoidal shape, following the ebb and flow that comes from the seasonal mating of intact cats, whose offspring flow into animal shelters. More adult cats enter animal shelters during the warmer months, as well. The high point for cat intake is in late spring or summer, and the low point is in winter.
Here are the same plots for Sacramento:
Monthly Intake
We can also look at the intake data aggregated by month. Here is the plot for cats, showing the monthly data points, a Loess curve model, and a linear model (n.b., the aspect ratio differs from the graphs above):
R source code
# # Function: atxSacCreateMonthlyVolumePlot # # Creates a plot of volume against month for each city, overlaying the # point plot with a smoothed mean curve and a trend line. # # Parameters: # # atxData - Data frame of Austin data # sacData - Data frame of Sacramento data # smooth - How much to smooth the line graph of the mean. # palette - List of colors to use for drawing # yLabel - Label for the y axis # yLimit - Use this maximum for y-axis instead of the computed maximum # # Both input data frames must contain month, group, and city columns. # # Returns: # # Box plot of monthly volume by city.group # atxSacCreateMonthlyVolumePlot <- function (atxData, sacData, smooth = 0.2, palette = hmColorPalette(), yLabel = "Volume", yLimit = NULL) { # Get data frame for each city containing (city, group, month, n) # columns (i.e., a count for each city, group, month combination). atxVolumeData <- count(atxData, city, group, month) sacVolumeData <- count(sacData, city, group, month) # Scale y-axis directly or by finding maximum data value. if (is.null(yLimit)) { yLimit = max(max(atxVolumeData$n), max(sacVolumeData$n)) } # Build a plot of the monthly intake for each city, overlaid with a curve # of the smoothed mean and a trend line. ggplot(sacVolumeData, aes(x = month, y = n)) + geom_smooth(data = sacVolumeData, alpha = 0.1, color = palette$Sac, se = FALSE, method = lm, linetype = "dotted") + geom_smooth(aes(color = "sac"), data = sacVolumeData, se = FALSE, span = smooth) + geom_smooth(data = atxVolumeData, color = palette$Atx, se = FALSE, method = lm, linetype = "dotted") + geom_smooth(aes(color = "atx"), data = atxVolumeData, se = FALSE, span = smooth) + geom_point(data = atxVolumeData, color = palette$AtxPoint, stroke = 0, alpha = 0.6, size = 3) + geom_point(data = sacVolumeData, color = palette$SacPoint, stroke = 0, alpha = 0.5, size = 3) + ylab(yLabel) + xlab("Month") + hmAxisTheme() + labs(color = "City") + scale_color_manual(values = c("sac" = palette$Sac, "atx" = palette$Atx), labels = c("Atx", "Sac")) + scale_y_continuous(limits = c(0, yLimit)) } > # Plot Austin and Sacramento monthly intake volume for cats, 2014 to July 2016 > atxSacCreateMonthlyVolumePlot(atxIntakeList$cats, sacIntakeList$cats, yLabel = "Intake Volume \u2013 Cats", smooth = 0.32, yLimit = 1200)
Austin intake is affected by storm season in central Texas. Flash flooding, which displaces people and animals from their homes, is most common in April, May, and June, and causes spikes in animal shelter intake. The conjunction of kitten season and storm season is a special challenge for Austin, since cat intake can surge past 1,100 cats in a single month (versus lows around 300 cats per month in the winter).
Here is the plot of monthly intake over the past two and a half years for dogs:
Sacramento monthly dog intake generally ranges between 400 and 520. Austin monthly dog intake generally ranges between 800 and 900, but has exceeded 1,000 in May and June. Combined with surging cat intake, the Austin shelter can come up against early summer months in which over two thousand animals are taken in.
Austin monthly intake distributions
> # Austin distributions of monthly intake, 2014 to July 2016 > lapply(atxIntakeList, function (data) { summary(count(data, month)$n) }) $cats_dogs Min. 1st Qu. Median Mean 3rd Qu. Max. 1079 1204 1436 1454 1626 2117 $cats Min. 1st Qu. Median Mean 3rd Qu. Max. 266.0 337.5 573.0 587.1 781.0 1103.0 $dogs Min. 1st Qu. Median Mean 3rd Qu. Max. 765.0 818.0 857.0 866.7 893.5 1020.0
Sacramento monthly intake distributions
> # Sacramento distributions of monthly intake, 2014 to July 2016 > lapply(sacIntakeList, function (data) { summary(count(data, month)$n) }) $cats_dogs Min. 1st Qu. Median Mean 3rd Qu. Max. 538 751 845 878 1000 1255 $cats Min. 1st Qu. Median Mean 3rd Qu. Max. 114.0 261.5 430.0 416.5 515.5 805.0 $dogs Min. 1st Qu. Median Mean 3rd Qu. Max. 398.0 430.5 450.0 461.5 495.0 543.0
The monthly intake data can also be represented and manipulated in R using a time series class of
> # Austin time series of monthly intake volume, 2014 to July 2016 (31 points) > print( atxTsList <- lapply(atxIntakeList, function (data) { count(data, group, month)$n %>% ts(start = c(2014, 1), end = c(2016, 7), frequency = 12) }) ) $cats_dogs Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2014 1212 1087 1248 1424 1867 1699 1807 1520 1621 1458 1329 1233 2015 1161 1079 1197 1436 1997 2117 1573 1630 1538 1638 1353 1096 2016 1152 1089 1196 1474 1941 1557 1338 $cats Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2014 335 269 353 566 901 821 881 679 704 605 482 341 2015 295 266 340 537 1009 1103 778 812 754 784 488 320 2016 304 279 333 663 921 703 573 $dogs Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2014 877 818 895 858 966 878 926 841 917 853 847 892 2015 866 813 857 899 988 1014 795 818 784 854 865 776 2016 848 810 863 811 1020 854 765
And here are the time series objects for Sacramento monthly intake:
> # Sacramento time series of monthly intake volume, 2014 to July 2016 (31 points) > print( sacTsList <- lapply(sacIntakeList, function (data) { count(data, group, month)$n %>% ts(start = c(2014, 1), end = c(2016, 7), frequency = 12) }) ) $cats_dogs Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2014 584 538 777 742 944 840 1011 1018 845 905 763 821 2015 741 733 749 923 1100 989 1145 931 891 1023 829 662 2016 779 660 753 921 1255 1221 1126 $cats Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2014 145 114 263 319 485 430 497 546 433 416 286 278 2015 247 255 301 493 656 591 642 517 485 514 398 215 2016 260 222 257 487 805 750 606 $dogs Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2014 439 424 514 423 459 410 514 472 412 489 477 543 2015 494 478 448 430 444 398 503 414 406 509 431 447 2016 519 438 496 434 450 471 520
Time series can be decomposed to extract trend and seasonal components. The basic idea is to estimate the trend component, by smoothing the data or by fitting a regression model, and then estimate the seasonal component, by averaging the de-trended seasonal data points (e.g., the December seasonal effect comes from the data points for all Decembers in the series). What is left over is called the "random" or the "remainder" component. Summing the three components yields the original data. Here is a quick plot of the Austin monthly cat intake series, decomposed by
R source code
> # Austin decomposition of monthly cat intake time series, 2014 to July 2016 (31 points) > plot(stl(atxTsList$cats, s.window = "periodic"), col.range = "orange")
In this plot, the orange range bars give a relative sense of scale, as each bar represents the same distance along the y-axis. Because two and a half years is not much data, the trend we observe is a local trend only. Still, we can plainly see that cat intake was higher during the summer of 2015 than the previous summer, and then was appreciably lower during the summer of 2016.
Another way of visualizing the same data is by subtracting out the seasonal effect and observing the trend plus the random component. Here are the seasonally adjusted time series for monthly cat intake and dog intake at both shelters:
Austin dog intake has been trending down for the past year and a half (at an approximate rate of three fewer dogs per month). Sacramento cat intake has been trending up for two and a half years (at an approximate rate of six more cats per month).
Distinct Animals
So far we have been analyzing intake events, but, when an animal has been taken in and discharged several times, multiple intake events will pertain to the same animal. Generally, animal shelters try to assign the identical animal identification number whenever they know an animal coming in has been impounded before. Therefore, we can get some idea about recurring intakes by looking at unique animal IDs.
Austin intake events vs. distinct animals
> # Austin counts of intake events and distinct animals, 2014 to July 2016 > lapply(atxIntakeList, function(data) { c(intake_events = nrow(data), animals = length(unique(data$animal_id))) }) $cats_dogs intake_events animals 45067 41306 $cats intake_events animals 18199 17612 $dogs intake_events animals 26868 23694
Sacramento intake events vs. distinct animals
> # Sacramento counts of intake events and distinct animals, 2014 to July 2016 > lapply(sacIntakeList, function(data) { c(intake_events = nrow(data), animals = length(unique(data$animal_id))) }) $cats_dogs intake_events animals 27219 25736 $cats intake_events animals 12913 12415 $dogs intake_events animals 14306 13321
Austin intake recurrence counts and proportions
> # Austin counts and proportions of animals taken up once, twice, etc. > lapply(atxIntakeList, calcIntakeRecurrence) $cats_dogs times_impounded num_animals proportion 1 1 38216 0.925 2 2 2584 0.063 3 3 395 0.010 4 4 75 0.002 5 5 26 0.001 6 6 6 0.000 7 7 2 0.000 8 8 1 0.000 9 10 1 0.000 $cats times_impounded num_animals proportion 1 1 17069 0.969 2 2 501 0.028 3 3 40 0.002 4 4 2 0.000 $dogs times_impounded num_animals proportion 1 1 21147 0.893 2 2 2083 0.088 3 3 355 0.015 4 4 73 0.003 5 5 26 0.001 6 6 6 0.000 7 7 2 0.000 8 8 1 0.000 9 10 1 0.000
Sacramento intake recurrence counts and proportions
> # Sacramento counts and proportions of animals taken up once, twice, etc. > lapply(sacIntakeList, calcIntakeRecurrence) $cats_dogs times_impounded num_animals proportion 1 1 24432 0.949 2 2 1162 0.045 3 3 117 0.005 4 4 18 0.001 5 5 4 0.000 6 6 1 0.000 7 7 2 0.000 $cats times_impounded num_animals proportion 1 1 11951 0.963 2 2 433 0.035 3 3 28 0.002 4 4 3 0.000 $dogs times_impounded num_animals proportion 1 1 12481 0.937 2 2 729 0.055 3 3 89 0.007 4 4 15 0.001 5 5 4 0.000 6 6 1 0.000 7 7 2 0.000
R source code
# # Function: calcIntakeRecurrence # # Create a data frame describing recurring intake events in the # given data. # # Parameters: # # intakeData - Intake data frame # # Returns: # # Data frame describing recurring intake events, with columns # for number of times taken in, count of animals taken in that # many times, and the proportion of animals. # calcIntakeRecurrence <- function (intakeData) { # Group the intake events by animal ID and then count the number # of intake events for each distinct animal. Create a frequency table # (really a data frame) from the results. intakeRecurrenceData <- as.data.frame(table(summarize(group_by(intakeData, animal_id), count = n())$count)) colnames(intakeRecurrenceData) <- c("times_impounded", "num_animals") # Add the column that states the proportion of animals taken in # that many times. intakeRecurrenceData$proportion <- with(intakeRecurrenceData, round(num_animals / sum(num_animals), digits = 3)) return(intakeRecurrenceData) }
In Austin, during the past two and a half years, about 11% of distinct dogs impounded were impounded more than once, and 3% of distinct cats. In Sacramento, the proportions were 6% of distinct dogs and 4% of distinct cats. The table below shows the breakdown:
Austin | Sacramento | |||
---|---|---|---|---|
Cats | Dogs | Cats | Dogs | |
Once | 97 % | 89 % | 96 % | 94 % |
More than once | 3 % | 11 % | 4 % | 6 % |
More than twice | < 1 % | 2 % | < 1 % | 1 % |
Adoption Churn
We might also ask how many distinct animals were adopted and then returned one or more times. The best approach is to identify return intake events by intake sub-type, but that information is not available in the open data. Consequently, we will estimate adoption churn by simply calculating the proportions of dogs and cats with multiple adoption outcome events.
Austin adoption recurrence counts and proportions
> # Austin counts and proportions of animals adopted once, twice, etc. > lapply(atxOutcomeList, calcAdoptionRecurrence) $cats_dogs times_adopted num_animals proportion 1 1 16431 0.933 2 2 1057 0.060 3 3 111 0.006 4 4 14 0.001 $cats times_adopted num_animals proportion 1 1 6567 0.964 2 2 229 0.034 3 3 19 0.003 $dogs times_adopted num_animals proportion 1 1 9864 0.914 2 2 828 0.077 3 3 92 0.009 4 4 14 0.001
Sacramento adoption recurrence counts and proportions
> # Sacramento counts and proportions of animals adopted once, twice, etc. > lapply(sacOutcomeList, calcAdoptionRecurrence) $cats_dogs times_adopted num_animals proportion 1 1 10904 0.938 2 2 669 0.058 3 3 50 0.004 4 4 5 0.000 $cats times_adopted num_animals proportion 1 1 5566 0.942 2 2 320 0.054 3 3 19 0.003 4 4 3 0.001 $dogs times_adopted num_animals proportion 1 1 5338 0.933 2 2 349 0.061 3 3 31 0.005 4 4 2 0.000
In Austin, during the past two and a half years, about 9% of distinct dogs that were adopted out had multiple adoption events, and 4% of distinct cats. In Sacramento, the proportions were 7% of distinct dogs adopted out and 6% of distinct cats. The table below shows the breakdown:
Austin | Sacramento | |||
---|---|---|---|---|
Cats | Dogs | Cats | Dogs | |
Once | 96 % | 91 % | 94 % | 93 % |
More than once | 4 % | 9 % | 6 % | 7 % |
More than twice | < 1 % | 1 % | < 1 % | < 1 % |
To Be Continued
The basic numbers describing intake tell us a little bit, but our best awareness will come from investigating outcomes. Axiomatically, life-saving performance can only be improved by looking at outcomes, by scrutinizing what animals are still being killed and understanding the circumstances and forces at work.
Austin Pets Alive! has for many years employed a data-driven strategy to identify and save classes of shelter animals with high probabilities of losing their lives because of lethal
In the next post in this series, we'll explore the outcome data for the Austin and Sacramento shelters, and see what we can learn.
References
-
Cleveland, William S. 1994. The Elements of Graphing Data. Hobart Press. (Amazon)
-
Wickham, Hadley. 2016. ggplot2: Elegant Graphics for Data Analysis. Springer. (Amazon)
-
Sarkar, Deepayan. 2008. Lattice: Multivariate Data Visualization with R. Springer. (Amazon)
-
Cleveland, William S. 1979. Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association 74, no. 368: 829-836.
-
Cowpertwait, Paul S.P., and Andrew Metcalfe. 2009. Introductory Time Series with R. Springer. (Amazon)
-
Cleveland, Robert B., William Cleveland, Jean McRae, and Irma Terpenning. 1990. STL: A Seasonal-Trend Decomposition Procedure Based on Loess. Journal of Official Statistics 6, no. 1: 3-73.
-
Jefferson, Ellen. 2012. Using Data to Make Austin a No-Kill City.
-
Auerbach, Kristen. 2015. How Austin Became America's Largest No-Kill City.
- Cleveland, William S. 1994. The Elements of Graphing Data. Hobart Press. (Amazon)
- Wickham, Hadley. 2016. ggplot2: Elegant Graphics for Data Analysis. Springer. (Amazon)
- Sarkar, Deepayan. 2008. Lattice: Multivariate Data Visualization with R. Springer. (Amazon)
- Cleveland, William S. 1979. Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association 74, no. 368: 829-836.
- Cowpertwait, Paul S.P., and Andrew Metcalfe. 2009. Introductory Time Series with R. Springer. (Amazon)
- Cleveland, Robert B., William Cleveland, Jean McRae, and Irma Terpenning. 1990. STL: A Seasonal-Trend Decomposition Procedure Based on Loess. Journal of Official Statistics 6, no. 1: 3-73.
- Jefferson, Ellen. 2012. Using Data to Make Austin a No-Kill City.
- Auerbach, Kristen. 2015. How Austin Became America's Largest No-Kill City.