Think, Know, Prove is a regular Saturday feature, where a topic with both mystery and importance is posted for community discussion. The title is a shortened version of the Investigative Mantra: What do we think, what do we know, what can we prove? and everything from wild speculation to resource referencing fact is welcome here.
I know that last week’s post was a little scattered; I hope you don’t expect differently from this week’s given that I’m, again, writing it in a bit of a rush, and all jacked up on cold medicine. (Digression: Maybe this will be the year that I start the longitudinal data keeping about when I get sick–I swear that I’ve been sick in the fifth week for at least the last six semesters in a row, and I don’t know if it’s the increase in students interacting with students and then interacting with me (lots of them seem to be getting sick in weeks three and four (or so they say), or if it’s my kids being back in school and rubbing up against their little Petri dish friends, or if it’s the fact that I pretty much stop exercising, eating well, sleeping reasonably, and the rest. It’s annoying, though. I hate having a cold.)
So, a little more on KPIs (Key Performance Indicators) before we move on to PBF next week. In last week’s post I talked a little about what they are and about a little about what we might learn from the story of how putting some emphasis on them led to both resistance and opportunity in professional baseball. I did not mean, however, to overemphasize the connections between education and baseball, and I don’t want to let the conversation get derailed by the emphasis on sport, in general or baseball in particular or even on Moneyball for that matter–the book and story described therein has plenty of critics, too. We don’t have to come up with KPIs because of Moneyball, nor because of our corporate overlords.
We have to develop them because our new Mayor promised while running for election to develop institutional report cards for all of the city and sister agencies that would show how they were doing in key areas of responsibility (see #9, #34, and #36). He’s promised to post these report cards for all of the agencies on the city web site (a.k.a., the data portal), but allowed the institutions themselves, at least initially, to develop the measures. In other words, we, along with everyone else—CTA, Park District, Fire Department, etc.—have to come up with a way of telling the public (vis a vis the Mayor) what our Won-Loss record is (to bring it back to sports terms).
This has to be done with care, obviously for the same reason that it has to be done with care by hospitals: adding up how many have died and how many have lived likely provides a misleading picture of the hospital’s quality. The challenge for the hospital is that to the general public, that may well be the stat they want to see, owing to their own desires and poor or careless thinking about the topic. So, assuming they don’t have the resources or time to embark on a massive health literacy campaign, they would have to define what a Win looks like through the metrics they choose. In other words, their stats have to tell a story. If they don’t, then the public/Mayor won’t accept them and will likely demand the commonly accepted and utterly misleading measures.
When the public thinks about the fire department, they might want to know how many fires the department puts out and how many people are saved and guess that seeing those numbers will tell them how the fire department is doing. But those numbers are out of the fire department’s control, and if the department considers fire prevention a key part of their job, they might say that their goal is to keep those numbers low while the public is thinking that high numbers equal excellence. The department would say just counting the numbers of fires we put out doesn’t tell you what we do, and so might focus their self measurement on their responsiveness, or fire hydrant readiness, or building inspections or something.
So, as the Realist pointed out in comments under last week’s TKP post, defining what a “win” is (or to take out the sports analogy, an effective interaction of our institution with the members of the public who avail themselves of it), has to be the first step, and I am writing here to urge all of those involved to think big and creatively rather than merely follow the path of least resistance or be tempted to “juke the stats” as they say in The Wire in order to merely keep feeding at the trough.
I have a lot more to say about all of that, but not the time to develop it. A few key ideas though, all influenced by but not entirely explained by Moneyball are these:
- Not all stats are KPIs and the most dangerous ones are often the familiar stats, widely accepted as KPIs, but empirically unrelated to aim of the organization. Statistics had along been a part of baseball, and a long time ago (back in the late 19th century) a handful of statistics became the standard KPIs. But not every statistic or data point is a KPI, even ones that have long been thought to be. For example, the RBI (Runs Batted In) stat was a long valued and highly prized indicator of hitting prowess. Turns out though, that it is a very poor predictor of anything. Why? Because the statistic relies on so many variables out of the batters’ control–that other people were on base, in scoring position, etc., there turned out to be no correlation between a batter’s RBI stats and offensive production. For decades, it seems, baseball people had been operating under the assumption that runs were produced by hits, when it turns out that runs are produced by avoiding outs, and so a much better KPI for being a baseball hitter is the measure of how often the hitter avoids making an out. A team that makes no outs scores an infinite number of runs. Batting average is another important and slightly misleading stat. A batter who hits .333 and never draws a walk gets on base one time for every two outs the batter makes. Another batter who hits .250 and gets say two walks in every five at bats, ends up avoiding an out in 3 out of every nine at bats. In other words, though their stats are very different, their value to the offense is about the same. Pointing this out doesn’t mean that knocking in a run (earning an RBI) isn’t valuable–it’s still a good thing–it is! It just isn’t very helpful in distinguishing between highly valuable hitters and average hitters. The kicker is that most people think it IS exactly that–a means of evaluating the hitters’ excellence. In other words, we should be on the lookout for statistics and measures that have been traditionally accepted as obvious and important that aren’t (because they are based on false assumptions).
- Inputs are key. Another lesson of the story is that some things are very hard to measure. In baseball, that hard thing is defense. It’s hard to determine how a team’s defense contributes to a team’s victory (and so how much value to put on a player’s defensive skills as well as how to measure those skills). The traditional statistic for measuring defense was the “error,” which Bill James pointed out is completely misleading. Players get an error for not making a play that the “official scorer” thought they should. In other words, it’s a moral judgment as much as a performative one, and it’s likely that bad defenders could still have very few errors and good ones would end up with more. Take two second basemen–one who covers a lot of ground and one who covers very little. The former is likely to come near or even get to many balls that when the other one can’t get near. Some of those might get turned into outs, but some of them might require difficult throws or be more challenging glove work and so look like errors, whereas if the other fielder were playing they would have been simply scored as hits. And if that doesn’t make much sense, then don’t worry about it. The point here is not errors or second basemen. The point is that good stats require good, meaningful inputs, and non-meaningful inputs tell lies.
- Models matter. Applying data analysis in a meaningful way requires having some testable hypotheses about what matters and then testing those hypotheses. As those hypotheses are developed and tested, they can be combined in more and more complex ways in order to attempt to build a model or a formulaic understanding of what is happening and why. No model is perfect, nor is it supposed to be. One way to do that is to come up with statistical models (formulae) and then apply those retroactively to see if they accurately predict the outcomes that occurred in connection with years worth of historical data. If they don’t, then one of three things must be the case–either we haven’t included all of the meaningful variables, or we’ve included a variable that is not meaningful (and so skews the prediction) or we have the values wrong. I’m dreaming of a formula built from national statistics that could accurately predict some outcome or other (course completion, degree completion, successful transfer, whatever, more on that in a minute) that provides a prediction for what percentage of our students (taking into account various meaningful factors (preparation, 1st generation student-hood, percentage part time, incoming skills, and whatever others are suggested as meaningful in the literature) that matched (roughly) the actual numbers of students who achieved those outcomes. If we had a model like that, we could then build a formula, punch in our data and see if we had more, less, or the same as the prediction. If we have more students achieve, consistently, maybe we’re outperforming the average schools. If we have fewer, maybe we’re underperforming. If we’re the same, then we’re at least as good as everyone else. Or even just a model that makes some sort of baseline. A model like that could be transformative to community college research and funding, and it would take a lot more work than looking up how many students were awarded a certificate last semester while telling us a whole lot more. And while there is risk in such a scheme, if we really are about students, and excellence, and social justice, then we have a professional, and I would say moral obligation to avoid the temptation to cherry pick a few stats that make us look good and find some meaningful data to report.
Now with all of that said, I am the first to admit that I am in way over my head when it comes to stats and quantitative research. I can just barely calculate a batting average, which I admit to as a point of shame, not pride, and perhaps what I’m dreaming is simply not possible. Fine. Still, I say, as I said last week, this is a spectacular opportunity to have some crucial conversations and explore and explain our professional expertise to the general public, their legislators, and the world. Aim big is all I’m saying; swing for the fences as it were.
For lots more on Moneyball, you can go HERE.