Summarizing Performance Samples with Extrema rather than Averages: A Lesson from Practitioners
January 18th, 14:00
LINCS seminar room (23 avenue d’Italie, metro Place d'Italie) (access)
The notion of performance seems crucial to many fields of science and engineering. Some scientists are concerned with the performance of algorithms or computing hardware, others evaluate the performance of communication channels, of materials, of chemicals, psychologists investigate the performances of experimental participants, etc. Yet, rather surprisingly, the performance notion is generally used without any explicit definition. I propose this: a performance is a quantitative behavioral measure that an agent deliberately tries to either minimize or maximize. The scope of the performance concept seems impressively large: the agent whose behavioral performance is being scored may be a human, a coalition of humans (e.g., an enterprise, an academic institution), or even a human product (e.g., a chemical, an algorithm, a market share).
Performance measures are random variables of a very special sort: their distributions are strongly skewed as a direct consequence of the extremization (minimization or maximization) pressure that constitutes their defining characteristic. Mainstream statistics takes it for granted that any distribution needs to be summarized by means of some representative central-tendency indicator (e.g., an arithmetic mean, a median), and so the asymmetry of performance distributions has been traditionally considered an unfortunate complication. In fact I will argue that when it comes to statistics of performance, averages become essentially irrelevant. The point is easy to make with the example of spirometry testing: practitioners of spirometry never compute an average, they retain the best measure of respiratory performance (i.e., the sample max) and flatly discard all other measures. And they are quite right to do so, as a simple model will help explain. The important general lesson to be learned from spirometry is that the better a sample measure of performance, the more valid as an estimate of the capacity of performance---the theoretical upper limit whose estimation is in fact the goal of most performance testing in experimental science. One likely reason why experimenters of many fields have recourse to the measurement of performances is because non-extremized behavior tends to be random, whereas performance capacities can abide by quantitative laws. This is easy to illustrate with empirical data from human experimental psychology (Hick's law, George Miller's magic number, Fitts' law). An experimenter myself, I can only conclude with a question to statisticians: don’t we need a brand new sort of statistics to fully acknowledge and accommodate the rather special nature of all these performance measures that these days we encounter everywhere, not just in virtually every sector of society but also in many field of scientific research?