At the end of the March 23 blog entry, I was promising that there was a better way of measuring batting performances than the traditional batting average, and a better alternative than Malcolm Knox’s. A month of forming a close relationship with Excel has proven that knowing something to be so, and demonstrating it, don’t live in the same suburb. The resultant suggestion is no more complicated than, say, three-dimensional calculus, and it’s fair to say that it’s by no means perfect or uncontroversial… perhaps all it achieves is to show how complex it is to design meaningful metrics. But I did bring pictures.

The origin of the proposal is to mimic golf’s concept of the adjusted differential. This metric takes the input of a golfer’s adjusted score (performance) and the golf course’s slope rating (how hard the course ordinarily is). A golfer who has a pile of adjusted differentials can calculate their golfing handicap (a measure of how badly they suck at golf; New Statsman’s handicap = 16.5, somewhat sucky).

For added nuance, if a large number of handicapped golfers are playing the same course on one day, even the slope rating can be adjusted to reflect the prevailing conditions. The grass, hills and trees on a course might not change much, but strong wind or rain, or perhaps tees pushed back to lengthen holes, or artfully placed pins can radically alter a course’s difficulty. If you were leafing through a sufficiently large pile of scorecards and rarely finding anyone playing close to their abilities, you’d think something was going on – common sense says that not everyone can be having a bad day at once, right?

I’ll answer my own question: yes, that’s right – most of the time. Data like the above tend to form a normal distribution (perhaps even… sigh… conforming to “the law of averages”), containing some people who played well, some whose day is best forgotten, but most whose play was… ok. At the extremes, there will also be some outliers who did extremely well – or badly – by their own standards.

When conditions are difficult (or easy), the expectations for adequate performance need to be adjusted. But how?

If all of this sounds nothing like cricket, please go and sit in the corner. Because cricket batsmen, too, face a range of conditions, from easy to difficult – and they even have the equivalent of a golfer’s handicap: their average (you might have thought that I have a deep hatred for batting averages… but I’m a tolerant person – an average isn’t perfect, but it’s a decent proxy for batting skill). The question is, how do you quantify that variability? Let’s start with the simplest example of all – the Steady XI’s recent match at their home ground in Averagetown. In their painfully consistent fashion, the team compiled a score equal to the sum of their averages. Here’s how their scores look when plotted, with averages on the horizontal x-axis and their scores for this match on the y-axis:

Through this data is a *regression line*. This fiendishly clever line finds a path between the plotted points with an absolute minimum aggregate vertical distance between it and the points. (That sounds very complicated and – if you have to work it out from scratch – I can guarantee you that it’s incredibly tedious. A full explanation is here (if you can stand it.)

But when the Steady XI travelled to play against a team with a carpet pitch and less-talented bowlers, all of a sudden the bounce and speed were truer – and the team score swelled to 30% above their combined average (here shown improbably clustered on the regression line):

Not as much fun for the Steady XI the following week, when the rain closed in on their uncovered home pitch, and the total was 25% below their combined average:

See the difference in the regression lines? The steeper the line, the better the batting conditions.

What about if the Steady XI found conditions conducive to their players scoring an average of 10 runs above their average, or 10 runs below?

If the line heads above *the origin *(where the axes meet) batting conditions are good; below, not so good.

So far, all of this sounds like great fun for maths nerds and not much fun for everyone else. However, at last we’ve reached the point of the exercise – the slope and location of the regression line can serve as a quantification of how easy (or difficult) it was to bat that day. To bring the scores into line with “neutral” conditions, the line just needs to slope upwards at 45 degrees and pass through the origin.

All that’s needed is some high school algebra: if this brings back some bad memories, maybe just accept that it’s easy, and skip down a paragraph or so.

For the rest of us: if the equation of the regression line is, for example, y = 2x + 4 (“today’s scores = average scores, multiplied by 2, plus 4”), adjust the scores by subtracting 4 and dividing by 2; a fresh plot will have a regression of y = x, or “today’s scores = average scores”.

The Steady XI is a convenient group to demonstrate all this. They become a less convenient group if they have a visiting player, like, say, Bradman. Whilst the rest of the team quietly compiles a set of comfortably average scores, he has to mess is all up by making 375. In the world of adjusted scores, he’s just ruined the perception of everyone’s performance – now the regression line is ridiculously steep, which will drag down everyone’s adjusted scores.

In this example (as in real life), Bradman’s score is an outlier – a piece of data that’s not terribly useful for broad comparisons. Any player’s innings can be no lower than zero, but in theory there is no upper limit, making “average” an occasionally misleading observation. In the same innings, the team average exceeds 62, which gives very little indication of the batting talents which were displayed, just as the regression line is no great indicator of the difficulty of the batting conditions.

So, let’s borrow from the scoring systems for diving, and exclude some data. In the examples that follow, I’ve chosen to exclude the 2 highest and lowest differentials – ie, the best and worst performances of the day. (You could just as validly pick only the one best and worst, or just the best, two best etc – I suspect lots of experimentation would be needed to uncover what works.) Effectively what the approach is saying is that if Bradman scores 375, it doesn’t make your 50 (against your average of 40) any worse. But if 3, or 4, or more players are scoring way more runs than your average, maybe you should have gotten just a few more.

Now, brighter readers may have worked out that the Steady XI doesn’t really exist. So how does this work with a real cricket team, like Australia? (Insert your own joke here… but don’t make it too topical. Readers of this in a few years’ time – or maybe just after the Ashes – may have forgotten all about these recent struggles)

Here’s an example from their recent match against India in Deccan. The full scorecard is here, but we’re going to focus just on the Australians’ batting.

The chart below shows the batsmen’s first innings scores (excluding the two best differentials – Clarke and Wade, and the two worst – Cowan and Warner). Even Siddle’s duck (just over 14 runs below average) was better than the two openers, whose single figure scores were more than 30 runs below par.

Player. Leave the room, please, if this column baffles you.

Average. The plain-old traditional test cricket average, or (if the player has been dismissed less than 20 times in test cricket – an inadequate sample – their first-class average).

Adjusted average. First-class averages are revised down 20%… test cricket is harder. (But I don’t know if it’s 20% harder, I just chose 20%. A topic for a whole other blog post.)

Raw Innings1 Adjusted. What to do with players who are not out? Make an estimate of what they’re likely to score. The number of runs they usually score before going out is – helpfully – also called their “average”. So, just add the average to the not out score.

Adjusted score A. The player’s score is adjusted to create a y = x regression line. In this instance, that means adding 4.85 and dividing by 0.749.

Adjusted scores B and C. Alternatives to Adjusted Score A – see discussion below the second innings chart and table.

Keep in mind that the adjusted score isn’t a way of saying “see, in the first innings Clarke actually made a century”. An adjusted score isn’t any more a cricket score than the A+ you got for grade 1 fingerpainting was an actual drawing; it’s just an invented metric to allow comparisons between innings and players – and it certainly has its limitations.

The second innings in particular features some heavily adjusted scores – maybe unpalatably so. David Warner’s second innings, for example, goes from a raw score of 44 to an adjusted 88.27 (although maybe this is the point – he made 44 when making runs was clearly challenging); meanwhile the ducks of Hughes, Henriques and Pattinson have become 20.15. Hmm.

Some alternatives are calculated as Adjusted scores B and C. In these, additional weight is placed on the raw score, and less on accepting that performance is affected by circumstances. There are infinite options for this weighting; these are just a couple.

(B adjusts the raw score by adding 1/3 of the y-intercept value and dividing by an average of gradient and 1; and C adjust raw score by adding 1/3 of the y-intercept value and dividing by a number 2/3 of the way to 1 from the true gradient.)

There are many ways to calculate a new metric, and it’s worth noting that, whilst they would each produce differing average or median adjusted scores for a player’s career, ultimately these figures would only be for comparison with other players’ adjusted scores calculated in identical fashion – apples with apples – just as traditional batting averages are only useful for comparison with each other. So, there would be nothing wrong with producing 3 or 4 versions of this, along with the traditional average (which is, of course, the stepping off point for all of these calculations).

In any case, none of this has been tested with anything like enough data to see if it throws up any anomalies; for example, are eleven scores really enough to make a determination on the trickiness or otherwise of the batting conditions? Is it valid to base this system on linear regression, or would a (tiresomely more complex) logarithmic or exponential regression be a sounder basis? Should a 20- or 50-innings moving average be used in place of the traditional average? How should the Indians’ innings – where few batsman actually *had *an innings – be handled? Is it a problem that, in a generally strong innings, a low raw score would probably produce a negative adjusted score?

Also, more fundamentally: run-scoring differs between eras, and this metric adjusts scores with respect to their difficulty compared to other innings played *in the same era*. So a typical low-scoring innings in the nineteenth century, compared to a whole bunch of other low scores, will result in low averages and barely-adjusted, low adjusted scores – none of which makes the players worse batsmen, just batsmen in a low-scoring era. So, at least one additional change would be a premium to seasonally adjust the metric to account for it – perhaps calculating a global adjusted-score metric using a moving average or moving median. (And – maybe a topic for another blog – but it wouldn’t hurt to import the idea of an official scorer to remove the impact of simple missed catches, misfields, overthrows and run-outs, all of which affect batting scores, none of which are indicators of batting skill.)

Regardless, I hope I’m not alone in thinking that, one future summer, we’ll be able to flick on the TV, watch two Australian batsmen striding to the middle of the MCG on a beautiful sunny day, and hear someone (not Richie Benaud, I fear) say, “Smith and Jones, what an opening pair they’ve become. Smith has the highest median A-type adjusted score of any opener since Mark Taylor, Jones’ C-adjusted average is just tremendous, he just isn’t fased by any conditions… ”

Follow @newstatsman