
What would soccer look like if we saw it less like video and more like data?
The other night a friend DMed a group of soccer writers about ranking historical players against current ones in a systematic way, which sounds fun if you’ve got hundreds of hours to sit around scrutinizing old film that looks like somebody spun painted grains of rice in a zoetrope. Even then, you’d need to know what to watch for. What does value—not just visible talent in the sense of WELCOME TO REAL MADRID ● MAGIC SKILLS ● RICE DEFINITION 1960 but actually doing stuff that makes your team better off in some objective way—look like on the pitch?
If my buddy’s ranking project was strictly modern, he could do the modern thing and start with data. Expected possession value models try to answer questions about what really matters by measuring players’ actions in terms of how much they’re likely to change the scoreline. (If you need more background on what a possession value model does, here’s an introduction to American Soccer Analysis’s goals added, aka g+). Some of these models are publicly available—all you need is coding skill and data. But let’s be real, most of us don’t have that, and even if we do it doesn’t help when we’re actually watching soccer unless we’ve got some kind of augmented reality tech decorating the field with decimals, Iron Man-style. What we really need—especially if we’re watching jittery old black-and-white film—is a way to just, like, eyeball it.
Ryan O’Hanlon started down this road in a recent edition of his excellent No Grass in the Clouds newsletter:
[I]f you wanted to create your own your-brain-based EPV model, you could do a lot worse than just keeping a running tally of how often a team moves the ball into the penalty area as you watch a given match. After creating a chance or getting in position to take the chance, the most valuable thing an individual player can do, on aggregate, is to move the ball into the penalty area.
I love that. Ever since I helped introduce g+ last spring (and by "helped" I mean wrote some articles and annoyed resident genius Matthias Kullowatz with questions in Slack while he did all the work) I’ve been trying to train my own brain-based EPV model. That means learning to see the game not just the way I’m used to, as intricate off-ball geometries and close-ups of Pep Guardiola drinking water, but also the way it looks in the event data, as a ball-sized point bouncing around a 100x100 grid, streaming a comet tail of probabilities behind it. Here, watch a whole game real quick:
Just for fun: Here's our Goals Added (g+) metric playing out over an entire match between the Seattle Sounders and the New England Revolution. The match ended 3-3. pic.twitter.com/uLMR6UvaI4
— American Soccer Analysis (@AnalysisEvolved) May 12, 2020
So okay, sure, let’s give it a whirl. First we’ll hop over to Footballia to grab a clip of Alfredo Di Stéfano scoring a goal for Real Madrid to tie Eintracht Frankfurt at 1-1 in the 1960 European Cup Final (Madrid would go on to win 7-3—a Zidane-Simeone cage match this was not). Then we’ll try to figure out how a normal soccer-watching brain and one equipped with brain-based EPV might see the same sequence differently.
Two-Possession Horizon
A good PV model looks at both sides of the ball. It doesn’t just try to estimate a team’s chances of scoring, the way xG does, but also subtracts the chance of conceding over the next however many seconds or, in the g+ framework that I prefer, on the next possession. Coaches and players think this way all the time: it’s why there aren’t five runners in the box when Madrid sends in its first cross, because some attackers are hanging back to win the ball in transition and try again.
For most of us, though, it’s natural to watch soccer one possession at a time. Sequences that start from the keeper are boring boring boring hmm okay interesting exciting EXCITING, then there’s a turnover and the game has to start all over trying to win your attention in the other direction. Picture this as a single value curve that gets very steep all of a sudden at the attacking end. A two-possession framework overlays a second, similar curve in the opposite direction: getting the ball away from your own box is also valuable because it lowers the risk of coughing up a goal off a turnover. Now our mental model looks less like a ski slope and more like a skate park halfpipe.

Madrid’s first two passes after the keeper tips away the corner kick are worthless if we’re thinking along a one-possession curve. Moving the ball from beside your box to the edge of your defensive third barely budges your team’s scoring prospects. But our two-possession brain-EPV should light up when we watch the defenders maneuver Madrid out of a tight spot near their own corner flag and mentally reward them for it.
Location
So far we’ve been talking about possession value in terms of location, because soccer’s more of a field position game than we sometimes acknowledge. Simply moving the ball from one end of the field to the other changes who’s likely to score next regardless of which team has the ball. You can do a decent job approximating possession value using nothing but location, and a lot of the so-called “PV” vizzes you’ll see on Twitter rely on an average value grid built by the analyst Laurie Shaw to do just that.
As far as location goes, the key thing for our brain-EPV to remember is that image of a halfpipe: stuff at either end of the pitch is valuable, but midfield is pretty low stakes because no matter what happens there the ball’s still several low-probability actions away from either goal. This U-shaped value curve led Statsbomb’s Thom Lawrence to dub midfield “the Valley of Meh.”

All those cute little stepovers as Di Stéfano carries the ball through the midfield? Our YouTube lobe loves them but the EPV cortex knows they’re empty calories. Even if the defender had committed and Di Stéfano had dribbled past him, it wouldn’t be worth that much in the center circle. Possession value’s like real estate: the three most important variables are location, location, location.
Velocity
Yet only considering location is insufficient. I know this because the team of very smart researchers who created the PV model known as VAEP recently published a paper with a section subheaded, in bold, “Only considering location is insufficient.” A good possession value model should also include information about, uh, the possession.
What kind of context matters? The VAEP researchers’ paper set out to answer that question. Both VAEP and g+ are built using the XGBoost machine learning algorithm, called a “black box” model because it’s hard to know what’s really going on inside its calculations. But the researchers found they could get results almost as good as VAEP’s by replacing XGBoost with a simpler, easier to interpret model that only uses VAEP’s 10 most important features instead of the original model’s full set of 151 variables. Which, you know, understanding what your model is doing is good. The guy who first emailed me this paper is a PhD who thinks soccer analysis will move toward more interpretable PV models in the next couple years, and it’s easy to see why you might want to give up marginal gains in accuracy to produce something as easy to grasp as this figure.

Of VAEP’s top 10 features, the three most important all have to do with distance and angle to goal: basically, the closer the ball is to a good shooting position, the more valuable the possession. But three others all have to do with the possession’s velocity: how fast is the ball moving toward goal? Our brain-EPV instinctively gets why this matters. Fast breaks disorganize opponents and open holes in the defense; holding up the play allows the defense to reconsolidate and close them.

Just think about how much more likely Real Madrid feels to score at the instant Di Stéfano charges halfway up the field and lays it off than at the instant a few seconds later when the winger, unable to find room for a cross, turns around and recycles possession. As soon as the play slows its vertical velocity, the defense regains its shape and some of the danger drains out of the play.
I’m not sure whether the 151-feature director’s cut of VAEP has something similar, but along with x velocity g+ also looks at y velocity to measure how fast a possession is moving across the field. I like this feature because it helps the model to capture the value of switches like the one that finds a Madrid player in space on the right side of the box just before the assist to Di Stéfano. But Matthias, the guy who actually built the model, told me he’s less sure about y velocity because its relationship to PV is “bow shaped (parabolic).” In other words, g+ thinks fast circulation or no circulation is good, but slow circulation makes a possession less valuable. I don’t know about you but my brain-EPV finds that totally plausible. If Madrid had swung the ball around gradually instead of hitting the halfspace-to-halfspace switch, they never would have found room to get to the touchline and lay in the goalmouth cross.
Cool, So What Did We Learn?
As complicated as possession models can be, the question they’re trying to answer is as basic as it gets: How much did this play disorganize the opposition while moving the possession away from our goal and towards theirs?
When you think about it that way, the mystery around our brain-based EPV melts away. Moving the ball from your half to the opponent’s is good. Entering the box is even better. Doing those things at speed—both up the field and across it—is best of all. Roger that.
If thinking like a PV model has taught me anything, it’s that I already knew everything I needed to know to evaluate soccer plays better. The hard part is looking past all the little aesthetic details that make the game fun to watch and focusing on outcomes. Because I guarantee you if I ran this play through the goals added algorithm, the model would tell me Di Stéfano’s most valuable contribution in this sequence isn’t the heroic, feinting run up the middle, bullying the terrified defense back each time he swings his body side to side behind the ball like he’s pointing a loaded gun. That’s our YouTube brain shouting at us. No, the most valuable thing he does—the thing with the highest likelihood of improving the scoreline—is the short little back post run to receive a goalmouth cross for a tap in. Sometimes it’s hard to wrap our brain-EPVs around just how easy this game looks when it’s done right. ❧
Further reading:
- Tom DeCroos and Jesse Davis, Interpretable Prediction of Goals in Soccer (Association for the Advancement of Artificial Intelligence 2020)
- Matthias Kullowatz, Goals Added: Deep Dive Methodology (American Soccer Analysis)
- Pieter Robberechts, Exploring How VAEP Values Actions (KU Leuven)
- Ryan O’Hanlon, What the Heck is Mikel Arteta Talking About? (No Grass in the Clouds)
Image: Juan Gris, Damier et cartes à jouer
Sign up for space space space
The full archive is now free for all members.