
How far can public stats get you toward a good scouting list?
You’re reading Space Recruits, a special series on recruiting made possible by space space space’s paid members. Please consider becoming a subscriber to read the full archive and get more letters.
- Part 1: The State of Analytics
- Part 2: Data Scouting on a Budget (you are here)
- More coming soon!
The problem with soccer data is that it’s expensive. Not expensive compared to dropping millions on an agent recommendation who turns out to be a bust. Probably not expensive compared to running an old-fashioned scouting network. But in plain old dollars or euros or Brazilian reais, buying data subscriptions and hiring people with the expertise to turn them into recruiting models isn’t cheap. Clubs — especially smaller clubs — want cheap.
But what if they didn’t have to buy raw data to do data scouting? As Jan Van Haaren put it in Part 1:
Often in recruitment, the metrics are only used to find players. You want players who performed well in a certain area to pop up at the top of your list, and the actual number doesn’t matter that much. You obviously want your metrics to be as accurate, reliable, and robust as possible, but it really depends on the task at hand how reliable and robust they need to be.
If the goal of your analytics operation is to produce a list of names for your scouts to go watch, it’s the names that matter, not how you got there.
Estimating On-Ball Value
Last fall, a couple of American Soccer Analysis contributors named Mike Imburgio and Sam Goldberg decided to see if they could get there in a way that would help clubs on a budget. After months of trial and error and hundreds of emails back and forth, they introduced a framework called DAVIES (one of those reverse-engineered sports acronyms where even the people who came up with it don’t really care what it stands for). It’s a model of a model. The idea is to approximate goals added, American Soccer Analysis’s action value model that assigns a goal value to thousands of individual events each game. But unlike g+, DAVIES is built on aggregate season stats from FBref’s StatsBomb data. That means it’s totally free not just for MLS but also for Europe’s top five leagues.
“It's a player evaluation metric that accounts for a player's age and their style of play,” Goldberg told Ryan O’Hanlon’s No Grass in the Clouds newsletter. “So it predicts a metric called goals added, which is an overall value of how many goals a player adds to their team over the course of a season. And then adjusts it based on similar players by their playstyle and their age.”
The metric is named for Alphonso Davies, who put up eye-watering g+ as a 17-year-old in MLS before Bayern Munich signed him. And yeah, DAVIES would have had him at the top of a scouting list too. “I didn’t think we were going to get as close as we did to a metric that I think is as good as goals added,” Imburgio told me. “Goals added can see things we can’t see with DAVIES, but the potential to do something like that without event-level data, which I think can be pretty expensive, makes it so much more applicable.”
Now, at this point in the letter I could just pull up some guys who score high in DAVIES and let you decide for yourself if it’s finding prospects worth watching. We’ll get to that in a second. If you’re in a hurry, the data is all freely available in an online app that you can sort and filter and even download. But the fun part, in my admittedly weird opinion, is understanding where the numbers come from.
Adjusting for Playstyle
The first step in turning a raw goals added estimate into a DAVIES value is comparing players in similar roles. Before you can tell if a player is good at his job, you need to know what his job is. First all outfield players are sorted into a few main position groups according to generic indicators like touches by third, then they’re clustered again using more specific stats to split each position group into playstyles. Instead of lumping all attackers together, DAVIES calls some “Dribblers,” others “Playmakers,” and a third group “Finishers” depending on what they do with the ball. (There are nine playstyles in all.)
One challenge in building the model was doing playstyle clustering in a way that didn’t confuse style with quality. “You could very easily do a clustering that just gives you all the best players. For the purpose of DAVIES, that’s terrible,” Imburgio said. “We don’t want to just compare the best players to the best players. So I started normalizing by touch.”

Playstyles are DAVIES’s way of comparing apples to apples when calculating player values, but they also double as a useful filter for scouting purposes. Say you’re Ed Woodward and you’re getting a little nervous about Paul Pogba running out his contract. You could start the search for an understudy by looking for players at the same position, but positions are a mess. FBref’s midfield-slash-forward heap includes not only Pogba but also guys like Marco Reus (a Dribbler, according to DAVIES) and Neymar (a Playmaker). If you start with the Attacking Central Progressor style instead, you see names like Frenkie de Jong and Sergej Milinković-Savić. Golberg and Imburgio describe this group as “players who play box-to-box, often carry the ball forward, play progressive passes and sometimes shoot or play balls into the box themselves.” Now we’re at least in the right ballpark.

Adjusting for Age
The second comparison baked into a DAVIES value is by age group, which goes back to how the project got its start. “It was originally not meant to be a player value model. We set out to try to build a player forecasting model, to predict future goals added,” Imburgio said. “We found that we were pretty good at getting player value-type numbers that made sense, and we were really bad at forecasting.” Instead of trying to predict the future, Goldberg and Imburgio settled for adjusting the present by comparing players’ contribution to guys in a similar role in one of five age bands: youth, rising to prime, prime, falling from prime, or veteran.
If we narrow the Pogba search by age and DAVIES value, we’ll see numbers that have already been adjusted. So while Pogba generates more expected goals added according to the base model than Bordeaux’s Yacine Adli or Barcelona’s Pedri, they come out roughly even in DAVIES against their respective age groups. It’s not fancy as far as age curve modeling goes, but nothing about DAVIES is meant to be precise — it only has to work. One of the first checks Goldberg and Imburgio ran when they were building the model was to go back to the first season in their data and see how the top prospects’ careers had developed since then. “When we looked at young players from a few years ago and it ended up being a very good list,” Imburgio said, “that’s when I started believing these values were useful.”

The list of Attacking Central Progressors age 24 and under with a comparable DAVIES to Pogba looks promising. The highest-scoring players, Frenkie de Jong, Lucas Paquetá, and Nicoló Barella, had high-profile summers with three of the world’s best national teams. The youngest, Pedri and Eduardo Camavinga, are two of the game’s most coveted prospects. These guys don’t all have exactly the same profile, but that’s not necessarily a bad thing. DAVIES isn’t trying to do player similarity rankings. Its broad playstyles make sense given the blurriness of season-level stats for players like Pogba, who plays multiple positions, and the uncertainty of recruiting. You want a list of guys who might fit — the rest is up to judgment.
But good luck buying de Jong or Barella right now, let alone convincing them to be anyone’s understudy. The reason I picked a hypo about scouting a backup is that it’s more fun to filter the list one more time to guys with a Transfermarkt value under $25 million and look for players who aren’t famous yet but might be one day. “The gold standard for a scouting model is to be able to tell the future,” Imburgio told me. “Everybody wants a diamond in the rough.”

Does any of the seven players left on the list have the potential to step in for Pogba? Maybe not, in this price band, but that’s for scouts to figure out. We’ll get to some video work later on in Space Recruits. What I can tell you is that, while I generally don’t recommend watching Bordeaux games, Adli usually makes it worthwhile.
Speaking of diamonds in the rough, DAVIES helped get its creators noticed. Goldberg got hired as a data scientist for the New York Red Bulls. Imburgio built a new model similar to DAVIES that covers more leagues, and clubs in Austria and Germany have been trying it out. “The ratings have gotten pretty good feedback from scouts out there,” he said. “That helped validate for me that doing this is useful for clubs that don’t have event-level data for the leagues they want to scout.” ❧
Further reading:
- Mike Imburgio and Sam Goldberg, Introducing DAVIES: A Framework for Identifying Talent Across the Globe (American Soccer Analysis)
- Ryan O’Hanlon, How a Former Minor League Baseball Player and a Neuroscience Student Are Redefining What Happens on the Soccer Field (No Grass in the Clouds)
- Mike Imburgio, Defining Roles: How Every Player Contributes to Goals (American Soccer Analysis)
Image: George Méliès, A Trip to the Moon
Sign up for space space space
The full archive is now free for all members.