On Sunday, BuzzFeed's Heidi Blake and John Templon published a joint investigation with the BBC into alleged match-fixing in professional tennis. The report is really two things rolled into one: a discussion of leaked documents and a discussion of statistical analysis of betting data. The leaked documents show that 16 players repeatedly suspected of fixing matches were not properly investigated or sanctioned by tennis authorities. The statistical analysis, conducted by Templon, showed strange patterns in betting odds movements regarding the matches of 15 players, including a former Grand Slam winner. The authors did not identify the 15 players, but they did release their data and methodology.
All week, players at the Australian Open have been asked questions about the report, and about match fixing in general. Their answers have contained more than a hint of exasperation. "I answered that question the other day," Roger Federer told reporters Wednesday. "Just check the transcript."
While Federer was growing annoyed in Melbourne, a Dutch programmer and sports gambling enthusiast named Chris Bol, who spent 15 years working for the sports information agency Infostrada Sports, and his twin brother, Adriaan, downloaded BuzzFeed's data. They realized quickly that the information—which included the season in which the matches took place—could lead them to the identities of the 15 players.
"I don't think there was a reason for him to do it, but it was in there," Chris told me by phone. "Maybe he wanted people to find out [who the players were]?"
With this information, the Bol brothers were able to establish win-loss records for all the coded players in the database. They then matched the win-loss records to the publicly available data on wins and losses among professional players and published the results on their blog, Show Legend. (The brothers weren't the only people to reverse engineer the BuzzFeed data. At least two other people around the world succeeded at roughly the same time. They all got the same results.)
The most prominent name among the 15 was Lleyton Hewitt, a former Wimbledon and U.S. Open champion, which jived with BuzzFeed's claim that their analysis included a former Grand Slam winner.
With this new information, the Bols and other at-home analysts more closely looked at the data and found some issues. They concluded that a number of the matches flagged by BuzzFeed's algorithm might not have been fixed at all.
One of the other people who reverse engineered the BuzzFeed data was the blogger Ian Dorward, who has been tracking match-fixing in tennis for a while. On his blog, he occasionally highlights deeply suspicious matches and breaks down how the betting data insinuates a possible fix. These analyses are fascinating reads. They're not proof of match-fixing, but they're a window into how match-fixers might manipulate live betting odds to maximize profit.
BuzzFeed's analysis did something a little different. "I looked at the odds that seven major sports books initially offered for each match," Templon wrote in a subsequent post about his data analysis, "and then compared them with the final odds, to see how far they had shifted."
There are, however, plenty of legitimate reasons odds might shift in the pre-match market, which is what Templon examined, that have nothing to do with match-fixing: player injury, player sickness, personal issues in a player's life—even rumors about any of these will do the trick. Maybe the player was seen at a bar late the night before. Maybe her dad just died. Bookmakers use all this information, which is not always publicly available, to set and adjust their odds, and the gambling public makes wagers based on the same information, whether real or rumor. Templon is aware of this. The implication is that these kinds of odds movements happened so frequently with these 15 players that fixing is a reasonable explanation.
The other problem with Templon's approach is more about how match-fixing works in today's world. In short, the pre-match betting market isn't where most match-fixers operate. They prefer the live betting market. Because there are so many variables that could account for shifting odds (see above), bookies tend to take smaller pre-match bets than they do during live betting, when there are fewer variables.
Not only is the live betting market more lucrative for gamblers—and fixers—but in this scenario it's easy to fix a tennis match in a way where the favorite still wins. For example, a top player facing a lowly opponent in the first round of a tournament can drop a set and still win the match. This would show suspicious odds movements during live betting—sudden shifting odds toward the stronger player losing, say, the second set—but it would not show up in an analysis of pre-match betting odds.
"In a way, this inquiry by BuzzFeed, maybe it was interesting eight years ago," Chris Bol said, "but if you want to do a good analysis now, you definitely have to analyze the live data."
Ten years ago, the pre-match betting market was all we had. Today, most of the betting—and the most lucrative betting—occurs during a match.
None of that live betting data was available to Templon. Like a good scientist, he admits this is a drawback to his study. "The analysis was undertaken with only the betting information that is publicly available," he wrote. "Tennis authorities and betting houses have access to much finer-grained data, such as the accounts placing bets, as well as forensic evidence such as phone data and bank records. Without access to such information, it is impossible to know with a sufficient degree of certainty whether these suspicious patterns are indeed the result of match fixing. For this reason, BuzzFeed News has decided not to name the players."
In other words, there is a lot of context missing from Templon's data analysis—which, again, is fine so long as you're open about it, which he was.
There are organizations that regularly use sports betting data—including live betting data—to look for fixes, like Sportradar (which I profiled recently), ESSA, Sport Integrity Monitor, and even the Tennis Integrity Unit discussed in BuzzFeed's story. Sportradar doesn't focus on tennis, but it employs entire teams of people tasked with trying to find legitimate reasons odds might move before flagging a game as possibly fixed.
With the players now named, along with the data from matches BuzzFeed flagged as being suspicious, we have the opportunity to do what these bet monitoring organizations do: look at the context of each match. The blogosphere has started doing this, and already there is reason to believe a number of the matches flagged by BuzzFeed were not fixed.
In a blogpost published yesterday, Ian Dorward examined eight of Lleyton Hewitt's supposedly suspicious matches and cast doubt that any of them were fixed. Hewitt's handlers didn't respond to questions about the match-fixing allegations before his second-round match on Thursday in Melbourne. He then took the court and lost to eighth-seeded David Ferrer. Prior to the tournament, the 34-year-old Hewitt had said that the Australian Open would be his final professional tournament. So his career ended with a loss amid allegations of match fixing.
"I think it's a joke to deal with it," Hewitt said about the allegations following his defeat. "You know, obviously, yeah, there's no possible way. I know my name's now been thrown into it. I don't think anyone here would think that I've done anything corruption or match-fixing. It's just absurd.
"For anyone that tries to go any further with it, then good luck. Take me on with it. Yeah, it's disappointing. I think throwing my name out there with it makes the whole thing an absolute farce."
What a way to go out.
The issue here isn't that BuzzFeed is wrong, or that their statistical analysis was bad. Rather, any good statistical analysis needs heavy context.
On Thursday, the Bol brothers compiled a list of all the matches BuzzFeed's algorithm flagged as possible fixes, which they shared with VICE Sports and which we have uploaded to a Google spreadsheet here.
Anybody with information regarding these matches should get in touch with us. You can reach me at Brian.Blickenstaff@Vice.com and the Bol brothers on Twitter at @show_legend, or tweet using the hashtag #Buzzfeed15.
The suspected players and number of flagged matches are as follows:
Additional reporting from Australia by Danielle Elliot
Update 1:20 pm: We reached out to John Templon and asked if BuzzFeed looked at their results case-by-case to see if there were legitimate explanations for any of the odds movements. Mark Schoofs, BuzzFeed's Investigations & Projects Editor, responded:
"As our stories state, the analysis of betting data does not prove match-fixing. That's why we did not name the players and why our investigation went much wider than the algorithm and was based on a cache of leaked documents, interviews across three continents, and much more."