The ‘Giant Data Ingestion Machine’ That Can Help Cities Meet Climate Goals

Cities around the world have set ambitious targets to reduce emissions. But the first step for most of them is to actually figure out how much they’re emitting in the first place.
Data vizzy
Screenshot: Youtube

More than 1,000 cities around the world have pledged to cut emissions in half by 2030 and achieve carbon neutrality by 2050. For these cities and the people who run them, it is extremely important to not only know where those fossil fuels are burned but also how much is coming from each source. And they have to know it in increasingly fine-grained detail in order to answer basic questions such as: How would anyone actually know if cities cut their emissions by 50 percent in 2030?


About 15 years ago, Kevin Gurney, a professor at Northern Arizona University, realized this would be a problem, too. Specifically, the problem was that experts had a good idea of how much carbon dioxide was being emitted globally based on atmospheric readings, but had difficulty tracing those emissions back to individual regions or sources. In other words, experts could estimate how much a continent was emitting, and how much a specific factory was emitting based on ground-level observations, but everything in between was little more than a guess.

“It was clear that this was going to be important for decision-makers, having more accuracy, more granularity, more functional information about emissions,” Gurney told Motherboard recently over Zoom. The problem, as he saw it, is that cities do everything through the planning process using systems like GIS, or geographic information systems mapping. Without information planners can plug into GIS, they would struggle to actually account for it when they make decisions. 

“Granularity gives you the type of information you're going to need to do practical things, not pledges, but the actual work of reducing emissions,” Gurney said.

At the time Gurney was looking into this in the mid-2000s, the best data available were rough estimations based on population totals. But population is a poor proxy for emissions. A few of the many reasons why: Power plants and large factories that pollute the most are often located away from population centers. Sprawling cities with lots of single family homes and a reliance on private cars can have much higher emissions than a city with a larger but denser population. 


With grant money initially from NASA, Gurney got to work on the Vulcan Project, an ambitious effort to aggregate dozens of existing data sources from federal databases on virtually everything having to do with emissions into a single repository that then calculates where the emissions are coming from. Those calculations are then tested against atmospheric and Environmental Protection Agency data for accuracy. Gurney likens Vulcan to a “giant data ingestion machine” with data on power plants, traffic statistics, local air pollution, tax assessments, vehicle registrations, and so on. The end result, Gurney says, is more than 11 terabytes of data and an “almost perfect” estimation of where emissions are coming from on a local level.

“Vulcan is a complement to the atmosphere [data],” Gurney said. “Atmosphere tends to be really accurate, but doesn’t know a lot about: Is it a car? Is it a factory? Is it a building? Vulcan knows lots about: It's a factory, it's a car, it's a building. But bottom up is not as accurate. And so I put those two together and get the best of both worlds.”

As Gurney fine tuned Volcan, cities started setting climate goals only to find they were on their own for actually measuring emissions. These estimates have gotten better since, but for most cities they are still little more than estimates based on several protocols set by non-governmental organizations like ICLEI - Local Governments for Sustainability and the World Resources Institute. These protocols are essentially—and in some cases literally—empty spreadsheets cities fill out themselves to arrive at an estimate, but they provide no data themselves. 


Gathering that data takes time, energy, and resources. Ben Furnas, the former New York City director of the Office of Climate and Sustainability, told Motherboard that some aspects of emissions calculations were relatively easy. For example, there are only two utilities that pipe natural gas into the city, so calling up National Grid and Con Edison is fairly straightforward. But for fuel oil and petroleum, Furnas’s office had to literally call every distributor and oil delivery company in the tri-state area and ask them how much they distributed in a given year. Furnas said even a city with resources like New York is doing a lot of emissions inventorying that’s “quite rudimentary.”

That showed when Gurney compared Vulcan’s results with the emissions data self-reported by 48 U.S. cities. The comparison, which was published in Nature last year, found cities were way off when compared to Vulcan’s results, underestimating by an average of about 18 percent but a range of underestimating emissions by 145 percent to overestimating by 64 percent. As Gurney and co-authors noted in the paper, this discrepancy between Vulcan’s results and those cities self-report is about 25 percent greater than all of California’s emissions. Another way of putting this is that cities, despite their best efforts, really have no idea how much greenhouse gasses they emit.


“These challenges are particularly important when placed in the context of the reduction targets,” the paper warned. “For example, the city of Indianapolis has indicated that they aim to make a 20 percent reduction in building GHG emissions between by 2025 relative to 2016 values. However, with the 26.9 percent underestimate found here, it will be difficult to know when and if this target is truly achieved or track progress towards it.”

Gurney said he’s received nothing but positive feedback from cities after calling out their estimates, and Gurney doesn’t blame cities for the predicament they’re in. The cities with flawed estimates and Gurney share a common goal. They all want one central system for greenhouse gas inventorying so each city doesn’t have to do it themselves. 

“I would argue this is absolutely the wrong way to do this problem,” Gurney said, referring to the every-city-for-themselves approach. “Asking every city to redundantly build an inventory is costly, takes up their staff time, they can't do a good job at it, because they just cannot find all the data.” Whereas, Gurney countered, Vulcan cost a few million dollars to build over the years and can now be used by any U.S. city.

Gurney would love to give Vulcan to every city for free, he says, but the problem now is cleaning up the masses of data so any city department can use it. Right now, all he can do is send them 11 terabytes of data and tell them to have fun. He’s hoping the federal government will come through, either to provide a grant to create a user-friendly interface with real-time data or do it themselves. If not, Gurney told Motherboard he’s prepared to start a company that will do the same, providing cities with a basic inventory for free but charge for more advanced datasets.

Gurney likened it to if every city was responsible for their own weather forecasting. “We would never expect every city in the United States to collect weather data, run a weather model and come up with their own forecast. It would just be an absurd thing to do. Well, that's what we're asking them to do on greenhouse gasses.”