News

Scientists are Renaming Dozens of Human Genes so Microsoft Excel Doesn't Get Confused

In the past, loads of clinical data has been corrupted by one of Excel's most basic functions.
Gavin Butler
Melbourne, AU
dna excel
Collage mage via Pxfuel / Wikimedia

Scientists have renamed some 27 human genes over the past year to stop Microsoft Excel misinterpreting their alphanumeric codes as dates.

These alphanumeric codes, otherwise known as “symbols”, are used as a shorthand method for researchers to identify the tens of thousands of genes in the human genome. Sometimes, though, they end up reading as something else. 

The Membrane Associated Ring-CH-Type Finger 1 gene, for example, is codified as MARCH1; the Septin-1 gene is codified as SEPT1. A harmless coincidence, as far as the naked human eye is concerned. But plug that into an algorithm-minded program like Microsoft Excel—commonly used by scientists to track their work and conduct clinical trials—and problems start to occur.

Advertisement

Scientists were frequently being frustrated by the fact that Excel spreadsheets, by their default settings, would take it upon themselves to change certain genetic symbols like MARCH1 into dates, switching them to read things like “1-Mar” and in turn corrupting reams of important clinical data, The Verge reports.

Apart from being incredibly annoying, the impacts of this attempted autocorrect are significant. In a 2016 study examining genetic data that was shared alongside 3,597 published papers, about one-fifth were found to have been affected by Excel errors.

This week, the HUGO Gene Nomenclature Committee (HGNC)—that is, the scientific body responsible for standardising the names of genes—addressed the issue by publishing new guidelines for determining genetic symbols. Going forward, genes like MARCH1 and SEPT1 will be re-codified in a way that doesn’t trigger Excel’s in-built data validation functions—thus becoming MARCHF1 and SEPTIN1, respectively.

Over the past 12 months, the names of about 27 genes have been modified for this reason, although the guidelines weren’t formally announced until this week.

“We consulted the respective research communities to discuss the proposed updates, and we also notified researchers who had published on these genes specifically when the changes were being put into effect,” Elspeth Bruford, the coordinator of HGNC, told The Verge

Bruford went on to explain that it’s not uncommon for scientists to rename genes in order to avoid confusion—symbols like CARS1, WARS1, and MARS1, for example, have all had the numerals added for that reason—and the new guidelines are little more than a tightening of measures in order to make it easier for researchers to do their work.

Or, in some other cases, to avoid insulting a patient.

“We always have to imagine a clinician having to explain to a parent that their child has a mutation in a particular gene,” says Bruford. “For example, HECA used to have the gene name ‘headcase homolog’ … named after the equivalent gene in fruit fly, but we changed it to ‘hdc homolog, cell cycle regulator’ to avoid potential offense.”

Follow Gavin on Twitter or Instagram