If you’ve ever tried to decipher your grandmother’s handwritten recipe or a forgotten note from your favorite childhood teacher, then you’ve experienced the agony of attempting to read between ink stains, smudged letters, and discoloration. For historians studying ancient documents, these headaches are magnified tenfold—especially because the authors of these documents are long since gone. But that could be about to change.
While digital methods have been applied to these documents in the past to parse their hidden meanings, these techniques can remove other important artifacts, including color, from the manuscripts. Now, an international team of researchers have devised a new technique that can tackle a variety of manuscript damage at once using spectral and color data to separate the degradation from the document itself and remove it. The findings were published Wednesday in the journal PLOS ONE.
Muhammad Hanif is first author on the paper and an assistant professor at the Gik Institute in Pakistan. He told Motherboard in an email that this method offers researchers a chance to restore ancient knowledge without ever physically touching the documents. It can also help revive documents in living color.
“Unfortunately, these documents have suffered damage over time due to various factors such as natural aging, environmental conditions, or improper handling, which impinge on their readability and information contents,” Hanif said.
“Most of the document restoration methods in literature focused on the text extraction only, resulting in a binary (black and white) image… we extend the concept of digital document restoration to the removal of degradation patterns of any type, such as spots, bleed-through, non-uniform illuminations, etc, while leaving unchanged the other informative elements.”
In addition to better matching the original look of the documents, Hanif said that this approach can also help restore non-text based elements, such as annotations and stamps.
To restore these documents to their former glory, the approach starts by analyzing the color of the document pixel by pixel to analyze spectral differences—how the different colors respond to light— between layers of the document, such as the ink, stamps, and the paper itself. These pixels are grouped together by like colors in order to separate the different layers. Specific pixel groups, such as those corresponding to a stain, can then be targeted for extraction.
Similar to an extraction tool you might use to remove a photobomber from your vacation shots, this technique can also fill in missing areas of the document after extraction for a more cohesive look by swapping in a pixel to match the background surrounding the defect, Hanif said. In addition to restoring the look of the document, this can also help character recognition tools better parse the document for easier text analysis.
Right now the team has demonstrated this approach on documents damaged by ink bleed-through and spots but also hope to extend this to other types of damage in the future, including from insects, fire, or water. They are also looking for ways to incorporate artificial intelligence into their approach going forward to help retrieve context-based information from damaged documents.
“Ancient manuscripts offer a unique and irreplaceable glimpse into the history of civilizations, cultures, and religions,” Hanif said. “We hope this type of restored document will open avenues for historians to explore history in more detail and will help to share the findings with a wider audience, making it more accessible to researchers and the general public.”