This AI Bot Translates Dead Ancient Languages into English

In any number of sites around the world home to ancient artifacts and the remains of once-thriving societies, there are clay tablets in the ground with messages that have been waiting to be read for more than 5,000 years. These messages—written using a reed stylus pressed into the clay to form wedge-shaped marks to create a script known as cuneiform—are typically thought as the earliest examples of written language in human history.

About 600,000 of these cryptic tablets have been unearthed over the past few centuries, and they’ve been steadily piling up in university and museum collections all over the world. From these findings, researchers known as Assyriologists have been able to decode and translate extinct languages and breathe new life back into them—such as Akkadian, once spoken by people living in the world’s first civilizations in Mesopotamia.

However, there are only so many Assyriologists who are capable of translating these ancient texts. As new clay tablets are discovered each year, the task to divine these ancient messages into modern languages becomes more and more daunting.

“It’s kind of bookending human written history—so it’s really exciting work.”

— Willis Monroe, The University of British Columbia

“There are very few people who can really read it well,” Shai Gordin, an Assyriologist at Ariel University, told The Daily Beast. “It’s also 3,000 years of written history, right? So if someone can read one period of cuneiform really well, that doesn’t necessarily mean they can read the other periods really well.”

So, to help them translate the growing number of ancient languages into modern-day ones, Gordin and his colleagues decided to turn to an advanced piece of emerging technology to help: artificial intelligence. More specifically, the team was able to develop a neural network that is capable of translating Akkadian and Sumerian cuneiform into English. The researchers published a paper of their findings May 2 in the journal PNAS Nexus.

Not only can a model like this help greatly speed up the process of translation for the researchers, it can also give historians and Assyriologists an opportunity to gain new and more in-depth insights into these ancient civilizations. The authors also see potential in democratizing assyriology by giving access to such tools to laypeople.

“We really wanted to have [Assyriology] be more integrated with technology,” Gordin said. “On the one hand, it makes the process more seamless and standardized. On the other, it’s actually helping us ask new and exciting questions because it allows us to see patterns we haven’t seen before.”

For better or worse, AI can be seen as the next evolution of humanity’s relationship with language. Not only can AI power tools like Google Translate to help people communicate and understand one another no matter what corner of the world they were born in, but large language models (LLM) like OpenAI’s ChatGPT and Google’s Bard are beginning to fundamentally change the way some people write and engage with the content they see online.

It’s no real surprise then that AI would eventually come from some of the oldest languages known to humanity. Willis Monroe, an ancient Near East historian at the University of British Columbia who wasn’t involved in the study, believes that it’s fitting in an almost poetic sense that Gordin and his colleagues developed an AI model to engage with one of the earliest written languages in human history.

“Cuneiform is very, very old,” Monroe told The Daily Beast. “It’s one of the earliest if not the earliest writing scripts in the world. So something that’s so fun about this is the combination of modern, digital approaches like neural networks and machine learning with the first writing ever. It’s kind of bookending human written history—so it’s really exciting work.”

How This Doctor Wrote Dozens of Science Papers With ChatGPT PUBLISH OR PERISH

Tony Ho Tran, Neel V. Patel

230506-tran-doctor-academic-papers-chatgpt-hero_l3aucl

The model itself is an extension of the Babylonian Engine, a platform for digital Assyriology that seeks to merge emerging tech with the study of these early written languages. “Its goal is to integrate artificial intelligence and machine learning models into the actual work that we do as historians and scholars of the past,” Gordin said.

This latest study is a big step towards that goal—albeit a formidable one considering the limitations of cuneiform. For one, the script is akin to “three-dimensional handwriting,” Monroe said. “It’s like trying to have a computer automatically generate a translation of someone writing in French on a beach with a stick. It’s very complex.”

He added that while there are teams working on models that can directly translate cuneiform, it’s still a long way off as the script is meant to be read in “shifting light.” That’s an inherent limitation for even the most powerful computer.

This Tiny Town Created by ChatGPT Is Better Than Reality TV CHATNPC

Tony Ho Tran, Neel V. Patel

So to get around this, the team actually trained two versions of the neural network. The first converts Akkadian that’s been transliterated in the Latin text to English (T2E), while the other translates unicode—an international encoding standard that assigns numbers to scripts and letters—representations of cuneiform into English (C2E). While not directly using the cuneiform, this method allowed researchers to create models that effectively translated the Akkadian.

To measure the success of the translations, the team used the Best Bilingual Evaluation Understudy (BLEU), a rubric that measures the accuracy of translation. The T2E resulted in the most accurate translations of the two models—achieving an average score of 37.47—while C2E resulted in an average score of 36.52. These are both relatively high scores on the BLEU and indicate that the model was capable of creating understandable translations.

While the neural network shows a lot of promise, there were some caveats. For one, the model is prone to hallucination, a perennial problem in LLMs and other generative AI where the system makes up inaccurate or entirely false answers. Gordin noted that this would often happen when the model attempted to translate text longer than 118 characters.

Google’s Chatbot Bard Spews Misinformation and Hate: Report GARBAGE IN, GARBAGE OUT

Decca Muldowney, Tracy Connor

For example, one translation that they put through the neural network was “If the day of disappearance of the moon reaches its normal length: the days of the ruler will be long.” However, the model translated it as “If the day reaches its normal length: a reign of long days.”

“So it's very close, but not as accurate as a human translator,” Gordin said.

Gordin said this example underscores the importance of always keeping a human in the loop when it comes to this tool—and other AIs like it. With the proliferation of LLMs, these systems need to be viewed as tools to assist actual flesh-and-blood people in their work rather than replace them entirely. As useful as the Akkadian neural network is when it comes to the work that Gordin and his fellow Assyriologists are doing, it still can’t outright replace a human scholar’s intuition and oversight.

“There’s still a human element,” Monroe said. “In terms of translating the world of Akkadian, this is very effective. It removes one or two steps from the process—but there's still humans that have to actually hold the clay tablets and study them.”

You might not realize it, but you owe a lot to the Mesopotamians. Everything from our numerical system, to our knowledge of astronomy, to our system of the rule of law can be traced back to this very ancient civilization and their writing system. That’s why Assyriologists like Gordin and Monroe work to translate these tablets—so we might be able to learn more about the origins that led to our current state of the world.

But there’s still a lot to be done even with an AI model to help lighten the load. As we’ve seen with just the past few months since the release of ChatGPT, though, technology can move at eye-watering speeds—and that’s especially the case with artificial intelligence.

According to Gordin, the model can be upgraded and trained with different periods of cuneiform. This will allow it to translate an even greater corpus of Akkadian, and gain even more insight into these ancient cultures. Moreover, the team plans on putting the model on the Babylonian Engine so more people can access it.

“We want to make it more accessible to people who are not necessarily historians or Assyriologists.”

— Shai Gordin, Ariel University

“We want to make it more accessible to people who are not necessarily historians or Assyriologists,” Gordin said. “We want to make it easier for them to pick up a text and get a translation through this model.”

This democratization of the research could help unearth even more insights into these ancient civilizations and how their people lived. Meanwhile, training neural networks on these languages help refine and train them in ways that they wouldn’t necessarily be able to by translating “living” languages. In that way, a very, very old and extinct language is still capable of teaching us new things about life today.

“There’s amazing things that we find in cuneiform,” Monroe said. “There's all these things that really resonate with us now, and show the commonality that we have with ancient people that are so distant, not only language, but also time.”

This AI Bot Translates Dead Ancient Languages into English

Time travelers might want to keep this one handy.

Tony Ho Tran