European Day of Languages is an annual celebration of the diverse range of languages spoken across the continent. But as Dr Fintan Mallory, from our Department of Philosophy, explains how we shouldn't expect AI technology to save lesser spoken languages.
I do interdisciplinary work at the intersection of linguistics, machine learning and the philosophy of mind and language.
One question I work on is, what is it to know a language?
Some people think that Large Language Models (LLM) like GPT4o can genuinely understand language. I’m sceptical of this but have argued that it’s not necessary for these models to fully know languages in order for us to use them.
If you book tickets at the cinema on the phone, you don’t have to believe that the automated ticketing system is talking to you in order to use it, you can simply engage in a game of make-believe in order to get what you want.
A second question I work on is, how do neural networks represent things about the world? The question of how one thing represents another is a very old philosophical topic and philosophers have made quite a bit of progress with it (at least, we’ve got very good at detecting dead-ends).
There are around 7,000 languages on the planet. It’s likely that by the end of this century about half of those languages will have been killed. This killing is a part of the ongoing process of colonialism that has sought to extinguish cultures and peoples in their uniqueness. Linguicide, the killing of a language, plays a major role in the attacks on indigenous communities by political powers seeking to render those communities economically useful to them.
It is possible to think of a language as a database of information, a hoard of facts about grammar and word meanings that can be extracted and put into storage. This is sometimes a good way to view a language when you’re doing linguistics — to remove the people from the equation. It’s what LLMs do when they train on ‘linguistic data’ with no regard for the human beings whose lives are expressed in that data.
But if a language were just a body of information, we could in theory just save languages by storing them on in LLMs. An alternative view of language puts human beings at the centre and views a language as something more like the soul of a community. You can’t store this in a machine. You can’t solve a human problem like linguicide with a view of language that removes the human component.
Rather than approaching language preservation as a technical problem, I think indigenous communities need to be politically empowered, whether that be funding from governments or legal protections to use their languages.