SA’s indigenous languages must all be fully developed and featured in digital resources – a UP professor says
South Africa has 11 official languages but we may be moving into a world of monolingualism, especially when it comes to finding information online, says Vukosi Marivate. Resultantly, it is “becoming more and more likely that it will be easier for the languages to disappear,” he said.
Marivate (left), who is the University of Pretoria’s Absa Chair of Data Science and an Associate Professor in the Department of Computer Science in the Faculty of Engineering, Built Environment, and Information Technology, was discussing African languages in the context of the forthcoming Vice-Chancellors’ Language Colloquium, to be hosted in a hybrid format at UP’s Senate Hall from 1 to 2 December.
Facilitated by Universities South Africa’s Community of Practice for the Teaching and Learning of African Languages (CoPAL), and themed Moving the Conversation Forward, the colloquium will focus on the implementation of the Language Policy Framework for Public Higher Education Institutions, which came into effect at the beginning of this year.
“Once you start using any other language in South Africa outside of English and Afrikaans, everything falls apart,” said Professor Marivate. He was not referring to the millions who speak African languages but to their digital interfaces. Even where there are some translation options, when interaction between people and machines such as computers moves beyond an introductory level to a conversation, these options vanish. “Most chatbots are in English, although there might be some in Afrikaans,” he said.
And the attitude of corporations about this deficit? The perception, he said, is: “Is it really worth our profit motive to build digital resources, or is it easier for the person to switch to interacting in English?”
The state, too, keeps deferring the development of African languages. “They keep on kicking that can down the roadbecause of other priorities. And it just becomes easier to say, ‘OK, let’s cancel everything else and just have English because that seems the least offensive way to keep on going,’ ” he said.
This only heightens the importance of the upcoming colloquium and its focus on implementing a policy that aims to uplift all indigenous languages to an equal footing within universities.
What is a language?
The reason the development of African languages matters, said Professor Marivate, is not just about people learning better in their mother tongue. It is about the information that is carried through language. Marivate keeps stressing: “Languages are not just symbols”.
He said people think a language is just a series of symbols put next to others that have some meaning. But it goes much further. ”Language carries culture. It carries knowledge. And there’s a saying in a lot of African languages that some things cannot be translated. People who speak that language know that if you put certain symbols together in a specific way, it’s not about reading them raw; they evoke a space and concepts that might not really translate well,” he said.
He likened the cultural associations of languages to memes circulated through social media. Richard Dawkins, the former Charles Simonyi Professor of the Public Understanding of Science at Oxford University, who coined the term “meme” in 1976 in his book The Selfish Gene, said: “Memes (discrete units of knowledge, gossip, jokes and so on) are to culture what genes are to life”.
Marivate said Dawkins “talks about memes as these cultural things that we all accept. It’s not just the word, it’s not just the picture, but it’s a cultural experience”.
What most people in Pretoria speak
A pertinent example of how language carries cultural messages is the dialect S’Pitori. Although the dominant language in Pretoria is Setswana, Marivate said more people in the area speak the S’Pitori, a dialect that includes Sepedi, isiZulu, English, Afrikaans and other words.
In last year’s municipal elections, the DA and Action SA’s billboard posters in Pretoria were in S’Pitori. There is also a dedicated Facebook page, Pitori Proverb, which features sayings and satire in S’Pitori, and has generated 92 000 followers in just over two years.
One example of a S’Pitori expression is “Dilo di nametse RunX” which means “things are going well”. It comprises the Setswana word “nametse”, which means “climb on” and “RunX”, a type of discontinued Toyota motor vehicle that went very fast.
“S’Pitori is also geographically different across the Tshwane municipalities. I can hear when a person is from Mamelodi or Atteridgeville or Hammanskraal,” said Marivate, adding that an academic in UP’s African languages department is doing his PhD on the varied use of Setswana on advertising billboards across Tshwane.
Let’s translate academic abstracts into African languages
Marivate said universities can play a role in developing African languages, one he mentioned in UP’s 30th Expert Lecture he presented in July, which was titled Riendzo ri lehile: Tackling Natural Language Processing for African languages to make better sense of our world.
He said the university would play a valuable role and be able to increase the lexicography of Setswana if it said that every abstract of every dissertation and paper published by a student or staff member must be translated into Setswana, which it would fund.
There might be scientific words which don’t exist yet in Setswana and the process would require asking a body such as the government-funded South African Language Board (PanSALB) to create suitable ones.
“You might find some universities are still just translating abstracts to Afrikaans,” said Marivate.
If this intervention were to spread to all universities in the languages geographically fitting to their area, it would simplify a much-needed process. “It’s easy to have frameworks and policies but it is another thing to actually push for things to happen,” he said. “Does UKZN make sure that all its abstracts are translated into Zulu? Does UCT make sure that all its abstracts are translated into isiXhosa? The colloquium should look at options such as these,” he said.
With a bank of scientific abstracts, for example in English and isiZulu, it would be possible to translate previously published scientific papers even beyond South Africa. Then an isiZulu-speaking person who felt it was hard to interact with science in English, could ask a smart machine for something scientific to be translated. “With this scientific lexicography and the AI (artificial intelligence) tools we could enable, it would be just a click on a website that says: ‘Hey, do you want to read this in isiZulu now?’ And you can do that.”
This is what Marivate is working towards, with his research into developing machine learning or AI methods to extract insights from data. “Just imagine,” he said.
“Just imagine the amount of access to scientific texts that Google Translate would probably do very badly now. Remember, though, Google Translate is a beta product (pre-release of software being tested under real conditions) and they say don’t use it in place of a human translator.
“Just imagine being in a situation where you’re talking to a refugee and you tell them ‘please type in your language’, and it will translate it for us in English and then we’ll type in a response,” he said, referring to a situation revealed in a 2019 report on US Immigration officials. (https://www.propublica.org/article/google-says-google-translate-cant-replace-human-translators-immigration-officials-have-used-it-to-vet-refugees)
How Xitsonga made it onto the internet this year
Without data, however, African languages are excluded from the internet and none of these tools can happen. Researchers need digital resources to teach machines to learn patterns from the data before they can create tools that can translate, change speech to text, and text to speech.
“My father’s language, Xitsonga, only got added to Google translate this year, and only after work done with Masakhane, creating more data, more digital resources,” he said. It might have taken time; but it proves that Marivate and various teams’ persistence has paid off.
Professor Marivate co-founded the grassroots organisation, Masakhane, in 2019, whose mission is to strengthen and spur natural language processing research in African languages, for Africans, by Africans. It is a volunteer research organisation with more than 1000 members across the African continent and beyond — local university students, developers and people from Google and Meta as well.
“Google could partly do this because there are all these researchers around the world who’ve been looking at these low resource languages, trying to create digital resources. Then other teams can see the great learning, and attempt to add it to their tool as well,” he said.
The value of the colloquium
Marivate said CoPAL should be applauded for getting vice-chancellors to share their experiences about what they see is working, and what is not.
“But when they go back to the universities, how much of their institutional community understands policy and what they’re going to be implementing? I’m pretty sure if I ask any of my colleagues they probably don’t know this even exists. So there’s a bit of a vacuum, a gulf between university management and staff.
“I think they should spread the message further. And it should be something that’s not seen as ‘nice to have’. It’s core to us understanding each other. The gears that move South Africa into better understanding itself and its past is to develop language.”
He said he is aware of the conversations that university leaders exchange, some joking around “why is my language not the one that’s being used” or “we’ve removed Afrikaans”.
“Yet for a lot of our African languages it’s a travesty,” he said. “Every time we ask about digital resources for isiNdebele, the people who work with digitising languages just look at you and they’re like, ‘yeah, that’s a big problem”.
As of November 2022, Setswana has 945 articles on Wikipedia, English has more than six million, Afrikaans more than 100 000, yet, surprisingly, isiZulu has only just over 10 000, he said. And isiNdebele has zero. “These are just heuristics. I’m not saying that’s how we should measure. It is about assuming that everything must just assimilate to this one view because it’s the most efficient and cheapest way. (https://meta.wikimedia.org/wiki/List_of_Wikipedias)
“It’s like saying that other people don’t matter. And their history doesn’t matter. And their learned and lived experience doesn’t matter either. In other words, this move towards multilingualism at universities is about far more than just the exterior use of language,” Professor Marivate concluded.
Gillian Anstey is a contract writer for Universities South Africa