African researchers are stepping up efforts to ensure the continent’s languages are not left behind in the artificial intelligence revolution.
The Africa Next Voices project has released a massive open-access dataset containing 9,000 hours of recorded conversations in 18 African languages across Kenya, Nigeria, and South Africa. This collection is designed to be “AI-ready,” providing developers and researchers with the raw material needed to build tools that better understand African speech.
The project is led by Professor Vukosi Marivate of the University of Pretoria, who stressed the cultural importance of this work. “We think in our own languages, dream in them, and interpret the world through them,” he said. “If technology doesn’t reflect that, whole communities risk being excluded.”
The gap is already visible. A recent study published in Nature found that ChatGPT correctly processes only 10–20% of text written in Hausa, one of the most widely spoken languages in West Africa with more than 90 million speakers. This underlines the risks of digital inequality as AI becomes central to everyday life.
By making its dataset openly available, Africa Next Voices hopes to accelerate innovation while ensuring that Africa’s languages and cultures are properly represented in the next generation of AI systems.