Welsh speakers are set to benefit from a groundbreaking artificial intelligence model designed to understand and use the language across key sectors including healthcare, education and law.
The pioneering project is being led by University College London in partnership with Bangor University and global technology company NVIDIA. The Welsh-language AI has been trained using the UK’s most powerful supercomputer, Isambard-AI, based in Bristol.
This development forms part of the UK-LLM initiative, which is focused on building “sovereign AI” models for UK languages. Significantly, the Welsh model is the first to demonstrate strong reasoning abilities in the language.
The primary aim is to make public services more accessible while supporting the Welsh Government’s Cymraeg 2050 strategy to increase the number of speakers. Researchers believe the model could help organisations in Wales – from hospitals and schools to shops and broadcasters – provide bilingual services more efficiently. It could also prove valuable for learners, widening access to Welsh resources.
Gruffudd Prys, from Bangor University’s Welsh language technology centre, brings nearly two decades of expertise to the project. His team is ensuring the model’s accuracy by reviewing machine-translated training data, validating hand-translated evaluation data, and testing how well the system handles linguistic features that AI typically struggles with, such as consonant mutations in Welsh grammar.
Prys said: “AI shows enormous potential to help with second-language acquisition of Welsh as well as for enabling native speakers to improve their language skills. The aim is to ensure that Welsh remains a living, breathing language that continues to develop with the times.”
The initiative builds on earlier models for UK languages, with plans to expand into Irish, Scottish Gaelic, Cornish and Scots. International collaborations are also being considered, with similar methods applied to under-represented languages in Africa and Asia.
Importantly, the Welsh model and its training data will be made publicly available so that developers, businesses and public services can adapt the technology to their needs.
Pontus Stenetorp, professor of natural language processing and deputy director for the Centre of Artificial Intelligence at University College London, highlighted the speed of progress: “This collaboration with NVIDIA and Bangor University enabled us to create new training data and train a new model in record time, accelerating our goal to build the best-ever language model for Welsh. Our aim is to take the insights gained from the Welsh model and apply them to other minority languages, in the UK and across the globe.”
