Quebec’s National Library Launches AI Cultural Databank to Better Represent French and Indigenous Languages

Share

BAnQ moves into experimental phase of ambitious AI training initiative

Bibliothèque et Archives nationales du Québec (BAnQ) has launched the experimental phase of a proposed cultural and government databank designed to improve how artificial intelligence systems understand Quebec society, French-language culture, and Indigenous languages — a project with an estimated five-year budget of nearly $10.5 million.

The initiative, which completed a feasibility study earlier this year, aims to address a well-documented gap: major generative AI systems frequently struggle to provide reliable information about Quebec because of the limited amount of Quebec-related data in their training sets.

Why the project exists

A 2024 report by Quebec’s innovation council identified the problem directly, attributing it to the “very small quantity of data on Quebec” available in AI training datasets. The BAnQ project stems from a recommendation in that report.

Destiny Tchéhouali, co-holder of a Quebec-based research chair focused on French-language AI and digital technologies, said Quebec culture remains “underrepresented in the corpora currently circulating in the AI world.”

“We run the risk of reproducing linguistic biases and cultural biases. And when we also talk about Indigenous peoples, we run an even greater risk of all these biases,” said Tchéhouali, a professor in the communications department at Université du Québec à Montréal.

He described the proposed database as “strategic infrastructure” that could help establish guidelines for how local content is identified, catalogued, and tracked within AI systems.

How the platform would work

BAnQ president and CEO Marie Grégoire said the goal is to ensure AI systems better reflect Quebec society and culture — “whether in small models or large models, whether they come from research or from the business community.”

The institution says the platform would not function as a public distribution channel for creative works, and that access to data would be tightly controlled. BAnQ plans to begin with its own collections before expanding to data from other sources.

Valérie D’Amour, who led the feasibility study, said the project remains in an exploratory stage. “All scenarios are a little bit on the table right now,” she said. “We have a lot of ideas and we want to validate the possibilities with cultural stakeholders, as well as with data owners and providers.”

Similar initiatives have emerged elsewhere. In Sweden, large collections of Nordic-language texts have been assembled to help develop generative AI models for Scandinavian languages.

Copyright has emerged as a central tension as BAnQ develops the project. Grégoire argued the platform could offer creators more protection than the current unregulated environment.

“Right now, it’s a bit like the Wild West,” she said. “Data is being harvested for free, and that should not be the case.”

Grégoire said a centralized databank could act as a gateway that makes it easier to compensate creators whose works are used in AI training, and that collective action would better position cultural organizations to keep the sector financially sustainable.

But not everyone is convinced. Some artists worry that participating in AI training systems could ultimately erode their own livelihoods.

“The main criticism we hear in the field is that, even if artists earn income from it, they are still feeding the beast that will eventually be used to replace contracts they may lose because of AI,” said Maxime Harvey, a postdoctoral researcher at the National Institute of Scientific Research and a member of the same research chair as Tchéhouali.

Timeline and funding

The feasibility study envisions the platform becoming operational by 2029, though D’Amour said the timeline will be reassessed after the current 12-month experimentation phase concludes.

The five-year budget is estimated at nearly $10.5 million through 2030, covering both operating and capital costs. The Quebec government has already provided $340,000 for the feasibility study and a further $750,000 to support the experimentation phase.

Read more

Latest News