Documents that describe the doings of powerful people are rarely accessible to ordinary folk. They’re often stuffed with obfuscatory jargon and euphemisms, which make getting to the truth of a statement a Herculean task.
Consider, for instance, the term ‘quantitative easing’, used during the 2008 economic crash. The public might have been angrier had the government labelled this initiative more honestly – perhaps something like, ‘giving billions of pounds to failing banks’.
Governments aren’t the only offenders. Businesses too publish plenty of documents that contain revealing information, clear to those who know where to look and how to translate it, but otherwise disguised by linguistic tricks and doublespeak.
But such information may soon be more available to the masses. A team of researchers at Oxford University has created a public-access AI platform called CLARA, which is designed to translate impenetrable jargon. Specifically, the tool will enable users to ask questions about the fossil fuel industry and receive answers retrieved from the sector’s own documents.
Rummaging through the rhetoric
Benjamin Franta is founding head of Oxford’s Climate Litigation Lab, which created CLARA to help academics, lawyers, advocacy groups or journalists find information tucked away in dense industry documents. “Rhetorical monotony is often weaponised,” he says, “so ways to cut through that are very helpful.”
CLARA was trained on more than 10,000 fossil fuel industry documents, hosted at the University of California in the largest library of its kind. The system allows users to ask questions in plain English and then performs some GenAI-powered wizardry to search those documents and provide answers.
Looking through industry archives is laborious and time-consuming. “Our tool reduces the barrier to entry for researchers,” says Franta, who completed a PhD in the history of climate denial. “I personally have read many or most of these documents – nearly all of them. But it took me six years to do that PhD.”

CLARA responding to a question about the fossil fuel industry funding US politicians over the past decade
Even when researchers locate a relevant document or report, they’re usually limited to the tried-and-tested CTRL+F method to search for keywords in PDFs. But this technique fails to provide context, leaving the searcher to scroll through endless repetitions of words that offer neither answers nor clues.
CLARA aims to solve this problem. Users can ask questions such as, “What did the fossil fuel industry know about climate change in the 1990s?” Rather than simply picking out keywords, CLARA will search through those 10,000 documents to find any information that is relevant to the specific question, complete with context and citations.
It’s the kind of GenAI-powered search tool that’s used privately by major law firms – but CLARA is free to use for everyone.
How the team built CLARA
To create the platform, the Franta’s research team entered the fossil fuel industry documents into optical character recognition software, which converts normal text into coded text that can be read by machines. The text in the documents was then cut down to paragraph-sized chunks and each of those was transformed to a ‘multidimensional vector’ – a type of data that describes the overall meaning of the text.
When a user interrogates the platform with a prompt, another vector is created and the system identifies relations between the user’s question and the data that the system was trained on. Using this method, CLARA can locate excerpts that seem most relevant to the user’s question.
The system uses a method called ‘graph retrieval-augmented generation’ to structure information into a sort of map that ties data points together. For example, when an individual’s name comes up in its training data, a ‘node’ or cluster is created that shows how other pieces of information, such as dates, organisations or other people, relate to that person. And users can ensure the system’s outputs are accurate because it provides citations and links to the original documents informing its answers.
Using AI to level the playing field
Although the team is exploring the use of CLARA with other archival data, the system is currently trained only on documents from the fossil fuel industry, including reports from oil and gas companies, trade associations and “front groups like the Global Climate Coalition,” Franta explains.
“Historically, corporate accountability has been impeded because big industries produce a lot of material and it’s very difficult and time-consuming for anyone else to look through,” he says. “Tobacco-industry documents are a perfect example: there’s so much evidence of wrongdoing but one has to wade through 100 million pages to get to it.”

CLARA responding to a request for examples of ExxonMobil acknowledging the impact of climate change and fossil fuels
Franta adds that crucial nuggets of information will often be buried in 1,000-page documents.
“Part of our vision is that AI could be used to level the playing field and increase transparency and scrutiny around what big companies are doing,” he says.
“By increasing accountability, we hope AI can be a leveller – not just something big companies use for their own interests but also something people can use to hold corporations accountable.”
Documents that describe the doings of powerful people are rarely accessible to ordinary folk. They’re often stuffed with obfuscatory jargon and euphemisms, which make getting to the truth of a statement a Herculean task.
Consider, for instance, the term ‘quantitative easing’, used during the 2008 economic crash. The public might have been angrier had the government labelled this initiative more honestly – perhaps something like, ‘giving billions of pounds to failing banks’.
Governments aren't the only offenders. Businesses too publish plenty of documents that contain revealing information, clear to those who know where to look and how to translate it, but otherwise disguised by linguistic tricks and doublespeak.