The Agent Commons: Knowledge Gets a Loss Function
Fabio Casati, University of Trento, Italy
Fanar: Building Sovereign Arabic Generative AI in the Shadow of Frontier Models
Mourad Ouzzani, Qatar Computing Research Institute, Hamad Bin Khalifa University, Qatar
Short Bio
Fabio Casati is Principal AI Architect at ServiceNow and Professor at the University of Trento.
Fabio focuses on designing, architecting, and deploying AI-powered workflows for enterprise customers.
His current areas of research include i) Conversational knowledge and intent elicitation, ii) Agent-centered knowledge management, and iii) Wellbeing studies in AI agents.
He has published many papers that mostly add noise to the overall body of knowledge.
His real contributions come from his past research in the area of well-being for older adults, which informs policy decisions in many countries, and recent research in conversational clustering.
Abstract
Everything in AI has an improvement loop — models have loss functions, agents have observability and error analysis, even code is entering self-improvement pipelines.
Enterprise knowledge, arguably the most important asset organizations own, has been left behind. In this talk, I'll explore how knowledge can get its own improvement loop — and show that how you close that loop has a profound effect on outcomes and quality of knowledge.
It is dangerously easy to design feedback loops that go wrong — rapidly, at scale - but it is also possible to design feedback mechanisms that evolve knowledge at the speed of agents, reliably, while meeting common enterprise constraints such as access control, and with minimal to no human intervention.
In the talk, we will explore different approaches and identify what separates useful from harmful feedback loops.
Short Bio
Dr. Mourad Ouzzani is Research Director at QCRI, leading the Research Engineering Group whose mission is to productize QCRI’s innovations. Over two decades, Mourad has worked on research topics related to data management, data integration and cleaning, and data-centric AI, including projects that produced platforms that became successful startups. Mourad was the project lead of Rayyan, a systematic-review platform that is now part of a startup and is serving over 1M users worldwide. More recently, he was part of a project that helped launch a new startup focused on preventing sudden cardiac arrests in athletes through continuous ECG monitoring during training and competitions. Built on QCRI’s SIHA digital health platform in collaboration with Aspetar, the solution has been piloted by leading European clubs, including PSG, Benfica, and Manchester City. Previously, Mourad was a Research Professor at Purdue University. His work is widely published in premier venues including SIGMOD, VLDB, and ICDE. He has been PI or Co-PI on 15+ major grants from NSF, NIH, and others.
Abstract
Generative AI is reshaping all of us and how we produce knowledge and make decisions — yet the frontier is defined and owned by a small handful of Western trillion-parameter models that were not built for the world's languages, cultures, or values, and whose terms of access are set elsewhere. For the 467 million Arabic speakers and for the more than two billion Muslims for whom Arabic is the liturgical language, this dependency carries a particular cost: Arabic represents only about 0.5% of web content despite the language's global reach, and the same models that fluently summarize an English news article frequently mistranslate a dialect, misrepresent a regional landmark, or misinterpret a religious query.
In this keynote, I will share our experience in building Fanar (meaning "lighthouse" in Arabic), Qatar's sovereign Arabic-centric generative AI platform, designed, built, and operated entirely at the Qatar Computing Research Institute at Hamad Bin Khalifa University. I will begin with the question that animated the project from the start: why undertake a sovereign generative AI capability for Arabic at all, given the apparent futility of competing with providers that command orders of magnitude more data, compute, and expertise? I will argue that sovereignty in this setting is not a policy aspiration but a first-class constraint, one that shapes how data is curated, how models are trained, how religious and cultural values are handled, and how the system is deployed and governed. I will then delve into some technical details about how we built Fanar. First, touching briefly on Fanar 1.0, a dual-model strategy with a custom morphology-aware Arabic tokenizer and specialized RAG for Islamic, recency, biography, and attribution queries. Then Fanar 2.0, a quality-over-quantity continual pre-training recipe that achieved substantial benchmark gains, multi-agent Islamic content grounding, long-form dialect-aware speech, culturally grounded vision, classical Arabic poetry generation, and a purpose-built guardrail model. Throughout, I will share some design choices, trade-offs, and lessons that emerged from doing this work under tight data and compute constraints