Sybilla: A Prototype to Detect Wikipedia Content Gaps from Reader Demand Signals

Session type: Lightning talk Showcase

Track: Artificial intelligence

Speakers

Ilario Valdelli

Ilario Valdelli is Innovation Programme Lead at Wikimedia CH (Switzerland) and has 18 years of experience in Wikimedia projects. A Wikipedian since 2005, he has been involved with Wikimedia CH since its founding, contributing to the chapter’s early development and long-term growth. He co-founded Wikimedia Switzerland and Wikimedia Italy, has worked as Programme Manager at Wikimedia CH since 2014, and has led the organisation’s innovation work since 2020.

His work focuses on innovation at the intersection of open knowledge, digital transformation, and AI, with an emphasis on strategy, partnerships, and community-informed experimentation. He supports collaborations with universities, public institutions, and civic tech networks in Switzerland and across Europe, translating research and emerging technologies into practical prototypes, programme design, and governance discussions for the Wikimedia ecosystem.

Recently, he produced the report from the Collective Intelligence vs Artificial Intelligence roundtable and co-organised WikiCite 2025, contributing to movement-wide discussions on knowledge infrastructure, responsible innovation, and the evolving relationship between Wikimedia and AI.

Abstract

Sybilla is an in-development tool to detect and prioritize Wikipedia content gaps by comparing reader demand signals with Wikipedia/Wikidata coverage. The goal is not automated article writing, but evidence-informed prioritization for editors, translators and program teams: deciding what to improve first, across topics and languages.

Sybilla’s pipeline combines: (1) demand aggregation from transparent proxies (e.g., recurring questions, topic requests, curated input lists, and partner/community datasets where appropriate), (2) mapping to Wikidata/Wikipedia entities, (3) explainable scoring for gap types (missing pages, thin coverage, language fragmentation, and update/maintenance proxies), and (4) actionable outputs such as ranked gap lists and gap maps.

This session will present the system design and the current implementation status, and—if available by July—show an early prototype with sample outputs and lessons learned.

Particular attention is given to governance: bias and representativeness of demand signals, transparency of scoring, and community control over inputs and usage.
Audience takeaway: a practical, auditable method to move from anecdotal “missing knowledge” to measurable, explainable gap detection that can support campaigns, partnerships, and strategic planning.

Additional information

How does your session relate to the event theme: Liberté, Équité, Fiabilité (Freedom, Equity, Reliability).
Sybilla relates directly to the theme by strengthening Freedom, Equity, and Reliability in how Wikimedia identifies and addresses knowledge gaps. Freedom: By turning “what information people need” into transparent, community-governed signals, Sybilla helps communities prioritize improvements without relying on opaque, proprietary platforms. The approach supports open, auditable workflows aligned with Wikimedia values rather than external ranking incentives. Equity: Content gaps are unevenly distributed across languages, regions, and topics. Sybilla is designed to surface under-served areas by comparing demand and coverage across language editions and by making bias and representativeness explicit. This enables communities to focus efforts where the impact on access to knowledge is greatest—especially for smaller languages and less-visible topics. Reliability: Sybilla does not automate article writing. Instead, it provides explainable gap indicators and links back to sources and Wikimedia structures (articles, Wikidata entities, maintenance signals). The goal is to support reliable editorial decision-making: prioritizing missing, thin, or outdated coverage while keeping human verification and community governance at the center.

Which Wikimedia audiences will find this content the most useful?
Editors and WikiProjects looking for evidence-based ways to prioritize what to create or improve next. Campaign and program organizers (affiliates, thematic campaigns, GLAM/education partners) who need measurable ways to choose focus topics and evaluate impact. Researchers and product/tech contributors exploring responsible uses of AI in the Wikimedia ecosystem, especially around transparency, bias, and governance.

What is the experience level needed for the audience for your session?
Some experience will be needed