Labs

CLEF 2026 hosts a total of 16 labs.

BioASQ

A challenge in large-scale biomedical semantic indexing and question answering

The aim of the BioASQ Lab is to push the research frontier towards systems that use the diverse and voluminous information available online to respond directly to the information needs of biomedical scientists.

CheckThat!

Developing technologies for identifying and verifying claims

Lab

The main objective of the lab is to advance the development of innovative technologies combating disinformation and manipulation efforts in online communication across a multitude of languages and platforms. The 9th edition of the CheckThat! lab at CLEF targets three tasks: (i) source retrieval for scientific web claims, (ii) fact-checking numerical claims, and (iii) generating full fact-checking articles.

ELOQUENT

Lab for evaluation of generative language model quality

Lab

The ELOQUENT lab for evaluation of generative language model quality and usefulness addresses high-level quality criteria for generative language models through a set of open-ended shared tasks.

eRisk

Early risk prediction on the internet

Lab

eRisk explores the evaluation methodology, effectiveness metrics and practical applications (particularly those related to health and safety) of early risk detection on the Internet. Early detection technologies can be employed in different areas, particularly those related to health and safety. Our main goal is to pioneer a new interdisciplinary research area that would be potentially applicable to a wide variety of situations and to many different personal profiles.

EXIST

Sexism identification in social networks

Lab

This lab focuses on the detection of sexist messages in social networks (SN), particularly in complex multimedia formats such as memes and short videos. Inequality and discrimination against women that remains embedded in society is increasingly being replicated online. In this edition, we extend the Learning with Disagreement (LwD) framework by incorporating sensor-based data from people exposed to potentially sexist content. This includes measurements such as heart rate variability, EEG, eye-tracking, and other sensor data.

FinMMEval

Multilingual and multimodal evaluation of financial AI systems

Lab

FinMMEval introduces the first multilingual and multimodal evaluation framework for financial large language models, assessing models’ abilities in understanding, reasoning, and decision-making across languages and modalities to promote robust, transparent, and globally inclusive financial AI systems.

HIPE

Evaluating accurate and efficient person–place relation extraction from multilingual historical texts

Lab

Who was where when? Accurate and Efficient Person–Place Relation Extraction from Multilingual Historical Texts.

ImageCLEF

Multimodal challenge in CLEF

Lab

ImageCLEF is an ongoing evaluation event that started in 2003, promoting the evaluation of technologies for annotating, indexing, retrieving, and generating multimodal data, and aiming to provide access to large collections of data across a veriety of scenarios, domains and contexts.

JOKER

Humor Detection, Search, and Translation

Lab

The JOKER Lab aims to improve AI’s understanding and handling of humor and wordplay by creating reusable test collections and tasks for humor retrieval, translation, and generation across languages.

LifeCLEF

Biodiversity monitoring using AI-powered tools

Lab

LifeCLEF is an international research initiative in the field of biodiversity informatics that organizes yearly challenges on the automated identification and understanding of life forms, particularly using machine learning and computer vision methods. It is part of CLEF (Conference and Labs of the Evaluation Forum), which organizes benchmark challenges to advance state-of-the-art techniques in information retrieval and data analysis.

LongEval

Longitudinal evaluation of model performance

Lab

Many components of information retrieval systems evolve over time. The LongEval Lab aims to provide a benchmark setting to the longitudinal evaluation of IR models. At its fourth edition, LongEval we focus on scholarly search and scholarly user models.

PAN

Stylometry and digital text forensics

Lab

PAN is a series of scientific events and shared tasks on digital text forensics and stylometry.

qCLEF

QuantumCLEF

Lab

Welcome to QuantumCLEF (qCLEF), an innovative evaluation lab at the intersection of Quantum Computing (QC), Information Retrieval (IR), and Recommender Systems (RS).

SimpleText

Simplify scientific text (and nothing more)

Lab

The SimpleText Lab seeks to make scientific knowledge more accessible by developing and evaluating AI models that simplify complex scientific texts, reduce misinformation risks, and improve public understanding through tasks on text simplification, hallucination control, and adaptive scientific content analysis.

TalentCLEF

Skill and job title intelligence for human capital management

Lab

TalentCLEF is an evaluation lab that aims to establish a public benchmark and collaborative forum for assessing NLP models in Human Capital Management, promoting fairness, multilingual capability, and cross-industry adaptability to advance skill and job intelligence technologies.

Touché

Argumentation systems

Lab

Touché is a series of scientific events and shared tasks on argumentation systems.