LongEval

Most Information Retrieval (IR) benchmarks evaluate systems at a single point in time, despite data and user behaviors changing over time. Research shows that IR and text classification systems lose effectiveness as data patterns evolve, especially when test data is temporally distant from training data. This lab encourages developing models that maintain performance over time by providing training and testing data from different periods. We propose the fourth LongEval Lab to further focus on evaluating IR systems’ ability to generalize across time, using datasets split by various temporal distances to assess how well systems handle evolving documents and queries. For 2026 we plan a total of 4 tasks, widening the scope of long-term IR to new dynamics beyond documents, topics and qrels, closer to evolving user behavior with user simulation tasks.

Organizers

  • Matteo Cancellieri (The Open University)
  • Alaa El-Ebshihy (TU Wien, Research Studio Austria)
  • Maik Fröbe (Friedrich-Schiller-Universität Jena)
  • Petra Galuščáková (University of Stavanger)
  • Gabriela González Sáez (Université Grenoble Alpes)
  • Lorraine Goeuriot (Université Grenoble Alpes)
  • Gabriel Iturra-Bocaz (University of Stavanger)
  • Jüri Keller (TH Köln - University of Applied Sciences)
  • Petr Knoth (The Open University)
  • Philippe Mulhem (LIG-CNRS)
  • Florina Piroi (TU Wien, Institute of Information Systems Engineering, ZFDM)
  • David Pride (KMi)
  • Philipp Schaer (TH Köln - University of Applied Sciences)
  • Didier Schwab (Université Grenoble Alpes)