ELOQUENT

The ELOQUENT evaluation lab experiments with new evaluation methods for generative language models to meet some of the challenges in the path from laboratory to application. The organisers include commercially active AI developers as well as research groups. This lab explores the following important characteristics of generative language model quality: (1) Trustworthiness, a many-faceted notion which involves topical relevance and truthfulness, discourse competence, reasoning in language, controllability, and robustness across varied input, which is at the forefront of current development projects for generative language models; (2) Multi-linguality and cultural fit: the suitability of a language model for some cultural and linguistic area which is at top of attention, not least for the European arena; (3) Self-assessment: the reliability of a language model to assess the quality of itself or some other language model, using as little human effort as possible; (4) Limits of language models: the delimitation of world knowledge and generative capacity.

Organizers

  • Jussi Karlgren (AMD Silo AI)
  • Ondřej Bojar (Charles University in Prague, ÚFAL)
  • Marie Engels (Fraunhofer IAIS)
  • Pavel Šindelář (Charles University)
  • Mario Piacentini (OECD)
  • Luis Francisco Vargas Madriz (OECD)
  • Katherina Thomas (OECD)