Evaluation benchmarks for spanish sentence representations
Date:
This talk introduces newly developed evaluation benchmarks for Spanish sentence representations, addressing the lack of systematic resources for comparing language models in Spanish across various NLP tasks. Dr. Vladimir Araujo presents benchmarks adapted from established English resources (SentEval and DiscoEval) to the Spanish language, enabling fairer and more comprehensive model assessment on tasks such as sentiment analysis, semantic similarity, entailment, and discourse-level relations. The presentation details benchmark construction, included datasets, and evaluation protocols, and reports comparative results across state-of-the-art Spanish and multilingual models, highlighting the strengths and weaknesses of different architectures and layers in capturing sentence and discourse features. The talk also introduces lightweight Spanish language models, discusses their efficiency and performance, and considers open questions in benchmark design, especially for more complex, pragmatic, or implicit linguistic phenomena.