Sequence-to-Sequence Spanish Pre-trained Language Models
Date:
This talk showcases the development and evaluation of Spanish sequence-to-sequence pre-trained language models, addressing the gap in encoder-decoder architectures specifically designed for Spanish NLP tasks. Presenters introduce Spanish adaptations of BART, T5, and BERT2BERT models, detailing their pre-training on large Spanish corpora and their application to a wide range of conditional generation tasks such as summarization, question answering, dialogue, split-and-rephrase, and translation—all critical for effective Spanish-language technology. Experimental results demonstrate that BART and T5 variants outperform competitors in most tasks, setting new standards for Spanish sequence-to-sequence capabilities, and all models are released for public use to enrich research in this underrepresented language space. The workshop further emphasizes the importance of diversity in computational linguistics, advocating for greater visibility, collaboration, and resource creation among LatinX researchers to advance NLP for Latin American languages.