Sequence-to-Sequence Spanish Pre-trained Language Models

Date: May 24, 2024

This talk showcases the development and evaluation of Spanish sequence-to-sequence pre-trained language models, addressing the gap in encoder-decoder architectures specifically designed for Spanish NLP tasks. Presenters introduce Spanish adaptations of BART, T5, and BERT2BERT models, detailing their pre-training on large Spanish corpora and their application to a wide range of conditional generation tasks such as summarization, question answering, dialogue, split-and-rephrase, and translation—all critical for effective Spanish-language technology. Experimental results demonstrate that BART and T5 variants outperform competitors in most tasks, setting new standards for Spanish sequence-to-sequence capabilities, and all models are released for public use to enrich research in this underrepresented language space. The workshop further emphasizes the importance of diversity in computational linguistics, advocating for greater visibility, collaboration, and resource creation among LatinX researchers to advance NLP for Latin American languages.

Share on

Twitter Facebook LinkedIn

Vladimir Araujo

Share on