AMALIA is a fully open Large Language Model for European Portuguese.
This repository serves as a central entry point for resources related to the paper:
“AMALIA: A Fully Open Large Language Model for European Portuguese”
Accepted at PROPOR 2026
📄 https://aclanthology.org/2026.propor-1.38/
Despite recent advances in open Large Language Models (LLMs), European Portuguese (pt-PT) remains underrepresented in both training data and evaluation benchmarks. Existing evaluations often rely on machine-translated datasets, which fail to capture important linguistic and cultural nuances of the language.
AMALIA addresses this gap by:
- Prioritizing high-quality pt-PT data during all training stages
- Providing a fully open LLM tailored specifically for European Portuguese
- Introducing new evaluation benchmarks for pt-PT
Experimental results show that AMALIA remains competitive with strong baselines, while achieving substantial improvements on pt-PT-specific evaluations, highlighting the importance of targeted training and native benchmarking for underrepresented language variants.
For implementation details refer to the official organization repositories:
🔗 https://github.com/orgs/AMALIA-LLM/repositories
If you use AMALIA in your work, please cite:
@inproceedings{simplicio-etal-2026-amalia,
title = "{AMALIA}: A Fully Open Large Language Model for {E}uropean {P}ortuguese",
author = "Simpl{\'i}cio, Afonso and Vinagre, Gon{\c{c}}alo and Ramos, Miguel Moura and Tavares, Diogo and Ferreira, Rafael and Attanasio, Giuseppe and Alves, Duarte M. and Calvo, In{\^e}s and Vieira, In{\^e}s and Guerra, Rui and Furtado, James and Canaverde, Beatriz and Paulo, Iago and Ramos, Vasco and Gl{\'o}ria-Silva, Diogo and Faria, Miguel and Treviso, Marcos and Gomes, Daniel and Gomes, Pedro and Semedo, David and Martins, Andr{\'e} and Magalh{\~a}es, Jo{\~a}o",
booktitle = "Proceedings of the 17th International Conference on Computational Processing of {P}ortuguese ({PROPOR} 2026) - Vol. 1",
month = apr,
year = "2026",
address = "Salvador, Brazil",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2026.propor-1.38/",
pages = "380--391",
isbn = "979-8-89176-387-6"
}
@misc{simplicio2026amaliatechnicalreportfully,
title = {AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese},
author = {Afonso Simplício and Gonçalo Vinagre and Miguel Moura Ramos and Diogo Tavares and Rafael Ferreira and Giuseppe Attanasio and Duarte M. Alves and Inês Calvo and Inês Vieira and Rui Guerra and James Furtado and Beatriz Canaverde and Iago Paulo and Vasco Ramos and Diogo Glória-Silva and Miguel Faria and Marcos Treviso and Daniel Gomes and Pedro Gomes and David Semedo and André Martins and João Magalhães},
year = {2026},
eprint = {2603.26511},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2603.26511}
}