Abstract:
Transformers have recently gained signifcant attention in machine learning due
to their self-attention mechanisms, which allow models to dynamically assess
the importance of different input elements. Although originally designed for
Natural Language Processing (NLP), the application of transformers in computer
vision tasks, such as image classifcation, has been gaining traction. This work
explores the use of Vision Transformers (ViT) in the context of face age
regression, focusing on three well-known datasets: MORPH II, AFAD, and CACD.
By leveraging ViT in a regression setting, we aim to predict the age of individuals
based on facial images.