Implementation of a Cortico-Thalamic Transformer for understanding Language Processing
| Author | Affiliation | ||
|---|---|---|---|
Granier, Arno | University of Bern | CH | |
Babu Kumar, Vasanth | University of Bern | CH | |
Balvočius, Titas | Vytauto Didžiojo universitetas | ||
Proix, Timothée | University of Geneva | CH | |
Woodman, Marmaduke | University Aix Marseille Université | FR | |
Vytauto Didžiojo universitetas | |||
Senn, Walter | University of Bern | CH |
| Date | Start Page | End Page |
|---|---|---|
2025-11-28 | 68 | 68 |
Transformer networks (Vaswani et al., 2017) have revolutionized artificial intelligence with their powerful capabilities, yet it remains unknown if they process information in a way that is similar to the human brain. Recent theories suggest that the transformer’s attention mechanism may mimic the brain’s cortico-thalamic circuits (Granier & Senn, 2025). However, a working computational model is needed to test this idea against realworld brain data. This study aimed to build and train a cortical transformer model, which is a transformer based on the proposed cortico-thalamic architecture, which maps the mathematical components of the self-attention mechanism—queries, keys, and values—directly onto the biological dynamics of specific cortical layers and thalamic loops (Granier & Senn, 2025). The goal was to create a tool that can generate internal activity patterns for direct comparison with human brain recordings during a language task. A cortical transformer was built in PyTorch, using a 4-layer, 6-head design. The model, containing 11.71 million parameters, was trained for 34 epochs on the Multi30k dataset (Elliott et al., 2016) to complete an English-to-German translation task. A data pipeline was developed to extract the model’s internal activations (Query, Key, and Value vectors) in response to linguistic input. The cortical transformer was successfully trained, achieving a BLEU score of 25.29. While not state-of-the-art, this result serves as a critical proof of concept, demonstrating that the cortico-thalamic architecture is computationally viable and capable of learning a complex language task. The system is now prepared for future neuroscientific studies that will correlate its activations with human brain recordings, providing a framework to investigate the biological basis of attention and cognition.