Following the launch of the “Cognitive Agent” project, the development team is currently focused on implementing a multimodal intelligent agent designed for real-time processing and support of academic sessions.

The main objective of the current module is to provide a technical solution for deictic expressions in the teaching environment. These spatial or visual references (such as “this graph” or “here”), commonly used in teachers’ discourse, often create a critical information barrier for students with visual impairments.

To address this challenge, the system integrates an agent with an architecture capable of monitoring classroom audio to detect visual references, resolving them synchronously through semantic routing and slide-by-slide precomputed visual descriptions generated by computer vision models, and automatically producing structured descriptions. This information is made available to the student in real time through speech synthesis or adapted formats, allowing them to autonomously decide when to receive the description according to their specific needs at each moment of the class.