A widely used teaching tool among teachers in the classroom is slide presentations, as they are a very useful resource due to their concise and visual nature. However, they constitute one of the main barriers to understanding lessons for blind or low-vision students, especially due to the frequent use of deictic expressions by the teacher during explanations (such as “this,” “here,” or “in this part”). Therefore, adapting them through automatic descriptions and detecting and clarifying such expressions are essential.
The project technicians, in collaboration with the professor of Statistics and Econometrics at UC3M, Andrés M. Alonso Fernández, have conducted a first evaluation of the quality of the automatic slide descriptions generated by the developed intelligent system, as well as their usefulness in subsequently clarifying the deictic expressions used by the instructor during the presentation. The evaluation consisted of assessing the execution of slide adaptations by instructing the model to perform this action through instructions or prompts.
The developers observed that a simple and direct prompt, in which the model was to describe a Statistics slide to a visually impaired student as if it were an Engineering professor, was not as effective as expected, as it focused on instructing how to explain the slide to the student rather than on the description itself. Conversely, more complex prompts that specified description objectives, rules, formats, important information to include, etc., proved much more suitable.
Overall, the results show that the system is making positive progress in describing visual elements, especially regarding simple tables and graphs. However, the generated descriptions still show some limitations in terms of fidelity, pedagogical appropriateness, and coherence with the explanation sequence. In the coming months, the project’s researchers and technicians will continue to work in this area, implementing improvements such as reducing irrelevant information, incorporating full topic context, refining prompts to avoid anticipating concepts, improving the description of complex figures, and exploring cross-verification with two models.

Example of a description generated by the model for a slide used by a Statistics professor in the classroom. In the lower left part of the image, from the observer’s point of view, the slide is shown. On the right, the phrase spoken by the professor is indicated: “Here you can see which is the most frequent number of cylinders,” in which he uses the deictic expression “here” that the model must identify in real time in class to provide the description to blind students. In the upper part of the image, the descriptive result of the slide generated by the model for the blind student is indicated with the prompt used: “The most frequent number of cylinders is 4, with an absolute frequency of 104 and a relative frequency of 0.6710.”