Research vision 2024

by Alan Guedes, April 2024

My research vision in 2024 is still grounded on three connected topics: Multimedia Systems, Immersive Media, and Computer Vision. I see them not as isolated areas, but as complementary ways of understanding how rich digital content is produced, delivered, experienced, and analysed. Across these topics, my current interest is increasingly shaped by AI, especially in ways that connect representation learning, multimodal understanding, and real-world applications.

Multimedia Systems ¶

Multimedia Systems remains a core foundation in how I think about research. My interest here is not only in media delivery itself, but in the broader challenge of building systems that make multimedia content accessible, interactive, adaptive, and useful in practice.

This perspective comes from my earlier work on TV middleware, standards, and interactive media, and it still influences how I approach current problems. Even when I work on AI-driven tasks, I care about the system context in which those methods operate: how data is represented, how users interact with it, and how methods can support real usage scenarios rather than isolated benchmarks.

Some of my contributions in this field include:

Immersive Media ¶

Immersive Media is important to my research because it pushes multimedia systems toward more natural, embodied, and context-aware experiences. I am interested in how immersive environments can better support interaction, perception, and quality of experience, especially when users are no longer passive viewers.

My work in this area has focused on 360 video and multisensory experiences, but my broader research view is that immersive media is also a useful testbed for AI methods. It raises questions about personalisation, adaptation, attention, interaction, and meaning in ways that are highly relevant to current multimodal AI research.

Some of my contributions in this field include:

Computer Vision ¶

Computer Vision is where much of my current methodological interest is concentrated. I am particularly interested in video understanding, multimodal learning, and AI methods that can extract useful structure from complex visual data. For me, the goal is not only to improve recognition performance, but to develop methods that help people explore, interpret, and act on rich media data.

This is also the area where I see the strongest connection with Generative AI and graph-based methods. I am interested in combining visual understanding with knowledge representation and domain context, especially in multidisciplinary settings such as Digital Humanities, climate-related applications, and biodiversity research.

Some of my contributions in this field include: