MultiMedia Modeling

MultiMedia Modeling

31st International Conference on Multimedia Modeling, MMM 2025, Nara, Japan, January 8-10, 2025, Proceedings, Part V

Kompatsiaris, Ioannis; Yamasaki, Toshihiko; Yanai, Keiji; Xu, Changsheng; Chu, Wei-Ta; Riegler, Michael; Ide, Ichiro; Nitta, Naoko

Springer Nature Switzerland AG

01/2025

387

Mole

9789819620739

15 a 20 dias

Descrição não disponível.
Special Session on Multimedia Research in Robotics.- Multimodal Engagement Prediction in Human-Robot Interaction using Transformer Neural Networks.- What Should Autonomous Robots Verbalize and What Should They Not?.- Special Session: SpIMA: Special Session on Spatial Intelligence in Multimedia Analytics.- Counting Unique Objects in Geo-Tagged Street Images: A Case Study Of Homeless Encampments in Los Angeles.- Special Session on Simulating Edge Computing and Multimodal AI: A Benchmark for Real-World Applications.- Correlation-Based Weighted Federated Learning with Multimodal Sensing and Knowledge Distillation: An Application on a Real-World Benchmark Dataset.- Leveraging Pruning, Quantization and Multi-Objective Optimization for an Efficient Deployment of Multi-modal Models.- Demo Papers.- A User Identification and Reading Style Detection System Based on Eye Movement Patterns During Reading.- AMDA: Advancing Multimedia Data Annotation for Human-centric Situations.- An Implementation of Networked JamSketch.- Badminton Footwork Practice via an Immersive Virtual Reality System.- Better Image Segmentation with Classification: Guiding Zero-Shot Models Using Class Activation Maps.- CleverFox: Integrating Visual Mnemonics with AI for Enhanced Language Learning.- Enhancing User Control in AI-Based Video Summarization for Social Media.- FencBuddy: Action-aware Depth Perception Training for Fencing Attacks.- Fingering Prediction for Classical Guitar: Dataset Creation and Model Development.- KuzushijiFontDiff: Diffusion Model for Japanese Kuzushiji Font Generation.- Leveraging Latent Diffusion in 3D Gaussian Splatting for Novel View Synthesis.-Movie Retrieval Systems Using Genre-guided Multimodal Learning Techniques.- Multi-Dimensional Exploration of Media Collection Metadata.- Multimodal Interoperability with the CLAMS Platform.- Real-time Visualizer for Turntablist Performance.- RoboDJ: Live Commentary Robots System Driven by Physical- and Cyber-world Observations.-SceneTextStyler: Editing Text with Style Transformation.- SelectSum: Topic-Based Selective Summarization of Speech-Based Videos.- Smart Driving Assistance with Real-time Risk Assessment and Personalized Driving Coaching to Enhance Road Safety.- System Demo of Modeling Smart University Campus Virtual Environments.- Training a Segmentation-based Visual Anonymization Service for Street Scenes.- Transformer-Based Audio Generation Conditioned by 2D Latent Maps: A Demonstration.- Using Language Models to Generate and Forget the Narrative Memories of an Assistive Robot.- WaveFontStyler: Font Style Transfer Based on Sound.- Video Browser Showdown.- diveXplore at the Video Browser Showdown 2025.- Exquisitor at the Video Browser Showdown 2025: Unifying Conversational Search and User Relevance Feedback.- Feature-driven Video Segmentation and Advanced Querying with vitrivr-engine.- FUSIONISTA: Fusion of 3-D Information of Video in Retrieval System.- HORUS: Multimodal Large Language Models Framework for Video Retrieval at VBS 2025.- IMSearch 2.0: Toward User-centric and Efficient Interactive Multimedia Retrieval System.- Interactive Video Search with Multi-modal LLM Video Captioning.- MediaMix: Multimedia Retrieval in Mixed Reality.- NII-UIT at VBS2025: Multimodal Video Retrieval with LLM Integration and Dynamic Temporal Search.-PraK Tool V3: Enhancing Video Item Search Using Localized Text and Texture Queries.- Simplified Video Retrieval in Virtual Reality with vitrivr-VR.- SnapSeek 2.0 at Video Browser Showdown 2025.- VEAGLE: Eye Gaze-Assisted Guidance for Video Browser Showdown.- VERGE in VBS 2025.- VideoEase at VBS2025: An Interactive Video Retrieval System.-ViewsInsight2.0: Enhancing Video Retrieval for VBS 2025 with an Automatic Query Generator Powered by Large Language Models.- ViFi: A Video Finding System at Video Browser Showdown 2025.
machine learning;image analysis;semantic information;computer programming;multimedia content analysis;multimedia mining;signal processing and communications;multimedia abstraction and summarization;security and content protection;multimedia applications;media content browsing and retrieval tools;multi-camera and multi-view;multimedia databases, content delivery and transport;audio, image, video processing, coding and compression;multimodal analysis for retrieval applications;multimedia fusion methods;semantic analysis of multimedia and contextual data;media representation and algorithms;multimedia content generation;multimedia analytics applications