MultiMedia Modeling

MultiMedia Modeling

31st International Conference on Multimedia Modeling, MMM 2025, Nara, Japan, January 8-10, 2025, Proceedings, Part IV

Yanai, Keiji; Chu, Wei-Ta; Kompatsiaris, Ioannis; Yamasaki, Toshihiko; Xu, Changsheng; Riegler, Michael; Ide, Ichiro; Nitta, Naoko

Springer Nature Switzerland AG

02/2025

470

Mole

9789819620708

Pré-lançamento - envio 15 a 20 dias após a sua edição

Descrição não disponível.
Regular Papers.- SES-Net: Multi-dimensional Spot-Edge-Surface Network for Nuclei Segmentation.- Skin-Adapter: Fine-Grained Skin-Color Preservation for Text-to-Image Generation.- Small Tunes Transformer: Exploring Macro & Micro-Level Hierarchies for Skeleton-Conditioned Melody Generation.- SMG-Diff: Adversarial Attack Method Based on Semantic Mask-Guided Diffusion.- SPLGAN-TTS:Learning Semantic and Prosody to Enhance the Text-to-Speech Quality of Lightweight GAN Models.- SSCDUF: Spatial-Spectral Correlation Transformer Based on Deep Unfolding Framework for Hyperspectral Image Reconstruction.- SSDL: Sensor-to-Skeleton Diffusion Model with Lipschitz Regularization for Human Activity Recognition.- Structural Information-guided Fine-grained Texture Image Inpainting.- Style Separation and Content Recovery for Generalizable Sketch Re-identification and A New Benchmark.- Synchronization and Calibration of Video Sequences acquired using Multiple Plenoptic 2.0 Cameras.- Target-Oriented Dynamic Denosing Curriculum Learning for Multimodel Stance Detection.- TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration.- Temporal Closeness for Enhanced Cross-Modal Retrieval of Sensor and Image Data.- The Right to an Explanation under the GDPR and the AI Act.- Toward Appearance-based, Autonomous Landing Site Identification for Multirotor Drones in Unstructured Environments.- Towards Inclusive Education: Multimodal Classification of Textbook Images for Accessibility.- Towards Visual Storytelling by Understanding Narrative Context through Scene-Graphs.- TPS-YOLO: The Efficient Tiny Person Detection Network Based on Improved YOLOv8 and Model Pruning.- Uncertainty-guided Joint Semi-supervised Segmentation and Registration of Cardiac Images.- Understanding the Roles of Visual Modality in Multimodal Dialogue: An Empirical Study.- Vision-Language Pretraining for Variable-shot Image Classification.- Visual Anomaly Detection on Topological Connectivity under Improved YOLOv8.- Wavelet Integrated Convolutional Neural Network for ECG Signal Denoising.- WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition.- Zero-shot Sketch-based Image Retrieval with Hybrid Information Fusion and Sample Relationship Modeling.- Special Session: ExpertSUM: Special Session on Expert-Level Text Summarization from Fine-Grained Multimedia Analytics.- CalorieVoL: Integrating Volumetric Context into Multimodal Large Language Models for Image-based Calorie Estimation.- Can Masking Background and Object Reduce Static Bias for Zero-shot Action Recognition?.- Special Session: MLLMA: Special Session on Multimodal Large Language Models and Applications.- Enhanced Anomaly Detection in 3D Motion through Language-Inspired Occlusion-Aware Modeling.- Evaluating VQA Models' Consistency in the Scientific Domain.- Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models.- Quantifying Image-Adjective Associations by Leveraging Large-Scale Pretrained Models.- TACST: Time-Aware Transformer for Robust Speech Emotion Recognition.- TS-MEFM: A New Multimodal Speech Emotion Recognition Network Based on Speech and Text Fusion.
machine learning;image analysis;semantic information;computer programming;multimedia content analysis;multimedia mining;signal processing and communications;multimedia abstraction and summarization;security and content protection;multimedia applications;media content browsing and retrieval tools;multi-camera and multi-view;multimedia databases, content delivery and transport;audio, image, video processing, coding and compression;multimodal analysis for retrieval applications;multimedia fusion methods;semantic analysis of multimedia and contextual data;media representation and algorithms;multimedia content generation;multimedia analytics applications