MultiMedia Modeling

MultiMedia Modeling

28th International Conference, MMM 2022, Phu Quoc, Vietnam, June 6-10, 2022, Proceedings, Part I

Huynh Thi Thanh, Binh; Huet, Benoit; Gurrin, Cathal; Tran, Minh-Triet; ?or Jonsson, Bjoern; Hu, Anita Min-Chun; Dang-Nguyen, Duc-Tien

Springer Nature Switzerland AG

03/2022

641

Mole

Inglês

9783030983574

15 a 20 dias

1009

Descrição não disponível.
BEST PAPER SESSION.- Real-time detection of tiny objects based on a weighted bi-directional FPN.- Multi-Modal Fusion Network for Rumor Detection with Texts and Images.- PF-VTON: Toward High-Quality Parser-Free Virtual Try-On Network.- MF-GAN: Multi-conditional fusion Generative Adversarial Network for Text-to-Image Synthesis.- APPLICATIONS 1.- Learning to classify weather conditions from single images without labels.- Learning Image Representation via Attribute-aware Attention Networks for Fashion Classification.- Toward Detail-Oriented Image-Based Virtual Try-On with Arbitrary Poses.- Parallel DBSCAN-Martingale estimation of the number of concepts for automatic satellite image clustering.- MULTIMEDIA APPLICATIONS - PERSPECTIVES, TOOLS & APPLICATIONS (Special Session) & BRAVE NEW IDEAS.- AI for the Media Industry: Application Potential and Automation Level.- Color the Word: Leveraging Web Images for Machine Translation of Untranslatable Words.- ACTIVITIES & EVENTS.- MGMP: Multimodal Graph Message Propagation Network for Event Detection.- Pose-Enhanced Relation Feature for Action Recognition in Still Images.-Prostate Segmentation of Ultrasound Images based on Interpretable-guided Mathematical Model.- Spatiotemporal Perturbation Based Dynamic Consistency for Semi-Supervised Temporal Action Detection.- MULTIMEDIA DATASETS FOR REPEATABLE EXPERIMENTATION (Special Session).- A Task Category Space for User-Centric Comparative Multimedia Search Evaluations.- GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval.- LLQA - Lifelog Question Answering Dataset.- LEARNING.- Category-sensitive Incremental Learning For Image-based 3D Shape Reconstruction.- AdaConfigure: Reinforcement Learning-based Adaptive Configuration for Video Analytics Services.- Mining Minority-class Examples With Uncertainty Estimates.- Conditional Context-aware Feature Alignment for Domain Adaptive Detection Transformer.- MULTIMEDIA for MEDICAL APPLICATIONS (Special Session).- Human activity recognition with IMU and vital signs feature fusion.- On Identifying Pareidolia Phenomenon by Emulating Patient Behavior.- Using Explainable AI to Identify Differences between Clinical and Experimental Pain Detection Models Based on Facial Expressions.- APPLICATIONS 2.- Double Granularity Relation Network with Self-Criticism for Occluded Person Re-Identification.- A Complementary Fusion Strategy for RGB-D Face Recognition.- Multi-scale Cross-modal Transformer Network for RGB-D Object Detection.- Joint Re-Detection and Re-Identification for Multi-Object Tracking.- MULTIMEDIA ANALYTICS for CONTEXTUAL HUMAN UNDERSTANDING (Special Session).- An Investigation into Keystroke Dynamics and Heart Rate Variability as Indicators of Stress.- Fall detection using multimodal data.- Prediction of Blood Glucose using Contextual LifeLog Data.- Multimodal Embedding for Lifelog Retrieval.- APPLICATIONS 3.- A Multiple Positives Enhanced NCE Loss for Image-Text Retrieval.- SAM: Self Attention Mechanism for Scene Text Recognition based on Swin Transformer.- JVCSR: Video Compressive Sensing Reconstruction with Joint In-loop Reference Enhancement and Out-loop Super-resolution.- Point Cloud Upsampling via a Coarse-to-fine Network.- IMAGE ANALYTICS.- Arbitrary Style Transfer With Adaptive Channel Network.- Fast Single Image Dehazing Using Morphological Reconstruction and Saturation Compensation.- One-Stage Image Inpainting with Hybrid Attention.- Real-time FPGA Design for OMP Targeting 8K Image Reconstruction.- SPEECH & MUSIC.- Time-Frequency Attention For Speech Emotion Recognition With Squeeze-and-Excitation Blocks.- SPEECH INTELLIGIBILITY ENHANCEMENT BY NON-PARALLEL SPEECH STYLE CONVERSION USING CWT AND iMetricGAN BASED CycleGAN.- A-Muze-Net: Music Generation by Composing the Harmony based on the Generated Melody.- MULTIMODAL ANALYTICS.- Bi-attention modal separation network for multimodal video fusion.- Combining Knowledge and Multi-modal Fusion for Meme Classification.- Non-Uniform Attention Network for Multi-modal Sentiment Analysis.- Multimodal Unsupervised Image-to-Image Translation Without Independent Style Encoder.
artificial intelligence;computer vision;emerging trends;image analysis;image processing;image quality;image segmentation;interactive multimedia and interfaces;machine learning;media content browsing and retrieval tools;multimedia annotation, tagging and recommendation;multimedia authoring and personalisation;multimedia content analysis;multimedia databases, content delivery and transport;multimedia indexing;multimedia mining;multimedia security and content protection;pattern recognition;signal processing