Computer Vision - ECCV 2024

Computer Vision - ECCV 2024

18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part V

Ricci, Elisa; Roth, Stefan; Sattler, Torsten; Varol, Guel; Leonardis, Ales; Russakovsky, Olga

Springer International Publishing AG

10/2024

478

Mole

9783031726514

15 a 20 dias

Descrição não disponível.
SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark.- AttnZero: Efficient Attention Discovery for Vision Transformers.- Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search.- Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search.- UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation.- TimeCraft: Navigate Weakly-Supervised Temporal Grounded Video Question Answering via Bi-directional Reasoning.- Spectral Subsurface Scattering for Material Classification.- nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding.- Dynamic Neural Radiance Field From Defocused Monocular Video.- PiTe: Pixel-Temporal Alignment for Large Video-Language Model.- CarFormer: Self-Driving with Learned Object-Centric Representations.- FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models.- Plain-Det: A Plain Multi-Dataset Object Detector.- Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation.- Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation.- Synchronous Diffusion for Unsupervised Smooth Non-Rigid 3D Shape Matching.- Text-Guided Video Masked Autoencoder.- Diffusion Models for Open-Vocabulary Segmentation.- Textual-Visual Logic Challenge: Understanding and Reasoning in Text-to-Image Generation.- EvSign: Sign Language Recognition and Translation with Streaming Events.- QUAR-VLA: Vision-Language-Action Model for Quadruped Robots.- Zero-shot Object Counting with Good Exemplars.- TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering.- SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds.- PartSTAD: 2D-to-3D Part Segmentation Task Adaptation.- FutureDepth: Learning to Predict the Future Improves Video Depth Estimation.- LLM as Copilot for Coarse-grained Vision-and-Language Navigation.
artificial intelligence;computer networks;computer systems;computer vision;education;Human-Computer Interaction (HCI);image analysis;image coding;image processing;image reconstruction;image segmentation;learning;machine learning;object recognition;pattern recognition;reconstruction;signal processing;software engineering