Giter Site home page Giter Site logo

huangyangyi / iccv-2023-papers Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dmitryryumin/iccv-2023-papers

1.0 1.0 0.0 10.56 MB

ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!

License: MIT License

iccv-2023-papers's Introduction

ICCV-2023-Papers

Awesome Version GitHub repo size License: MIT Contributions welcome GitHub contributors GitHub commit activity (branch) GitHub closed issues GitHub issues GitHub closed pull requests GitHub pull requests GitHub last commit GitHub watchers GitHub forks GitHub Repo stars Visitors

Completed: Progress


ICCV 2023 Papers: Explore a comprehensive collection of cutting-edge research papers presented at ICCV 2023, the premier computer vision conference. Keep up to date with the latest advances in computer vision and deep learning. Code implementations included. ⭐ the repository for the development of visual intelligence!

ICCV 2023


The online version of the ICCV 2023 Conference Programme, comprises a list of all accepted full papers, their presentation order, as well as the designated presentation times.


Other collections of the best AI conferences

❗ Conference table will be up to date all the time.

Conference Year
Computer Vision (CV)
CVPR 2023
Speech (SP)
ICASSP 2023
INTERSPEECH 2023

Contributors



Contributions to improve the completeness of this list are greatly appreciated. If you come across any overlooked papers, please feel free to create pull requests, open issues or contact me via email. Your participation is crucial to making this repository even better.


❗ Final paper links will be added post-conference.

List of sections

3D from Multi-View and Sensors

Title Repo Paper Video
Multi-Modal Neural Radiance Field for Monocular Dense SLAM with a Light-Weight ToF Sensor arXiv
ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes GitHub Page arXiv YouTube
Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach
Doppelgangers: Learning to Disambiguate Images of Similar Structures GitHub Page
GitHub
EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries GitHub arXiv
ClothPose: A Real-world Benchmark for Visual Analysis of Garment Pose via an Indirect Recording Solution
EMR-MSF: Self-Supervised Recurrent Monocular Scene Flow Exploiting Ego-Motion Rigidity
ENVIDR: Implicit Differentiable Renderer with Neural Environment Lighting GitHub Page
GitHub
arXiv Google Drive
Learning a more Continuous Zero Level Set in Unsigned Distance Fields through Level Set Projection GitHub arXiv
GNT-MOVE: Generalizable NeRF Transformer with Mixture-of-View-Experts GitHub arXiv
MatrixCity: A Large-Scale City Dataset for City-Scale Neural Rendering and Beyond GitHub Page Pdf
R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras WEB Page
GitHub
arXiv YouTube
ClimateNeRF: Extreme Weather Synthesis in Neural Radiance Field GitHub Page arXiv
Rendering Humans from Object-Occluded Monocular Videos WEB Page
GitHub
arXiv YouTube
AssetField: Assets Mining and Reconfiguration in Ground Feature Plane Representation GitHub Page arXiv
PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images GitHub arXiv
MIMO-NeRF: Fast Neural Rendering with Multi-Input Multi-Output Neural Radiance Fields
Adaptive Positional Encoding for Bundle-Adjusting Neural Radiance Fields
NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-View Reconstruction WEB Page
GitHub
arXiv
Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition GitHub arXiv
Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching arXiv
Compatibility of Fundamental Matrices for Complete Viewing Graphs arXiv
ProtoTransfer: Cross-Modal Prototype Transfer for Point Cloud Segmentation
SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-View 3D Object Detection GitHub arXiv
GraphAlign: Enhancing Accurate Feature Alignment by Graph matching for Multi-Modal 3D Object Detection
Tangent Sampson Error: Fast Approximate Two-View Reprojection Error for Central Camera Models
Using a Waffle Iron for Automotive Point Cloud Semantic Segmentation GitHub arXiv
Fast Globally Optimal Surface Normal Estimation from an Affine Correspondence
HeadsUp: A Data-Driven Volumetric Prior for Few-Shot Synthesis of Ultra High-Resolution Human Heads
TILTED: Robust Neural Fields via Latent Registration
Center-based Decoupled Point-Cloud Registration for 6D Object Pose Estimation GitHub
Deep Geometry-Aware Camera Self-Calibration from Video
V-FUSE: Volumetric Depth Map Fusion with Long-Range Constraints GitHub arXiv
Consistent Depth Prediction for Transparent Object Reconstruction from RGB-D Camera
FaceCLIPNeRF: Text-Driven 3D Face Manipulation using Deformable Neural Radiance Fields GitHub Page arXiv
HollowNeRF: Pruning Hashgrid-based NeRFs with Trainable Collision Mitigation arXiv
ICE-NeRF: Interactive Color Editing of NeRFs via Decomposition-Aware Weight Optimization
FULLER: Unified Multi-Modality Multi-Task 3D Perception via Multi-Level Gradient Calibration arXiv
Neural Fields for Structured Lighting
CO-Net: Learning Multiple Point Cloud Tasks at Once with a Cohesive Network
Pose-Free Neural Radiance Fields via Implicit Pose Regularization arXiv
TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering GitHub Page
GitHub
arXiv
S-VolSDF: Sparse Multi-View Stereo Regularization of Neural Implicit Surfaces GitHub Page arXiv YouTube
DPS-Net: Deep Polarimetric Stereo Depth Estimation GitHub
3DPPE: 3D Point Positional Encoding for Transformer-based Multi-Camera 3D Object Detection GitHub arXiv
Deformable Neural Radiance Fields using RGB and Event Cameras
Inter-Reflectable Light Fields for Geometry and Material Estimation GitHub Page
GitHub
arXiv
Hierarchical Prior Mining for Non-Local Multi-View Stereo GitHub arXiv
Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection GitHub arXiv
Re-ReND: Real-Time Rendering of NeRFs Across Devices GitHub arXiv
Learning Shape Primitives via Implicit Convexity Regularization GitHub
Geometry-Guided Feature Learning and Fusion for Indoor Scene Reconstruction
LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and Semantic-Aware Alignment GitHub arXiv
PivotNet: End-to-End Learning for Vectorized HD Map Construction arXiv
Sat2Density: Faithful Density Learning from Satellite-Ground Image Pairs GitHub Page
GitHub
arXiv YouTube
Mask-Attention-Free Transformer for 3D Instance Segmentation GitHub arXiv
Scene-Aware Feature Matching arXiv
Revisiting Domain-Adaptive 3D Object Detection by Reliable, Diverse and Class-Balanced Pseudo-Labeling GitHub arXiv
GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction GitHub Page
GitHub
arXiv YouTube
BANSAC: A Dynamic BAyesian Network for SAmple Consensus GitHub Page
Theoretical and Numerical Analysis of 3D Reconstruction using Point and Line Incidences arXiv
RealGraph: A Multiview Dataset for 4D Real-World Context Graph Generation GitHub Pdf
CL-MVSNet: Unsupervised Multi-View Stereo with Dual-Level Contrastive Learning GitHub Pdf
Temporal Enhanced Training of Multi-View 3D Object Detector via Historical Object Prediction GitHub arXiv
Object as Query: Lifting any 2D Object Detector to 3D Detection GitHub arXiv
PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection arXiv
Not Every Side is Equal: Localization Uncertainty Estimation for Semi-Supervised 3D Object Detection

Adversarial Attack and Defense

Title Repo Paper Video
Robust Mixture-of-Expert Training for Convolutional Neural Networks GitHub arXiv
Set-Level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-Training Models GitHub arXiv
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning GitHub arXiv
CGBA: Curvature-Aware Geometric Black-Box Attack GitHub arXiv
Robust Evaluation of Diffusion-based Adversarial Purification arXiv
Advancing Example Exploitation can Alleviate Critical Challenges in Adversarial Training
The Victim and the Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data
TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models GitHub arXiv
SAGA: Spectral Adversarial Geometric Attack on 3D Meshes GitHub Page
GitHub
arXiv
Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial Examples GitHub arXiv
ACTIVE: Towards Highly Transferable 3D Physical Camouflage for Universal and Robust Vehicle Evasion GitHub Page arXiv YouTube
Frequency-Aware GAN for Adversarial Manipulation Generation
Breaking Temporal Consistency: Generating Video Universal Adversarial Perturbations using Image Models
Tracing the Origin of Adversarial Attack for Forensic Investigation and Deterrence arXiv
Downstream-Agnostic Adversarial Examples GitHub arXiv
Hiding Visual Information via Obfuscating Adversarial Perturbations GitHub arXiv
An Embarrassingly Simple Self-Supervised Trojan Attack
Efficient Decision-based Black-Box Patch Attacks on Video Recognition arXiv
Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff arXiv
Towards Building more Robust Models with Frequency Bias GitHub arXiv
Does Physical Adversarial Example Really Matter to Autonomous Driving? Towards System-Level Effect of Adversarial Object Evasion Attack WEB Page arXiv
Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning GitHub arXiv
Enhancing Generalization of Universal Adversarial Perturbation through Gradient Aggregation GitHub arXiv
Unified Adversarial Patch for Cross-Modal Attacks in the Physical World arXiv
RFLA: A Stealthy Reflected Light Adversarial Attack in the Physical World GitHub arXiv
Enhancing Fine-Tuning based Backdoor Defense with Sharpness-Aware Minimization arXiv
Conditional 360-Degree Image Synthesis for Immersive Indoor Scene Decoration GitHub arXiv
An Adaptive Model Ensemble Adversarial Attack for Boosting Adversarial Transferability GitHub arXiv
Mitigating Adversarial Vulnerability through Causal Parameter Estimation by Adversarial Double Machine Learning GitHub arXiv
LEA2: A Lightweight Ensemble Adversarial Attack via Non-Overlapping Vulnerable Frequency Regions
Explaining Adversarial Robustness of Neural Networks from Clustering Effect Perspective
VertexSerum: Poisoning Graph Neural Networks for Link Inference arXiv
How to Choose Your Best Allies for a Transferable Attack? GitHub arXiv
Enhancing Adversarial Robustness in Low-Label Regime via Adaptively Weighted Regularization and Knowledge Distillation GitHub arXiv
AdvDiffuser: Natural Adversarial Example Synthesis with Diffusion Models
FnF Attack Adversarial Attack against Multiple Object Trackers by Inducing False Negatives and False Positives
Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis GitHub arXiv
Hard No-Box Adversarial Attack on Skeleton-based Human Action Recognition with Skeleton-Motion-Informed Gradient GitHub arXiv YouTube
Structure Invariant Transformation for Better Adversarial Transferability GitHub
Beating Backdoor Attack at its Own Game GitHub arXiv
Transferable Adversarial Attack for Both Vision Transformers and Convolutional Networks via Momentum Integrated Gradients
REAP: A Large-Scale Realistic Adversarial Patch Benchmark GitHub arXiv
Multi-Metrics Adaptively Identifies Backdoors in Federated Learning GitHub arXiv
Backpropagation Path Search on Adversarial Transferability arXiv
Fast Adaptation of Neural Networks using Test-Time Feedback
One-Bit Flip is All You Need: When Bit-Flip Attack Meets Model Training GitHub arXiv
PolicyCleanse: Backdoor Detection and Mitigation for Competitive Reinforcement Learning arXiv
Towards Viewpoint-Invariant Visual Recognition via Adversarial Training arXiv
Fast Adversarial Training with Smooth Convergence GitHub arXiv
The Perils of Learning from Unlabeled Data: Backdoor Attacks on Semi-Supervised Learning arXiv
Boosting Adversarial Transferability via Gradient Relevance Attack
Towards Robust Model Watermark via Reducing Parametric Vulnerability GitHub arXiv
TRM-UAP: Enhancing the Transferability of Data-Free Universal Adversarial Perturbation via Truncated Ratio Maximization

Vision and Robotics

Title Repo Paper Video
Simoun: Synergizing Interactive Motion-Appearance Understanding for Vision-based Reinforcement Learning
Among Us: Adversarially Robust Collaborative Perception by Consensus GitHub arXiv
Walking Your LiDOG: A Journey Through Multiple Domains for LiDAR Semantic Segmentation GitHub Page
GitHub
arXiv
Stabilizing Visual Reinforcement Learning via Asymmetric Interactive Cooperation
MAAL: Multimodality-Aware Autoencoder-based Affordance Learning for 3D Articulated Objects
Rethinking Range View Representation for LiDAR Segmentation arXiv
PourIt!: Weakly-Supervised Liquid Perception from a Single Image for Visual Closed-Loop Robotic Pouring GitHub Page
GitHub
arXiv YouTube
CROSSFIRE: Camera Relocalization On Self-Supervised Features from an Implicit Representation arXiv
Environment Agnostic Representation for Visual Reinforcement Learning
Test-Time Personalizable Forecasting of 3D Human Poses
HM-ViT: Hetero-Modal Vehicle-to-Vehicle Cooperative Perception with Vision Transformer arXiv

Vision and Graphics

Title Repo Paper Video
Efficient Neural Supersampling on a Novel Gaming Dataset arXiv
Locally Stylized Neural Radiance Fields
NEMTO: Neural Environment Matting for Novel View and Relighting Synthesis of Transparent Objects arXiv
DDColor: Towards Photo-Realistic and Semantic-Aware Image Colorization via Dual Decoders GitHub
ModelScope
arXiv
IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis GitHub Page
GitHub
arXiv
PARIS: Part-Level Reconstruction and Motion Analysis for Articulated Objects GitHub Page
GitHub
arXiv YouTube
ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model GitHub Page
GitHub
arXiv YouTube
DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion GitHub Page
GitHub
Hugging Face
arXiv
Dynamic Mesh-Aware Radiance Fields GitHub Page
GitHub
Pdf
Neural Reconstruction of Relightable Human Model from Monocular Video
Neural Microfacet Fields for Inverse Rendering GitHub Page
GitHub
arXiv
A Theory of Topological Derivatives for Inverse Rendering of Geometry GitHub Page arXiv
Vox-E: Text-Guided Voxel Editing of 3D Objects GitHub Page
GitHub
arXiv
StegaNeRF: Embedding Invisible Information within Neural Radiance Fields GitHub Page
GitHub
arXiv
GlobalMapper: Arbitrary-Shaped Urban Layout Generation arXiv
Urban Radiance Field Representation with Deformable Neural Mesh Primitives GitHub Page
GitHub
arXiv YouTube
End2End Multi-View Feature Matching with Differentiable Pose Optimization GitHub Page arXiv YouTube
Tree-Structured Shading Decomposition GitHub Page
GitHub
Pdf YouTube
Lens Parameter Estimation for Realistic Depth of Field Synthesis GitHub Page
AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism
Cross-Modal Latent Space Alignment for Image to Avatar Translation
Computationally Efficient Neural Image Compression with Shallow Decoders GitHub arXiv

Segmentation, Grouping and Shape Analysis

Title Repo Paper Video
Enhancing Spatial and Semantic Supervision for Hybrid-based 3D Instance Segmentation
Learning Neural Eigenfunctions for Unsupervised Semantic Segmentation GitHub arXiv
Divide and Conquer: 3D Point Cloud Instance Segmentation with Point-Wise Binarization GitHub arXiv
Point2Mask: Point-Supervised Panoptic Segmentation via Optimal Transport GitHub arXiv
Handwritten and Printed Text Segmentation: A Signature Case Study SignaTR6K arXiv
Semantic-Aware Template Learning via Part Deformation Consistency arXiv
LeaF: Learning Frames for 4D Point Cloud Sequence Understanding
MARS: Model-Agnostic Biased Object Removal without Additional Supervision for Weakly-Supervised Semantic Segmentation GitHub arXiv
USAGE: A Unified Seed Area Generation Paradigm for Weakly Supervised Semantic Segmentation arXiv
Production-Level Video Segmentation from Few Annotated Frames GitHub Page
GitHub
arXiv YouTube
ΣIGMA: Scale-Invariant Global Sparse Shape Matching arXiv
Self-Calibrated Cross Attention Network for Few-Shot Segmentation GitHub arXiv
Multi-Granularity Interaction Simulation for Unsupervised Interactive Segmentation arXiv
Texture Learning Domain Randomization for Domain Generalized Segmentation GitHub arXiv
Unsupervised Video Object Segmentation with Online Adversarial Self-Tuning
Exploring Open-Vocabulary Semantic Segmentation without Human Labels arXiv
RbA: Segmenting Unknown Regions Rejected by All GitHub Page
GitHub
arXiv
SEMPART: Self-Supervised Multi-Resolution Partitioning of Image Semantics
Multi-Object Discovery by Low-Dimensional Object Motion GitHub Page
GitHub
arXiv
MemorySeg: Online LiDAR Semantic Segmentation with a Latent Memory
Treating Pseudo-Labels Generation as Image Matting for Weakly Supervised Semantic Segmentation
BoxSnake: Polygonal Instance Segmentation with Box Supervision GitHub arXiv
Dynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation arXiv
Instance Neural Radiance Field GitHub arXiv YouTube
Global Knowledge Calibration for Fast Open-Vocabulary Segmentation arXiv
Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation arXiv
Boosting Semantic Segmentation from an Explicit Class Embedding's Perspective gitee arXiv
The Making and Breaking of Camouflage
CoinSeg: Contrast Inter- and Intra- Class Representations for Incremental Segmentation
Few-Shot Physically-Aware Articulated Mesh Generation via Hierarchical Deformation GitHub Page
GitHub
arXiv YouTube
HAL3D: Hierarchical Active Learning for Fine-Grained 3D Part Labeling arXiv
FreeCOS: Self-Supervised Learning from Fractals and Unlabeled Images for Curvilinear Object Segmentation GitHub arXiv
MasQCLIP for Open-Vocabulary Universal Image Segmentation
CTVIS: Consistent Training for Online Video Instance Segmentation GitHub arXiv
A Simple Framework for Panoptic Segmentation
Spectrum-Guided Multi-Granularity Referring Video Object Segmentation GitHub arXiv
Space Engage: Collaborative Space Supervision for Contrastive-based Semi-Supervised Semantic Segmentation GitHub arXiv
Adaptive Superpixel for Active Learning in Semantic Segmentation arXiv
Multimodal Variational Auto-Encoder based Audio-Visual Segmentation
Isomer: Isomerous Transformer for Zero-Shot Video Object Segmentation GitHub arXiv
2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision GitHub Page Pdf
Foreground-Background Separation through Concept Distillation from Generative Image Foundation Models GitHub arXiv
SegPrompt: Boosting Open-World Segmentation via Category-Level Prompt Learning GitHub arXiv
Monte Carlo Linear Clustering with Single-Point Supervision is Enough for Infrared Small Target Detection GitHub Page
GitHub
arXiv
A Simple Framework for Open-Vocabulary Segmentation and Detection GitHub arXiv YouTube
Source-Free Depth for Object Pop-Out GitHub arXiv
DynaMITe: Dynamic Query Bootstrapping for Multi-Object Interactive Segmentation Transformer GitHub Page
GitHub
arXiv
Atmospheric Transmission and Thermal Inertia Induced Blind Road Segmentation with a Large-Scale Dataset TBRSD GitHub
Informative Data Mining for One-Shot Cross-Domain Semantic Segmentation
Homography Guided Temporal Fusion for Road Line and Marking Segmentation GitHub
Zero-Shot Semantic Segmentation with Decoupled One-Shot Network
TCOVIS: Temporally Consistent Online Video Instance Segmentation GitHub
FPR: False Positive Rectification for Weakly Supervised Semantic Segmentation GitHub Pdf
Stochastic Segmentation with Conditional Categorical Diffusion Models GitHub arXiv
SegGPT: Segmenting Everything in Context GitHub Page
GitHub
Hugging Face
arXiv YouTube
Open-Vocabulary Panoptic Segmentation with Embedding Modulation arXiv
Residual Pattern Learning for Pixel-Wise Out-of-Distribution Detection in Semantic Segmentation GitHub arXiv
Zero-Guidance Segmentation using Zero Segment Labels GitHub Page arXiv
Model Calibration in Dense Classification with Adaptive Label Perturbation GitHub arXiv
Enhanced Soft Label for Semi-Supervised Semantic Segmentation
MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation arXiv
DiffuMask: Synthesizing Images with Pixel-Level Annotations for Semantic Segmentation using Diffusion Models GitHub Page
GitHub
arXiv
Alignment Before Aggregation: Trajectory Memory Retrieval Network for Video Object Segmentation
Semi-Supervised Semantic Segmentation under Label Noise via Diverse Learning Groups
SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets GitHub arXiv YouTube
Class-Incremental Continual Learning for Instance Segmentation with Image-Level Weak Supervision
Coarse-to-Fine Amodal Segmentation with Shape Prior GitHub Page
GitHub
arXiv
Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-Centric Representation GitHub
DVIS: Decoupled Video Instance Segmentation Framework GitHub arXiv
3D Segmentation of Humans in Point Clouds with Synthetic Data GitHub Page arXiv
WaterMask: Instance Segmentation for Underwater Imagery
Decoupled or End-to-End Trained Video Segmentation if Target Data is Scarce?

Recognition: Categorization

Title Repo Paper Video
Cross Contrasting Feature Perturbation for Domain Generalization GitHub arXiv
Flexible Visual Recognition by Evidential Modeling of Confusion and Ignorance arXiv
CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification arXiv
RankMixup: Ranking-based Mixup Training for Network Calibration WEB Page arXiv
Label-Noise Learning with Intrinsically Long-Tailed Data GitHub arXiv
Parallel Attention Interaction Network for Few-Shot Skeleton-based Action Recognition GitHub
Rethinking Mobile Block for Efficient Attention-based Models GitHub arXiv
Read-Only Prompt Optimization for Vision-Language Few-Shot Learning GitHub arXiv
Understanding Self-Attention Mechanism via Dynamical System Perspective arXiv
Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial Labels arXiv
What do Neural Networks Learn in Image Classification? A Frequency Shortcut Perspective GitHub arXiv
Inducing Neural Collapse to a Fixed Hierarchy-Aware Frame for Reducing Mistake Severity GitHub arXiv
Unified Out-of-Distribution Detection: A Model-Specific Perspective arXiv
A Unified Framework for Robustness on Diverse Sampling Errors
Scene-Aware Label Graph Learning for Multi-Label Image Classification
Holistic Label Correction for Noisy Multi-Label Classification
Strip-MLP: Efficient Token Interaction for Vision MLP GitHub arXiv
EQ-Net: Elastic Quantization Neural Networks GitHub arXiv
Data-Free Knowledge Distillation for Fine-Grained Vision Categorization
Shift from Texture-Bias to Shape-Bias: edge Deformation-based Augmentation for Robust Object Recognition GitHub
Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for Occluded Facial Expression Recognition GitHub arXiv
DR-Tune: Improving Fine-Tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration GitHub arXiv
Understanding the Feature Norm for Out-of-Distribution Detection
Multi-View Active Fine-Grained Visual Recognition GitHub arXiv
DiffGuard: Semantic Mismatch-Guided Out-of-Distribution Detection using Pre-Trained Diffusion Models GitHub arXiv
Task-Aware Adaptive Learning for Cross-Domain Few-Shot Learning
Improving Adversarial Robustness of Masked Autoencoders via Test-Time Frequency-Domain Prompting GitHub arXiv
Saliency Regularization for Self-Training with Partial Annotations
Learning Gabor Texture Features for Fine-Grained Recognition arXiv
UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding GitHub arXiv
RankMatch: Fostering Confidence and Consistency in Learning with Noisy Labels
MetaGCD: Learning to Continually Learn in Generalized Category Discovery arXiv
FerKD: Surgical Label Adaptation for Efficient Distillation
Point-Query Quadtree for Crowd Counting, Localization, and more GitHub arXiv
Nearest Neighbor Guidance for Out-of-Distribution Detection
Bayesian Optimization Meets Self-Distillation GitHub arXiv
When Prompt-based Incremental Learning does not Meet Strong Pretraining GitHub arXiv
When to Learn what: Model-Adaptive Data Augmentation Curriculum arXiv
Parametric Information Maximization for Generalized Category Discovery GitHub arXiv
Boosting Few-Shot Action Recognition with Graph-Guided Hybrid Matching GitHub arXiv
Domain Generalization via Rationale Invariance GitHub arXiv
Masked Spiking Transformer GitHub arXiv
Prototype Reminiscence and Augmented Asymmetric Knowledge Aggregation for Non-Exemplar Class-Incremental Learning
Distilled Reverse Attention Network for Open-World Compositional Zero-Shot Learning arXiv
Candidate-Aware Selective Disambiguation based on Normalized Entropy for Instance-Dependent Partial-Label Learning
CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No GitHub arXiv
Self-Similarity Driven Scale-Invariant Learning for Weakly Supervised Person Search arXiv
Sample-Wise Label Confidence Incorporation for Learning with Noisy Labels
Combating Noisy Labels with Sample Selection by Mining High-Discrepancy Examples
Spatial-Aware Token for Weakly Supervised Object Localization GitHub arXiv

Explainable AI for CV

Title Repo Paper Video
Towards Improved Input Masking for Convolutional Neural Networks GitHub arXiv
PDiscoNet: Semantically Consistent Part Discovery for Fine-Grained Recognition GitHub HAL Science
Corrupting Neuron Explanations of Deep Visual Features
ICICLE: Interpretable Class Incremental Continual Learning GitHub arXiv
ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models WEB Page
GitHub
arXiv
Out-of-Distribution Detection for Monocular Depth Estimation GitHub arXiv
Using Explanations to Guide Models arXiv
Rosetta Neurons: Mining the Common Units in a Model Zoo GitHub Page
GitHub
arXiv
Prototype-based Dataset Comparison GitHub Page
GitHub
arXiv
Learning to Identify Critical States for Reinforcement Learning from Videos GitHub arXiv
Leaping Into Memories: Space-Time Deep Feature Synthesis GitHub Page
GitHub
arXiv
MAGI: Multi-Annotated Explanation-Guided Learning
SAFARI: Versatile and Efficient Evaluations for Robustness of Interpretability GitHub arXiv
Do BLIP and Stable Diffusion Understand Each Other? GitHub Page arXiv
Evaluation and Improvement of Interpretability for Self-Explainable Part-Prototype Networks GitHub arXiv
MoreauGrad: Sparse and Robust Interpretation of Neural Networks via Moreau Envelope GitHub arXiv
Towards Understanding the Generalization of Deepfake Detectors from a Game-Theoretical View
Counterfactual-based Saliency Map: Towards Visual Contrastive Explanations for Neural Networks
Beyond Single Path Integrated Gradients for Reliable Input Attribution via Randomized Path Sampling
Learning Support and Trivial Prototypes for Interpretable Image Classification arXiv
Visual Explanations via Iterated Integrated Gradients

Neural Generative Models

Title Repo Paper Video
Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models GitHub Page
GitHub
arXiv
Better Aligning Text-to-Image Models with Human Preference GitHub Page
GitHub
arXiv
DLT: Conditioned Layout Generation with Joint Discrete-Continuous Diffusion Layout Transformer GitHub Page
GitHub
arXiv
Anti-DreamBooth: Protecting users from Personalized Text-to-Image Synthesis GitHub Page
GitHub
arXiv
GECCO: Geometrically-Conditioned Point Diffusion Models GitHub Page arXiv
DiffDreamer: Towards Consistent Unsupervised Single-View Scene Extrapolation with Conditional Diffusion Models GitHub Page
GitHub
arXiv YouTube
Controllable Human Motion Synthesis via Guided Diffusion Models GitHub Page
GitHub
arXiv YouTube
COOP: Decoupling and Coupling of Whole-Body Grasping Pose Generation
Zero-Shot Spatial Layout Conditioning for Text-to-Image Diffusion Models arXiv
StyleDomain: Efficient and Lightweight Parameterizations of StyleGAN for One-Shot and Few-Shot Domain Adaptation arXiv
GRAM-HD: 3D-Consistent Image Generation at High Resolution with Generative Radiance Manifolds GitHub Page arXiv YouTube
Your Diffusion Model is Secretly a Zero-Shot Classifier GitHub Page
GitHub
arXiv
Learning Hierarchical Features with Joint Latent Space Energy-based Prior
ActFormer: A GAN-based Transformer towards General Action-Conditioned 3D Human Motion Generation arXiv
Landscape Learning for Neural Network Inversion arXiv
Diffusion in Style
Diffusion-SDF: Conditional Generative Modeling of Signed Distance Functions WEB Page
GitHub
arXiv
GETAvatar: Generative Textured Meshes for Animatable Human Avatars
A-STAR: Test-Time Attention Segregation and Retention for Text-to-Image Synthesis arXiv
TF-ICON: Diffusion-based Training-Free Cross-Domain Image Composition GitHub Page
GitHub
arXiv
Breaking The Limits of Text-Conditioned 3D Motion Synthesis with Elaborative Descriptions
BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction GitHub Page
GitHub
arXiv
Delta Denoising Score GitHub Page arXiv
Mimic3D: Thriving 3D-Aware GANs via 3D-to-2D Imitation GitHub Page
GitHub
arXiv
DreamBooth3D: Subject-Driven Text-to-3D Generation GitHub Page arXiv YouTube
Feature Proliferation the Cancer in StyleGAN and its Treatments
Unsupervised Facial Performance Editing via Vector-Quantized StyleGAN Representations
3D-Aware Image Generation using 2D Diffusion Models GitHub Page
GitHub
arXiv
Neural Collage Transfer: Artistic Reconstruction via Material Manipulation
Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption GitHub arXiv
Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction GitHub Page
GitHub
arXiv
Erasing Concepts from Diffusion Models WEB Page
GitHub
arXiv
Make Encoder Great Again in 3D GAN Inversion through Geometry and Occlusion-Aware Encoding GitHub Page
GitHub
arXiv YouTube
HairNeRF: Geometry-Aware Hair Swapped Image Synthesis

Vision and Language

Title Repo Paper Video
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-Training arXiv
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model GitHub arXiv
Explore and Tell: Embodied Visual Captioning in 3D Environments GitHub Page
GitHub
arXiv
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability GitHub arXiv
Learning Trajectory-Word Alignments for Video-Language Tasks arXiv
Variational Causal Inference Network for Explanatory Visual Question Answering
TextManiA: Enriching Visual Feature by Text-Driven Manifold Augmentation GitHub Page
GitHub
arXiv
UniRef: A Unified Model for Reference-based Object Segmentation Tasks
Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models arXiv
Misalign, Contrast then Distill: Rethinking Misalignments in Language-Image Pre-Training
Toward Multi-Granularity Decision-Making: Explicit Visual Reasoning with Hierarchical Knowledge
VL-Match: Enhancing Vision-Language Pretraining with Token-Level and Instance-Level Matching
Moment Detection in Long Tutorial Videos GitHub
Not All Features Matter: Enhancing Few-Shot CLIP with Adaptive Prior Refinement GitHub arXiv
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images GitHub Page arXiv
Advancing Referring Expression Segmentation Beyond Single Image GitHub arXiv
CLIPoint: Adapting CLIP for Powerful 3D Open-World Learning
Unsupervised Prompt Tuning for Text-Driven Object Detection
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding arXiv
I can't Believe there's no Images! Learning Visual Tasks using Only Language Data WEB Page
GitHub
arXiv
Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples GitHub arXiv
MeViS: A Large-Scale Benchmark for Video Segmentation with Motion Expressions GitHub Page
GitHub
arXiv
Diverse Data Augmentation with Diffusions for Effective Test-Time Prompt Tuning GitHub arXiv
ShapeScaffolder: Structure-Aware 3D Shape Generation from Text Pdf
SuS-X: Training-Free Name-Only Transfer of Vision-Language Models GitHub Page
GitHub
arXiv
BEVBert: Multimodal Map Pre-Training for Language-Guided Navigation GitHub arXiv
X-Mesh: Towards Fast and Accurate Text-Driven 3D Stylization via Dynamic Textual Guidance GitHub Page
GitHub
arXiv
OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation GitHub arXiv
Attentive Mask CLIP arXiv
Knowledge Proxy Intervention for Deconfounded Video Question Answering
UniVTG: Towards Unified Video-Language Temporal Grounding GitHub arXiv
Self-Supervised Cross-View Representation Reconstruction for Change Captioning GitHub
Unified Coarse-to-Fine Alignment for Video-Text Retrieval GitHub arXiv
Confidence-Aware Pseudo-Label Learning for Weakly Supervised Visual Grounding
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge GitHub Page
GitHub
arXiv
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation arXiv
Transferring Visual Knowledge with Pre-Trained Models for Multimodal Machine Translation GitHub Page
GitHub
arXiv
Learning Human-Human Interactions in Images from Weak Textual Supervision GitHub Page
GitHub
arXiv
BUS: Efficient and Effective Vision-Language Pretraining with Bottom-Up Patch Summarization arXiv
3D-VisTA: Pre-Trained Transformer for 3D Vision and Text Alignment GitHub Page
GitHub
arXiv YouTube
ALIP: Adaptive Language-Image Pre-Training with Synthetic Caption GitHub arXiv
LoGoPrompt: Synthetic Text Images can be Good Visual Prompts for Vision-Language Models GitHub Page arXiv
Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning GitHub arXiv
Decouple Before Interact: Multi-Modal Prompt Learning for Continual Visual Question Answering
Prompt-Guided Image Captioning for VQA with GPT-3 GitHub Page
GitHub
arXiv
Grounded Image Text Matching with Mismatched Relation Reasoning arXiv
GePSAn: Generative Procedure Step Anticipation in Cooking Videos
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models GitHub Page
GitHub
arXiv
VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control GitHub arXiv
With a Little Help from Your own Past: Prototypical Memory Networks for Image Captioning GitHub arXiv
Improving Zero-Shot Generalization for CLIP with Synthesized Prompts GitHub arXiv
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models GitHub arXiv
Learning Navigational Visual Representations with Semantic Map Supervision GitHub arXiv
CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection GitHub Page arXiv
Open Set Video HOI detection from Action-Centric Chain-of-Look Prompting
Learning Concise and Descriptive Attributes for Visual Recognition arXiv
Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models GitHub arXiv
Encyclopedic VQA: Visual Questions About Detailed Properties of Fine-Grained Categories GitHub Page arXiv
Story Visualization by Online Text Augmentation with Context Memory GitHub Page
GitHub
arXiv
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning GitHub arXiv
Too Large; Data Reduction for Vision-Language Pre-Training GitHub arXiv
ViLTA: Enhancing Vision-Language Pre-Training through Textual Augmentation arXiv
Zero-Shot Composed Image Retrieval with Textual Inversion WEB Page
GitHub
arXiv

Vision, Graphics, and Robotics

Title Repo Paper Video
Adding Conditional Control to Text-to-Image Diffusion Models GitHub arXiv
Factorized Inverse Path Tracing for Efficient and Accurate Material-Lighting Estimation GitHub Page
GitHub
arXiv
Manipulate by Seeing: Creating Manipulation Controllers from Pre-Trained Representations GitHub Page
GitHub
arXiv
3D Implicit Transporter for Temporally Consistent Keypoint Discovery GitHub ResearchGate
Chordal Averaging on Flag Manifolds and its Applications GitHub arXiv
UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-Aware Curriculum and Iterative Generalist-Specialist Learning arXiv
GameFormer: Game-Theoretic Modeling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving GitHub Page
GitHub
arXiv
PPR: Physically Plausible Reconstruction from Monocular Videos GitHub Page
GitHub
Pdf

Privacy, Security, Fairness, and Explainability

Title Repo Paper Video
Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction GitHub Page
GitHub
arXiv
ACLS: Adaptive and Conditional Label Smoothing for Network Calibration GitHub Page arXiv
PGFed: Personalize Each Client's Global Objective for Federated Learning GitHub arXiv
Overwriting Pretrained Bias with Finetuning Data GitHub arXiv
ResearchGate
ITI-GEN: Inclusive Text-to-Image Generation GitHub Page
GitHub
arXiv
FunnyBirds: A Synthetic Vision Dataset for a Part-based Analysis of Explainable AI Methods GitHub arXiv
X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events GitHub arXiv
Adaptive Testing of Computer Vision Models arXiv

Fairness, Privacy, Ethics, Social-good, Transparency, Accountability in Vision

Title Repo Paper Video
Enhancing Privacy Preservation in Federated Learning via Learning Rate Perturbation
TARGET: Federated Class-Continual Learning via Exemplar-Free Distillation GitHub arXiv
FACTS: First Amplify Correlations and then Slice to Discover Bias
Computation and Data Efficient Backdoor Attacks
Global Balanced Experts for Federated Long-Tailed Learning
Source-Free Domain Adaptive Human Pose Estimation GitHub arXiv
Gender Artifacts in Visual Datasets GitHub Page
GitHub
arXiv
FRAug: Tackling Federated Learning with Non-IID Features via Representation Augmentation arXiv
zPROBE: Zero Peek Robustness Checks for Federated Learning arXiv
Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study
FedPD: Federated Open Set Recognition with Parameter Disentanglement
MUter: Machine Unlearning for Adversarial Training Models
Beyond Skin Tone: A Multidimensional Measure of Apparent Skin Color arXiv
A Multidimensional Analysis of Social Biases in Vision Transformers GitHub arXiv
Partition-and-Debias: Agnostic Biases Mitigation via a Mixture of Biases-Specific Experts GitHub arXiv
Rethinking Data Distillation: Do not Overlook Calibration arXiv
Mining Bias-Target Alignment from Voronoi Cells GitHub arXiv
Better May not be Fairer: A Study on Subgroup Discrepancy in Image Classification GitHub
GIFD: A Generative Gradient Inversion Method with Feature Domain Optimization GitHub arXiv
Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach using Synthetic Faces and Human Evaluation arXiv
FedPerfix: Towards Partial Model Personalization of Vision Transformers in Federated Learning GitHub arXiv
Towards Attack-Tolerant Federated Learning via Critical Parameter Analysis GitHub arXiv
What can Discriminator do? Towards Box-Free Ownership Verification of Generative Adversarial Networks GitHub arXiv
Robust Heterogeneous Federated Learning under Data Corruption GitHub
Communication-Efficient Federated Learning with Single-Step Synthetic Features Compressor for Faster Convergence GitHub arXiv
GPFL: Simultaneously Learning Global and Personalized Feature Information for Personalized Federated Learning GitHub arXiv
MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention arXiv
Identification of Systematic Errors of Image Classifiers on Rare Subgroups arXiv
Adaptive Image Anonymization in the Context of Image Classification with Neural Networks
When do Curricula Work in Federated Learning? arXiv
Domain Specified Optimization for Deployment Authorization
STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition arXiv
SAL-ViT: Towards Latency Efficient Private Inference on ViT using Selective Attention Search with a Learnable Softmax Approximation
Generative Gradient Inversion without Prior
Inspecting the Geographical Representativeness of Images from Text-to-Image Models arXiv
Divide and Conquer: A Two-Step Method for High Quality Face De-Identification with Model Explainability
Exploring the Benefits of Visual Prompting in Differential Privacy GitHub arXiv
Towards Fairness-Aware Adversarial Network Pruning
AutoReP: Automatic ReLU Replacement for Fast Private Network Inference GitHub arXiv
Flatness-Aware Minimization for Domain Generalization arXiv
Communication-Efficient Vertical Federated Learning with Limited Overlapping Samples GitHub Page arXiv

First Person (Egocentric) Vision

Title Repo Paper Video
Multimodal Distillation for Egocentric Action Recognition GitHub arXiv
Self-Supervised Object Detection from Egocentric Videos
Multi-Label Affordance Mapping from Egocentric Vision arXiv
Ego-Only: Egocentric Action Detection without Exocentric Transferring arXiv
COPILOT: Human-Environment Collision Prediction and Localization from Egocentric Videos WEB Page arXiv YouTube
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding WEB Page arXiv
EgoVLPv2: Egocentric Video-Language Pre-Training with Fusion in the Backbone GitHub Page arXiv

Representation Learning

Title Repo Paper Video
WDiscOOD: Out-of-Distribution Detection via Whitened Linear Discriminant Analysis GitHub arXiv
Pairwise Similarity Learning is SimPLE
No Fear of Classifier Biases: Neural Collapse Inspired Federated Learning with Synthetic and Fixed Classifier GitHub arXiv
Generalizable Neural Fields as Partially Observed Neural Processes arXiv
M2T: Masking Transformers Twice for Faster Decoding arXiv
Keep it SimPool: Who Said Supervised Transformers Suffer from Attention Deficit? GitHub arXiv
Improving Pixel-based MIM by Reducing Wasted Modeling Capability GitHub arXiv
Learning Image-Adaptive Codebooks for Class-Agnostic Image Restoration arXiv
Quality Diversity for Visual Pre-Training GitHub
Subclass-Balancing Contrastive Learning for Long-Tailed Recognition arXiv
Mastering Spatial Graph Prediction of Road Networks arXiv
Poincaré ResNet GitHub arXiv
Exploring Model Transferability through the Lens of Potential Energy GitHub arXiv
Improving CLIP Fine-Tuning Performance
Unsupervised Manifold Linearizing and Clustering arXiv
Generalized Sum Pooling for Metric Learning GitHub arXiv
Partition Speeds Up Learning Implicit Neural Representations based on Exponential-Increase Hypothesis
The Effectiveness of MAE Pre-Pretraining for Billion-Scale Pretraining arXiv
Token-Label Alignment for Vision Transformers GitHub arXiv
Efficiently Robustify Pre-Trained Models arXiv
OFVL-MS: Once for Visual Localization Across Multiple Indoor Scenes GitHub Page
GitHub
arXiv
Feature Prediction Diffusion Model for Video Anomaly Detection
Joint Implicit Neural Representation for High-Fidelity and Compact Vector Fonts
How Far Pre-Trained Models are from Neural Collapse on the Target Dataset Informs their Transferability
OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions GitHub arXiv
Perceptual Grouping in Contrastive Vision-Language Models GitHub arXiv
Fully Attentional Networks with Self-Emerging Token Labeling
Instance and Category Supervision are Alternate Learners for Continual Learning
SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-Training GitHub arXiv
Motion-Guided Masking for Spatiotemporal Representation Learning arXiv
Amazon Science
Data Augmented Flatness-Aware Gradient Projection for Continual Learning
Take-a-Photo: 3D-to-2D Generative Pre-Training of Point Cloud Models WEB Page
GitHub
arXiv
BiViT: Extremely Compressed Binary Vision Transformers GitHub arXiv
Spatio-Temporal Crop Aggregation for Video Representation Learning GitHub arXiv
Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning GitHub arXiv
Semantic Information in Contrastive Learning
Cross-Domain Product Representation Learning for Rich-Content E-Commerce GitHub arXiv
Contrastive Continuity on Augmentation Stability Rehearsal for Continual Self-Supervised Learning
HybridAugment++: Unified Frequency Spectra Perturbations for Model Robustness GitHub arXiv
Unleashing Text-to-Image Diffusion Models for Visual Perception WEB Page
GitHub
arXiv

Deep Learning Architectures

Title Repo Paper Video
Efficient Controllable Multi-Task Architectures arXiv
ParCNetV2: Oversized Kernel with Enhanced Attention GitHub arXiv
Unleashing the Power of Gradient Signal-to-Noise Ratio for Zero-Shot NAS
MMST-ViT: Climate Change-Aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer GitHub Pdf
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization GitHub arXiv
IIEU: Rethinking Neural Feature Activation from Decision-Making
Scratching Visual Transformer's Back with Uniform Attention arXiv
SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference arXiv
ElasticViT: Conflict-Aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices arXiv
Gramian Attention Heads are Strong yet Efficient Vision Learners
EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones GitHub arXiv
Ord2Seq: Regarding Ordinal Regression as Label Sequence Prediction GitHub arXiv
Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning arXiv
LaPE: Layer-Adaptive Position Embedding for Vision Transformers with Independent Layer Normalization GitHub arXiv
Exemplar-Free Continual Transformer with Convolutions GitHub Page
GitHub
arXiv
Building Vision Transformers with Hierarchy Aware Feature Aggregation
ShiftNAS: Improving One-Shot NAS via Probability Shift GitHub arXiv
DarSwin: Distortion Aware Radial Swin Transformer GitHub Page arXiv
ROME: Robustifying Memory-Efficient NAS via Topology Disentanglement and Gradient Accumulation arXiv
FDViT: Improve the Hierarchical Architecture of Vision Transformer
FLatten Transformer: Vision Transformer using Focused Linear Attention GitHub arXiv
MixPath: A Unified Approach for One-Shot Neural Architecture Search arXiv
SSF: Accelerating Training of Spiking Neural Networks with Stabilized Spiking Flow
Dynamic Perceiver for Efficient Visual Recognition GitHub arXiv
SG-Former: Self-Guided Transformer with Evolving Token Reallocation GitHub arXiv
Scale-Aware Modulation Meet Transformer GitHub arXiv
Learning to Upsample by Learning to Sample GitHub arXiv
GET: Group Event Transformer for Event-based Vision
Adaptive Frequency Filters as Efficient Global Token Mixers GitHub Page arXiv
Fcaformer: Forward Cross Attention in Hybrid Vision Transformer GitHub arXiv
Dynamic Snake Convolution based on Topological Geometric Constraints for Tubular Structure Segmentation GitHub Page
GitHub
arXiv
Sentence Attention Blocks for Answer Grounding
MST-Compression: Compressing and Accelerating Binary Neural Networks with Minimum Spanning Tree arXiv
EGformer: Equirectangular Geometry-biased Transformer for 360 Depth Estimation arXiv
SPANet: Frequency-Balancing Token Mixer using Spectral Pooling Aggregation Modulation GitHub Page arXiv YouTube
ModelGiF: Gradient Fields for Model Functional Distance GitHub
ClusT3: Information Invariant Test-Time Training
Cumulative Spatial Knowledge Distillation for Vision Transformers arXiv
Luminance-Aware Color Transform for Multiple Exposure Correction
Towards Memory- and Time-Efficient Backpropagation for Training Spiking Neural Networks GitHub arXiv
Domain Generalization Guided by Gradient Signal to Noise Ratio of Parameters
DOT: A Distillation-Oriented Trainer arXiv
Extensible and Efficient Proxy for Neural Architecture Search
Learning to Transform for Generalizable Instance-Wise Invariance GitHub
Convolutional Networks with Oriented 1D Kernels

Recognition: Detection

Title Repo Paper Video
Random Boxes are Open-World Object Detectors GitHub arXiv
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection GitHub arXiv
CoIn: Contrastive Instance Feature Mining for Outdoor 3D Object Detection with Very Limited Annotations GitHub
A Dynamic Dual-Processing Object Detection Framework Inspired by the Brain's Recognition Mechanism
Anchor-Intermediate Detector: Decoupling and Coupling Bounding Boxes for Accurate Object Detection
Inter-Realization Channels: Unsupervised Anomaly Detection Beyond One-Class Classification
Deep Equilibrium Object Detection GitHub arXiv
RecursiveDet: End-to-End Region-based Recursive Object Detection GitHub arXiv
Small Object Detection via Coarse-to-Fine Proposal Generation and Imitation Learning GitHub arXiv
ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation GitHub arXiv
COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts GitHub Page
GitHub
arXiv
Generative Prompt Model for Weakly Supervised Object Localization GitHub arXiv
UniKD: Universal Knowledge Distillation for Mimicking Homogeneous or Heterogeneous Object Detectors
PNI: Industrial Anomaly Detection using Position and Neighborhood Information GitHub arXiv
Masked Autoencoders are Stronger Knowledge Distillers
GPA-3D: Geometry-Aware Prototype Alignment for Unsupervised Domain Adaptive 3D Object Detection from Point Clouds GitHub arXiv
ADNet: Lane Shape Prediction via Anchor Decomposition GitHub arXiv
Periodically Exchange Teacher-Student for Source-Free Object Detection
Towards Fair and Comprehensive Comparisons for Image-based 3D Object Detection
Monocular 3D Object Detection with Bounding Box Denoising in 3D by Perceiver arXiv
Template-Guided Hierarchical Feature Restoration for Anomaly Detection
ALWOD: Active Learning for Weakly-Supervised Object Detection arXiv
ProtoFL: Unsupervised Federated Learning via Prototypical Distillation arXiv
Efficient Adaptive Human-Object Interaction Detection with Concept-Guided Memory GitHub Page
GitHub
arXiv
Detection Transformer with Stable Matching GitHub arXiv
Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection
Anomaly Detection Under Distribution Shift GitHub arXiv
Detecting Objects with Context-Likelihood Graphs and Graph Refinement arXiv
Unsupervised Object Localization with Representer Point Selection GitHub arXiv
DETR does not Need Multi-Scale or Locality Design GitHub arXiv
Deep Directly-Trained Spiking Neural Networks for Object Detection GitHub arXiv
GACE: Geometry Aware Confidence Enhancement for Black-Box 3D Object Detectors on LiDAR-Data
StageInteractor: Query-based Object Detector with Cross-Stage Interaction arXiv
Adaptive Rotated Convolution for Rotated Object Detection arXiv
Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection
Exploring Transformers for Open-World Instance Segmentation arXiv
DDG-Net: Discriminability-Driven Graph Network for Weakly-Supervised Temporal Action Localization GitHub arXiv
Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment GitHub arXiv
Category-Aware Allocation Transformer for Weakly Supervised Object Localization GitHub
The Devil is in the Crack Orientation: A New Perspective for Crack Detection
Clusterformer: Cluster-based Transformer for 3D Object Detection in Point Clouds
Less is more: Focus Attention for Efficient DETR GitHub Page
Gitee Page
arXiv
DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting GitHub arXiv
Multi-Label Self-Supervised Learning with Scene Images arXiv
Cascade-DETR: Delving into High-Quality Universal Object Detection GitHub arXiv
Representation Disparity-Aware Distillation for 3D Object Detection arXiv
FeatEnHancer: Enhancing Hierarchical Features for Object Detection and Beyond Under Low-Light Vision arXiv
DetZero: Rethinking Offboard 3D Object Detection with Long-Term Sequential Point Clouds GitHub arXiv
DETRs with Collaborative Hybrid Assignments Training GitHub arXiv
Open-Vocabulary Object Detection with an Open Corpus
SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-Positive Mining WEB Page
GitHub
arXiv
Unsupervised Surface Anomaly Detection with Diffusion Probabilistic Model
UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation GitHub arXiv
Focus the Discrepancy: Intra- and Inter-Correlation Learning for Image Anomaly Detection GitHub arXiv
MonoNeRD: NeRF-Like Representations for Monocular 3D Object Detection GitHub arXiv
Integrally Migrating Pre-Trained Transformer Encoder-Decoders for Visual Object Detection GitHub arXiv
Generating Dynamic Kernels via Transformers for Lane Detection
Meta-ZSDETR: Zero-Shot DETR with Meta-Learning arXiv
Spatial Self-Distillation for Object Detection with Inaccurate Bounding Boxes GitHub Page
GitHub
arXiv
AlignDet: Aligning Pre-Training and Fine-Tuning in Object Detection GitHub Page
GitHub
arXiv
MULLER: Multilayer Laplacian Resizer for Vision arXiv
Unilaterally Aggregated Contrastive Learning with Hierarchical Augmentation for Anomaly Detection arXiv
DETRDistill: A Universal Knowledge Distillation Framework for DETR-Families arXiv
Delving into Motion-Aware Matching for Monocular 3D Object Tracking GitHub arXiv
FB-BEV: BEV Representation from Forward-Backward View Transformations GitHub arXiv
Learning from Noisy Data for Semi-Supervised 3D Object Detection
Boosting Long-Tailed Object Detection via Step-Wise Learning on Smooth-Tail Data arXiv
Objects do not Disappear: Video Object Detection by Single-Frame Object Location Anticipation GitHub arXiv
Unified Visual Relationship Detection with Vision and Language Models GitHub Page arXiv
Universal Domain Adaptation via Compressive Attention Matching arXiv
Unsupervised Domain Adaptive Detection with Network Stability Analysis GitHub arXiv
ImGeoNet: Image-Induced Geometry-Aware Voxel Representation for Multi-View 3D Object Detection GitHub Page arXiv
Cyclic-Bootstrap Labeling for Weakly Supervised Object Detection GitHub arXiv

Image and Video Synthesis

Title Repo Paper Video
Text-Driven Generative Domain Adaptation with Spectral Consistency Regularization
MosaiQ: Quantum Generative Adversarial Networks for Image Generation on NISQ Computers arXiv
Controllable Visual-Tactile Synthesis GitHub Page
GitHub
arXiv YouTube
Editing Implicit Assumptions in Text-to-Image Diffusion Models GitHub Page
GitHub
arXiv
DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars GitHub arXiv
Smoothness Similarity Regularization for Few-Shot GAN Adaptation arXiv
HSR-Diff: Hyperspectral Image Super-Resolution via Conditional Diffusion Models arXiv
Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models GitHub Page arXiv
AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration arXiv
GaFET: Learning Geometry-Aware Facial Expression Translation from in-the-Wild Images arXiv
Collecting the Puzzle Pieces: Disentangled Self-Driven Human Pose Transfer by Permuting Textures GitHub arXiv
Multi-Directional Subspace Editing in Style-Space GitHub Page
GitHub
arXiv
HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces GitHub Page
GitHub
arXiv
Generating Realistic Images from in-the-Wild Sounds arXiv
CC3D: Layout-Conditioned Generation of Compositional 3D Scenes GitHub Page
GitHub
arXiv
UMFuse: Unified Multi View Fusion for Human Editing Applications arXiv
Evaluating Data Attribution for Text-to-Image Models GitHub Page
GitHub
arXiv YouTube
Neural Characteristic Function Learning for Conditional Image Generation
WaveIPT: Joint Attention and Flow Alignment in the Wavelet Domain for Pose Transfer
LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models GitHub Page
GitHub
arXiv
Human-Inspired Facial Sketch Synthesis with Dynamic Adaptation GitHub arXiv
Conceptual and Hierarchical Latent Space Decomposition for Face Editing
Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations arXiv
BallGAN: 3D-Aware Image Synthesis with a Spherical Background GitHub Page
GitHub
arXiv YouTube
End-to-End Diffusion Latent Optimization Improves Classifier Guidance GitHub arXiv
Deep Geometrized Cartoon Line Inbetweening GitHub
UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human Generation GitHub Page YouTube
Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond arXiv
SVDiff: Compact Parameter Space for Diffusion Fine-Tuning GitHub arXiv
MI-GAN: A Simple Baseline for Image Inpainting on Mobile Devices
Structure and Content-Guided Video Synthesis with Diffusion Models WEB Page arXiv YouTube
Scenimefy: Learning to Craft Anime Scene via Semi-Supervised Image-to-Image Translation GitHub Page
GitHub
Hugging Face
arXiv
Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers
A Latent Space of Stochastic Diffusion Models for Zero-Shot Image Editing and Guidance
Generative Multiplane Neural Radiance for 3D-Aware Image Generation GitHub arXiv
Parallax-Tolerant Unsupervised Deep Image Stitching GitHub arXiv
GAIT: Generating Aesthetic Indoor Tours with Deep Reinforcement Learning GitHub
EverLight: Indoor-Outdoor Editable HDR Lighting Estimation GitHub Page arXiv
Prompt Tuning Inversion for Text-Driven Image Editing using Diffusion Models arXiv
Efficient Diffusion Training via Min-SNR Weighting Strategy GitHub arXiv
BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion GitHub arXiv
Improving Sample Quality of Diffusion Models using Self-Attention Guidance GitHub Page
GitHub
arXiv
Not All Steps are Created Equal: Selective Diffusion Distillation for Image Manipulation GitHub arXiv
Deep Image Harmonization with Learnable Augmentation GitHub arXiv
Out-of-Domain GAN Inversion via Invertibility Decomposition for Photo-Realistic Human Face Manipulation GitHub arXiv
Bidirectionally Deformable Motion Modulation for Video-based Human Pose Transfer GitHub arXiv
Size does Matter: Size-Aware Virtual Try-On via Clothing-Oriented Transformation Try-On Network
VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs GitHub Page
GitHub
arXiv YouTube
Learning Global-Aware Kernel for Image Harmonization arXiv
Expressive Text-to-Image Generation with Rich Text GitHub Page
GitHub
Hugging Face
arXiv YouTube
A Large-Scale Outdoor Multi-Modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction WEB Page
GitHub
arXiv Loom
Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis GitHub Page
GitHub
arXiv YouTube
Perceptual Artifacts Localization for Image Synthesis Tasks GitHub Page
GitHub
Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis GitHub Page
GitHub
arXiv
StylerDALLE: Language-Guided Style Transfer using a Vector-Quantized Tokenizer of a Large-Scale Generative Model GitHub arXiv
Shortcut-V2V: Compression Framework for Video-to-Video Translation based on Temporal Redundancy Reduction GitHub Page arXiv
Tune-a-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation GitHub Page
GitHub
Hugging Face
arXiv
BlendFace: Re-Designing Identity Encoders for Face-Swapping GitHub Page
GitHub
arXiv
Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors GitHub Page arXiv YouTube
LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis GitHub Page
GitHub
arXiv
Open-Vocabulary Object Segmentation with Diffusion Models GitHub Page
GitHub
arXiv
StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models arXiv
ToonTalker: Cross-Domain Face Reenactment GitHub Page
GitHub
arXiv
Dense Text-to-Image Generation with Attention Modulation GitHub arXiv
Householder Projector for Unsupervised Latent Semantics Discovery GitHub arXiv
Deep Image Harmonization with Globally Guided Feature Transformation and Relation Distillation GitHub arXiv
One-Shot Generative Domain Adaptation GitHub Page
GitHub
arXiv
Hashing Neural Video Decomposition with Multiplicative Residuals in Space-Time GitHub Page
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model GitHub
Hugging Face
arXiv
Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis GitHub arXiv
FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model GitHub arXiv

Vision and Audio

Title Repo Paper Video
Sound Source Localization is All About Cross-Modal Alignment arXiv
Class-Incremental Grouping Network for Continual Audio-Visual Learning GitHub arXiv
Audio-Visual Class-Incremental Learning GitHub arXiv
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-Guided Speaker Embedding arXiv
The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion GitHub Page arXiv
SIDGAN: High-Resolution Dubbed Video Generation via Shift-Invariant Learning Amazon Science
On the Audio-Visual Synchronization for Lip-to-Speech Synthesis arXiv
Be Everywhere - Hear Everything (BEE): Audio Scene Reconstruction by Sparse Audio-Visual Samples
Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal Distillation arXiv
Hyperbolic Audio-Visual Zero-Shot Learning arXiv
AdVerb: Visually Guided Audio Dereverberation arXiv
Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation GitHub Page
GitHub
arXiv

Recognition, Segmentation, and Shape Analysis

Title Repo Paper Video
Segment Anything WEB Page
GitHub
arXiv
Shape Analysis of Euclidean Curves under Frenet-Serret Framework
Unmasking Anomalies in Road-Scene Segmentation GitHub
Open In Colab
arXiv
High Quality Entity Segmentation GitHub Page
GitHub
arXiv
Towards Open-Vocabulary Video Instance Segmentation GitHub arXiv
Beyond One-to-One: Rethinking the Referring Image Segmentation GitHub arXiv
Multiple Instance Learning Framework with Masked Hard Instance Mining for whole Slide Image Classification GitHub arXiv
Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning GitHub arXiv
Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval GitHub
Towards Deeply Unified Depth-Aware Panoptic Segmentation with Bi-Directional Guidance Learning GitHub arXiv
LogicSeg: Parsing Visual Semantics with Neural Logic Learning and Reasoning
ASIC: Aligning Sparse in-the-Wild Image Collections GitHub Page arXiv YouTube

Generative AI

Title Repo Paper Video
CLIPascene: Scene Sketching with Different Types and Levels of Abstraction GitHub Page
GitHub
arXiv
LD-ZNet: A Latent Diffusion Approach for Text-based Image Segmentation GitHub Page arXiv
TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models
NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions GitHub Page Pdf
Scalable Diffusion Models with Transformers WEB Page
GitHub
arXiv
Texture Generation on 3D Meshes with Point-UV Diffusion GitHub Page
GitHub
arXiv
Generative Novel View Synthesis with 3D-Aware Diffusion Models GitHub Page
GitHub
arXiv
DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning GitHub arXiv
VQ3D: Learning a 3D-Aware Generative Model on ImageNet GitHub Page arXiv
Ref-NeuS: Ambiguity-Reduced Neural Implicit Surface Learning for Multi-View Reconstruction with Reflection GitHub Page
GitHub
arXiv
A Complete Recipe for Diffusion Generative Models GitHub arXiv
MMVP: Motion-Matrix-based Video Prediction GitHub arXiv
Simulating Fluids in Real-World Still Images GitHub arXiv
FateZero: Fusing Attentions for Zero-Shot Text-based Video Editing GitHub arXiv

Humans, 3D Modeling, and Driving

Title Repo Paper Video
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models GitHub Page
GitHub
arXiv YouTube
LivePose: Online 3D Reconstruction from Monocular Video with Dynamic Camera Poses GitHub arXiv
NDDepth: Normal-Distance Assisted Monocular Depth Estimation arXiv
LATR: 3D Lane Detection from Monocular Images with Transformer GitHub arXiv
DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving GitHub arXiv
Dynamic Point Fields GitHub Page arXiv YouTube
Generalizing Neural Human Fitting to Unseen Poses with Articulated SE(3) Equivariance WEB Page arXiv
Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric Views GitHub Page
GitHub
arXiv YouTube
DECO: Dense Estimation of 3D Human-Scene Contact in the Wild WEB Page
Decoupled Iterative Refinement Framework for Interacting Hands Reconstruction from a Single RGB Image GitHub arXiv
Chasing Clouds: Differentiable Volumetric Rasterisation of Point Clouds as a Highly Efficient and Accurate Loss for Large-Scale Deformable 3D Registration
Rehearsal-Free Domain Continual Face Anti-Spoofing: Generalize more and Forget Less arXiv

Low-Level Vision and Theory

Title Repo Paper Video
A 5-Point Minimal Solver for Event Camera Relative Motion Estimation
General Planar Motion from a Pair of 3D Correspondences
Beyond the Pixel: A Photometrically Calibrated HDR Dataset for Luminance and Color Prediction GitHub Page
GitHub
arXiv
DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion GitHub arXiv
Iterative Prompt Learning for Unsupervised Backlit Image Enhancement GitHub Page
GitHub
arXiv YouTube
Similarity Min-Max: Zero-Shot Day-Night Domain Adaptation GitHub Page
GitHub
arXiv YouTube
Multi-Interactive Feature Learning and a Full-Time Multi-Modality Benchmark for Image Fusion and Segmentation GitHub arXiv
Computational 3D Imaging with Position Sensors YouTube
Passive Ultra-Wideband Single-Photon Imaging
Viewing Graph Solvability in Practice GitHub
Minimal Solutions to Generalized Three-View Relative Pose Problem
SoDaCam: Software-Defined Cameras via Single-Photon Imaging WEB Page arXiv

Navigation and Autonomous Driving

Title Repo Paper Video
Robust Monocular Depth Estimation under Challenging Conditions GitHub Page
GitHub
arXiv
UMC: A Unified Bandwidth-Efficient and Multi-Resolution based Collaborative Perception Framework GitHub Page
GitHub
arXiv
View Consistent Purification for Accurate Cross-View Localization GitHub Page
GitHub
arXiv
Semi-Supervised Semantics-Guided Adversarial Training for Robust Trajectory Prediction GitHub arXiv
NeRF-LOAM: Neural Implicit Representation for Large-Scale Incremental LiDAR Odometry and Mapping GitHub arXiv
MapPrior: Bird's-Eye View Map Layout Estimation with Generative Models GitHub Page
GitHub
arXiv
Hidden Biases of End-to-End Driving Models GitHub arXiv YouTube
Search for or Navigate to? Dual Adaptive Thinking for Object Navigation arXiv
BiFF: Bi-Level Future Fusion with Polyline-based Coordinate for Interactive Trajectory Prediction arXiv
Towards Zero Domain Gap: A Comprehensive Study of Realistic LiDAR Simulation for Autonomy Testing
Clustering based Point Cloud Representation Learning for 3D Analysis GitHub arXiv
ADAPT: Efficient Multi-Agent Trajectory Prediction with Adaptation GitHub Page
GitHub
arXiv
MV-DeepSDF: Implicit Modeling with Multi-Sweep Point Clouds for 3D Vehicle Reconstruction in Autonomous Driving
Learning Vision-and-Language Navigation from YouTube Videos GitHub arXiv
TrajPAC: Towards Robustness Verification of Pedestrian Trajectory Prediction Models arXiv
VAD: Vectorized Scene Representation for Efficient Autonomous Driving GitHub arXiv
Traj-MAE: Masked Autoencoders for Trajectory Prediction arXiv
Sparse Point Guided 3D Lane Detection GitHub
A Simple Vision Transformer for Weakly Semi-Supervised 3D Object Detection
Learn TAROT with MENTOR: A Meta-Learned Self-Supervised Approach for Trajectory Prediction
FocalFormer3D: Focusing on Hard Instance for 3D Object Detection GitHub arXiv
Scene as Occupancy GitHub arXiv
Real-Time Neural Rasterization for Large Scenes
A Game of Bundle Adjustment - Learning Efficient Convergence arXiv
Efficient Transformer-based 3D Object Detection with Dynamic Token Halting arXiv
RegFormer: An Efficient Projection-Aware Transformer Network for Large-Scale Point Cloud Registration GitHub arXiv
CASSPR: Cross Attention Single Scan Place Recognition arXiv
Recursive Video Lane Detection GitHub arXiv YouTube
Parametric Depth based Feature Representation Learning for Object Detection and Segmentation in Bird's-Eye View arXiv
SHIFT3D: Synthesizing Hard Inputs for Tricking 3D Detectors arXiv
Bootstrap Motion Forecasting With Self-Consistent Constraints arXiv
Towards Viewpoint Robustness in Bird's Eye View Segmentation GitHub Page arXiv
R-Pred: Two-Stage Motion Prediction via Tube-Query Attention-based Trajectory Refinement arXiv
INT2: Interactive Trajectory Prediction at Intersections WEB Page
GitHub
YouTube
MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception GitHub arXiv
Unsupervised Self-Driving Attention Prediction via Uncertainty Mining and Knowledge Embedding GitHub arXiv
SVQNet: Sparse Voxel-Adjacent Query Network for 4D Spatio-Temporal LiDAR Semantic Segmentation arXiv
MotionLM: Multi-Agent Motion Forecasting as Language Modeling
Improving Online Lane Graph Extraction by Object-Lane Clustering arXiv
Unsupervised 3D Perception with 2D Vision-Language Distillation for Autonomous Driving
Self-Supervised Monocular Depth Estimation by Direction-Aware Cumulative Convolution Network GitHub arXiv
Ordered Atomic Activity for Fine-Grained Interactive Traffic Scenario Understanding Pdf
DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation
Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving WEB Page arXiv
MV-Map: Offboard HD-Map Generation with Multi-View Consistency GitHub arXiv
Towards Universal LiDAR-based 3D Object Detection by Multi-Domain Knowledge Transfer
Forecast-MAE: Self-Supervised Pre-Training for Motion Forecasting with Masked Autoencoders GitHub arXiv
UniFusion: Unified Multi-View Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View GitHub arXiv
BEVPlace: Learning LiDAR-based Place Recognition using Bird's Eye View Images GitHub arXiv
CORE: Cooperative Reconstruction for Multi-Agent Perception GitHub arXiv
MetaBEV: Solving Sensor Failures for 3D Detection and Map Segmentation GitHub Page
GitHub
arXiv YouTube

3D from a Single Image and Shape-from-X

Title Repo Paper Video
Aggregating Feature Point Cloud for Depth Completion
Coordinate Transformer: Achieving Single-Stage Multi-Person Mesh Recovery from Videos GitHub arXiv
MAMo: Leveraging Memory and Attention for Monocular Video Depth Estimation arXiv
SlaBins: Fisheye Depth Estimation using Slanted Bins on Road Environments
Creative Birds: Self-Supervised Single-View 3D Style Transfer GitHub arXiv
Dynamic PlenOctree for Adaptive Sampling Refinement in Explicit NeRF GitHub Page
GitHub
arXiv YouTube
CORE: Co-Planarity Regularized Monocular Geometry Estimation with Weak Supervision
Relightify: Relightable 3D Faces from a Single Image via Diffusion Models GitHub Page arXiv YouTube
GLA-GCN: Global-Local Adaptive Graph Convolutional Network for 3D Human Pose Estimation from Monocular Video GitHub arXiv
Calibrating Panoramic Depth Estimation for Practical Localization and Mapping arXiv YouTube
SimNP: Learning Self-Similarity Priors between Neural Points arXiv
AGG-Net: Attention Guided Gated-Convolutional Network for Depth Image Completion GitHub arXiv
Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data GitHub Page
GitHub
arXiv
CVSformer: Cross-View Synthesis Transformer for Semantic Scene Completion arXiv
U-RED: Unsupervised 3D Shape Retrieval and Deformation for Partial Point Clouds GitHub arXiv
Single Depth-Image 3D Reflection Symmetry and Shape Prediction
Self-Supervised Monocular Depth Estimation: Let's Talk About the Weather GitHub Page
GitHub
arXiv YouTube
Mesh2Tex: Generating Mesh Textures from Image Queries GitHub Page arXiv YouTube
Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation arXiv
Learning a Room with the Occ-SDF Hybrid: Signed Distance Function Mingled with Occupancy Aids Scene Representation GitHub Page
GitHub
arXiv
Robust Geometry-Preserving Depth Estimation using Differentiable Rendering arXiv
FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models WEB Page arXiv
One-Shot Implicit Animatable Avatars with Model-based Priors GitHub Page
GitHub
arXiv
VeRi3D: Generative Vertex-based Radiance Fields for 3D Controllable Human Image Synthesis GitHub Page
GitHub
arXiv
Diffuse3D: Wide-Angle 3D Photography via Bilateral Diffusion GitHub Pdf YouTube
AutoSynth: Learning to Generate 3D Training Data for Object Point Cloud Registration arXiv

Motion Estimation, Matching and Tracking

Will soon be added

Action and Event Understanding

Will soon be added

Computational Imaging

Will soon be added

Embodied Vision: Active Agents; Simulation

Will soon be added

Recognition: Retrieval

Will soon be added

Transfer, Low-Shot, Continual, Long-Tail Learning

Will soon be added

Low-Level and Physics-based Vision

Title Repo Paper Video
High-Resolution Document Shadow Removal via a Large-Scale Real-World Dataset and a Frequency-Aware Shadow Erasing Net GitHub Page
GitHub
arXiv
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution GitHub arXiv

Computer Vision Theory

Title Repo Paper Video
Femtodet: An Object Detection Baseline for Energy Versus Performance Tradeoffs GitHub arXiv

Video Analysis and Understanding

Will soon be added

Object Pose Estimation and Tracking

Will soon be added

3D Shape Modeling and Processing

Will soon be added

Human Pose/Shape Estimation

Will soon be added

Transfer, Low-Shot, and Continual Learning

Will soon be added

Self-, Semi-, and Unsupervised Learning

Will soon be added

Self-, Semi-, Meta-, Unsupervised Learning

Will soon be added

Photogrammetry and Remote Sensing

Will soon be added

Efficient and Scalable Vision

Title Repo Paper Video
AdaNIC: Towards Practical Neural Image Compression via Dynamic Transform Routing
Rethinking Vision Transformers for MobileNet Size and Speed GitHub arXiv
DELFlow: Dense Efficient Learning of Scene Flow for Large-Scale Point Clouds GitHub arXiv
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers arXiv
Inherent Redundancy in Spiking Neural Networks arXiv
Achievement-based Training Progress Balancing for Multi-Task Learning
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation arXiv
Differentiable Transportation Pruning arXiv
XiNet: Efficient Neural Networks for tinyML
Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers GitHub arXiv
A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance arXiv
Workie-Talkie: Accelerating Federated Learning by Overlapping Computing and Communications via Contrastive Regularization
DenseShift: Towards Accurate and Transferable Low-Bit Shift Network arXiv
PRANC: Pseudo RAndom Networks for Compacting deep models GitHub arXiv
Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement GitHub arXiv
A Fast Unified System for 3D Object Detection and Tracking
Estimator Meets Equilibrium Perspective: A Rectified Straight Through Estimator for Binary Neural Networks Training arXiv
I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference GitHub arXiv
EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization arXiv
Local or Global: Selective Knowledge Assimilation for Federated Learning with Limited Labels arXiv
DataDAM: Efficient Dataset Distillation with Attention Matching
SAFE: Machine Unlearning With Shard Graphs arXiv
ResQ: Residual Quantization for Video Perception arXiv
Efficient Computation Sharing for Multi-Task Visual Scene Understanding GitHub arXiv
Essential Matrix Estimation using Convex Relaxations in Orthogonal Space
TripLe: Revisiting Pretrained Model Reuse and Progressive Learning for Efficient Vision Transformer Scaling and Searching
DiffRate: Differentiable Compression Rate for Efficient Vision Transformers GitHub arXiv
Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection GitHub arXiv
From Knowledge Distillation to Self-Knowledge Distillation: A Unified Approach with Normalized Loss and Customized Soft Labels GitHub arXiv
Efficient 3D Semantic Segmentation with Superpoint Transformer GitHub arXiv
Dataset Quantization arXiv
Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy GitHub arXiv
RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers GitHub arXiv
Semantically Structured Image Compression via Irregular Group-Based Decoupling GitHub arXiv
SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel Storage GitHub arXiv
SMMix: Self-Motivated Image Mixing for Vision Transformers GitHub arXiv
Multi-Label Knowledge Distillation GitHub arXiv
UGC: Unified GAN Compression for Efficient Image-to-Image Translation
MotionDeltaCNN: Sparse CNN Inference of Frame Differences in Moving Camera Videos with Spherical Buffers and Padded Convolutions GitHub arXiv
EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction GitHub arXiv
DREAM: Efficient Dataset Distillation by Representative Matching GitHub arXiv
INSTA-BNN: Binary Neural Network with INSTAnce-aware Threshold arXiv
Deep Incubation: Training Large Models by Divide-and-Conquering GitHub arXiv
AdaMV-MoE: Adaptive Multi-Task Vision Mixture-of-Experts
Overcoming Forgetting Catastrophe in Quantization-Aware Training
Window-Based Early-Exit Cascades for Uncertainty Estimation: When Deep Ensembles are More Efficient than Single Models GitHub arXiv
ORC: Network Group-based Knowledge Distillation using Online Role Change GitHub arXiv
RMP-Loss: Regularizing Membrane Potential Distribution for Spiking Neural Networks arXiv
Structural Alignment for Network Pruning through Partial Regularization
Automated Knowledge Distillation via Monte Carlo Tree Search
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications GitHub arXiv
Causal-DFQ: Causality Guided Data-Free Network Quantization
Efficient Joint Optimization of Layer-Adaptive Weight Pruning in Deep Neural Networks
Automatic Network Pruning via Hilbert-Schmidt Independence Criterion Lasso under Information Bottleneck Principle
Distribution Shift Matters for Knowledge Distillation with Webly Collected Images arXiv
FastRecon: Few-shot Industrial Anomaly Detection via Fast Feature Reconstruction
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning GitHub arXiv
Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation GitHub arXiv
SHACIRA: Scalable HAsh-grid Compression for Implicit Neural Representations
Efficient Deep Space Filling Curve
Q-Diffusion: Quantizing Diffusion Models GitHub arXiv
Lossy and Lossless (L2) Post-training Model Size Compression GitHub arXiv
Robustifying Token Attention for Vision Transformers GitHub arXiv

Machine Learning (other than Deep Learning)

Will soon be added

Document Analysis and Understanding

Will soon be added

Biometrics

Will soon be added

Datasets and Evaluation

Will soon be added

Faces and Gestures

Title Repo Paper Video
DeePoint: Visual Pointing Recognition and Direction Estimation GitHub arXiv
Contactless Pulse Estimation Leveraging Pseudo Labels and Self-Supervision
Most Important Person-Guided Dual-Branch Cross-Patch Attention for Group Affect Recognition
ContactGen: Generative Contact Modeling for Grasp Generation Pdf
Imitator: Personalized Speech-Driven 3D Facial Animation GitHub Page arXiv YouTube
DVGaze: Dual-View Gaze Estimation GitHub arXiv
TransFace: Calibrating Transformer Training for Face Recognition from a Data-Centric Perspective GitHub arXiv
Towards Unsupervised Domain Generalization for Face Anti-Spoofing
Reinforced Disentanglement for Face Swapping without Skip Connection GitHub arXiv
CoSign: Exploring Co-Occurrence Signals in Skeleton-based Continuous Sign Language Recognition
EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation GitHub Page
GitHub
arXiv YouTube
LA-Net: Landmark-Aware Learning for Reliable Facial Expression Recognition under Label Noise arXiv
ASM: Adaptive Skinning Model for High-Quality 3D Face Modeling GitHub arXiv
Troubleshooting Ethnic Quality Bias with Curriculum Domain Adaptation for Face Image Quality Assessment
UniFace: Unified Cross-Entropy Loss for Deep Face Recognition
Human Part-Wise 3D Motion Context Learning for Sign Language Recognition arXiv
Weakly-Supervised Text-Driven Contrastive Learning for Facial Behavior Understanding arXiv
HaMuCo: Hand Pose Estimation via Multiview Collaborative Self-Supervised Learning GitHub Page
GitHub
arXiv
ReactioNet: Learning High-Order Facial Behavior from Universal Stimulus-Reaction by Dyadic Relation Reasoning
CLIP-Cluster: CLIP-Guided Attribute Hallucination for Face Clustering
Learning Human Dynamics in Autonomous Driving Scenarios
LivelySpeaker: Towards Semantic-Aware Co-Speech Gesture Generation GitHub arXiv
Controllable Guide-Space for Generalizable Face Forgery Detection arXiv
Unpaired Multi-Domain Attribute Translation of 3D Facial Shapes with a Square and Symmetric Geometric Map GitHub arXiv
Emotional Listener Portrait: Neural Listener Head Generation with Emotion
Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis
Invariant Feature Regularization for Fair Face Recognition
Gloss-Free Sign Language Translation: Improving from Visual-Language Pretraining GitHub arXiv
Contrastive Pseudo Learning for Open-World DeepFake Attribution GitHub arXiv
Continual Learning for Personalized Co-Speech Gesture Generation
HandR2N2: Iterative 3D Hand Pose Estimation using a Residual Recurrent Neural Network GitHub
SPACE: Speech-Driven Portrait Animation with Controllable Expression arXiv
How to Boost Face Recognition with StyleGAN? GitHub Page
GitHub
arXiv YouTube
ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour WEB Page
Zenodo
arXiv
Robust One-Shot Face Video Re-Enactment using Hybrid Latent Spaces of StyleGAN2 GitHub Page arXiv
Data-Free Class-Incremental Hand Gesture Recognition Pdf GitHub
Learning Robust Representations with Information Bottleneck and Memory Network for RGB-D-based Gesture Recognition GitHub
Knowledge-Spreader: Learning Semi-Supervised Facial Action Dynamics by Consistifying Knowledge Granularity
Face Clustering via Graph Convolutional Networks with Confidence Edges
StyleGANEX: StyleGAN-based Manipulation Beyond Cropped Aligned Faces WEB Page
GitHub
arXiv YouTube
SeeABLE: Soft Discrepancies and Bounded Contrastive Learning for Exposing Deepfakes GitHub arXiv
Adaptive Nonlinear Latent Transformation for Conditional Face Editing GitHub arXiv
Semi-Supervised Speech-Driven 3D Facial Animation via Cross-Modal Encoding
ICD-Face: Intra-Class Compactness Distillation for Face Recognition
C2ST: Cross-Modal Contextualized Sequence Transduction for Continuous Sign Language Recognition

Medical and Biological Vision; Cell Microscopy

Title Repo Paper Video
BoMD: Bag of Multi-Label Local Descriptors for Noisy Chest X-Ray Classification GitHub arXiv
CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection GitHub arXiv

Scene Analysis and Understanding

Will soon be added

Multimodal Learning

Will soon be added

Human-in-the-Loop Computer Vision

Will soon be added

Image and Video Forensics

Will soon be added

Geometric Deep Learning

Will soon be added

Vision Applications and Systems

Title Repo Paper Video
Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing GitHub arXiv

Machine Learning and Dataset

Title Repo Paper Video
Unmasked Teacher: Towards Training-Efficient Video Foundation Models GitHub arXiv

Star History

Star History Chart

iccv-2023-papers's People

Contributors

dmitryryumin avatar pvtien96 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.