深度解讀電腦視覺論文 -- 經典到前沿 -- 中英雙語對照 + 多維度深度解析
論文
領域
Deep Residual Learning for Image Recognition (ResNet)
影像分類 (CVPR 2016 Best Paper)
You Only Look Once (YOLO)
物件偵測 (CVPR 2016)
Attention Is All You Need (Transformer)
序列模型 (NeurIPS 2017)
An Image is Worth 16x16 Words (ViT)
Vision Transformer (ICLR 2021)
Denoising Diffusion Probabilistic Models (DDPM)
擴散模型 (NeurIPS 2020)
CVPR (2013-2025, 13 屆, 156 篇)
含經典論文 5 篇 + CVPR 156 篇 + ICCV 84 篇 + ECCV 72 篇 = 共 317 篇
論文
領域
Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
物件偵測 (Best Paper)
Discriminative Non-blind Deblurring
影像復原 (Best Student Paper)
Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization
視覺定位 (Honorable Mention)
Online Object Tracking: A Benchmark (OTB)
目標追蹤
SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
SLAM
Pedestrian Detection with Unsupervised Multi-Stage Feature Learning
行人偵測
Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
RGB-D 場景理解
HON4D: Histogram of Oriented 4D Normals for Activity Recognition
動作辨識
Joint 3D Scene Reconstruction and Class Segmentation
3D 重建
Hierarchical Saliency Detection
顯著性偵測
Detecting and Aligning Faces by Image Retrieval
人臉偵測
Multi-target Tracking by Lagrangian Relaxation to Min-cost Network Flow
多目標追蹤
論文
領域
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation (R-CNN)
物件偵測
DeepFace: Closing the Gap to Human-Level Performance in Face Verification
人臉辨識
What Camera Motion Reveals About Shape with Unknown BRDF
3D 形狀 (Best Paper)
Learning and Transferring Mid-Level Image Representations using CNNs
遷移學習
Multiscale Combinatorial Grouping (MCG)
分割
Partial Optimality by Pruning for MAP-inference
最佳化 (Best Student Paper)
3D Shape and Indirect Appearance by Structured Light Transport
3D 光傳輸 (Honorable Mention)
Deep Learning Face Representation from Predicting 10,000 Classes (DeepID)
人臉辨識
Caffe: Convolutional Architecture for Fast Feature Embedding
框架
Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group
動作辨識
Fast and Accurate Image Matching with Cascade Hashing for 3D Reconstruction
3D 重建
Multi-Object Tracking via Constrained Sequential Labeling
多目標追蹤
論文
領域
DynamicFusion: Reconstruction and Tracking of Non-rigid Scenes in Real-Time
3D 重建 (Best Paper)
Fully Convolutional Networks for Semantic Segmentation (FCN)
語義分割 (Honorable Mention)
Going Deeper with Convolutions (GoogLeNet/Inception)
網路架構
FaceNet: A Unified Embedding for Face Recognition and Clustering
人臉辨識
Show and Tell: A Neural Image Caption Generator
影像描述
Long-Term Recurrent Convolutional Networks (LRCN)
影片理解
Hypercolumns for Object Segmentation and Fine-grained Localization
分割
3D ShapeNets: A Deep Representation for Volumetric Shapes
3D 形狀
Category-Specific Object Reconstruction from a Single Image
3D 重建 (Best Student Paper)
Picture: A Probabilistic Programming Language for Scene Perception
場景理解 (Honorable Mention)
Deformable Part Models are Convolutional Neural Networks
物件偵測
Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition
動作辨識
論文
領域
Structural-RNN: Deep Learning on Spatio-Temporal Graphs
時空圖 (Best Student Paper)
Sublabel-Accurate Relaxation of Nonconvex Energies
最佳化 (Honorable Mention)
Rethinking the Inception Architecture for Computer Vision (Inception v3)
網路架構
Learning Deep Features for Discriminative Localization (CAM)
可解釋性
Context Encoders: Feature Learning by Inpainting
影像修復
Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images
3D 偵測
Stacked Attention Networks for Image Question Answering
VQA
NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis
動作辨識
Training Region-based Object Detectors with Online Hard Example Mining (OHEM)
物件偵測
Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs
影片理解
Learning Dense Correspondence via 3D-Guided Cycle Consistency
3D 對應
Single Image 3D Interpreter Network
3D 理解
論文
領域
Densely Connected Convolutional Networks (DenseNet)
網路架構 (Best Paper)
Learning from Simulated and Unsupervised Images through Adversarial Training (SimGAN)
GAN (Best Paper)
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
3D 點雲
Feature Pyramid Networks for Object Detection (FPN)
物件偵測
Image-to-Image Translation with Conditional Adversarial Networks (pix2pix)
GAN
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields (OpenPose)
姿態估計
Pyramid Scene Parsing Network (PSPNet)
語義分割
Aggregated Residual Transformations for Deep Neural Networks (ResNeXt)
網路架構
Photo-Realistic Single Image Super-Resolution Using a GAN (SRGAN)
超解析度
Deformable Convolutional Networks
物件偵測
Annotating Object Instances with a Polygon-RNN
實例分割 (Honorable Mention)
Computational Imaging on the Electric Grid
計算攝影 (Best Student Paper)
論文
領域
Taskonomy: Disentangling Task Transfer Learning
遷移學習 (Best Paper)
Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies
3D 人體 (Best Student Paper)
Non-local Neural Networks
影片理解
Squeeze-and-Excitation Networks (SENet)
網路架構
StarGAN: Unified GANs for Multi-Domain Image-to-Image Translation
GAN
MobileNetV2: Inverted Residuals and Linear Bottlenecks
高效架構
High-Resolution Image Synthesis with Conditional GANs (pix2pixHD)
影像合成
DensePose: Dense Human Pose Estimation In The Wild
姿態估計
Progressive Growing of GANs (ProGAN)
GAN
A Closer Look at Spatiotemporal Convolutions for Action Recognition (R(2+1)D)
影片理解
Deep Learning of Graph Matching
圖匹配 (Honorable Mention)
SPLATNet: Sparse Lattice Networks for Point Cloud Processing
點雲 (Honorable Mention)
論文
領域
A Theory of Fermat Paths for Non-Line-of-Sight Shape Reconstruction
非視線重建 (Best Paper)
ArcFace: Additive Angular Margin Loss for Deep Face Recognition
人臉辨識
A Style-Based Generator Architecture for GANs (StyleGAN)
GAN / 影像生成
Semantic Image Synthesis with Spatially-Adaptive Normalization (SPADE)
語義影像合成
DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation
3D 形狀表示
PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud
3D 物件偵測
Fast Online Object Tracking and Segmentation (SiamMask)
物件追蹤 / 分割
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
神經架構搜尋
Libra R-CNN: Towards Balanced Learning for Object Detection
物件偵測
Bag of Tricks for Image Classification with CNNs
影像分類
Generalized Intersection over Union (GIoU)
物件偵測
GANFIT: GAN Fitting for High Fidelity 3D Face Reconstruction
3D 人臉重建
論文
領域
Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild
3D 重建 (Best Paper)
BSP-Net: Generating Compact Meshes via Binary Space Partitioning
3D 形狀生成 (Best Student Paper)
Momentum Contrast for Unsupervised Visual Representation Learning (MoCo)
自監督學習
EfficientDet: Scalable and Efficient Object Detection
物件偵測
PointRend: Image Segmentation as Rendering
影像分割
3D Photography using Context-aware Layered Depth Inpainting
3D 攝影
HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation
人體姿態估計
PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization
3D 人體重建
Circle Loss: A Unified Perspective of Pair Similarity Optimization
度量學習
X3D: Expanding Architectures for Efficient Video Recognition
影片辨識
DeepCap: Monocular Human Performance Capture Using Weak Supervision
人體動態捕捉
Adversarial Latent Autoencoders (ALAE)
生成模型
論文
領域
GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields
場景生成 (Best Paper)
Exploring Simple Siamese Representation Learning (SimSiam)
自監督學習 (Honorable Mention)
NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections
新視角合成
RepVGG: Making VGG-Style ConvNets Great Again
網路架構
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective (SETR)
語義分割
D-NeRF: Neural Radiance Fields for Dynamic Scenes
動態場景
Real-Time High-Resolution Background Matting
影像摳圖 (Honorable Mention)
End-to-End Video Instance Segmentation with Transformers (VisTR)
影片實例分割
RAFT-3D: Scene Flow Using Rigid-Motion Embeddings
場景流
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Vision Transformer
UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
物件偵測
Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos
人體深度估計 (Honorable Mention)
論文
領域
Learning to Solve Hard Minimal Problems
幾何視覺 (Best Paper)
Dual-Shutter Optical Vibration Sensing
計算攝影 (Best Paper Honorable Mention)
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation
姿態估計 (Best Student Paper)
Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields
神經輻射場 (Best Student Paper Honorable Mention)
Masked Autoencoders Are Scalable Vision Learners (MAE)
自監督學習
High-Resolution Image Synthesis with Latent Diffusion Models (LDM)
擴散模型
A ConvNet for the 2020s (ConvNeXt)
網路架構
Swin Transformer V2: Scaling Up Capacity and Resolution
Vision Transformer
Masked-attention Mask Transformer for Universal Image Segmentation (Mask2Former)
統一分割
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
物體偵測
Restormer: Efficient Transformer for High-Resolution Image Restoration
影像修復
Point-NeRF: Point-based Neural Radiance Fields
3D 重建
論文
領域
Visual Programming: Compositional Visual Reasoning Without Training
視覺推理 (Best Paper)
Planning-oriented Autonomous Driving (UniAD)
自動駕駛 (Best Paper)
DynIBaR: Neural Dynamic Image-Based Rendering
新視角合成 (Best Paper Honorable Mention)
3D Registration with Maximal Cliques
點雲配準 (Best Student Paper)
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
擴散模型 (Best Student Paper Honorable Mention)
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
視覺基礎模型
ImageBind: One Embedding Space To Bind Them All
多模態學習
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
自監督學習
ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
開放詞彙分割
Scaling Up GANs for Text-to-Image Synthesis (GigaGAN)
GAN / 影像生成
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
偵測與分割
Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation
3D 生成
論文
領域
Generative Image Dynamics
影像動態生成 (Best Paper)
Rich Human Feedback for Text-to-Image Generation
文字到影像 (Best Paper)
Mip-Splatting: Alias-free 3D Gaussian Splatting
3D 高斯潑灑 (Best Student Paper)
BioCLIP: A Vision Foundation Model for the Tree of Life
生物視覺基礎模型 (Best Student Paper)
pixelSplat: 3D Gaussian Splats from Image Pairs
3D 重建 (Honorable Mention)
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
深度估計
YOLO-World: Real-Time Open-Vocabulary Object Detection
開放詞彙偵測
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
多模態基礎模型
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
視覺語言模型
4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
動態 3D 渲染
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
高效分割
Objects as Volumes: A Stochastic Geometry View of Opaque Solids
渲染理論 (Honorable Mention)
論文
領域
VGGT: Visual Geometry Grounded Transformer
3D 多視角幾何 (Best Paper)
Neural Inverse Rendering from Propagating Light
逆渲染 (Best Student Paper)
MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos
結構與運動 (Honorable Mention)
Navigation World Models
世界模型 (Honorable Mention)
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art VLMs
視覺語言模型 (Honorable Mention)
3D Student Splatting and Scooping
神經渲染 (Honorable Mention)
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens (DDT-LLaMA)
多模態預訓練 (Best Student Paper Honorable Mention)
MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
即時 SLAM
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
統一多模態
TRELLIS: Structured 3D Latents for Scalable and Versatile 3D Generation
3D 生成
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
影片深度估測
OmniGen: Unified Image Generation
統一影像生成
ICCV (2013-2025, 7 屆, 84 篇)
論文
領域
From Large Scale Image Categorization to Entry-Level Categories
影像分類 (Marr Prize)
Hierarchical Data-driven Descent for Efficient Optimal Deformation Estimation
形變估計 (Honorable Mention)
Piecewise Rigid Scene Flow
3D 場景流 (Honorable Mention)
Action Recognition with Improved Trajectories
動作辨識
Structured Forests for Fast Edge Detection
邊緣偵測
OverFeat: Integrated Recognition, Localization and Detection
物件偵測
Saliency Detection: A Boolean Map Approach
顯著偵測
Holistic Scene Understanding for 3D Object Detection with RGBD
3D 偵測
Saliency Detection via Absorbing Markov Chain
顯著偵測
Segmentation Driven Object Detection with Fisher Vectors
物件偵測
PhotoOCR: Reading Text in Uncontrolled Conditions
文字辨識
Robust Object Tracking with Online Multi-lifespan Dictionary Learning
視覺追蹤
論文
領域
Deep Neural Decision Forests
分類 (Marr Prize)
Holistically-Nested Edge Detection (HED)
邊緣偵測 (Honorable Mention)
Fast R-CNN
物件偵測
Delving Deep into Rectifiers (PReLU)
影像分類
FlowNet: Learning Optical Flow with Convolutional Networks
光流估計
Learning Spatiotemporal Features with 3D Convolutional Networks (C3D)
影片理解
Conditional Random Fields as Recurrent Neural Networks (CRF-RNN)
語意分割
Learning Deconvolution Network for Semantic Segmentation
語意分割
Unsupervised Visual Representation Learning by Context Prediction
自監督學習
Ask Your Neurons: A Neural-Based Approach to Answering Questions
視覺問答
Unsupervised Learning of Visual Representations using Videos
自監督學習
Dense Optical Flow Prediction From a Static Image
運動預測
論文
領域
Mask R-CNN
實例分割 (Marr Prize)
Focal Loss for Dense Object Detection (RetinaNet)
物件偵測 (Best Student Paper)
First-Person Activity Forecasting
活動預測 (Honorable Mention)
CycleGAN: Unpaired Image-to-Image Translation
影像生成
Grad-CAM: Visual Explanations from Deep Networks
可解釋性
Channel Pruning for Accelerating Very Deep Neural Networks
模型壓縮
Open Set Domain Adaptation
領域適應
Globally-Optimal Inlier Set Maximisation
3D 視覺
Structured Attentions for Visual Question Answering
VQA
Generative Image Inpainting with Contextual Attention
影像修補
Globally and Locally Consistent Image Completion
影像補全
Learning to Segment Every Thing
實例分割
論文
領域
SinGAN: Learning a Generative Model from a Single Natural Image
影像生成 (Marr Prize)
PLMP: Point-Line Minimal Problems
多視圖幾何 (Best Student Paper)
CutMix: Regularization Strategy to Train Strong Classifiers
資料增強
FCOS: Fully Convolutional One-Stage Object Detection
物件偵測
Deep Hough Voting for 3D Object Detection (VoteNet)
3D 偵測
SlowFast Networks for Video Recognition
影片理解
Mesh R-CNN
3D 重建
Larger Norm More Transferable
領域適應
Exploring Randomly Wired Neural Networks
架構搜尋
Cascade R-CNN
物件偵測
Asynchronous Single-Photon 3D Imaging
計算攝影 (Honorable Mention)
Specifying Object Attributes and Relations in Interactive Scene Generation
場景生成 (Honorable Mention)
論文
領域
Swin Transformer
Vision Transformer (Marr Prize)
Pixel-Perfect Structure-from-Motion
SfM (Best Student Paper)
Mip-NeRF: Anti-Aliasing Neural Radiance Fields
神經輻射場 (Honorable Mention)
OpenGAN: Open-Set Recognition via Open Data Generation
開放集辨識 (Honorable Mention)
Common Objects in 3D (CO3D)
3D 資料集 (Honorable Mention)
Vision Transformers for Dense Prediction (DPT)
稠密預測
Pyramid Vision Transformer (PVT)
Vision Transformer
Focal Transformer
Vision Transformer
NeRF in the Dark
神經輻射場
MaskFormer: Per-Pixel Classification is Not All You Need
語意分割
Viewing Graph Solvability via Cycle Consistency
多視圖幾何 (Honorable Mention)
Multiscale Vision Transformers (MViT)
Vision Transformer
論文
領域
Passive Ultra-Wideband Single-Photon Imaging
計算攝影 (Marr Prize)
ControlNet: Adding Conditional Control to Text-to-Image Diffusion Models
可控生成 (Marr Prize)
Segment Anything (SAM)
分割基礎模型 (Honorable Mention)
Tracking Everything Everywhere All at Once
運動追蹤 (Best Student Paper)
3D Gaussian Splatting for Real-Time Radiance Field Rendering
3D 渲染
DINOv2: Learning Robust Visual Features without Supervision
自監督學習
AnyDoor: Zero-shot Object-level Image Customization
影像編輯
LAVIE: High-Quality Video Generation
影片生成
Nerfstudio: A Modular Framework for Neural Radiance Fields
神經輻射場
LERF: Language Embedded Radiance Fields
3D 語言
IP-Adapter: Text Compatible Image Prompt Adapter
可控生成
Tracking Anything in High Quality
追蹤
論文
領域
BrickGPT: Generating Physically Stable Brick Structures from Text
3D 生成 (Marr Prize)
Spatially-Varying Autofocus
計算攝影 (Honorable Mention)
RayZer: A Self-supervised Large View Synthesis Model
新視角合成 (Best Student Paper HM)
FlowEdit: Inversion-Free Text-Based Editing
影像編輯
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
視覺語言模型
Sa2VA: Marrying SAM2 with LLaVA
多模態定位
Dynamic-DINO: MoE for Real-time Open-Vocabulary Detection
開放詞彙偵測
MaskControl: Spatio-Temporal Control for Motion Synthesis
動作生成
HERMES: A Unified Self-Driving World Model
自動駕駛
LongSplat: Robust Unposed 3D Gaussian Splatting
3D 重建
SceneSplat: Gaussian Splatting-based Scene Understanding
3D 場景理解
From Image to Video: An Empirical Study of Diffusion Representations
表示學習
ECCV (2014-2024, 6 屆, 72 篇)
論文
領域
Large-Scale Object Classification using Label Relation Graphs
物件分類 (Best Paper)
Scene Chronology
場景年代學 (Best Paper)
Microsoft COCO: Common Objects in Context
資料集 / 物件偵測
Spatial Pyramid Pooling in Deep Convolutional Networks (SPPNet)
影像分類 / 偵測
Visualizing and Understanding Convolutional Networks (ZFNet)
網路視覺化
Image Super-Resolution Using Deep Convolutional Networks (SRCNN)
超解析度
LSD-SLAM: Large-Scale Direct Monocular SLAM
SLAM
CNN Features Off-the-Shelf: An Astounding Baseline
遷移學習
Edge Boxes: Locating Object Proposals from Edges
物件候選框
Simultaneous Detection and Segmentation
偵測與分割
Part-Based R-CNNs for Fine-Grained Category Detection
細粒度偵測
Action Recognition with Stacked Fisher Vectors
動作辨識
論文
領域
Real-Time 3D Reconstruction and 6-DoF Tracking with an Event Camera
事件相機 (Best Paper)
The Fast Bilateral Solver
影像處理 (Honorable Mention)
SSD: Single Shot MultiBox Detector
物件偵測
Identity Mappings in Deep Residual Networks
網路架構
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
風格轉換 / 超解析
Colorful Image Colorization
影像上色
Fully-Convolutional Siamese Networks for Object Tracking (SiamFC)
視覺追蹤
Stacked Hourglass Networks for Human Pose Estimation
姿態估測
Learning to Track at 100 FPS with Deep Regression Networks (GOTURN)
視覺追蹤
Wide Residual Networks (WRN)
網路架構
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
語義分割
Temporal Segment Networks (TSN)
影片動作辨識
論文
領域
Implicit 3D Orientation Learning for 6D Object Detection
6D 偵測 (Best Paper)
Group Normalization
正規化 (Honorable Mention)
GANimation: Anatomically-aware Facial Animation from a Single Image
臉部動畫 (Honorable Mention)
CBAM: Convolutional Block Attention Module
注意力機制
CornerNet: Detecting Objects as Paired Keypoints
物件偵測
Encoder-Decoder with Atrous Separable Convolution (DeepLabv3+)
語義分割
Simple Baselines for Human Pose Estimation and Tracking
姿態估測
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
高效網路
PersonLab: Person Pose Estimation and Instance Segmentation
人體分析
Exploring the Limits of Weakly Supervised Pretraining
弱監督預訓練
ESPNet: Efficient Spatial Pyramid of Dilated Convolutions
即時語義分割
Rethinking ImageNet Pre-training
預訓練策略
論文
領域
RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
光流估測 (Best Paper)
NeRF: Representing Scenes as Neural Radiance Fields
神經渲染 (Honorable Mention)
Towards Streaming Perception
串流感知 (Honorable Mention)
DETR: End-to-End Object Detection with Transformers
物件偵測
SOLO: Segmenting Objects by Locations
實例分割
Object-Contextual Representations for Semantic Segmentation (OCRNet)
語義分割
Rewriting a Deep Generative Model
生成模型編輯
PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding
3D 點雲
Rethinking Bottleneck Structure for Efficient Mobile Network Design
行動網路
AR-Net: Adaptive Frame Resolution for Efficient Action Recognition
動作辨識
Knowledge Distillation Meets Self-Supervision
知識蒸餾
Unpaired Learning of Deep Image Denoising
影像去噪
論文
領域
On the Versatile Uses of Partial Distance Correlation in Deep Learning
深度學習理論 (Best Paper)
Level Set Theory for Neural Implicit Evolution
神經隱式表面 (Honorable Mention)
Pose-NDF: Modelling Human Pose Manifolds with Neural Distance Fields
人體姿態 (Honorable Mention)
BEVFormer: Learning Bird's-Eye-View Representation
自動駕駛
TensoRF: Tensorial Radiance Fields
神經渲染
MaxViT: Multi-Axis Vision Transformer
視覺 Transformer
DaViT: Dual Attention Vision Transformers
視覺 Transformer
Panoptic Scene Graph Generation
場景圖生成
PETR: Position Embedding Transformation for Multi-View 3D Detection
3D 偵測
2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR
點雲分割
AnyDoor: Zero-shot Object-level Image Customization
影像編輯
MaskGIT: Masked Generative Image Transformer
影像生成
論文
領域
Minimalist Vision with Freeform Pixels
計算攝影 (Best Paper)
Rasterized Edge Gradients: Handling Discontinuities Differentiably
可微分渲染 (Honorable Mention)
Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models
擴散模型安全 (Honorable Mention)
Adversarial Diffusion Distillation
擴散模型蒸餾
Concept Sliders: LoRA Adaptors for Precise Control
可控生成
Grounding DINO: Marrying DINO with Grounded Pre-Training
開放集偵測
LGM: Large Multi-View Gaussian Model for 3D Content Creation
3D 生成
Sapiens: Foundation for Human Vision Models
人體基礎模型
SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow
光流估測
SiT: Exploring Flow and Diffusion-based Generative Models
生成模型
VideoMamba: State Space Model for Efficient Video Understanding
影片理解
DUSt3R: Geometric 3D Vision Made Easy
3D 重建