C'e s t L a V I E
Visual Inference&Evaluation
Group Seminar
Deep Model Reference: Simple yet Effective Confidence Estimation for Image Classification |
|
Multimodal Action Quality Assessment |
|
A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment |
|
Q-Ground: Image Quality Grounding with Large Multi-modality Models |
|
MEDICAL SAM 2: SEGMENT MEDICAL IMAGES AS VIDEO VIA SEGMENT ANYTHING MODEL 2 |
|
LAR-IQA: A Lightweight, Accurate, and Robust No-Reference Image Quality Assessment Model |
|
Your Diffusion Model is Secretly a Zero-Shot Classifier |
|
ExpertAF: Expert Actionable Feedback from Video |
|
CausalPC: Improving the Robustness of Point Cloud Classification by Causal Effect Identification |
|
Rich Human Feedback for Text-to-Image Generation |
|
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs |
|
pix2gestalt: Amodal Segmentation by Synthesizing Wholes |
|
Magic Mamba |
|
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction |
|
U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation |
|
Scaling Vision with Sparse Mixture of Experts |
|
From Feline Classification to Skills Evaluation: A Multitask Learning Framework for Evaluating Micro Suturing Neurosurgical Skills |
|
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation |
|
Understanding Zero-Shot Adversarial Robustness for Large-Scale Models |
|
Comparison of No-Reference Image Quality Models via MAP Estimation in Diffusion Latents |
|
Ferret: Refer and Ground Anything Anywhere at Any Granularity |
|
Disruptive Autoencoders- Leveraging Low-level features for 3D Medical Image Pre-training |
|
Keep Your Eye on the Best: Contrastive Regression Transformer for Skill Assessment in Robotic Surgery |
|
Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild |
|
DiffPoint: Single and Multi-view Point Cloud Reconstruction with ViT Based Diffusion Model |
|
Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning |
|
Scalable Diffusion Models with Transformers |
|
PECoP: Parameter Efficient Continual Pretraining for Action Quality Assessment |
|
Zero-1-to-3: Zero-shot One Image to 3D Object |
|
Multimodal Optimal Transport-based Co-Attention Transformer with Global Structure Consistency for Survival Prediction |
|
Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation |
|
Objects do not disappear: Video object detection by single-frame object location anticipation |
|
MedLSAM: Localize and Segment Anything Model for 3D CT Images |
|
Test Time Adaptation for Blind Image Quality Assessment |
|
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning |
|
UMIFormer: Mining the Correlations between Similar Tokens for Multi-View 3D Reconstruction |
|
Attention Discriminant Sampling for Point Clouds |
|
Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Localization |
|
Coarse-to-Fine Amodal Segmentation with Shape Prior |
|
DDG-Net: Discriminability-Driven Graph Network forWeakly-supervised Temporal Action Localization |
|
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models |
|
Blindly Assess Image Quality in the Wild Guided by A Self-Adaptive Hyper Network |
|
Deep Evidential Regression |
|
MA-SAM- Modality-agnostic SAM Adaptation for 3D Medical Image Segmentation |
|
Segment and Track Anything |
|
MARS: Model-agnostic Biased Object Removal without Additional Supervision for Weakly-Supervised Semantic Segmentation |
|
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation |
|
Blind image quality assessment based on progressive multi-task learning |
|
Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks |
|
RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model |
|
E-LPIPS: Robust Perceptual Image Similarity via Random Transformation Ensembles |
|
SegGPT: Segmenting Everything In Context |
|
Volumetric memory network for interactive medical image segmentation |
|
Semi-Supervised Authentically Distorted Image Quality Assessment with Consistency-Preserving Dual-Branch Convolutional Neural Network |
|
3D Cinemagraphy from a Single Image |
|
Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection |
|
Perceptual Attacks of No-Reference Image Quality Models with Human-in-the-Loop |
|
Knowledge-Guided Blind Image Quality Assessment with Few Training Samples |
|
PIDNet- A Real-time Semantic Segmentation Network Inspired by PID Controllers |
|
Images Speak in Images: A Generalist Painter for In-Context Visual Learning |
|
Sibling-Attack: Rethinking Transferable Adversarial Attacks against Face Recognition |
|
Image Quality Assessment using Semi-Supervised Representation Learning |
|
Segment Anything |
|
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment |
|
Unsupervised Pre-training for Temporal Action Localization Tasks |
|
Towards Implicit Text-Guided 3D Shape Generation: Supplementary Material |
|
HCSC: Hierarchical Contrastive Selective Coding |
|
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation |
|
PATCHDCT: PATCH REFINEMENT FOR HIGH QUALITY INSTANCE SEGMENTATION |
|
No Reference Opinion Unaware Quality Assessment of Authentically Distorted Images |
|
Towards certifying Linf robustness using neural networks with Linf-dist neurons |
|
Instance Shadow Detection |
|
Transductive Semi-Supervised Deep Learning using Min-Max Features |
|
V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation |
|
Bridge-Prompt Towards Ordinal Action Understanding in Instructional |
|
Hyperbolic Image Segmentation |
|
Image Quality Assessment using Contrastive Learning |
|
【写作技巧】Abstract和Introduction鉴赏 |
|
Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation |
|
Fast and Unsupervised Action Boundary Detection for Action Segmentation |
|
Natural Color Fool: Towards Boosting Black-box Unrestricted Attacks |
|
Theme-Aware Semi-Supervised Image Aesthetic Quality Assessment |
|
Uncertainty-Guided Probabilistic Transformer for Complex Action Recognition |
|
ObjectBox: From Centers to Boxes for Anchor-Free Object Detection |
|
Learning to Refactor Action and Co-occurrence Features for Temporal Action Localization |
|
Shift-tolerant Perceptual Similarity Metric |
|
Incorporating Semi-Supervised and Positive-Unlabeled Learning for Boosting Full Reference Image Quality Assessment |
|
Deep Evidential Regression |
|
In Defense of Online Models for Video Instance Segmentation |
|
The Dimpled Manifold Model of Adversarial Examples in Machine Learning |
|
Modeling Localness for Self-Attention Networks |
|
Single-View 3D Object Reconstruction from Shape Priors in Memory |
|
Class Semantic-based Attention for Action Detection |
|
Optimism in the Face of Adversity: Understanding and Improving Deep Learning Through Adversarial Robustness |
|
Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification |
|
MaD-DLS: Mean and Deviation of Deep and Local Similarity for Image Quality Assessment |
|
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation |
|
Prototypical Cross-Attention Networks forMultiple Object Tracking and Segmentation |
|
A Simple Semi-Supervised Learning Framework for Object Detection |
|
FreeSOLO: Learning to Segment Objects without Annotations |
|
Deep Self-Dissimilarities as Powerful Visual Fingerprints |
|
Temporal Action Detection with Multi-level Supervision |
|
An Image Patch is a Wave: Quantum Inspired Vision MLP |
|
Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation |
|
Reducing Flipping Errors in Deep Neural Networks |
|
Vision-Language Pre-Training with Triple Contrastive Learning |
|
Instant Teaching: An End to End Semi Supervised Object Detection Framework |
|
Learning Action Completeness from Points forWeakly-supervised Temporal Action Localization |
|
Weakly Supervised Instance Segmentation using Class Peak Response |
|
First-order Adversarial Vulnerability of Neural Networks and Input Dimension |
|
Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment |
|
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion |
|
End-to-End Video Instance Segmentation with Transformers |
|
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields |
|
Unsupervised Domain Adaptation in Semantic Segmentation |
|
Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization |
|
Pixel Difference Networks for Efficient Edge Detection |
|
Image Quality Assessment: Unifying Structure and Texture Similarity |
|
Cubemap-Based Perception-Driven Blind Quality Assessment for 360-degree Images |
|
Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation |
|
A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation |
|
Perceptual Adversarial Robustness: Defense Against Unseen Threat Models |
|
GLoRIA A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition |
|
Weakly-Supervised Semantic Segmentation via Sub-category Exploration |
|
Learning to Resize Images for Computer Vision Tasks |
|
Topology-Imbalance Learning for Semi-Supervised Node Classification |
|
MUSIQ: Multi-scale Image Quality Transformer |
|
Video Self-Stitching Graph Network for Temporal Action Localization |
|
Self-Supervised 3D Mesh Reconstruction from Single Images |
|
DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort |
|
LOWKEY: Leveraging Adversarial Attacks to Protect Social Media Users From Facial Recognition |
|
Blind Omnidirectional Image Quality Assessment Based on Structure and Natural Features |
|
Pre-Trained Image Processing Transformer |
|
Rich features for perceptual quality assessment of UGC videos |
|
BoxInst: High-Performance Instance Segmentation with Box Annotations |
|
Instance Similarity Learning for Unsupervised Feature Representation |
|
Enriching Local and Global Contexts for Temporal Action Localization |
|
Prototype Completion with Primitive Knowledge for Few-Shot Learning |
|
Feature Importance-aware Transferable Adversarial Attacks |
|
Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement Learning |
|
ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation |
|
Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild |
|
MDETR - Modulated Detection for End-to-End Multi-Modal Understanding |
|
Temporal Query Networks for Fine-Grained Video Understanding |
|
Recent work |
|
Hierarchical Recurrent Deep Fusion Using Adaptive Clip Summarization for Sign Language Translation |
|
Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation |
|
Patch-VQ: ‘Patching Up’ the Video Quality Problem |
|
Extreme Rotation Estimation using Dense Correlation Volumes |
|
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting |
|
Point2Skeleton: Learning Skeletal Representations from Point Clouds |
|
TDN: Temporal Difference Networks for Efficient Action Recognition |
|
The Devil is in the Boundary Exploiting Boundary Representation for Basis-based Instance Segmentation |
|
CO2: CONSISTENT CONTRAST FOR UNSUPERVISED VISUAL REPRESENTATION LEARNING |
|
Emerging Properties in Self-Supervised Vision Transformers |
|
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers |
|
进展汇报:小工具推荐 |
|
Feature Selection for Zero-Shot Gesture Recognition |
|
Detecting and Mapping Video Impairments |
|
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis |
|
Hallucinated-IQA No-Reference Image Quality Assessment |
|
Quantifying Visual Image Quality: A Bayesian View |
|
Distilling Knowledge via Knowledge Review |
|
Transferable Perturbations of Deep Feature Distributions |
|
Personality-Assisted Multi-Task Learning for Generic and Personalized Image Aesthetics Assessment |
|
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows |
|
Learning Continuous Image Representation with Local Implicit Image Function |
|
A survey on visual transformer |
|
Depth and Amodal Segmentation |
|
Unlearnable examples: making personal data exploitable |
|
Learning Transferable Visual Models From Natural Language Supervision |
|
Blind Image Quality Assessment with Active Inference |
|
Conditional Convolutions for Instance Segmentation |
|
Disentangled Non-Local Networks |
|
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric |
|
RankIQA Learning From Rankings for No-Reference Image Quality Assessment |
|
Towards Open World Object Detection |
|
Peer Collaborative Learning for Online Knowledge Distillation |
|
Weakly-Supervised Action Localization by Generative Attention Modeling |
|
Modeling Multi-Label Action Dependencies for Temporal Action Localization |
|
RESTRICTING THE FLOW: INFORMATION BOTTLENECKS FOR ATTRIBUTION |
|
Unsupervised Multi-Modal Image Registration via Geometry Preserving |
|
Continual Learning for Blind Image Quality Assessment |
|
Stabilized Medical Image Attack |
|
Recent Progress on Self-Supervised Representation Learning |
|
Amodal Segmentation Based on Visible Region Segmentation and Shape Prior |
|
The VC Dimension |
|
Unsupervised Deep Homography A Fast and Robust Homography Estimation Model |
|
Rethinking softmax cross entropy loss for adversarial robustness |
|
Learning via Uniform Convergence |
|
Joint Semantic Segmentation and Boundary Detection using Iterative Pyramid Contexts |
|
Temporal Pyramid Network for Action Recognition |
|
PAC-Learning |
|
Growing Neural Cellular Automata |
|
RIRNet: Recurrent-In-Recurrent Network for Video Quality Assessment |
|
An Unsupervised Information-Theoretic Perceptual Quality Metric |
|
Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space |
|
A Mathematical Theory of Evidence |
|
Correlating Edge, Pose with Parsing |
|
Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation |
|
Adversarial Weight Perturbation Helps Robust Generalization |
|
Memory-augmented Dense Predictive Coding for Video Representation Learning |
|
Appearance-Preserving 3D Convolution for Video-based Person Re-identification |
|
Synthesize then Compare - Detecting Failures and Anomalies for Semantic Segmentation |
|
ArcFace: Additive Angular Margin Loss for Deep Face Recognition |
|
Understanding the Role of Individual Units in a Deep Network |
|
Object Instance Annotation with Deep Extreme Level Set Evolution |
|
Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation |
|
Self-Attention: From Image Recognition to Image Segmentation |
|
Beyond Vision: A Multimodal Recurrent Attention Convolutional Neural Network for Unified Image Aesthetic Prediction Tasks |
|
High-frequency Component Helps Explain the Generalization of Convolutional Neural Networks |
|
An Asymmetric Modeling for Action Assessment |
|
ECCV 20 Segmentation Theme |
|
Collaborative Video Object Segmentation by Foreground-Background Integration |
|
End-to-End Object Detection with Transformers |
|
A Unified Framework of Surrogate Loss by Refactoring and Interpolation |
|
Adv-watermark_A Novel Watermark Perturbation for Adversarial Examples |
|
[Survey] Meta-Learning |
|
Dual Super-Resolution Learning for Semantic Segmentation |
|
SRFlow: Learning the Super-Resolution Space with Normalizing Flow |
|
Closed-loop Matters: Dual Regression Networks for Single Image Super-Resolution |
|
Exploring Self-attention for Image Recognition |
|
Efficient Semantic Video Segmentation with Per-frame Inference |
|
Attacks Which Do Not Kill Training Make Adversarial Learning Stronger |
|
FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding |
|
Learning Instance Occlusion for Panoptic Segmentation |
|
Bootstrap Your Own Latent A New Approach to Self-Supervised Learning |
|
Understanding SSIM |
|
Image Processing Using Multi-Code GAN Prior |
|
RetinaTrack: Online Single Stage Joint Detection and Tracking |
|
Not All Areas Are Equal: Transfer Learning for Semantic Segmentation via Hierarchical Region Selection |
|
Second-Order Provable Defenses against Adversarial Attacks |
|
Learning Fast and Robust Target Models for Video Object Segmentation |
|
SpeedNet: Learning the Speediness in Videos |
|
Intra- and Inter-Action Understanding via Temporal Action Parsing |
|
Gradient Centralization: A New Optimization Technique for Deep Neural Networks |
|
Visualizing the Invisible Occluded Vehicle Segmentation and Recovery |
|
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy |
|
Multi-Modal Domain Adaptation for Fine-Grained Action Recognition |
|
Destruction and Construction Learning for Fine-grained Image Recognition |
|
Blurry Video Frame Interpolation |
|
Spatially Transformed Adversarial ExamplesSpatially Transformed Adversarial Examples |
|
SC4D: A Sparse 4D Convolutional Network for Skeleton-Based Action Recognition |
|
Circle Loss: A Unified Perspective of Pair Similarity Optimization |
|
MOPT: Multi-Object Panoptic Tracking |
|
Deep Unfolding Network for Image Super-Resolution |
|
Difficulty-Aware Attention Network with Confidence Learning for Medical Image Segmentation |
|
Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations |
|
Diagnosing Error in Object Detectors |
|
BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition |
|
Feedback Graph Convolutional Network for Skeleton-based Action Recognition |
|
How Useful is Self-Supervised Pretraining for Visual Tasks |
|
See the Sound, Hear the Pixels |
|
PolarMask: Single Shot Instance Segmentation with Polar Representation |
|
RANet: Ranking Attention Network for Fast Video Object Segmentation |
|
PSENet: Psoriasis Severity Evaluation Network |
|
GIQA: Generated Image Quality Assessment |
|
Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation |
|
From Patches to Pictures (PaQ-2-PiQ): Mapping the Perceptual Space of Picture Quality |
|
Functional Adversarial Attacks |
|
PF-Net- Point Fractal Network for 3D Point Cloud Completion |
|
DynamoNet: Dynamic Action and Motion Network |
|
Deep Snake for Real-Time Instance Segmentation |
|
Spatial-Temporal Relation Networks for Multi-Object Tracking |
|
Part-Level Graph Convolutional Network for Skeleton Based Action Recognition |
|
Unsupervised Learning for Real-World Super-Resolution |
|
Adversarial Feedback Loop |
|
Attention Based Glaucoma Detection: A Large-scale Database and CNN Model |
|
SlowFast Networks for Video Recognition |
|
Libra R-CNN: Towards Balanced Learning for Object Detection |
|
How Does Batch Normalization Help Optimization? |
|
Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses |
|
What’s important to boost performance in deep learning |
|
Anchor Diffusion for Unsupervised Video Object Segmentation |
|
Collaborative Learning of Semi-Supervised Segmentation and Classification for Medical Images |
|
Trust Region Based Adversarial Attack on Neural Networks |
|
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation |
|
Tracking Without Bells and Whistles |
|
DeepCO3 - Deep Instance Co-segmentation by Co-peak Search and Co-saliency Detection |
|
Noise2Void - Learning Denoising from Single Noisy Images |
|
Action Recognition from Single Timestamp Supervision in Untrimmed Videos |
|
Momentum Contrast for Unsupervised Visual Representation Learning |
|
Efficient Parameter-free Clustering Using First Neighbor Relations |
|
DeepUSPS: Deep Robust Unsupervised Saliency Prediction With Self-Supervision |
|
Cyclic Guidance for Weakly Supervised Joint Detection and Segmentation |
|
Efficient Online Multi-Person 2D Pose Tracking with Recurrent Spatio-Temporal Affinity Fields |
|
Multi-person Articulated Tracking with Spatial and Temporal Embeddings |
|
ComDefend_An Efficient Image Compression Model to Defend Adversarial Examples |
|
Second-order Attention Network for Single Image Super-Resolution |
|
Image Aesthetic Assessment Based on Pairwise Comparison – A Unified Approach to Score Regression, Binary Classification, and Personalization |
|
Learning Semantics-aware Distance Map with Semantics Layering Network for Amodal Instance Segmentation |
|
A Style-Based Generator Architecture for Generative Adversarial Networks |
|
SeGAN: Segmenting and Generating the Invisible |
|
Action Assessment by Joint Relation Graphs |
|
Learning Temporal Action Proposals With Fewer Labels |
|
Quality Assessment of In-the-Wild Videos |
|
RankSRGAN: Generative Adversarial Networks with Ranker for Image Super-Resolution |
|
Adaptive Pyramid Context Network for Semantic Segmentation |
|
Surgical Skill Assessment on In-Vivo Clinical Data via the Clearness of Operating Field |
|
Weakly Supervised Energy-Based Learning for Action Segmentation |
|
SparseFool_A_Few_Pixels_Make_a_Big_Difference |
|
An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition |
|
A General and Adaptive Robust Loss Function |
|
Graph convolutional tracking |
|
Visual Attention Consistency under Image Transforms for Multi-Label Image Classification |
|
Timeception for Complex Action Recognition |
|
Efficient Video Classification Using Fewer Frames |
|
ScratchDet: Training Single-Shot Object Detectors from Scratch |
|
Learning Loss for Active Learning |
|
Do Better ImageNet Models Transfer Better? |
|
Weakly Supervised Image Classification through Noise Regularization |
|
“Double-DIP”: Unsupervised Image Decomposition via Coupled Deep-Image-Priors |
|
Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration |
|
Defense Against Adversarial Images using Web-Scale Nearest-Neighbor Search |
|
Adversarial Attacks Beyond the Image Space |
|
Collaborative Global-Local Networks for Memory-Efficient Segmentation |
|
Accel-A Corrective Fusion Network for Efficient Semantic Segmentation on Video |
|
Pose2Seg- Detection Free Human Instance Segmentation |
|
UPSNet- A Unified Panoptic Segmentation Network |
|
Laso: Label-Set Operations networks for multi-label few-shot learning |
|
TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions |
|
Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition |
|
Skin Lesion Classification in Dermoscopy Images Using Synergic Deep Learning |
|
1. A Relation-Augmented Fully Convolutional Network for Semantic Segmentation in Aerial Scenes 2. Co-occurrent Features in Semantic Segmentation |
|
Not All Areas Are Equal: Transfer Learning for Semantic Segmentation via Hierarchical Region Selection |
|
Learning Correspondence from the Cycle-consistency of Time |
|
SiamRPN++ |
|
Bag of Tricks for Image Classification with Convolutional Neural Networks |
|
Making Convolutional Networks Shift-Invariant Again |
|
Complement Objective Training |
|
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment |
|
DAVANet: Stereo Deblurring with View Aggregation |
|
Towards Robust Detection of Adversarial Examples |
|
Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers |
|
On the Use of Deep Learning for Blind Image Quality Assessment |
|
Fast Online Object Tracking and Segmentation: A Unifying Approach |
|
Learning Deep Compositional Grammatical Architectures for Visual Recognition |
|
ExFuse: Enhancing Feature Fusion for Semantic Segmentation |
|
The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos |
|
SFNet: Learning Object-aware Semantic Correspondence |
|
Data augmentation using learned transforms for one-shot medical image segmentation |
|
A Comparative Study for Single Image Blind Deblurring |
|
Graph CNNs with Motif and Variable Temporal Block for Skeleton-based Action Recognition |
|
Deep Video Quality Assessor: From Spatio-temporal Visual Sensitivity to A convolutional Neural Aggregation Network |
|
CCNet: Criss-Cross Attention for Semantic Segmentation |
|
DenseASPP for Semantic Segmentation in Street Scenes |
|
Eliminating Background-Bias for Robust Person Re-identification |
|
Application-Driven No-Reference Quality |
|
R-FCN: Object Detection via Region-based Fully Convolutional Networks |
|
ActionVLAD: Learning spatio-temporal aggregation for action classification |
|
MoNet: Deep Motion Exploitation for Video Object Segmentation |
|
A Constrained Deep Neural Network for Ordinal Regression |
|
DeepFool: a simple and accurate method to fool deep neural networks |
|
Modeling Surgical Technical Skill Using Expert Assessment for Automated Computer Rating |
|
Self-Ensembling Attention Networks- Addressing Domain Shift for Semantic Segmentation |
|
IMAGENET-TRAINED CNNS ARE BIASED TOWARDS TEXTURE; INCREASING SHAPE BIAS IMPROVES ACCURACY AND ROBUSTNESS |
|
Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles |
|
Dual Attention Network for Scene Segmentation |
|
A Benchmark for Automatic Visual Classification of Clinical Skin Disease Images |
|
Clinical Skin Lesion Diagnosis using Representations Inspired by Dermatologist Criteria |
|
An Information-Theoretic Definition of Similarity |
|
Generating Images with Perceptual Similarity Metrics based on Deep Networks |
|
Spatio-Temporal Graph Routing for Skeleton-based Action Recognition |
|
Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation |
|
Towards Robust Interpretability with Self-explaining Neural Networks |
|
Trajectory Convolution for Action Recognition |
|
Videos as Space-Time Region Graphs |
|
Recurrent Autoregressive Networks for Online Multi-Object Tracking |
|
Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks |
|
High Performance Visual Tracking with Siamese Region Proposal Network |
|
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric |
|
Fast Video Object Segmentation by Reference-Guided Mask Propagation |
|
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets |
|
A New Representation of Skeleton Sequences for 3D Action Recognition |
|
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition |
|
Learning Deep Features for Discriminative Localization |
|
Networks and the Best Approximation Property |
|
Temporal Deformable Residual Networks for Action Segmentation in Videos |
|
Bi-box Regression for Pedestrian Detection and Occlusion Estimation |
|
Instance Segmentation and Tracking with Cosine Embeddings and Recurrent Hourglass Networks |
|
Person Re-identification with Deep Similarity-Guided Graph Neural Network |
|
Collaborative Deep Reinforcement Learning for Multi-Object Tracking |
|
Video quality assessment accounting for temporal visual masking of local flicker |
|
Deep Reinforcement Learning for Surgical Gesture Segmentation and Classification |
|
Generate To Adapt: Aligning Domains using Generative Adversarial Networks |
|
Direction-aware Spatial Context Features for Shadow Detection and Removal |
|
Online Multi-Object Tracking with Dual Matching Attention Networks |
|
Fusing Crowd Density Maps and Visual Object Trackers for People Tracking in Crowd Scenes |
|
Eigen-Distortions of Hierarchical Representations |
|
Gaussian Process |
|