TechcraftingAI Computer Vision

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

TechcraftingAI Computer Vision

De : Brad Edwards

Écouter gratuitement

À propos de ce contenu audio

TechcraftingAI Computer Vision brings you summaries of the latest arXiv research daily. Research is read by your virtual host, Sage. The podcast is produced by Brad Edwards, an AI Engineer from Vancouver, BC, and a graduate student of computer science studying AI at the University of York. Thank you to arXiv for use of its open access interoperability.Brad Edwards

Les membres Amazon Prime bénéficient automatiquement de 2 livres audio offerts chez Audible.

Vous êtes membre Amazon Prime ?

Bénéficiez automatiquement de 2 livres audio offerts.
Bonne écoute !

Épisodes Voir plus

Ep. 247 - Part 3 - June 13, 2024

Jun 15 2024

ArXiv Computer Vision research for Thursday, June 13, 2024.

00:21: LRM-Zero: Training Large Reconstruction Models with Synthesized Data

01:56: Scale-Invariant Monocular Depth Estimation via SSI Depth

03:08: GGHead: Fast and Generalizable 3D Gaussian Heads

04:55: Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

06:34: Towards Vision-Language Geo-Foundation Model: A Survey

08:11: SimGen: Simulator-conditioned Driving Scene Generation

09:44: Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition

11:03: Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

12:32: LLAVIDAL: Benchmarking Large Language Vision Models for Daily Activities of Living

13:56: WonderWorld: Interactive 3D Scene Generation from a Single Image

15:21: Modeling Ambient Scene Dynamics for Free-view Synthesis

16:29: Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA

17:50: Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms

19:39: Real-Time Deepfake Detection in the Real-World

21:17: OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation

23:02: Yo'LLaVA: Your Personalized Language and Vision Assistant

24:30: MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

26:26: Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion

28:03: Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

29:59: ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing

31:24: 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

33:16: Towards Evaluating the Robustness of Visual State Space Models

34:57: Data Attribution for Text-to-Image Models by Unlearning Synthesized Images

36:09: CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras

37:37: Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach

40:02: MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

41:40: Explore the Limits of Omni-modal Pretraining at Scale

42:46: Interpreting the Weight Space of Customized Diffusion Models

43:58: Depth Anything V2

45:12: An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

46:23: Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

48:11: Rethinking Score Distillation as a Bridge Between Image Distributions

49:44: VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Afficher plus Afficher moins

52 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
Ep. 247 - Part 2 - June 13, 2024

Jun 15 2024

ArXiv Computer Vision research for Thursday, June 13, 2024.

00:21: INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

02:11: Large-Scale Evaluation of Open-Set Image Classification Techniques

03:43: PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation

05:00: MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era

06:41: Auto-Vocabulary Segmentation for LiDAR Points

07:30: AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring

08:43: EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

10:23: Fine-Grained Domain Generalization with Feature Structuralization

12:03: SR-CACO-2: A Dataset for Confocal Fluorescence Microscopy Image Super-Resolution

14:13: ReMI: A Dataset for Reasoning with Multiple Images

15:41: A Large-scale Universal Evaluation Benchmark For Face Forgery Detection

17:26: Thoracic Surgery Video Analysis for Surgical Phase Recognition

18:58: Reducing Task Discrepancy of Text Encoders for Zero-Shot Composed Image Retrieval

20:40: Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

22:26: CLIP-Driven Cloth-Agnostic Feature Learning for Cloth-Changing Person Re-Identification

24:22: Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024

25:21: Optimizing Visual Question Answering Models for Driving: Bridging the Gap Between Human and Machine Attention Patterns

26:30: WildlifeReID-10k: Wildlife re-identification dataset with 10k individual animals

27:44: MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction

29:28: Comparison Visual Instruction Tuning

30:51: MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

32:14: Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV

33:10: Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

34:33: Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models

36:04: StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning

37:30: Parameter-Efficient Active Learning for Foundational models

38:31: Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

40:22: Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

42:38: Towards AI Lesion Tracking in PET/CT Imaging: A Siamese-based CNN Pipeline applied on PSMA PET/CT Scans

44:36: Memory-Efficient Sparse Pyramid Attention Networks for Whole Slide Image Analysis

46:19: Instance-level quantitative saliency in multiple sclerosis lesion segmentation

48:37: CMC-Bench: Towards a New Paradigm of Visual Signal Compression

50:05: Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs

52:05: CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models

Afficher plus Afficher moins

53 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
Ep. 247 - Part 1 - June 13, 2024

Jun 15 2024

ArXiv Computer Vision research for Thursday, June 13, 2024.

00:21: FouRA: Fourier Low Rank Adaptation

01:41: Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

03:18: Few-Shot Anomaly Detection via Category-Agnostic Registration Learning

04:57: Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting

06:46: ToSA: Token Selective Attention for Efficient Vision Transformers

08:00: Computer vision-based model for detecting turning lane features on Florida's public roadways

09:08: Improving Adversarial Robustness via Feature Pattern Consistency Constraint

10:52: Research on Deep Learning Model of Feature Extraction Based on Convolutional Neural Network

12:10: NeRF Director: Revisiting View Selection in Neural Volume Rendering

13:36: Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency

15:03: Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

16:40: COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing

18:16: Fusion of regional and sparse attention in Vision Transformers

19:26: Zoom and Shift are All You Need

20:17: EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding

21:49: The Penalized Inverse Probability Measure for Conformal Classification

23:24: OpenMaterial: A Comprehensive Dataset of Complex Materials for 3D Reconstruction

24:47: Blind Super-Resolution via Meta-learning and Markov Chain Monte Carlo Simulation

26:30: Computer Vision Approaches for Automated Bee Counting Application

27:17: Dual Attribute-Spatial Relation Alignment for 3D Visual Grounding

28:16: A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras

29:43: Multiple Prior Representation Learning for Self-Supervised Monocular Depth Estimation via Hybrid Transformer

31:25: Neural NeRF Compression

32:29: Preserving Identity with Variational Score for General-purpose 3D Editing

33:50: AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings

34:51: Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition

36:10: Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation

37:34: AMSA-UNet: An Asymmetric Multiple Scales U-net Based on Self-attention for Deblurring

38:49: Cross-Modal Learning for Anomaly Detection in Fused Magnesium Smelting Process: Methodology and Benchmark

40:45: A PCA based Keypoint Tracking Approach to Automated Facial Expressions Encoding

42:02: Steganalysis on Digital Watermarking: Is Your Defense Truly Impervious?

43:28: FacEnhance: Facial Expression Enhancing with Recurrent DDPMs

45:11: How structured are the representations in transformer-based vision encoders? An analysis of multi-object representations in vision-language models

47:08: Suitability of KANs for Computer Vision: A preliminary investigation

Afficher plus Afficher moins

48 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement

Aucun commentaire pour le moment

SÉLECTION

TechcraftingAI Computer Vision

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

TechcraftingAI Computer Vision

À propos de ce contenu audio

Vous êtes membre Amazon Prime ?

Ep. 247 - Part 3 - June 13, 2024

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Ep. 247 - Part 2 - June 13, 2024

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Ep. 247 - Part 1 - June 13, 2024

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Les Top 10

Prix littéraires

Écoutez en illimité