Ep. 242 - June 8, 2024

https://is1-ssl.mzstatic.com/image/thumb/Podcasts126/v4/4a/9c/ef/4a9ceff8-5c1a-e15c-62d9-6360c52cd38a/mza_2283181023971434852.jpg/600x600bb.jpg

TechcraftingAI Computer Vision

Brad Edwards

315 episodes

6 days ago

TechcraftingAI Computer Vision brings you summaries of the latest arXiv research daily. Research is read by your virtual host, Sage. The podcast is produced by Brad Edwards, an AI Engineer from Vancouver, BC, and a graduate student of computer science studying AI at the University of York. Thank you to arXiv for use of its open access interoperability.

Technology

RSS

All content for TechcraftingAI Computer Vision is the property of Brad Edwards and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/39305030/39305030-1703089970889-aab16cf4a6955.jpg

Ep. 242 - June 8, 2024

TechcraftingAI Computer Vision

36 minutes 36 seconds

1 year ago

Ep. 242 - June 8, 2024

ArXiv Computer Vision research for Saturday, June 08, 2024.

00:20: Blurry-Consistency Segmentation Framework with Selective Stacking on Differential Interference Contrast 3D Breast Cancer Spheroid

01:31: 1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation

03:01: Metric Convolutions: A Unifying Theory to Adaptive Convolutions

04:13: Layered Image Vectorization via Semantic Simplification

05:18: Select-Mosaic: Data Augmentation Method for Dense Small Object Scenes

06:31: 3D MRI Synthesis with Slice-Based Latent Diffusion Models: Improving Tumor Segmentation Tasks in Data-Scarce Regimes

07:51: Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models

09:42: Unsupervised learning of Data-driven Facial Expression Coding System (DFECS) using keypoint tracking

11:36: HDRT: Infrared Capture for HDR Imaging

13:14: Attri-Net: A Globally and Locally Inherently Interpretable Model for Multi-Label Classification Using Class-Specific Counterfactuals

14:49: Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis

16:18: Training-Free Robust Interactive Video Object Segmentation

17:49: One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models

19:50: A Two-Stage Adverse Weather Semantic Segmentation Method for WeatherProof Challenge CVPR 2024 Workshop UG2+

21:04: PAPR in Motion: Seamless Point-level 3D Scene Interpolation

22:25: VP-LLM: Text-Driven 3D Volume Completion with Large Language Models through Patchification

23:38: Medical Vision Generalist: Unifying Medical Imaging Tasks in Context

25:24: Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification

26:50: Understanding Inhibition Through Maximally Tense Images

27:52: Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

29:19: Deep Learning to Predict Glaucoma Progression using Structural Changes in the Eye

30:58: Which Backbone to Use: A Resource-efficient Domain Specific Comparison for Computer Vision

32:32: Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval

34:11: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

35:35: Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion