Ep. 246 - Part 1 - June 12, 2024

https://is1-ssl.mzstatic.com/image/thumb/Podcasts126/v4/4a/9c/ef/4a9ceff8-5c1a-e15c-62d9-6360c52cd38a/mza_2283181023971434852.jpg/600x600bb.jpg

TechcraftingAI Computer Vision

Brad Edwards

315 episodes

5 days ago

TechcraftingAI Computer Vision brings you summaries of the latest arXiv research daily. Research is read by your virtual host, Sage. The podcast is produced by Brad Edwards, an AI Engineer from Vancouver, BC, and a graduate student of computer science studying AI at the University of York. Thank you to arXiv for use of its open access interoperability.

Technology

RSS

All content for TechcraftingAI Computer Vision is the property of Brad Edwards and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/39305030/39305030-1703089970889-aab16cf4a6955.jpg

Ep. 246 - Part 1 - June 12, 2024

TechcraftingAI Computer Vision

45 minutes 45 seconds

1 year ago

Ep. 246 - Part 1 - June 12, 2024

ArXiv Computer Vision research for Wednesday, June 12, 2024.

00:20: FaithFill: Faithful Inpainting for Object Completion Using a Single Reference Image

01:21: Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

02:49: Unveiling the Power of Wavelets: A Wavelet-based Kolmogorov-Arnold Network for Hyperspectral Image Classification

04:26: Flexible Music-Conditioned Dance Generation with Style Description Prompts

05:52: Robust 3D Face Alignment with Multi-Path Neural Architecture Search

07:00: Small Scale Data-Free Knowledge Distillation

08:48: KernelWarehouse: Rethinking the Design of Dynamic Convolution

10:31: A Comprehensive Survey on Machine Learning Driven Material Defect Detection: Challenges, Solutions, and Future Prospects

12:34: Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation

14:02: IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes

14:54: Multi-Teacher Multi-Objective Meta-Learning for Zero-Shot Hyperspectral Band Selection

16:30: DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera

18:10: Spatial-Frequency Dual Progressive Attention Network For Medical Image Segmentation

20:07: Accurate Explanation Model for Image Classifiers using Class Association Embedding

21:55: Real-world Image Dehazing with Coherence-based Label Generator and Cooperative Unfolding Network

23:11: SimSAM: Simple Siamese Representations Based Semantic Affinity Matrix for Unsupervised Image Segmentation

24:06: Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization

25:34: OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding

26:58: Generalizable Disaster Damage Assessment via Change Detection with Vision Foundation Model

28:26: Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models

29:52: Deep Learning for Slum Mapping in Remote Sensing Images: A Meta-analysis and Review

31:49: LVBench: An Extreme Long Video Understanding Benchmark

33:14: Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking

34:48: A Robust Pipeline for Classification and Detection of Bleeding Frames in Wireless Capsule Endoscopy using Swin Transformer and RT-DETR

36:23: 3D CBCT Challenge 2024: Improved Cone Beam CT Reconstruction using SwinIR-Based Sinogram and Image Enhancement

37:29: MWIRSTD: A MWIR Small Target Detection Dataset

38:34: CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

40:27: A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder

42:35: Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

44:26: Identification of Conversation Partners from Egocentric Video