Ep. 247 - Part 2 - June 13, 2024

https://is1-ssl.mzstatic.com/image/thumb/Podcasts126/v4/4a/9c/ef/4a9ceff8-5c1a-e15c-62d9-6360c52cd38a/mza_2283181023971434852.jpg/600x600bb.jpg

TechcraftingAI Computer Vision

Brad Edwards

315 episodes

12 hours ago

TechcraftingAI Computer Vision brings you summaries of the latest arXiv research daily. Research is read by your virtual host, Sage. The podcast is produced by Brad Edwards, an AI Engineer from Vancouver, BC, and a graduate student of computer science studying AI at the University of York. Thank you to arXiv for use of its open access interoperability.

Technology

RSS

All content for TechcraftingAI Computer Vision is the property of Brad Edwards and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/39305030/39305030-1703089970889-aab16cf4a6955.jpg

Ep. 247 - Part 2 - June 13, 2024

TechcraftingAI Computer Vision

53 minutes 26 seconds

1 year ago

Ep. 247 - Part 2 - June 13, 2024

ArXiv Computer Vision research for Thursday, June 13, 2024.

00:21: INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

02:11: Large-Scale Evaluation of Open-Set Image Classification Techniques

03:43: PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation

05:00: MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era

06:41: Auto-Vocabulary Segmentation for LiDAR Points

07:30: AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring

08:43: EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

10:23: Fine-Grained Domain Generalization with Feature Structuralization

12:03: SR-CACO-2: A Dataset for Confocal Fluorescence Microscopy Image Super-Resolution

14:13: ReMI: A Dataset for Reasoning with Multiple Images

15:41: A Large-scale Universal Evaluation Benchmark For Face Forgery Detection

17:26: Thoracic Surgery Video Analysis for Surgical Phase Recognition

18:58: Reducing Task Discrepancy of Text Encoders for Zero-Shot Composed Image Retrieval

20:40: Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

22:26: CLIP-Driven Cloth-Agnostic Feature Learning for Cloth-Changing Person Re-Identification

24:22: Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024

25:21: Optimizing Visual Question Answering Models for Driving: Bridging the Gap Between Human and Machine Attention Patterns

26:30: WildlifeReID-10k: Wildlife re-identification dataset with 10k individual animals

27:44: MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction

29:28: Comparison Visual Instruction Tuning

30:51: MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

32:14: Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV

33:10: Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

34:33: Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models

36:04: StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning

37:30: Parameter-Efficient Active Learning for Foundational models

38:31: Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

40:22: Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

42:38: Towards AI Lesion Tracking in PET/CT Imaging: A Siamese-based CNN Pipeline applied on PSMA PET/CT Scans

44:36: Memory-Efficient Sparse Pyramid Attention Networks for Whole Slide Image Analysis

46:19: Instance-level quantitative saliency in multiple sclerosis lesion segmentation

48:37: CMC-Bench: Towards a New Paradigm of Visual Signal Compression

50:05: Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs

52:05: CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models