Ep. 246 - Part 3 - June 12, 2024

https://is1-ssl.mzstatic.com/image/thumb/Podcasts126/v4/4a/9c/ef/4a9ceff8-5c1a-e15c-62d9-6360c52cd38a/mza_2283181023971434852.jpg/600x600bb.jpg

TechcraftingAI Computer Vision

Brad Edwards

315 episodes

5 days ago

TechcraftingAI Computer Vision brings you summaries of the latest arXiv research daily. Research is read by your virtual host, Sage. The podcast is produced by Brad Edwards, an AI Engineer from Vancouver, BC, and a graduate student of computer science studying AI at the University of York. Thank you to arXiv for use of its open access interoperability.

Technology

RSS

All content for TechcraftingAI Computer Vision is the property of Brad Edwards and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/39305030/39305030-1703089970889-aab16cf4a6955.jpg

Ep. 246 - Part 3 - June 12, 2024

TechcraftingAI Computer Vision

43 minutes 50 seconds

1 year ago

Ep. 246 - Part 3 - June 12, 2024

ArXiv Computer Vision research for Wednesday, June 12, 2024.

00:20: From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition

02:09: APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentatio

03:57: 2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction

05:47: DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor

06:58: Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze

08:02: LaneCPP: Continuous 3D Lane Detection using Physical Priors

09:23: FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

11:10: VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

12:46: MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

14:39: OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

16:49: AWGUNET: Attention-Aided Wavelet Guided U-Net for Nuclei Segmentation in Histopathology Images

18:15: Diffusion Soup: Model Merging for Text-to-Image Diffusion Models

19:58: Coherent Optical Modems for Full-Wavefield Lidar

21:32: Transformation-Dependent Adversarial Attacks

22:45: PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement

24:10: GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

25:57: ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery

27:26: Self-supervised Learning of Neural Implicit Feature Fields for Camera Pose Refinement

28:51: Real2Code: Reconstruct Articulated Objects via Code Generation

30:02: Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models

31:42: RMem: Restricted Memory Banks Improve Video Object Segmentation

33:12: What If We Recaption Billions of Web Images with LLaMA-3?

34:42: Real3D: Scaling Up Large Reconstruction Models with Real-World Images

36:07: Enhancing End-to-End Autonomous Driving with Latent World Model

37:12: Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation

38:43: On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models

40:16: Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

42:15: ICE-G: Image Conditional Editing of 3D Gaussian Splats