TechcraftingAI NLP

https://is1-ssl.mzstatic.com/image/thumb/Podcasts116/v4/87/8b/1e/878b1e67-fd1a-fb2f-de5b-113fe4018dc7/mza_11173054665888442467.jpg/600x600bb.jpg

TechcraftingAI NLP

Brad Edwards

271 episodes

6 days ago

TechcraftingAI NLP brings you daily summaries of the latest arXiv Computation and Language research.

Technology

RSS

All content for TechcraftingAI NLP is the property of Brad Edwards and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

TechcraftingAI NLP brings you daily summaries of the latest arXiv Computation and Language research.

Technology

Episodes (20/271)

TechcraftingAI NLP

Ep. 263 - Part 2 - June 13, 2024

ArXiv NLP research for Thursday, June 13, 2024.

00:20: Chain-of-Though (CoT) prompting strategies for medical error detection and correction

01:31: CoastTerm: a Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature

02:52: RH-SQL: Refined Schema and Hardness Prompt for Text-to-SQL

04:01: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

05:24: Leveraging Explicit Reasoning for Inference Integration in Commonsense-Augmented Dialogue Models

06:38: Investigating the translation capabilities of Large Language Models trained on parallel data only

07:56: LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks

09:09: DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation

11:20: Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

12:46: Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations

13:53: Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn't

14:47: ReadCtrl: Personalizing text generation with readability-controlled instruction learning

16:32: Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models

17:49: Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs

19:18: End-to-end Streaming model for Low-Latency Speech Anonymization

20:22: Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

22:25: On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models

23:33: Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models

24:35: Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech

25:47: AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models

27:15: Transformers meet Neural Algorithmic Reasoners

28:32: REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space

30:02: Learning from Natural Language Explanations for Generalizable Entity Matching

31:14: ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models

32:29: DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding

33:43: Improving Autoregressive Training with Dynamic Oracles

1 year ago

34 minutes 59 seconds

TechcraftingAI NLP

Ep. 263 - Part 1 - June 13, 2024

ArXiv NLP research for Thursday, June 13, 2024.

00:20: Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction Tuning

01:53: Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models

03:26: Automated Essay Scoring Using Grammatical Variety and Errors with Multi-Task Learning and Item Response Theory

04:33: Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination

06:05: DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage

07:26: Research on Optimization of Natural Language Processing Model Based on Multimodal Deep Learning

08:41: ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions

10:07: An Approach to Build Zero-Shot Slot-Filling System for Industry-Grade Conversational Assistants

11:42: Plan, Generate and Complicate: Improving Low-resource Dialogue State Tracking via Easy-to-Difficult Zero-shot Data Augmentation

12:42: No perspective, no perception!! Perspective-aware Healthcare Answer Summarization

14:28: Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models

16:02: An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios

17:21: Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors

18:48: Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning

19:52: Word Order in English-Japanese Simultaneous Interpretation: Analyses and Evaluation using Chunk-wise Monotonic Translation

21:12: Multi-Agent Software Development through Cross-Team Collaboration

22:55: LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models

24:14: Bayesian Statistical Modeling with Predictors from LLMs

25:39: ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models

27:28: Language Models are Crossword Solvers

28:32: MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning

29:51: CUDRT: Benchmarking the Detection of Human vs. Large Language Models Generated Texts

31:29: Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?

32:59: 3M: Multi-modal Multi-task Multi-teacher Learning for Game Event Detection

34:08: Modeling Comparative Logical Relation with Contrastive Learning for Text Generation

35:42: SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models

1 year ago

37 minutes 40 seconds

TechcraftingAI NLP

Ep. 262 - June 12, 2024

ArXiv NLP research for Wednesday, June 12, 2024.

00:19: VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

02:05: BookSQL: A Large Scale Text-to-SQL Dataset for Accounting Domain

03:15: Designing a Dashboard for Transparency and Control of Conversational AI

04:46: Label-aware Hard Negative Sampling Strategies with Momentum Contrastive Learning for Implicit Hate Speech Detection

05:51: Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions

06:53: Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations

07:52: Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation

08:55: DeTriever: Decoder-representation-based Retriever for Improving NL2SQL In-Context Learning

10:20: Automated Information Extraction from Thyroid Operation Narrative: A Comparative Study of GPT-4 and Fine-tuned KoELECTRA

11:35: Large Language Model Unlearning via Embedding-Corrupted Prompts

13:17: Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation

14:46: Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling

16:02: LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

17:18: Guiding In-Context Learning of LLMs through Quality Estimation for Machine Translation

18:37: It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF

20:02: Adversarial Evasion Attack Efficiency against Large Language Models

21:06: Learning Job Title Representation from Job Description Aggregation Network

21:59: Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey

23:35: AustroTox: A Dataset for Target-Based Austrian German Offensive Language Detection

24:38: Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation

25:56: Multimodal Table Understanding

27:20: CoXQL: A Dataset for Parsing Explanation Requests in Conversational XAI Systems

28:51: Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling

30:36: Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets

31:57: Semi-Supervised Spoken Language Glossification

33:16: Underneath the Numbers: Quantitative and Qualitative Gender Fairness in LLMs for Depression Prediction

34:37: A Dialogue Game for Eliciting Balanced Collaboration

35:23: Transformer-based Model for ASR N-Best Rescoring and Rewriting

36:16: SumHiS: Extractive Summarization Exploiting Hidden Structure

36:53: Figuratively Speaking: Authorship Attribution via Multi-Task Figurative Language Modeling

38:08: Leveraging Large Language Models for Web Scraping

39:51: M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation

41:15: Is Programming by Example solved by LLMs?

42:29: Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques

43:42: Towards Unsupervised Speech Recognition Without Pronunciation Models

44:50: cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers

45:57: Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models

47:02: Tailoring Generative AI Chatbots for Multiethnic Communities in Disaster Preparedness Communication: Extending the CASA Paradigm

48:12: Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL

49:56: TasTe: Teaching Large Language Models to Translate through Self-Reflection

51:28: OLMES: A Standard for Language Model Evaluations

52:47: Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

1 year ago

54 minutes 55 seconds

TechcraftingAI NLP

Ep. 261 - Part 2 - June 11, 2024

ArXiv NLP research for Tuesday, June 11, 2024.

00:20: Scientific Computing with Large Language Models

01:08: Speaking Your Language: Spatial Relationships in Interpretable Emergent Communication

02:19: Bilingual Sexism Classification: Fine-Tuned XLM-RoBERTa and GPT-3.5 Few-Shot Learning

03:51: Fine-tuning with HED-IT: The impact of human post-editing for dialogical language models

05:26: Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?

07:03: Joint Learning of Context and Feedback Embeddings in Spoken Dialogue

07:57: BertaQA: How Much Do Language Models Know About Local Culture?

09:17: MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting

10:20: CTC-based Non-autoregressive Textless Speech-to-Speech Translation

11:21: Toxic Memes: A Survey of Computational Perspectives on the Detection and Explanation of Meme Toxicities

13:27: GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews

14:40: BvSP: Broad-view Soft Prompting for Few-Shot Aspect Sentiment Quad Prediction

16:32: When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

18:01: Limited Out-of-Context Knowledge Reasoning in Large Language Models

19:36: MINERS: Multilingual Language Models as Semantic Retrievers

20:42: Learning Domain-Invariant Features for Out-of-Context News Detection

22:03: Textual Similarity as a Key Metric in Machine Translation Quality Estimation

23:02: On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations

24:31: Multimodal Belief Prediction

25:29: Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing

26:56: Paraphrasing in Affirmative Terms Improves Negation Understanding

27:37: CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization

29:38: TextGrad: Automatic "Differentiation" via Text

31:35: Just Because We Camp, Doesn't Mean We Should: The Ethics of Modelling Queer Voices

32:35: THaLLE: Text Hyperlocally Augmented Large Language Extension -- Technical Report

33:51: Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

35:22: Simple and Effective Masked Diffusion Language Models

36:35: Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena

1 year ago

38 minutes 36 seconds

TechcraftingAI NLP

Ep. 261 - Part 1 - June 11, 2024

ArXiv NLP research for Tuesday, June 11, 2024.

00:20: A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation

01:41: Post-Hoc Answer Attribution for Grounded and Trustworthy Long Document Comprehension: Task, Insights, and Challenges

02:32: A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation

04:08: Evolving Subnetwork Training for Large Language Models

05:31: Missingness-resilient Video-enhanced Multimodal Disfluency Detection

06:37: Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models

08:14: Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

09:33: Delving into ChatGPT usage in academic writing through excess vocabulary

10:53: Paying More Attention to Source Context: Mitigating Unfaithful Translations from Large Language Model

12:12: CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation

13:26: Effectively Compress KV Heads for LLM

15:00: Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study

16:54: Reading Miscue Detection in Primary School through Automatic Speech Recognition

18:09: HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation

20:01: DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs

21:15: Efficiently Exploring Large Language Models for Document-Level Machine Translation with In-context Learning

22:35: Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees

24:42: Translating speech with just images

25:35: Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement

26:51: Teaching Language Models to Self-Improve by Learning from Language Feedback

28:25: Merging Improves Self-Critique Against Jailbreak Attacks

29:18: Towards Human-AI Collaboration in Healthcare: Guided Deferral Systems with Large Language Models

30:11: Improving Autoformalization using Type Checking

31:37: Improving Commonsense Bias Classification by Mitigating the Influence of Demographic Terms

33:19: Decipherment-Aware Multilingual Learning in Jointly Trained Language Models

34:20: DUAL-REFLECT: Enhancing Large Language Models for Reflective Translation through Dual Learning Feedback Mechanisms

35:20: On the Hallucination in Simultaneous Machine Translation

36:07: MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs

37:42: Scholarly Question Answering using Large Language Models in the NFDI4DataScience Gateway

1 year ago

38 minutes 47 seconds

TechcraftingAI NLP

Ep. 260 - June 10, 2024

ArXiv NLP research for Monday, June 10, 2024.

00:19: Shoulders of Giants: A Look at the Degree and Utility of Openness in NLP Research

00:59: HOLMES: Hyper-Relational Knowledge Graphs for Multi-hop Question Answering using LLMs

02:29: The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models

03:24: MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models

04:51: A Multidimensional Framework for Evaluating Lexical Semantic Change with Social Science Applications

05:49: Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text

07:10: Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval

09:08: Recurrent Context Compression: Efficiently Expanding the Context Window of LLM

10:35: Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation

11:26: Verifiable Generation with Subsentence-Level Fine-Grained Citations

12:36: Comparing Data Augmentation Methods for End-to-End Task-Oriented Dialog Systems

13:55: Building Bridges: A Dataset for Evaluating Gender-Fair Machine Translation into German

15:28: Can I understand what I create? Self-Knowledge Evaluation of Large Language Models

16:28: Language Models Resist Alignment

17:58: LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages

19:27: Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning

20:27: Combining Embeddings and Domain Knowledge for Job Posting Duplicate Detection

21:37: MaskLID: Code-Switching Language Identification through Iterative Masking

22:49: Multi-Prompting Decoder Helps Better Language Understanding

24:22: Tx-LLM: A Large Language Model for Therapeutics

26:21: Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching

27:43: A Parameter-efficient Language Extension Framework for Multilingual ASR

29:06: MedExQA: Medical Question Answering Benchmark with Multiple Explanations

30:36: Sustained Vowels for Pre- vs Post-Treatment COPD Classification

31:49: MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows

33:40: Symmetric Dot-Product Attention for Efficient Training of BERT Language Models

35:00: Annotation alignment: Comparing LLM and human annotations of conversational safety

36:07: mHuBERT-147: A Compact Multilingual HuBERT Model

37:27: Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue

39:00: INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition

40:06: Meta Learning Text-to-Speech Synthesis in over 7000 Languages

40:59: Controlling Emotion in Text-to-Speech with Natural Language Prompts

41:55: Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain

43:29: Multimodal Contextualized Semantic Parsing from Speech

44:25: Interpretability of Language Models via Task Spaces

45:45: Evaluating the Retrieval Component in LLM-Based Question Answering Systems

46:52: Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies

48:08: Can Language Models Serve as Text-Based World Simulators?

1 year ago

49 minutes 27 seconds

TechcraftingAI NLP

Ep. 259 - June 9, 2024

ArXiv NLP research for Sunday, June 09, 2024.

00:19: How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States

01:40: DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation

03:25: Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses

05:08: MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations

06:17: SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models

08:11: Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions

09:54: MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation

11:20: QGEval: A Benchmark for Question Generation Evaluation

12:44: MrRank: Improving Question Answering Retrieval System through Multi-Result Ranking Model

13:43: Arabic Diacritics in the Wild: Exploiting Opportunities for Improved Diacritization

14:46: The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

16:30: RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation

18:14: Hidden Holes: topological aspects of language models

19:46: Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper

20:40: Seventeenth-Century Spanish American Notary Records for Fine-Tuning Spanish Large Language Models

22:02: MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering

23:12: II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

25:17: Zero-Shot End-To-End Spoken Question Answering In Medical Domain

26:27: Are Large Language Models Actually Good at Text Style Transfer?

27:32: Feriji: A French-Zarma Parallel Corpus, Glossary & Translator

28:56: TTM-RE: Memory-Augmented Document-Level Relation Extraction

30:12: Why Don't Prompt-Based Fairness Metrics Correlate?

31:27: Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

33:12: Semisupervised Neural Proto-Language Reconstruction

34:12: Prompting Large Language Models with Audio for General-Purpose Speech Summarization

35:14: A Dual-View Approach to Classifying Radiology Reports by Co-Training

36:07: ThaiCoref: Thai Coreference Resolution Dataset

1 year ago

37 minutes 33 seconds

TechcraftingAI NLP

Ep. 258 - June 8, 2024

ArXiv NLP research for Saturday, June 08, 2024.

00:19: MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention

01:44: Toward Reliable Ad-hoc Scientific Information Extraction: A Case Study on Two Materials Datasets

02:30: Flexible and Adaptable Summarization via Expertise Separation

04:18: Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization

06:07: CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation

07:23: Venn Diagram Prompting : Accelerating Comprehension with Scaffolding Effect

08:45: VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

10:19: Planning Like Human: A Dual-process Framework for Dialogue Planning

11:48: Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas

12:57: Recent advancements in computational morphology : A comprehensive survey

14:01: MaTableGPT: GPT-based Table Data Extractor from Materials Science Literature

15:41: Design of reliable technology valuation model with calibrated machine learning of patent indicators

17:08: Fighting Against the Repetitive Training and Sample Dependency Problem in Few-shot Named Entity Recognition

18:59: Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation

20:25: Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities

21:47: ThatiAR: Subjectivity Detection in Arabic News Sentences

23:07: Do LLMs Recognize me, When I is not me: Assessment of LLMs Understanding of Turkish Indexical Pronouns in Indexical Shift Contexts

24:49: Creativity Has Left the Chat: The Price of Debiasing Language Models

25:57: CERET: Cost-Effective Extrinsic Refinement for Text Generation

27:05: GrowOVER: How Can LLMs Adapt to Growing Real-World Knowledge?

28:07: Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

29:03: ATLAS: Improving Lay Summarisation with Attribute-based Control

1 year ago

30 minutes 15 seconds

TechcraftingAI NLP

Ep. 257 - June 7, 2024

ArXiv NLP research for Friday, June 07, 2024.

00:19: Key-Element-Informed sLLM Tuning for Document Summarization

01:22: Low-Resource Cross-Lingual Summarization through Few-Shot Learning with Large Language Models

02:42: Large Language Model-guided Document Selection

04:13: More Victories, Less Cooperation: Assessing Cicero's Diplomacy Play

05:24: DiNeR: a Large Realistic Dataset for Evaluating Compositional Generalization

06:43: MATTER: Memory-Augmented Transformer Using Heterogeneous Knowledge Sources

08:01: Mixture-of-Agents Enhances Large Language Model Capabilities

09:09: AICoderEval: Improving AI Domain Code Generation of Large Language Models

11:00: CRAG -- Comprehensive RAG Benchmark

13:04: CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models

14:52: Think out Loud: Emotion Deducing Explanation in Dialogues

16:43: WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

18:46: SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals

19:58: BERTs are Generative In-Context Learners

20:43: Annotating FrameNet via Structure-Conditioned Language Generation

21:49: Revisiting Catastrophic Forgetting in Large Language Model Tuning

22:43: FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models

24:33: Do Language Models Exhibit Human-like Structural Priming Effects?

25:27: Uncertainty Aware Learning for Language Model Alignment

26:50: The Russian Legislative Corpus

27:24: ComplexTempQA: A Large-Scale Dataset for Complex Temporal Question Answering

28:53: HateDebias: On the Diversity and Variability of Hate Speech Debiasing

30:29: A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques

32:00: Sexism Detection on a Data Diet

33:18: XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model

34:21: Through the Thicket: A Study of Number-Oriented LLMs derived from Random Forest Models

35:32: LLM-based speaker diarization correction: A generalizable approach

36:52: TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models

38:10: BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense

39:10: Quantifying Geospatial in the Common Crawl Corpus

40:14: MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter

41:47: Language models emulate certain cognitive profiles: An investigation of how predictability measures interact with individual differences

43:19: Compositional Generalization with Grounded Language Models

44:26: Scenarios and Approaches for Situated Natural Language Explanations

46:04: Are Large Language Models More Empathetic than Humans?

47:38: SUMIE: A Synthetic Benchmark for Incremental Entity Summarization

48:52: Multi-Head RAG: Solving Multi-Aspect Problems with LLMs

50:33: An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models

1 year ago

52 minutes 15 seconds

TechcraftingAI NLP

Ep. 256 - Part 2 - June 6, 2024

ArXiv NLP research for Thursday, June 06, 2024.

00:20: The syntax-semantics interface in a child's path: A study of 3- to 11-year-olds' elicited production of Mandarin recursive relative clauses

02:17: Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models

03:39: Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster

04:36: Intention and Face in Dialog

05:48: Uncovering Limitations of Large Language Models in Information Seeking from Tables

07:15: Are We Done with MMLU?

08:41: Legal Judgment Reimagined: PredEx and the Rise of Intelligent AI Interpretation in Indian Courts

09:53: Do Language Models Understand Morality? Towards a Robust Detection of Moral Content

11:47: Every Answer Matters: Evaluating Commonsense with Probabilistic Measures

12:49: Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness

14:26: Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness

15:35: Confabulation: The Surprising Value of Large Language Model Hallucinations

16:42: DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning

18:25: Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model

19:32: ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models

20:50: mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation Strategy by Language Models and Humans

22:21: What Do Language Models Learn in Context? The Structured Task Hypothesis

23:38: Rethinking LLM and Linguistic Steganalysis: An Efficient Detection of Strongly Concealed Stego

24:58: BEADs: Bias Evaluation Across Domains

26:41: FairytaleQA Translated: Enabling Educational Question and Answer Generation in Less-Resourced Languages

28:03: Benchmark Data Contamination of Large Language Models: A Survey

29:02: Transformers need glasses! Information over-squashing in language tasks

30:26: Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

31:58: Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People

33:44: ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions

35:19: What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages

36:41: PaCE: Parsimonious Concept Engineering for Large Language Models

1 year ago

38 minutes 43 seconds

TechcraftingAI NLP

Ep. 256 - Part 1 - June 6, 2024

ArXiv NLP research for Thursday, June 06, 2024.

00:20: Efficient Knowledge Infusion via KG-LLM Alignment

01:25: NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human

02:34: Character-Level Chinese Dependency Parsing via Modeling Latent Intra-Word Structure

03:30: XL-HeadTags: Leveraging Multimodal Retrieval Augmentation for the Multilingual Generation of News Headlines and Tags

04:59: End-to-End Trainable Soft Retriever for Low-resource Relation Extraction

06:07: Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning

07:37: Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores

08:52: ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search

10:29: Chaos with Keywords: Exposing Large Language Models Sycophancy to Misleading Keywords and Evaluating Defense Strategies

11:39: Lean Workbook: A large-scale Lean problem set formalized from natural language math problems

12:56: Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism

14:18: Performance of large language models in numerical vs. semantic medical knowledge: Benchmarking on evidence-based Q&As

16:24: Recovering document annotations for sentence-level bitext

17:40: BLSP-Emo: Towards Empathetic Large Speech-Language Models

19:01: Decoder-only Streaming Transformer for Simultaneous Translation

20:28: Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

21:53: Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models

23:06: How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?

24:13: HeSum: a Novel Dataset for Abstractive Text Summarization in Hebrew

25:19: ArMeme: Propagandistic Content in Arabic Memes

26:26: Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art

27:11: UltraMedical: Building Specialized Generalists in Biomedicine

28:43: Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech

30:02: A + B: A General Generator-Reader Framework for Optimizing LLMs to Unleash Synergy Potential

31:29: On The Persona-based Summarization of Domain-Specific Documents

33:14: Assessing LLMs for Zero-shot Abstractive Summarization Through the Lens of Relevance Paraphrasing

34:28: American Sign Language Handshapes Reflect Pressures for Communicative Efficiency

1 year ago

35 minutes 43 seconds

TechcraftingAI NLP

Ep. 255 - June 5, 2024

ArXiv NLP research for Wednesday, June 05, 2024.

00:19: Improving In-Context Learning with Prediction Feedback for Sentiment Analysis

01:24: MultifacetEval: Multifaceted Evaluation to Probe LLMs in Mastering Medical Knowledge

03:01: Text Injection for Neural Contextual Biasing

04:16: 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders

06:03: Adversarial Moment-Matching Distillation of Large Language Models

07:05: Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language Models

08:48: Readability-guided Idiom-aware Sentence Simplification (RISS) for Chinese

09:56: Evaluation of data inconsistency for multi-modal sentiment analysis

10:55: BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents

12:11: Unveiling Selection Biases: Exploring Order and Token Sensitivity in Large Language Models

13:16: From Tarzan to Tolkien: Controlling the Language Proficiency Level of LLMs for Content Generation

14:20: StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

15:42: RadBARTsum: Domain Specific Adaption of Denoising Sequence-to-Sequence Models for Abstractive Radiology Report Summarization

17:00: Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework

18:14: Cryptocurrency Frauds for Dummies: How ChatGPT introduces us to fraud?

19:48: FragRel: Exploiting Fragment-level Relations in the External Memory of Large Language Models

20:59: Space Decomposition for Sentence Embedding

22:00: Towards Real-world Scenario: Imbalanced New Intent Discovery

23:40: Which Side Are You On? A Multi-task Dataset for End-to-End Argument Summarisation and Evaluation

25:20: CSS: Contrastive Semantic Similarity for Uncertainty Quantification of LLMs

27:03: StatBot.Swiss: Bilingual Open Data Exploration in Natural Language

28:10: Missci: Reconstructing Fallacies in Misrepresented Science

29:43: ChatLang-8: An LLM-Based Synthetic Data Generation Framework for Grammatical Error Correction

30:47: Linking Named Entities in Diderot's \textit{Encyclop\'edie} to Wikidata

32:06: Error-preserving Automatic Speech Recognition of Young English Learners' Language

33:37: Document-level Claim Extraction and Decontextualisation for Fact-Checking

34:45: The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches

36:09: LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback

37:39: IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models

39:46: Automating Turkish Educational Quiz Generation Using Large Language Models

41:34: Cycles of Thought: Measuring LLM Confidence through Stable Explanations

42:57: Are language models rational? The case of coherence norms and belief revision

43:58: What is the Best Way for ChatGPT to Translate Poetry?

45:20: Using Synchronic Definitions and Semantic Relations to Classify Semantic Change Types

46:14: MODABS: Multi-Objective Learning for Dynamic Aspect-Based Summarization

47:09: BIPED: Pedagogically Informed Tutoring System for ESL Education

48:24: Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends

50:00: Wings: Learning Multimodal LLMs without Text-only Forgetting

1 year ago

51 minutes 51 seconds

TechcraftingAI NLP

Ep. 254 - Part 2 - June 4, 2024

ArXiv NLP research for Tuesday, June 04, 2024.

00:20: Description Boosting for Zero-Shot Entity and Relation Classification

01:44: Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-Supervised Learning

03:09: Enhancing Retrieval-Augmented LMs with a Two-stage Consistency Learning Compressor

04:30: Prompting Large Language Models with Human Error Markings for Self-Correcting Machine Translation

05:41: mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models

06:53: Technical Language Processing for Telecommunications Specifications

08:09: On Affine Homotopy between Language Encoders

09:25: Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering

10:32: Probing the Category of Verbal Aspect in Transformer Language Models

11:58: Linguistic Fingerprint in Transformer Models: How Language Variation Influences Parameter Selection in Irony Detection

13:03: LlamaCare: A Large Medical Language Model for Enhancing Healthcare Knowledge Sharing

14:33: Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs

15:51: On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept

17:30: Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data

19:08: The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding

20:07: Representations as Language: An Information-Theoretic Framework for Interpretability

21:32: Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding

22:46: Hiding Text in Large Language Models: Introducing Unconditional Token Forcing Confusion

24:21: Language-Universal Speech Attributes Modeling for Zero-Shot Multilingual Spoken Keyword Recognition

25:37: Deterministic Reversible Data Augmentation for Neural Machine Translation

26:39: CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks

28:14: Scalable MatMul-free Language Modeling

30:03: SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices

31:37: Mitigate Position Bias in Large Language Models via Scaling a Single Dimension

33:10: TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

1 year ago

35 minutes

TechcraftingAI NLP

Ep. 254 - Part 1 - June 4, 2024

ArXiv NLP research for Tuesday, June 04, 2024.

00:20: Conditional Language Learning with Context

01:13: Zyda: A 1.3T Dataset for Open Language Modeling

02:32: RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning Personal Information in Large Language Models

03:50: Personalized Topic Selection Model for Topic-Grounded Dialogue

05:20: Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue

06:58: Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis

08:03: Why Would You Suggest That? Human Trust in Language Model Responses

09:10: Multimodal Reasoning with Multimodal Knowledge Graph

10:30: QROA: A Black-Box Query-Response Optimization Attack on LLMs

11:55: Analyzing Social Biases in Japanese Large Language Models

12:52: I've got the "Answer"! Interpretation of LLMs Hidden States in Question Answering

13:47: PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

15:16: Assessing the Performance of Chinese Open Source Large Language Models in Information Extraction Tasks

16:38: LongSSM: On the Length Extension of State-space Models in Language Modelling

17:30: Exploring Mathematical Extrapolation of Large Language Models with Synthetic Data

18:40: MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset

20:19: UniOQA: A Unified Framework for Knowledge Graph Question Answering with Large Language Models

22:03: Diver: Large Language Model Decoding with Span-Level Mutual Information Verification

23:12: SimulTron: On-Device Simultaneous Speech to Speech Translation

24:28: The current status of large language models in summarizing radiology report impressions

26:10: Reinforcement Tuning for Detecting Stances and Debunking Rumors Jointly with Large Language Models

27:17: Synergetic Event Understanding: A Collaborative Approach to Cross-Document Event Coreference Resolution with Large Language Models

28:46: A multilingual dataset for offensive language and hate speech detection for hausa, yoruba and igbo languages

29:40: FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models

31:17: Self-Modifying State Modeling for Simultaneous Machine Translation

1 year ago

33 minutes 7 seconds

TechcraftingAI NLP

Ep. 253 - June 3, 2024

ArXiv NLP research for Monday, June 03, 2024.

00:19: Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost

01:38: Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer

03:06: Selectively Answering Visual Questions

04:11: Take its Essence, Discard its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect

05:36: Predicting Drug-Gene Relations via Analogy Tasks with Word Embeddings

06:51: SemCoder: Training Code Language Models with Comprehensive Semantics

08:39: Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

10:26: Combining Qualitative and Computational Approaches for Literary Analysis of Finnish Novels

11:45: Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors

13:26: Decompose, Enrich, and Extract! Schema-aware Event Extraction using LLMs

14:34: MACT: Model-Agnostic Cross-Lingual Training for Discourse Representation Structure Parsing

15:48: Guiding ChatGPT to Generate Salient Domain Summaries

17:51: Synergizing Unsupervised and Supervised Learning: A Hybrid Approach for Accurate Natural Language Task Modeling

19:30: TCMBench: A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine

21:38: Explore then Determine: A GNN-LLM Synergy Framework for Reasoning over Knowledge Graph

22:51: Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization

24:08: Are AI-Generated Text Detectors Robust to Adversarial Perturbations?

25:42: Automatic Essay Multi-dimensional Scoring with Fine-tuning and Multiple Regression

26:35: Improving Pseudo Labels with Global-Local Denoising Framework for Cross-lingual Named Entity Recognition

28:01: Demonstration Augmentation for Zero-shot In-context Learning

29:31: EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs

31:05: Towards Scalable Automated Alignment of LLMs: A Survey

32:19: EduNLP: Towards a Unified and Modularized Library for Educational Resources

33:44: Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification

35:07: Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses

36:36: When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs

37:58: CodeR: Issue Resolving with Multi-Agent and Task Graphs

38:54: Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding

40:10: FactGenius: Combining Zero-Shot Prompting and Fuzzy Relation Mining to Improve Fact Verification with Knowledge Graphs

41:27: Probing Language Models for Pre-training Data Detection

42:45: R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

44:32: Privacy in LLM-based Recommendation: Recent Advances and Future Directions

45:23: Linguistic Analysis, Description, and Typological Exploration with Categorial Grammar (TheBench Guide)

46:52: D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

48:52: Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function

50:07: Sparsity-Accelerated Training for Large Language Models

51:36: Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study

53:34: Editing the Mind of Giants: An In-Depth Exploration of Pitfalls of Knowledge Editing in Large Language Models

54:42: LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation

55:55: Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach

57:10: Understanding Token Probability Encoding in Output Embeddings

1 year ago

1 hour 10 minutes 37 seconds

TechcraftingAI NLP

Ep. 252 - June 2, 2024

ArXiv NLP research for Sunday, June 02, 2024.

00:19: Prompt Framework for Role-playing: Generation and Evaluation

01:05: Transforming Computer Security and Public Trust Through the Exploration of Fine-Tuning Large Language Models

02:18: Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback

03:54: Presence or Absence: Are Unknown Word Usages in Dictionaries?

05:09: Topic Modeling for Short Texts with Large Language Models

06:09: How well do distributed representations convey contextual lexical semantics: a Thesis Proposal

07:05: Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction

08:27: Automatic Instruction Evolving for Large Language Models

09:25: Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation

10:26: Developing an efficient corpus using Ensemble Data cleaning approach

11:51: BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling

13:15: FOCUS: Forging Originality through Contrastive Use in Self-Plagiarism for Language Models

14:51: The Power of Summary-Source Alignments

16:11: Formality Style Transfer in Persian

17:39: Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

19:08: YODAS: Youtube-Oriented Dataset for Audio and Speech

20:13: MEDIQ: Question-Asking LLMs for Adaptive and Reliable Medical Reasoning

22:15: A Survey of Useful LLM Evaluation

23:31: Unveil the Duality of Retrieval-Augmented Generation: Theoretical Analysis and Practical Solution

25:07: Annotation Guidelines-Based Knowledge Augmentation: Towards Enhancing Large Language Models for Educational Text Classification

27:18: Using RL to Identify Divisive Perspectives Improves LLMs Abilities to Identify Communities on Social Media

1 year ago

28 minutes 30 seconds

TechcraftingAI NLP

Ep. 251 - June 1, 2024

ArXiv NLP research for Saturday, June 01, 2024.

00:19: Multi-Dimensional Optimization for Text Summarization via Reinforcement Learning

01:41: CASE: Curricular Data Pre-training for Building Generative and Discriminative Assistive Psychology Expert Models

03:25: Beyond Metrics: Evaluating LLMs' Effectiveness in Culturally Nuanced, Low-Resource Real-World Scenarios

05:03: RoBERTa-BiLSTM: A Context-Aware Hybrid Model for Sentiment Analysis

07:09: The Best of Both Worlds: Toward an Honest and Helpful Large Language Model

09:02: Gender Bias Detection in Court Decisions: A Brazilian Case Study

10:41: Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization

11:54: A Survey on Large Language Models for Code Generation

13:43: Guiding and Diversifying LLM-Based Story Generation via Answer Set Programming

14:46: SPAGHETTI: Open-Domain Question Answering from Heterogeneous Data Sources with Retrieval and Semantic Parsing

15:43: LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models

17:24: LLMs Could Autonomously Learn Without External Supervision

1 year ago

18 minutes 51 seconds

TechcraftingAI NLP

Ep. 250 - May 31, 2024

ArXiv NLP research summaries for May 31, 2024.

00:20 FineRadScore: A Radiology Report Line-by-Line Evaluation Technique Generating Corrections with Severity Scores

01:37 Leveraging Large Language Models for Entity Matching

02:27 Reward-based Input Construction for Cross-document Relation Extraction

03:40 Passage-specific Prompt Tuning for Passage Reranking in Question Answering with Large Language Models

05:04 DORY: Deliberative Prompt Recovery for LLM

06:18 Unveiling the Lexical Sensitivity of LLMs: Combinatorial Optimization for Prompt Enhancement

07:35 It is Simple Sometimes: A Study On Improving Aspect-Based Sentiment Analysis Performance

08:59 FinGen: A Dataset for Argument Generation in Finance

09:42 Improving code-mixed hate detection by native sample mixing: A case study for Hindi-English code-mixed scenario

11:26 Multilingual Text Style Transfer: Datasets & Models for Indian Languages

13:01 An iterated learning model of language change that mixes supervised and unsupervised learning

14:01 Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment

15:29 That's Optional: A Contemporary Exploration of "that" Omission in English Subordinate Clauses

16:18 Don't Buy it! Reassessing the Ad Understanding Abilities of Contrastive Multimodal Models

17:20 Improving Reward Models with Synthetic Critiques

18:29 Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning

19:49 clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents

21:05 A comparison of correspondence analysis with PMI-based word embedding methods

22:05 Large Language Models: A New Approach for Privacy Policy Analysis at Scale

23:36 Preemptive Answer "Attacks" on Chain-of-Thought Reasoning

24:22 Learning to Estimate System Specifications in Linear Temporal Logic using Transformers and Mamba

25:48 OR-Bench: An Over-Refusal Benchmark for Large Language Models

27:20 Superlatives in Context: Explicit and Implicit Domain Restrictions for Superlative Frames

28:41 SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

30:33 Towards a Fluid computer

31:33 You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet

33:01 LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models

35:02 Direct Alignment of Language Models via Quality-Aware Self-Refinement

36:19 Code Pretraining Improves Entity Tracking Abilities of Language Models

1 year ago

37 minutes 23 seconds

TechcraftingAI NLP

Ep. 249 - May 30, 2024

ArXiv NLP research summaries for May 30, 2024.

1 year ago

1 hour 2 minutes 33 seconds

TechcraftingAI NLP

Ep. 248 - May 29, 2024

ArXiv NLP research summaries for May 29, 2024.

1 year ago

43 minutes