Home
Categories
EXPLORE
Comedy
Music
Society & Culture
History
True Crime
News
Education
About Us
Contact Us
Copyright
© 2024 PodJoint
Loading...
0:00 / 0:00
Podjoint Logo
VU
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/fc/48/08/fc480827-6109-5bf6-c47a-c842949c6ef9/mza_17693176697459781715.jpg/600x600bb.jpg
Epikurious
Alejandro Santamaria Arza
15 episodes
6 days ago
Cravings of knowledge around tech, AI and the mind
Show more...
Tech News
News
RSS
All content for Epikurious is the property of Alejandro Santamaria Arza and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Cravings of knowledge around tech, AI and the mind
Show more...
Tech News
News
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/42513579/42513579-1732431020227-e2bfc8a7a1b3a.jpg
From Bias to Balance: Navigating LLM Evaluations
Epikurious
17 minutes 29 seconds
6 months ago
From Bias to Balance: Navigating LLM Evaluations

This research paper explores the challenges of evaluating Large Language Model (LLM) outputs and introduces EvalGen, a new interface designed to improve the alignment between LLM-generated evaluations and human preferences. EvalGen uses a mixed-initiative approach, combining automated LLM assistance with human feedback to generate and refine evaluation criteria and assertions. The study highlights a phenomenon called "criteria drift," where the process of grading outputs helps users define and refine their evaluation criteria. A qualitative user study demonstrates overall support for EvalGen, but also reveals complexities in aligning automated evaluations with human judgment, particularly regarding the subjective nature of evaluation and the iterative process of alignment. The authors conclude by discussing implications for future LLM evaluation assistants.


Epikurious
Cravings of knowledge around tech, AI and the mind