yay more mandatory training on AI... /s

Drikanis

Drikanis

aaaaaaaaaaaaaa the expert one is over 7 hours

Drikanis

"ai is the new electricity"
f right off

Drikanis

apparently the metrics used to evaluate llm-based systems don't come from anything grounded in reality. they just pass the prompt and response pairs to an llm and ask it to evaluate them. usually the llm doing the evaluation is the same one being evaluated.

so much of this feels entirely unscientific. engineers are treating llms as these magical infallible black boxes without understanding their specific strengths and limitations. it's the ultimate hammer and now literally every problem looks like a nail.

it's also very comical that the examples they are using in the lab produce incredibly generic and horribly useless responses but the llm is scoring them very high in all metrics.