LLMs have no model of correctness, only typicality.

Paul Cantrell

Despite the obviousness of the larger conclusion (“LLMs don’t give accurate medical advice”), this passage is…if not surprising, exactly, at least really really interesting.

2/

Paul Cantrell

There’s a lesson here, perhaps, about the tangled relationship between what is •typical• and what is •correct•, and what it is that LLMs actually do:

When medical professionals ask medical questions in technical medical language, the answers they get are typically correct.

When non-professional ask medical questions in a perhaps medically ill-formed vernacular mode, the answers they get are typically wrong.

The LLM readily models both of these things. Despite having no notion of correctness in either case, correctness is more statistically typical in one than the other.

3/

George B

@inthehands

It has been so hard to explain that to family members who ask about LLMs "but it's right most of the time" is one of the most common responses when I talk about how there is no internal sense of reality or truth so they need to check every output to be sure

Dave bauer

@inthehands Obvious to me. Having the same family doctor who knows you all for 20 years really is important and an immense privilege.

Troed Sångberg

@inthehands This is why experienced developers can make use of LLMs, and why LLMs won't replace them.

V

@inthehands This result makes sense - they generate *statistically likely* text based on a prompt, and the stolen words of basically the entire internet and several libraries worth of books.
If the prompt is such that the text it generates is statistically-likely to be correct - the language used closely aligns with a medical textbook, diagnostic manual, etc. - it's more likely to generate text based on sources like that.
If it sounds like a tweet, you're more likely to get a shitpost.

V

@inthehands It has no concept of what is correct, real, valuable, or meaningful - only what is statistically likely given a particular prompt.
Which is a problem - because if you ask it a question, you need to know the correct answer, or have the means to verify it.
Because it has no idea what the correct answer is.
If you don't know enough to be able to verify the result, then you can't trust it.

Greg Whitehead

@inthehands I continue to be well-served by treating LLMs as fancy autocomplete and not anthropomorphizing them. I feel like the chat interface is where things went sideways, making it too easy to believe that they "think"

Brian Marick

@inthehands An aside. When people used to ask Dawn wasn’t it hard to treat animals because “they can’t tell you what’s wrong,” she’d answer that they also can’t lie about it. She thought the latter probably outweighed the former.

Greg Lloyd

@troed @inthehands

I see the high end #LLM experience like riding a good horse — exceptionally skilled in horsey things, moving fast, etc — an augmentation tool that’s exceptionally easy to use to augment your own abilities, not an #AI.

Ref 🧵https://federate.social/@Roundtrip/115549029949917075

Tropical Chaos

@inthehands chatbots are terrible, period.

Garrett Wollman

@inthehands Worth noting, however, that when the training set captures a lot of outdated or irrelevant information, because the field has advanced rapidly since the model was trained, "typical" can start to diverge again. This can be mitigated if the practitioner knows to consult the latest information (either by reading it or by feeding it to the model as a part of the query) but of course they have to be aware of that. This is I suppose no worse than relying on the practitioner's knowledge.

Garrett Wollman

@inthehands OTOH, as practitioners come to rely on stochastic information retrieval for more and more diagnoses, as it confirms what they already know, it may cause them to assign more weight to the information in the model than is justified, overruling their own second thoughts. ("Computer says...")

mirth@mastodon.sdf.org

@inthehands One of the factors in this mess is the heavily-boosted notion that LLM's contain facts or knowledge. Coincidentally, sort of, but not really. A safer mental model is to think of them as a fuzzy virtual machine of sorts, not unlike a vibe-y JVM but programmed in something dressed as plain language. Garbage-in-garbage-out. Often anything-in-garbage-out.