This is one of the worst takes from LLM enthusiasts.

Ivor Hewitt

@Orb2069 @zzt @arroz my qualification ('93) was actually "software engineering' and it was an attempt to create a new type of course treating the subject like other engineering disciplines. I thought it would take off, but I believe they gave up soon after and went for straight comp-sci.

petros

@arroz It is funny, even people who work for months on a LLM project are surprised that the LLM does not give consistently the same result.

Which can be ok, in some cases. In the one Isee right now, replacing boring data entry, the LLM gets a result 90% right, and if a second one independently gets the same result, the result is considered confirmed - it is in fact very unlikely that two models get the same thing wrong.

Leaves 20% for review, and the LLMs are faster than humans.

Chris

@zzt @arroz I have a "deduplication for your bank account" to sell to you

petros

@arroz In this case, the LLMs are replacing a boring job to a certain extend.

I wouldn't trust a "90% right" machine a job where people's lives can depend on it, though.

Also, there are traditional OCR based solutions used before and concurrently. In this project the jury is still out. Not certain which is more efficient. The obstacles and issues are bigger than expected. Not all smooth sailing.

Miguel Arroz

@petros I would need more context to know what we’re talking about here. Scanning and OCRing documents? Manually filled forms? Historical docs? If so, I don’t see how “one word wrong out of 10” is in any way acceptable.*

To me automation means something I can set and forget. If I have to verify the work of the “automation”, it’s not automating anything.

Imagine how successful computing would have been if those 40 year old computers I played with they got 10% of their math operations wrong. 1/2

Miguel Arroz

@petros Of course this doesn’t mean you have a tool that assists you with hard and repetitive work. If someone is scanning documents from the VI century for historical preservation, having a tool that helps identifying characters worn out by time, the several aspects of translation and interpretation, etc, might help. But that’s not something that does the job for itself. The historian is the central piece of that puzzle with the necessary knowledge and context for doing it.

Chris

@arroz LLMs are a compiler in the same way that my 3-year old with a bunch of crayons is a camera.

petros

@arroz In this case there are invoices and purchase orders coming as PDF, unstructured data.

Currently there is OCR software and manual data entry. Both make mistakes, so there is always "double keying". If the result is the same, it is considered right. Otherwise it goes to review.

Now there are 2 LLMs who do the "keying" job. Both get it ça. 90% right.

A difference to compilers: two compilers do not create the same machine code, so one cannot compare two results and decide that's right.

petros

@arroz Also, if there still is an error in one invoice and purchase order, it is usually not catastrophic. You get 250 screws instead of 25.. that happened even before we had computers. It's annoying but.. well, magic doesn't happen, sh** does

Given that we work on behalf of customers, we need to have an acceptably low error rate, of course.

goatcheese

@arroz Had a genAI-curious colleague voice this exact take last week.
I pointed out the same things you did, but honestly they're so eager to believe that I don't think they internalized the difference...
Another, koolaid-drinking colleague replied "well sometimes compilers are not deterministic!!!", as if finding a compiler bug every 15 years was the same as an LLM crapping out every prompt.

Miguel Arroz

@petros What you need is to get rid of the PDFs and deploy an online store.

What is the failure rate of the traditional OCRs compared to the LLMs? And how modern were those OCRs? Modern OCR in the last 5 years or so have a success rate way higher than 90%. And are the failures on OCR itself or interpreting their context (aka knowing how to read the invoice or order, not just identifying the right characters)?

Rainer M Krug

@mtconleyuk @arroz @stroughtonsmith can we please go back to talking with each others instead of shouting? Please make your point without insulting somebody who made his point!

Fubaroque

@arroz I certainly don’t enjoy reviewing AI slop. So as far as I’m concerned just fine… the sooner the better. Do enjoy the results…. #SEP 🤪

random thoughts

@Orb2069 @aspensmonster @zzt @arroz

Soon coming to an eathquake zone near you!

Rainer M Krug

@thechris @arroz if you tell the LLM to be “ 3-year old with a bunch of crayons is a camera.”, then yes.

Fubaroque

@arroz But why generate code at all. Just execute the prompts directly. Suits me...

Jalil

@arroz even if LLMs were comparable, people do review the output of compilers

petros

@arroz I don't have the exact numbers of "traditional" OCR but it will be around 90% as well. And, yes, you are right, the issue is not to get the letters right, it's to make it structured information. With OCR it needs templating which tells the OCR where to find an address, what to do with multiple lines and pages etc. Every new format requires that work again.

LLMs are "smarter" in that regard.

Fun fact rookie error: Sending a T&C page to a LLM. It chews on it forever..

Nils Ballmann

@arroz @binford2k some people already understood this in 2016: https://www.commitstrip.com/en/2016/08/25/a-very-comprehensive-and-precise-spec/

petros

@arroz And, yeah, why there are so many companies who send this PDFs. God knows. I worked in the automotive industry until 2015 and they still faxed orders.. And it's not Australia only, e.g. just recently we "OCRed" a big Canadian company's invoices.