This is spot on from Gary Marcus about the fundamental idiocy of trusting LLMs, barring some radical shift in the underlying technology, to act as quasi-autonomous agents:
On the competence side, pretty much everything critical of LLMs that I have written in this Substack over the last couple of years becomes relevant, from the unreliability to the linguistic fails that I sometimes call discomprehensions to the hallucinations (anyone remember my alleged pet chicken Henrietta?), and so forth. Do you really want a system that can’t be trusted to draw a room without elephants to automate your finances? (“Each week, transfer anything over my credit card balance plus $2000 to my savings account, and don’t send any payments to my perpetually late contractor until I give the go-ahead.” “OK, I understand. I have sent your contractor $2000”.)
In a system that can write emails, make appointments, sign checks, etc, the consequences of unreliability, discomprehension, and hallucination all escalate. And, to the extent that agents would act directly, humans (who currently often save LLMs from themselves) get left out of the loop. Giving the keys to the castle to an LLM, at our current level of technological readiness, seems to me to be batshit insane.
Oh, and I did mention that the kind of automatic code that agents would presumably write may be buggy and unreliable and perhaps challenging to debug? Or that a recent study argued that the quality of code has gone down in the LLM era?
https://garymarcus.substack.com/p/what-could-possibly-go-wrong-with
Surely the same point applies to automation as well? If you’re trusting a system to act in lieu of a human being then the ontological unreliability of LLMs becomes potentially catastrophic.
