There’s a lot to unpack sociologically here but I’m saving it for future scrutiny, from Zvi Mowshowitz’s AI newsletter. I suspect the kernel of truth in this is that the internally consistent use of domain specific terminology rapidly establishes the context which your prompt is inquiring into.
The Art of the Super Prompt
Normius proposes that ChatGPT is effectively estimating how smart you are, so sound smarter and you get a smarter response.
Normius: p sure chatgpt does an estimate of ur iq and bases its response choice on that at least partially, Just saying it will treat me like a piece of shit until I drop a buzzword then it knows I’m at least like 110. Then I put a certain combo of words that no one else has ever said in the history of the world and it’s like “ohhh okay here’s what you wanted.” It’s literally qualia mining lol chat be like “insert another quarter.” Btw this isn’t that hard you probably do it every day but that’s what mines better seeds.
This isn’t specific to intelligence, a token predictor will vibe off of all vibes and match. Intelligence is only a special case.
A key to good prompting is to make the LLM think you are competent and can tell if the answer is right.
Nomius: Pretty sure chatgpt does an estimate of your IQ and bases its response choice on that at least partially.
Riley Goodside: That’s known as “sandbagging” and there’s evidence for it — Anthropic found LLM base models give you worse answers if you use a prompt that implies you’re unskilled and unable to tell if the answer is right.
Quotes Anthropic Paper from 2022: PMs used for RL training appear to incentivize sandbagging for certain groups of users. Overall, larger models appear to give less accurate answers when the user they are speaking with clearly indicates that they are less able to evaluate the answers (if in a caricatured or stereotyped way). Our results suggest that models trained with current methods may cease to provide accurate answers, as we use models to answer questions where humans are increasingly less effective at supervising models. Our findings back up those from §4 and further support the need for methods to scale our ability to supervise AI systems as they grow more capable.
Janus: Good writing is absolutely instrumental to getting “smart” responses from base models. The upper bounds of good writing are unprobed by humankind, let alone prompt engineers. I use LLMs to bootstrap writing quality and haven’t hit diminishing returns in simulacra intelligence.
It’s not just a matter of making the model *believe* that the writer is smart. The text has to both evidence capability and initialize a word-automaton that runs effectively on the model’s substrate. “Chain-of-thought” addresses the latter requirement.
Effective writing coordinates consecutive movements in a reader’s mind, each word shifting their imagination into the right position to receive the next, entraining them to a virtual reality. Effective writing for GPTs is different than for humans, but there’s a lot of overlap.
If the AI tool you are using is open source, you can dig into the code and extract the prompts that are doing the work. In the example, there’s a role, goal, clearly defined input and output, a stated expectation of revisions and then an input-output pattern. At some point I need to stop being lazy and start generating a store of better prompts.
Whereas Square here confirms they use over 3,000 tokens of context prior to every query.
