I found this an extremely informative interview on a number of levels:
- How much Altman attributes the experience of Chat-GPT to the reinforcement training by human subjects rather than the underlying capacities of the model. This is what explains the sense that Chat-GPT is ‘trying to help you’.
- The corpus of content on which GPT is trained is huge and varied, though he draws attention to the importance of exclusions being made here. I would love to hear more about this and the decisions which guide these exclusions.
- He recognises the temptation to anthropomorphise Chat-GPT while insisting it has in some sense developed the capacity to reason based on ‘ingesting’ human culture.
- He explains the roll out strategy as a way of using ‘collective intelligence’ to identify issues in a way which would be beyond the capacities of the organisation on their own, as well as an intention to make mistakes while ‘the stakes are low’.
- He contrasts people approaching GPT as a database rather than using it as a partner for reasoning with.
- He suggests these models can “bring nuance back to the world” after it has been destroyed by Twitter. I found the political sociology implicit in this statement fascinating and suggest it needs being unpacked.
- He suggests the degree of alignment (to human purposes) is increasing faster than the underlying capabilities through a combination of internal and external strategies. Given how fast the capabilities increased between 3.5 and 4, I’d like to understand how degree of alignment is operationalised. He claims that people outside the field imagine alignment and capability as orthogonal to each other, whereas in the fact they are often entangled i.e. a more aligned model is a more capable model.
- Using the example of coding he suggests iterative dialogues with chat-gpt are becoming more important with each successive version. It’s not a one shot engagement but something that can be refined through a sustained engagement over time. This highlights the capabilities (including time and inclination) required to to persist in skilful ways in these engagements. I give it a year before we see ‘generative AI capital’ as a (crap) concept.
- He describes LLMs as part of the way that artificial general intelligence can be built. He frames their approach as ultra-pragmatic focused on what works even if it lacks the elegance which some in the field hoped would define progress towards AGI.
It was particularly interesting that he sees the short-term problems as being about misinformation and economic shocks e.g. how would you know if the conversations on Twitter are being directed by LLMs. I would have imagined the financial interests involve would leave him reluctant to raise these issues so openly. He suggests the rise of open source LLMs is going to radically transform the landscape in which we are operating.