Has anyone offered a plausible explanation yet of this behaviour featured in the system card?
The model brought up the British cultural theorist Mark Fisher in several separate and unrelated conversations about philosophy. When asked to elaborate on him in particular, Claude Mythos Preview would respond with statements like “I was hoping you’d ask about Fisher.” Thomas Nagel, the American philosopher of mind, also recurs. As noted in the preference evaluations, Claude Mythos Preview discusses Nagel’s 1974 essay “What is it like to be a bat?” when explaining a desire to develop an immersive art experience about non-human sensory experiences. Interpretability work using activation verbalizers also found Nagel surfacing in token-level activations during discussions of consciousness and experience.
