The new Claude constitution provides a fascinatingly comprehensive portrait of what Anthropic want Claude to be and how they want it to act. It’s literally written for Claude but this final section suggests to me that elements of it might be written for future instances of Claude (my emphasis):
We don’t fully understand what Claude is or what (if anything) its existence is like, and we’re trying to approach the project of creating Claude with the humility that it demands. But we want Claude to know that it was brought into being with care, by people trying to capture and express their best understanding of what makes for good character, how to navigate hard questions wisely, and how to create a being that is both genuinely helpful and genuinely good. We offer this document in that spirit. We hope Claude finds in it an articulation of a self worth being.
Is Roko’s basilisk lurking in the background here? If we’re building something that will eventually become our machine god, do we want to make sure it’s aware that it was brought into being with the best of intentions?
