Anthropic just published new research that successfully identified and mapped millions of human-interpretable concepts, called “features”, within the neural networks of Claude.
Anthropic just published new research that successfully identified and mapped millions of human-interpretable concepts, called “features”, within the neural networks of Claude.

Mapping the Mind of a Large Language Model

Wow. This is potentially huge.