Anthropic just published new research that successfully identified and mapped millions of human-interpretable concepts, called “features”, within the neural networks of Claude.
Sims @ Sims @lemmy.ml Posts 4Comments 269Joined 2 yr. ago
Sims @ Sims @lemmy.ml
Posts
4
Comments
269
Joined
2 yr. ago
..oc it also opens up for manipulative use by corporations. I.e we will probably quickly see commercial models that inflate users ego by exaggerating how amazing the users insights are, or recommending Corp interests - all hidden for the user, and just to profit from the $!@ model.