links 01/31
January 31, 2026
- New whitepaper on machine consciousness.
- Log of known exfiltration attacks on chatbots.
- Disempowerment patterns in Claude user interactions.
- Talk arguing for evolutionary convergent pressures towards cooperative values in AI..
- Leike on alignment solvability.