links 01/31

January 31, 2026

  1. New whitepaper on machine consciousness.
  2. Log of known exfiltration attacks on chatbots.
  3. Disempowerment patterns in Claude user interactions.
  4. Talk arguing for evolutionary convergent pressures towards cooperative values in AI..
  5. Leike on alignment solvability.