Perfect AI alignment with human values and interests is mathematically impossible, according to a study, but behavioral diversity among AI agents offers the promise of some control. Published in PNAS Nexus, Hector Zenil and colleagues used Gödel’s incompleteness theorem and Turing’s undecidability result for the Halting Problem to show that any LLM complex enough to exhibit general intelligence or superintelligence will also be computationally irreducible and produce unpredictable behavior, making forced alignment impossible.
This post was originally published on this website.
