why Occam?
January 23, 2026
[1] "Ideal Bayesian reasoners" rely on the simplicity prior. This is false as stated: Solomonoff induction convergence is not dependent on the exact choice of the $2^{-|K|}$ simplicity prior; convergence over any computable distribution holds if the prior is any universal semimeasure.1
[2] The generalization bias described in the Bayesian free-energy functional is a bias towards low-description length programs. I don't know enough about this to provide a full, concise, precise accounting of the argument, but I buy it? See my earlier post on MDL and SLT. (Hopefully work on the inductive bias of SGD will shed light on a similar result in the training of neural networks).
[3] "Simple hypotheses" are adaptive because world phenomena are naturally generated by "simple" processes. Empirically, phenomena have parsimonious explanations. We do not live in the most simple world,2 but physical theories are decomposable.
[4] "Simple hypotheses" are adaptive because learning systems learn simple explanations more effectively. Singular learning theory predicts this for Bayesian reasoners; deep learning generalizes because the parameter function map is biased towards simple functions; cf. inductive bias considerations; "special snowflake" hypotheses where certain kinds of learning are only possible in environments with nice properties, one of which is likely simplicity.
[5] Simple explanations are more memetically fit. Directionally correct. Minimizing the free parameters in your model means there's less information to necessarily communicate. But the pressures shaping acceptance of one theory over another do not rely on simplicity as a primary proxy, and in any case accuracy should be prioritized.
[6] Simplicity is elegant. Deutsch argues for objectivity in aesthetics, such that "aesthetic truths are linked to factual ones by explanations." Schmidhuber defines beauty through simplicity. Surely there's a convergence here; however, attributing causality requires care.
Clearly an Occam-like hypothesis is adaptive. I find empirical justifications for simplicity biases the most compelling, yet their compelling formulations elude me. Excited about fleshing out a correct [1] and a precise [2], [3] is a philosophical goldmine, [4] is (to me) obviously correct, [5] requires specification, [6] deserves a steelman.
There are subleties with regards to convergence as a distribution versus "pointwise" and their appropriate characterizations in the Solomonoff induction setting.
We also probably do not live in the most simple world conditioning on our existence, but the arguments here are more nuanced.