Perfect SMLD is TC
October 16, 2025
This paper, Perfect diffusion is TC^
(I claim no novelty in either exposition or content for what is below)
what is SMLD?
Given a distribution
- Take some noise schedule
such that (this is to ensure the distribution gets fully noised). - Evolve my data sample
according to which is just Langevin dynamics (and is the Brownian motion contributor). This has a corresponding evolution over the distributions given by (straightforward from Fokker-Planck). - Fix a time
The reverse time-evolution process can be exactly characterized by and is exactly what we're trying to approximate with - So, if I sample
(pure noise!), by solving the reversed SDE above and substituting for the score function, I can approximate my underlying data distribution.
TC circuit families are (essentially) neural networks
Circuit complexity classes describe "families of boolean circuits." A "family of boolean circuits" is just the set of all possible boolean circuits satisfying some properties; a boolean circuit is (abstractly) some DAG of "gates" (boolean functions) that takes a series of Boolean inputs to a Boolean output. TC
- polynomial width: the number of gates at each depth, is upper-bounded by a polynomial in the input-size
- constant depth: the maximum number of gates from input to output is upper-bounded by some constant,
- unbounded fan-in: each gate can receive inputs from arbitrarily many other gates,
- each gate is either AND< OR, NOT, or MAJ (for "majority", returns True when half or more arguments are True and False otherwise).
You can use MAJ gates to simulate threshold gates.. As such, this last bullet point is equivalent to:
- each gate is a threshold gate:
where are real numbers and returns if the value is greater than and otherwise.
This is a feed-forward neural network, where each gate is an activation function acting on Boolean inputs. TC
proof of the result
Theorem. Take a TC
What is a "prefix language modeling problem"? Next-token prediction: given
The proof relies on a result from the literature given in O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions which states that there exists some universal constant
One may wonder where we require "perfect" score matching! Well, the