Perfect SMLD is TC0

October 16, 2025

This paper, Perfect diffusion is TC^0 -- Bad diffusion is Turing-complete, has been stuck in my head. Here's an exposition of 1/2 of the result.

(I claim no novelty in either exposition or content for what is below)

what is SMLD?

Given a distribution ρdata over Rd, SMLD aims to learn a score-matching function fθ which samples from logρdata well. In particular, for xρ0, which we define as ρ0=ρdata, we aim for fθ(x,t) to approximate xlogρt. Why are we introducing time as a parameter? The point of diffusion models is to learn a function which can learn to denoise a process and thus learn to sample the underlying distribution well. SMLD does this by training a model to learn the score function in the reverse Langevin process for a given noise schedule. In more detail:

TC0 circuit families are (essentially) neural networks

Circuit complexity classes describe "families of boolean circuits." A "family of boolean circuits" is just the set of all possible boolean circuits satisfying some properties; a boolean circuit is (abstractly) some DAG of "gates" (boolean functions) that takes a series of Boolean inputs to a Boolean output. TC0 in particular satisfies:

You can use MAJ gates to simulate threshold gates.. As such, this last bullet point is equivalent to:

This is a feed-forward neural network, where each gate is an activation function acting on Boolean inputs. TC0 is the class of neural networks with width <p(n) and depth D. Circuit complexity classes in general are somewhat pathological, and can solve sets of problems that one might not expect to be easily grouped together. But this is quite interestingly natural for ML purposes.

proof of the result

Theorem. Take a TC0 family of score-networks fθ,0,fθ,1, such that for each n and each x1,,xn the function fθ,n(x,t|x1,,xn) exactly computes the score function of some initial distribution ρ0,n with bounded first moment. If this family solves a prefix language modeling problem in the SMLD infinite time limit with a constant probability bound, then the problem is in TC0.

What is a "prefix language modeling problem"? Next-token prediction: given n previous tokens in an alphabet, predict token n+1. This is solved by a circuit complexity class if a family of circuits Ci for every input size satisfying the complexity class solves the problem.

The proof relies on a result from the literature given in O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions which states that there exists some universal constant c>0 such that TV(pX,pY)cdlog3TT+cϵscorelogT. Here TV is the total variation distance, X is the data distribution, and Y is a distribution generated by DDPM denoising (essentially just a discretization of SMLD). One can use this to upper bound the total variation distance between the DDPM sampler and our denoising process by some constant ϵ, and setting our constant probability bound of finding a solution to ϵ we find that ϵ=ϵ/2 is enough for our denoising process to solve the problem. This is then derandomized by a construction given in Threshold circuits of bounded depth, which reconstructs a TC0 class.

One may wonder where we require "perfect" score matching! Well, the cϵscorelogT term increases with T, so for this proof to work completely cleanly one requires ϵscore to be set to 0. Practical diffusion networks are not like this --- there will always be some error.