Notes

Rome

March 20, 2024

A post-apocalyptic fever dream. The oldest civilized metropolis. Where sons are pathetic in the eyes of their father, and both are pathetic in the eyes of their grandfathers—all while wearing blackened sunglasses and leather jackets. Grown, not made.

Rome is, perhaps, the first place I recognized as solely for visiting, never living. Unlike Tokyo, one feels this immediately. Japan’s undesirability stems primarily from its inordinate, sprawling bureaucracy that is, for the most part, hidden from the typical visitor. Rome’s undesirability is apparent for all to see—it’s loud, stifling, unmaintained, and requires arduous traversals.

Population c. 100 C.E.: 1 million people.
Population c. 1000 C.E.: 35,000.
Population c. 2024 C.E.: 2.8 million people.

Rebound? Not so fast—the global population in the year 100 was just 200 million.

And this is obvious. The city center is still dominated by the Colosseum, the Imperial fora, and Trajan’s Market. Only the Vittoriano holds a candle to their extant glory. Yet the hordes of tourists still walk down the Via dei Forti Imperiali and congregate in stupidly long lines at the ticket booth to see ruins!

I walked across the city from east to west, passing by a secondary school, flea market, and various patisseries (is that the correct wording?). The pastries were incredible. The flea market reminded me of Mexico, interestingly enough. Felt very Catholic.

(All the buses and trams run late in Rome. This too, is very Catholic, as Orwell picked up on during his time in Catalonia and as anyone visiting a Mexican house would know. Plausibly also Irish?)

Rome’s ivies pervade its structures. Villas, monuments, churches (all 900 of them), and fountains all fall victim to these creepers. It gives the perception of a ruined city, that Roman glory has come and gone—and when one is aware of Italian history, it is very, very hard to perceive Rome as anything else than an overgrown still-surviving bastion against the continuing spirit of the Vandals.

Roman pines, too, are fungiform. Respighi’s tone poem doesn’t do justice to them. Perhaps this is just a Mediterranean vibe? But amongst the monumental Classical, Romanesque, and Neoclassical structures of the Piazza Venezia, these pines are punctual. Don’t really know how else to convey it.

It is difficult to comprehend how the animalistic, gladitorial Roman society became the seat of the Catholic Church. This city is clearly Gaian, and clearly ruled by the very same gods their Pantheon pays homage to. The Christian God is not of nature, it is apart from nature. It is not Dionysian, it is not even Apollonian (because it cannot recognize or respect the Dionysian, and as such cannot exist in context of it, only apart from it). And yet, the Pope persists.

I have not seen the Vatican yet. I want to. I will be back.

Self-Referential Probabilistic Logic Admits the Payor's Lemma

November 28, 2023

In summary: A probabilistic version of the Payor's Lemma holds under the logic proposed in the Definability of Truth in Probabilistic Logic. This gives us modal fixed-point-esque group cooperation even under probabilistic guarantees.

Background

Payor's Lemma: If $\vdash \Box (\Box x \to x) \to x,$ then $\vdash x.$

We assume two rules of inference:

Necessitation: $\vdash x \implies \vdash \Box x,$
Distributivity: $\vdash \Box(x \to y) \implies \vdash \Box x \to \Box y.$

Proof:

$\vdash x \to (\Box x \to x),$ by tautology;
$\vdash \Box x \to \Box (\Box x \to x),$ by 1 via necessitation and distributivity;
$\vdash \Box (\Box x \to x) \to x$, by assumption;
$\vdash \Box x \to x,$ from 2 and 3 by modus ponens;
$\vdash \Box (\Box x \to x),$ from 4 by necessitation;
$\vdash x,$ from 5 and 3 by modus ponens.

The Payor's Lemma is provable in all normal modal logics (as it can be proved in $K,$ the weakest, because it only uses necessitation and distributivity). Its proof sidesteps the assertion of an arbitrary modal fixedpoint, does not require internal necessitation ($\vdash \Box x \implies \vdash \Box \Box x$), and provides the groundwork for Lobian handshake-based cooperation without Lob's theorem.

It is known that Lob's theorem fails to hold in reflective theories of logical uncertainty. However, a proof of a probabilistic Payor's lemma has been proposed, which modifies the rules of inference necessary to be:

Necessitation: $\vdash x \implies \vdash \Box_p , x,$
Weak Distributivity: $\vdash x \to y \implies \vdash \Box_p , x \to \Box_p , y.$ where here we take $\Box_p$ to be an operator which returns True if the internal credence of $x$ is greater than $p$ and False if not. (Formalisms incoming).

The question is then: does there exist a consistent formalism under which these rules of inference hold? The answer is yes, and it is provided by Christiano 2012.

Setup

(Regurgitation and rewording of the relevant parts of the Definability of Truth)

Let $L$ be some language and $T$ be a theory over that language. Assume that $L$ is powerful enough to admit a Godel encoding and that it contains terms which correspond to the rational numbers $\mathbb{Q}.$ Let $\phi_1, \phi_{2} \ldots$ be some fixed enumeration of all sentences in $L.$ Let $\ulcorner \phi \urcorner$ represent the Godel encoding of $\phi.$

We are interested in the existence and behavior of a function $\mathbb{P}: L \to [0,1]^L,$ which assigns a real-valued probability in $[0,1]$ to each well-formed sentence of $L.$ We are guaranteed the coherency of $\mathbb{P}$ with the following assumptions:

For all $\phi, \psi \in L$ we have that $\mathbb{P}(\phi) = \mathbb{P}(\phi \land \psi) + \mathbb{P}(\phi \lor \neg \psi).$
For each tautology $\phi,$ we have $\mathbb{P}(\phi) = 1.$
For each contradiction $\phi,$ we have $\mathbb{P}(\phi) = 0.$

Note: I think that 2 & 3 are redundant (as says John Baez), and that these axioms do not necessarily constrain $\mathbb{P}$ to $[0,1]$ in and of themselves (hence the extra restriction). However, neither concern is relevant to the result.

A coherent $\mathbb{P}$ corresponds to a distribution $\mu$ over models of $L.$ A coherent $\mathbb{P}$ which gives probability 1 to $T$ corresponds to a distribution $\mu$ over models of $T$. We denote a function which generates a distribution over models of a given theory $T$ as $\mathbb{P}_T.$

Syntactic-Probabilistic Correspondence: Observe that $\mathbb{P}_T(\phi) =1 \iff T \vdash \phi.$ This allows us to interchange the notions of syntactic consequence and probabilistic certainty.

Now, we want $\mathbb{P}$ to give sane probabilities to sentences which talk about the probability $\mathbb{P}$ gives them. This means that we need some way of giving $L$ the ability to talk about itself.

Consider the formula $Bel.$ $Bel$ takes as input the Godel encodings of sentences. $Bel(\ulcorner \phi \urcorner)$ contains arbitrarily precise information about $\mathbb{P}(\phi).$ In other words, if $\mathbb{P}(\phi) = p,$ then the statement $Bel(\ulcorner \phi \urcorner) > a$ is True for all $a < p,$ and the statement $Bel(\ulcorner \phi \urcorner) > b$ is False for all $b > p.$ $Bel$ is fundamentally a part of the system, as opposed to being some metalanguage concept.

(These are identical properties to that represented in Christiano 2012 by $\mathbb{P}(\ulcorner \phi \urcorner).$ I simply choose to represent $\mathbb{P}(\ulcorner \phi \urcorner)$ with $Bel(\ulcorner \phi \urcorner)$ as it (1) reduces notational uncertainty and (2) seems to be more in the spirit of Godel's $Bew$ for provability logic).

Let $L'$ denote the language created by affixing $Bel$ to $L.$ Then, there exists a coherent $\mathbb{P}_T$ for a given consistent theory $T$ over $L$ such that the following reflection principle is satisfied:

$$ \forall \phi \in L' ; \forall a,b \in \mathbb{Q} : (a < \mathbb{P}{T}(\phi) < b) \implies \mathbb{P}{T}(a < Bel(\ulcorner \phi \urcorner) < b) = 1. $$

In other words, $a < \mathbb{P}_T(\phi) < b$ implies $T \vdash a < Bel(\ulcorner \phi \urcorner) < b.$

Proof

(From now, for simplicity, we use $\mathbb{P}$ to refer to $\mathbb{P}_T$ and $\vdash$ to refer to $T \vdash.$ You can think of this as fixing some theory $T$ and operating within it).

Let $\Box_p , (\phi)$ represent the sentence $Bel(\ulcorner \phi \urcorner) > p,$ for some $p \in \mathbb{Q}.$ We abbreviate $\Box_p , (\phi)$ as $\Box_p , \phi.$ Then, we have the following:

Probabilistic Payor's Lemma: If $\vdash \Box_p , (\Box_p , x \to x) \to x,$ then $\vdash x.$

Proof as per Demski:

$\vdash x \to (\Box_{p},x \to x),$ by tautology;
$\vdash \Box_{p}, x \to \Box_{p}, (\Box_{p}, x \to x),$ by 1 via weak distributivity,
$\vdash \Box_{p} (\Box_{p} , x \to x) \to x$, by assumption;
$\vdash \Box_{p} , x \to x,$ from 2 and 3 by modus ponens;
$\vdash \Box_{p}, (\Box_{p}, x \to x),$ from 4 by necessitation;
$\vdash x,$ from 5 and 3 by modus ponens.

Rules of Inference:

Necessitation: $\vdash x \implies \vdash \Box_p , x.$ If $\vdash x,$ then $\mathbb{P}(x) = 1$ by syntactic-probabilistic correspondence, so by the reflection principle we have $\mathbb{P}(\Box_p , x) = 1,$ and as such $\vdash \Box_p , x$ by syntactic-probabilistic correspondence.

Weak Distributivity: $\vdash x \to y \implies \vdash \Box_p , x \to \Box_p , y.$ The proof of this is slightly more involved.

From $\vdash x \to y$ we have (via correspondence) that $\mathbb{P}(x \to y) = 1,$ so $\mathbb{P}(\neg x \lor y) = 1.$ We want to prove that $\mathbb{P}(\Box_p , x \to \Box_p , y) = 1$ from this, or $\mathbb{P}((Bel(\ulcorner x \urcorner) \leq p) \lor (Bel(\ulcorner y \urcorner) > p)) = 1.$ We can do casework on $x$. If $\mathbb{P}(x) \leq p,$ then weak distributivity follows from vacuousness. If $\mathbb{P}(x) >p,$ then as $$ \begin{align*}
\mathbb{P}(\neg x \lor y) &= \mathbb{P}(x \land(\neg x \lor y)) + \mathbb{P}(\neg x \land (\neg x \lor y)), \\
1 &= \mathbb{P}(x \land y) + \mathbb{P}(\neg x \lor (\neg x \land y)), \\ 1 &= \mathbb{P}(x \land y) + \mathbb{P}(\neg x), \end{align*} $$ $\mathbb{P}(\neg x) < 1-p,$ so $\mathbb{P}(x \land y) < p,$ and therefore $\mathbb{P}(y) > p.$ Then, $Bel(\ulcorner y \urcorner) > p$ is True by reflection, so by correspondence it follows that $\vdash \Box_p , x \to \Box_p y.$

(I'm pretty sure this modal logic, following necessitation and weak distributivity, is not normal (it's weaker than $K$). This may have some implications? But in the 'agent' context I don't think that restricting ourselves to modal logics makes sense).

Bots

Consider agents $A,B,C$ which return True to signify cooperation in a multi-agent Prisoner's Dilemma and False to signify defection. (Similar setup to Critch's ). Each agent has 'beliefs' $\mathbb{P}_A, \mathbb{P}_B, \mathbb{P}_C : L \to [0,1]^L$ representing their credences over all formal statements in their respective languages (we are assuming they share the same language: this is unnecessary).

Each agent has the ability to reason about their own 'beliefs' about the world arbitrarily precisely, and this allows them full knowledge of their utility function (if they are VNM agents, and up to the complexity of the world-states they can internally represent). Then, these agents can be modeled with Christiano's probabilistic logic! And I would argue it is natural to do so (you could easily imagine an agent having access to its own beliefs with arbitrary precision by, say, repeatedly querying its own preferences).

Then, if $A,B,C$ each behave in the following manner:

$\vdash \Box_a , (\Box_e , E \to E) \to A,$
$\vdash \Box_b , (\Box_e , E \to E) \to B,$
$\vdash \Box_c , (\Box_e , E \to E) \to C,$

where $E = A \land B \land C$ and $e = \max ({ a,b,c }),$ they will cooperate by the probabilistic Payor's lemma.

Proof:

$\vdash \Box_a , (\Box_e , E \to E) \land \Box_b , (\Box_e , E \to E) \land \Box_c , (\Box_e , E \to E) \to A \land B \land C,$ via conjunction;
$\vdash \Box_e , (\Box_e , E \to E) \to E,$ as if the $e$-threshold is satisfied all others are as well;
$\vdash E,$ by probabilistic Payor.

This can be extended to arbitrarily many agents. Moreso, the valuable insight here is that cooperation is achieved when the evidence that the group cooperates exceeds each and every member's individual threshold for cooperation. A formalism of the intuitive strategy 'I will only cooperate if there are no defectors' (or perhaps 'we will only cooperate if there are no defectors').

It is important to note that any $\mathbb{P}$ is going to be uncomputable. However, I think modeling agents as having arbitrary access to their beliefs is in line with existing 'ideal' models (think VNM -- I suspect that this formalism closely maps to VNM agents that have access to arbitrary information about their utility function, at least in the form of preferences), and these agents play well with modal fixedpoint cooperation.

Acknowledgements

This work was done while I was a 2023 Summer Research Fellow at the Center on Long-Term Risk. Many thanks to Abram Demski, my mentor who got me started on this project, as well as Sam Eisenstat for some helpful conversations. CLR was a great place to work! Would highly recommend if you're interested in s-risk reduction.

Crossposted to the AI Alignment Forum.

Hyperreals In A Nutshell

October 15, 2023

Epistemic status: Vaguely confused and probably lacking a sufficient technical background to get all the terms right. Is very cool though, so I figured I'd write this.

And what are these Fluxions? The Velocities of evanescent Increments? And what are these same evanescent Increments? They are neither finite Quantities nor Quantities infinitely small, nor yet nothing. May we not call them the ghosts of departed quantities?

George Berkeley, The Analyst

When calculus was invented, it didn't make sense. Newton and Leibniz played fast and dirty with mathematical rigor to develop methods that arrived at the correct answers, but no one knew why. It took another one and a half centuries for Cauchy and Weierstrass develop analysis, and in the meantime people like Berkeley refused to accept the methods utilizing these "ghosts of departed quantities."

Cauchy's and Weierstrass's solution to the crisis of calculus was to define infinitesimals in terms of limits. In other words, to not describe the behavior of functions directly acting on infinitesimals, but rather to frame the the entire endeavour as studying the behaviors of certain operations in the limit, in that weird superposition of being arbitrarily close to something yet not it.

(And here I realize that math is better shown, not told)

The limit of a function $f(x)$at $x=a$ is $L$ if for any $\epsilon>0$ there exists some $\delta > 0$ such that if

\[|x-a|<\delta,\]

then

\[|f(x)-L|<\epsilon.\]

Essentially, the limit exists if there's some value $\delta$ that forces $f(x)$ to be within $\epsilon$ of $L$ if $x$ is within $\delta$ of $a$. Note that this has to hold true for all $\epsilon$, and you choose $\epsilon$ first!

From this we get the well-known definition of the derivative:

\[f'(x) = \lim_{h \to 0} \frac{f(x+h)-f(x)}{h}\]

and you can define the integral similarly.

The limit solved calculus's rigor problem. From the limit the entire field of analysis was invented and placed on solid ground, and this foundation has stood to this day.

Yet, it seems like we lose something important when we replace the idea of the "infinitesimally small" with the "arbitrarily close to." Could we actually make numbers that were infinitely small?

The Sequence Construction

Imagine some mathematical object that had all the relevant properties of the real numbers (addition, multiplication are associative and commutative, is closed, etc.) but had infinitely small and infinitely large numbers. What does this object look like?

We can take the set of all infinite sequences of real numbers $\mathbb{R}^\mathbb{N}$ as a starting point. A typical element $a\in\mathbb{R}^\mathbb{N}$ would be

\[a = (a_0, , a_1, , a_2, \ldots)\]

where $a_0, a_1, a_2, \ldots$ is some infinite sequence of real numbers.

We can define addition and multiplication element-wise as:

\[a + b = (a_{0} + b_{0}, , a_{1} + b_{1}, a_{2} + b_{2}, \ldots),\]

\[a \cdot b = (a_0 \cdot b_0, a_1 \cdot b_1, a_2 \cdot b_2, \ldots).\]

You can verify that this is a commutative ring, which means that these operations behave nicely. Yet, being a commutative ring is not the same thing as being an ordered field, which is what we eventually want if our desired object is to have the same properties as the reals.

To get from $\mathbb{R}^\mathbb{N}$ to a field structure, we have to modify it to accommodate well-defined division. The typical way of doing this is looking at how to introduce the zero product property: i.e. ensuring that if $a,b \in \mathbb{R}^\mathbb{N}$ then if $ab = 0$ either one of $a,b$ is $0$.

If we let $0$ be the sequence of all zeros $(0,0,0,\ldots)$ in $\mathbb{R}^\mathbb{N},$ then it is clear that we can have two non-zero elements multiply to get zero. If we have

\[a = (a, 0, 0, 0, \ldots),\]

and

\[b = (0,b,b,b, \ldots),\]

then neither of these are the zero element, yet their product is zero.

How do we fix this? Equivalence classes!

Our problem is that there are too many distinct "zero-like" things in the ring of real numbered sequences. Intuitively, we should expect the sequence $(0,1,0,0,\ldots)$ to be basically zero, and we want to find a good condensation of $\mathbb{R}^\mathbb{N}$ that allows for this.

In other words, how do we make all the sequences with "almost all" their elements as zero to be equal to zero?

Almost All Agreement ft. Ultrafilters

Taken from "five ways to say "Almost Always" and actually mean it":

A filter $\mathcal{F}$ on an arbitrary set $I$ is a collection of subsets of $I$ that is closed under set intersections and supersets. (Note that this means that the smallest filter on $I$ is $I$ itself).

An ultrafilter is a filter which, for every $A \subseteq I$, contains either $A$ or its complement. A principal ultrafilter contains a finite set.

A nonprincipal ultrafilter does not.

This turns out to be an incredibly powerful mathematical tool, and can be used to generalize the concept of "almost all" to esoteric mathematical objects that might not have well-defined or intuitive properties.

Let's say we define some nonprincipal ultrafilter $\mathcal{U}$ on the natural numbers. This will contain all cofinite sets, and will exclude all finite sets. Now, let's take two sequences $a,b \in \mathbb{R}^\mathbb{N},$ and define their agreement set $I$ to be the indices on which $a,b$ are identical (have the same real number in the same position).

Observe that $I$ is a set of natural numbers. If $I \in \mathcal{U},$ then $I$ cannot be finite, and it seems pretty obvious that almost all the elements in $a,b$ are the same (they only disagree at a finite number of places after all). Conversely, if $I \not\in \mathcal{U},$ this implies that $\mathbb{N}/I \in \mathcal{U}$, which means that $a,b$ disagree at almost all positions, so they probably shouldn't be equal.

Voila! We have a suitable definition of "almost all agreement": if the agreement set $I$ is contained in some arbitrary nonprincipal ultrafilter $\mathcal{U}$.

Let $^*\mathbb{R}$ be the quotient set of $\mathbb{R}^\mathbb{N}$ under this equivalence relation (essentially, the set of all distinct equivalence classes of $\mathbb{R}^\mathbb{N}$). Does this satisfy the zero product property?

(Notation note: we will let $(a)$ denote the infinite sequence of the real number $a$, and $[a]$ the equivalence class of the sequence $(a)$ in $^* \mathbb{R}$.)

Yes, This Behaves Like The Real Numbers

Let $a,b \in \mathbb{R}^\mathbb{N}$ such that $ab = (0)$. Let's break this down element-wise: either $a_n, b_n$ must be zero for all $n \in \mathbb{N}.$ As one of the ultrafilter axioms is that it must contain a set or its complement, either the index set of the zero elements in $a$ or the index set of the zero elements in $b$ will be in any nonprincipal ultrafilter on $\mathbb{N}.$ Therefore, either $a$ or $b$ is equivalent to $(0)$ in $^* \mathbb{R},$ so $^* \mathbb{R}$ satisfies the zero product property.

Therefore, division is well defined on $^\mathbb{R}$! Now all we need is an ordering, and luckily almost all agreement saves the day again. We can say for $a,b \in ^\mathbb{R}$ that $a>b$ if almost all elements in $a$ are greater than the elements in $b$ at the same positions (using the same ultrafilter equivalence).

So, $^*\mathbb{R}$ is an ordered field!

Infinitesimals and Infinitely Large Numbers

We have the following hyperreal:

\[\epsilon = \left( 1, \frac{1}{2}, \frac{1}{3}, \ldots, \frac{1}{n}, \ldots \right).\]

Recall that we embed the real numbers into the hyperreals by assigning every real number $a$ to the equivalence class $[a]$. Now observe that $\epsilon$ is smaller than every real number embedded into the hyperreals this way.

Pick some arbitrary real number $a$. There exists $p \in \mathbb{N}$ such that $\frac{1}{p}< a$. There are infinitely many fractions of the form $\frac{1}{n}$, where $n$ is a natural number greater than $p$, so $\epsilon$ is smaller than $(a)$ at almost all positions, so it is smaller than $a$.

This is an infinitesimal! This is a rigorously defined, coherently defined, infinitesimal number smaller than all real numbers! In a number system which shares all of the important properties of the real numbers! (except the Archimedean one, as we will shortly see, but that doesn't really matter).

Consider the following

\[\Omega = (1,2,3, \ldots).\]

By a similar argument this is larger than all possible real numbers. I encourage you to try to prove this for yourself!

(The Archimedean principle is that which guarantees that if you have any two real numbers, you can multiply the smaller by some natural number to become greater than the other. This is not true in the hyperreals. Why? (Hint: $\Omega$ breaks this if you consider a real number.))

How does this tie into calculus, exactly?🔗

Well, we have a coherent way of defining infinitesimals!

The short answer is that we can define the star operator (also called the standard part operator) $\text{st}(x)$ as that which maps any hyperreal to its closest real counterpart. Then, the definition of a derivative becomes

\[f'(x) = \text{st}\left( \frac{^*f(x+\Delta x)- ^*f(x)}{\Delta x}\right)\]

where $\Delta x$ is some infinitesimal, and $^*f$ is the natural extension of $f$ to the hyperreals. More on this in a future blog post!

It also turns out the hyperreals have a bunch of really cool applications in fields far removed from analysis. Check out my expository paper on the intersection of nonstandard analysis and Ramsey theory for an example!

Yet, the biggest effect I think this will have is pedadogical. I've always found the definition of a limit kind of unintuitive, and it was specifically invented to add post hoc coherence to calculus after it had been invented and used widely. I suspect that formulating calculus via infinitesimals in introductory calculus classes would go a long way to making it more intuitive.

Review | Invisible China

October 14, 2023

Pretty good. Made me think, and Rozelle's unique perspective (economist who's spent 30+ years on the ground) adds an authentic flavor to the book.

He claims that the existential threat to China's potential future growth and success is a lack of human capital: specifically, that systemic factors hinder rural Chinese children's ability to participate effectively in the service-based, advanced economy of the 21st century, and that fixing this issue is imperative so that China does not fall into the middle income trap.

In a 2004 article in Foreign Affairs, political scientist Geoffrey Garrett shared a startling observation. He looked at the history of recent economic development and noticed that while rich countries were continuing to do well and many poor countries were achieving strong growth rates, the countries in the middle of the global income spectrum were growing more slowly and less successfully than anyone else.

In a report that made ripples throughout the development world, economists at the World Bank demonstrated the full extent of the problem. The report showed that out of 101 countries that were middle income in 1960, only thirteen had made it to high-income status by 2008. The rest remained stuck or even ended the fifty years poorer than before.

The middle income trap is the stagnation developing countries face when their average citizen reaches a "middle income." An easy way for developing countries to rapidly grow their economies is to accept industrialization and foreign direct investment, shifting to a manufacturing economy reminiscent of Western nations in the 1890s. However, Rozelle makes the argument that to then shift from a middle income to high income country requires an educated populace, and the middle-income countries which have failed to continue their economic growth did not appropriately invest in the education of their citizenry.

But with a bit of historical data, another trend emerges that is far more relevant to the question at hand: the very small number of countries that have made it out of middle-income status in recent decades—the graduates—all have very high levels of high school education. Even more surprising, they’ve had those high levels for decades. In particular, back when they were still middle income, the graduates (places like South Korea, Taiwan, and Ireland) all had high school attainment rates comparable to those of the rich countries.

China is not appropriately investing in the necessary amounts of education to avoid the middle income trap. While they may have the necessary innovation, all the Baidus, Tencents, and Bytedances in the world wouldn't fix the systemic issue of having nearly 70% of its population without a high school education unable to participate in the high-skilled, high-paid industries that characterize high-income countries.

Rural Chinese mainly work low-skilled manufacturing jobs, or jobs which don't require a high school education. This was fine, until rising wages in mainland China started pushing manufacturing jobs overseas, leaving hundreds of thousands, if not millions of workers unemployed. Throughout the book, Rozelle includes telling anecdotes of workers who lost their job and couldn't get a new one because they didn't have the required skillset.

...We fell into conversation with one man in particular. Mr. Wang was about thirty-five and had recently been laid off from his job of eighteen years as an electrician.

As we watched, Mr. Wang gamely sidled up to the first booth, which represented a bank. The man running the booth, a human resources staff member, shook his hand and asked him to read a short page of text and comment on what he understood. Mr. Wang stared hard at the piece of paper, trying to force the characters into an order he could understand. He knew all the words and could read them aloud, but many of the terms were over his head. He kept getting confused by the logic of the sentences. For several minutes he stared at it, trying to make sense of what he was seeing. At first the man running the booth was patient and seemed sympathetic, but gradually the bank representative lost interest, turned back to his desk, and started looking around for workers who might be a better bet. After a few more minutes, Mr. Wang lowered his head with a sigh: “I’m sorry, I just can’t make sense of it.”

As it turns out, simply increasing access and/or mandating a high school education wouldn't be enough. The rural/urban divide in China runs deep. The opportunity & outcome differential comes not only from unequal access to education and other outgrowths of the hukou policy, but also from deep-seated systemic differences. 'Vocational' schooling is a joke:

Tao’s middle school teacher suggested he check out the new vocational high school that had recently opened across town... When he arrived on campus a week later, he found himself in an unfamiliar world. Whereas his middle school had been orderly and regimented, here chaos reigned. The older students were tough-looking, with tight jeans, black pleather jackets, and spiked hair. As he walked to class on his first day, there were no adults in sight. He passed groups of kids hanging out in the courtyard, smoking cigarettes and laughing.

Children who grow up in rural China suffer from three "invisible epidemics", as Rozelle calls them: anemia, uncorrected myopia, and parasitic intestinal worms. Iron-deficiency anemia has been linked to dramatic decreases in IQ, myopia (for obvious reasons) impedes learning ability, and worms almost literally sap the life force from their hosts. Luckily, cheap and safe interventions can fix these issues. Iron supplements, subsidized glasses, and cheap, common medicine could easily bring the health of a Chinese kid in the country to within striking distance of one in Beijing.

Perhaps a harder problem to fix is that of crucial stimulation of children during their formative developmental years. Rural parenting strategies are unequipped to give a child sufficient intellectual stimulation.

In rural China, babies are systematically missing out on the mental stimulation they need. When we asked rural families if they ever talked to their babies, we were met with blank looks or bemused smiles. “Why would I talk to my baby?” one young mother responded, giggling into her hand. “She can’t talk back!”

Yet, this leads to delayed infant development, and the permanent stunting of a child's ability relative to their peers. Rural chinese babies score horribly on the Bayley test, while urban Chinese children score higher than average. This seems to be much of the issue.

For the sake of China and the world, Rozelle hopes that China will be able to fix its human capital crisis. He is somewhat bullish: the Chinese government is beginning to recognize that this is an issue, and some sane policies have been implemented. But he wants more to be done. Maybe China can pull off an economic miracle again, but this seems to be a major roadblock.

Review | Radical Markets

October 14, 2023

Radical Markets? More like Activist Markets

The Atlas Fellowship partners with Impact Books to give fellows access to a curated collection of books worth reading. One of these books is Radical Markets. The first two chapters are worth reading, the rest are not.

Posner & Weyl make five proposals to "uproot capitalism and democracy for a just society":

partial common ownership of land, mediated by auctions,
quadratic voting,
individual visa sponsorship,
demonopolizing capital, but only 'within industries',
treating data as labor.

Each has its own chapter, and each chapter is meant to stand alone as a defense of its policy. Yet the latter three fall flat.

What Radical Markets does well: a coherent exposition of Georgism, 'radical' applications of auctions to the uninitiated, and the first decent defense of quadratic voting from first principles I've read. What it falls victim to: left-right political reductionism, one-size-fits-all solutions to coordination problems, and hefty claims partnered with weak evidence.

Property as Monopoly

The first chapter is the best chapter. It popularizes the Common Ownership Self-assessed Tax (COST) (aka the Harberger Tax) as a plausible mechanism by which an auction-based property market could be implemented. Coupled with concrete policy proposals and effect analyses, it is a self-contained introduction to modern-day Georgism and disrupting the current 'property monopoly' status quo.

What is a COST? In Harberger's own words:

If taxes are to be levied...on...the value of ... properties ... it is important that assessment procedures be adopted which estimate the true economic value ... The economist's answer... is simple and essentially fool-proof: allow each...owner.. to declare the value of his own property, make the declared values ... public, and require that an owner sell his property to any bidder... willing to pay... the declared value. This system is simple, self-enforcing, allows no scope for corruption, has negligible cost of administration, and creates incentives, in addition to those already present in the market, for each property to be put to that use in which it has the highest economic productivity.

- Arnold Harberger, Chile 1962

Essentially, tax properties based on their value as assessed by the property owner. To ensure that the self-assessed values are accurate representations of the owners' valuation, make all of the valuations public and mandate that a property be sold if someone outbids the declared value.

While Harberger designed his scheme as a way to raise government revenue, it offers an inspired solution to the monopoly problem we highlighted above.

Setting the tax rate requires trading off between allocative and investment efficiency w.r.t. the properties being taxed. If the COST is set at the turnover rate (the probability at which the asset changes hands) then on the margin the property owner's gains from price increases are exactly offset by tax increases -- incentivizing accurate valuations. This maximizes allocative efficiency.

However, such a high tax rate disincentivizes investment. Consider the case of a property owner who by investing $20,000 in his home can raise its value from $40,000 to $70,000. If this home has a turnover rate of 50%, and the COST is set to 50%, this owner will not invest in his home. A $10,000 profit combined with a $15,000 tax increase is not a good financial decision.

Disincentivizing property investment is bad. Private property rights are meant to avert the tragedy of the commons and encourage investment. Any alternative will have to do the same. As such, the socially optimal COST rate is less than the turnover rate.

The socially optimal property COST is non-zero though!

One might assume that the loss in allocative efficiency would offset the gain in investment efficiency. However -- and this is a key point -- the opposite happens. When the tax is reduced incrementally to improve investment efficiency, the loss in allocative efficiency is less than the gain in investment efficiency... In fact, it can be shown that the size of the social loss from monopoly power grows quadratically to the extent of this power. Thus, reducing the markup by a third eliminates close to 5/9...of the allocative harm from private ownership.

One can conceptualize higher COSTs as greater public ownership of private property, as they transfer use value to the public and increase the number of possible owners of the property. In some sense, higher COSTs mean a freer market -- more participation, more competition.

How could COSTs be implemented? Countless details would have to be worked out, most in practice, but the salient issues seem to be setting the correct tax rates and developing the necessary infrastructure for frictionless transactions. While the latter is mostly a technological and system design issue (an engineering problem! just an engineering problem), the former is an ideological one. What is the optimal tradeoff?

For typical assets, we estimate that turnover once every fourteen years is reasonable and thus (combined with other factors below) a 7% tax annually is a good target.

Posner & Weyl probably get their 7% from the median length of American homeownership being 13.2 years as of 2022. Note that this is essentially setting the COST at the turnover rate, maximizing allocative efficiency at the cost of investment incentivization. They then claim that:

At the tax rate we advocate, asset prices would fall by between a third and two-thirds from their current level. In popular and congested areas like San Francisco and Boston, where very modest houses sell for $600,000 or more, their price could fall to as low as $200,000.

Where are these numbers coming from? It seems plausible that prices would fall this much, considering that a 7% COST with a 14 year turnover rate would tax the owner $588,000 and as such they would be incentivized to lower the price of their home to find other buyers faster, leading to lower turnover rates overall and higher profits for owners. Unclear what the underlying model is though.

An important point that they make in footnote 47 of Chapter 1:

To make our account vivid we discuss some examples of personal possessions of individuals, like homes and cars, but the reader should keep in mind that most assets are owned by businesses and thus much of the participation and benefits from a COST would be through business assets.

The authors propose starting with implementing COSTs for publicly held assets to enter the private market. For instance, spectrum licenses:

...redesigning spectrum licenses to include a COST-based license fee would solve [the current misallocation of American spectrum] and could be implemented in a variety of ways consistent with existing FCC rules. This approach, which they call "depreciating licenses," would address many recent complaints about license design for the newly available 3.5 GHz bands of the spectrum; their small geographic scope and short durations under current plans were intended to maximize flexibility but may undermine investment incentives.

COSTs on Internet domains, grazing rights, and natural resource leases to name a few also make similar amounts of sense. But, a much broader implementation would reap much greater rewards:

As we noted above, the economy underperforms by as much as 25% annually because of the misallocation of resources to low productivity firms. A fully implemented COST could increase social wealth by trillions of dollars [emphasis mine] every year...

At the rate of roughly 7% annually that we imagine being near-optimal, a COST would raise roughly 20% of national income. About half of that money would suffice to eliminate all existing taxes on capital, corporations, property, and inheritance...and to wipe out the budget deficit and significantly reduce debt, further stimulating investment.

Where are these numbers coming from?? I don't doubt that the gains would be large, or that this would be a more efficient taxation regime, but where is the substantiation?? The models are neither in the text of the book nor in the footnotes, and I don't think it should be on the reader to fact check these claims?

Regardless of this chapter's issues with providing evidence commensurate to its claims, the underlying proposal of a COST is innovative and, according to John Halstead, perhaps the most serious intellectual challenge to the idea of private property in history. Worth the read.

Radical Democracy

In two words: quadratic voting.

In this chapter, we will show that these two elements -- the capacity to save up voting power, and the square root function -- would be a much-needed cure to the pathologies of the traditional voting systems used in democracies.

Why quadratic voting? Why not giving people votes proportional to the cube root of their credits, or the logarithm of their credits? What property does a quadratic have that no other function does?

Its derivative is linear. No other function would make the marginal cost of casting an extra vote proportional to the number of votes cast. We make decisions on the margin -- if one voter cares three times as much as another voter about an issue, they will be willing to pay three times as much for an extra vote, and as such buy three times as many votes than the other. No other function would lead to the same outcome.

QV achieves a perfect balance between the free-rider and the tyranny of the majority problems. If the cost of voting increased more steeply, say, as the fourth power of votes cast, those with strong preferences would vote too little and we would revert to a partial tyranny of the majority. If the cost of voting increased more slowly, those with intense preferences would have too much say, as a partial free-rider problem would prevail.

Radical Markets was my first introduction to this explanation, and for this I am grateful. I think this chapter is worth reading just for this, but your mileage may vary.

QV in practice has seen limited adoption. A version is in Akasha (a Ethereum based blockchain application) where money is the currency by which voters buy their votes. Paying for votes has obvious issues in today's world -- perhaps if the world was much more egalitarian, then this would be a better system than one person, one vote.

There is good reason to believe that QV mitigates polarization as a result of allowing individuals to vote more for issues they care more about. Implementation runs into a host of engineering challenges (how to make the UI easy to use & not require excessive amounts of thinking) but I am optimistic that they are solvable.

I doubt this will gain much traction for public elections at the state and national level. It could be useful and innovative for for local and private governance. Notably, the Colorado House Democratic caucus used it in 2019 to decide on their legislative priorities, and Taiwan has flirted with QV in their digital democracy.

Uniting the World's Workers

Subtitled: REBALANCING THE INTERNATIONAL ORDER TOWARD LABOR

The basic argument: economic freedom is necessary for economic growth. Gains from trade have been the primary engine by which this occurs, but as intranational inequality is less severe than international inequality gains from migration should be considered instead. Expanding existing legal migration channels is insufficient and possibly detrimental to natives. Therefore, countries should auction visas and let individuals sponsor migrants. They dub this the Visas between Individuals Program, or VIP.

Bryan Caplan thinks this is his open borders proposal in disguise. I think it's where the book starts going off the deep end.

I agree with the claim that low-skilled migrants are likely a net drain on the state. As migrants are more likely to work informally and typically send large proportions of their income outside the country, they pay less taxes and consume less than natives. Naturally, this facilitates resentment (amongst a multitude of other cultural reasons, etc.), typically in existing rural, low-income, and more conservative regions.

How is the VIP going to fix this issue, exactly?? Posner and Weyl think that the money generated by the visa auction system will be enough:

Suppose that OECD countries accepted enough migration to increase their populations by a third. Suppose too that migrants on average bid $6,000 per year for a visa. This sum seems plausible that Mexican migrants to the United States gain more than $11,000 annually under the current highly inefficient system. Average GDP per capita in OECD countries is $35,000, so this proposal would boost the national income of a typical OECD country by almost 6%, comparable to their growth in real income per person in the last five years.

This is highly optimistic, and assumes that both the demand and quota for visas is at least equivalent to the population of a given OECD country. For America, that's 300 million migrants willing to pay $6,000 for a visa per year. This won't even give you 6% income growth because the GDP per capita in America is so high.

The population of Mexico isn't even 300 million! The likelihood of most migrants having $6,000 on hand is almost nil! And this money goes to the government, not to private individuals directly -- many individuals will not make the connection between increased funding for public services and the massive, massive amounts of migrants this proposal would apparently bring.

Even if migrants would arrive in the numbers that the authors propose, it would utterly, completely, irreversibly change the cultural landscape of the developed world. We are considering migration in numbers such that for every native American there would be a non-native migrant -- statistically low-skilled and with poor English. All the typical conservative arguments migration would be multiplied ten-fold, and they would have a point.

But individuals can sponsor visas! Moreso, they can sponsor migrants and get a cut of their earnings such that the migrant earns less than minimum wage! And this isn't wage slavery because the migrant would be making more than he would have in his home country, so it makes sense!

The risk of exploitation is minimal because foreign workers are protected by the same health, safety, labor, and employment laws that Americans benefit from, and foreign workers can return to their home country if the employer mistreats them.

Who enforces these health and safety laws? Americans. What are the economic benefits from arriving from, say, Haiti to the United States? About thirty-fold? What is the likelihood that the system would be abused, ala the Jim Crow South? And even then, what is the likelihood that the migrant would voluntarily return to their home country??

Many people may object to this system. Perhaps to some readers it is uncomfortably similar to indentured servitude, even though migrants would be free to leave at any time. Or perhaps it just seems exploitative.

YES.

Oh, but don't worry about any of this, because 100 million people will want to sponsor visas.

Imagine then, that 100 million people sponsor migrant workers. Currently, there are about 45 million foreign-born people in the United States. Of those, about 13 million are legal noncitizens, and 11 million are illegal aliens. If our program replaced existing migrant worker visas [which it wouldn't], the number of migrant workers would increase dramatically, from 24 million to 100 million, but not in a way that would disrupt society and overwhelm public services. [sure] It would leave the ratio of foreign-born to natives in the United States below the numbers in even the most restrictive GCC countries.

To be clear, 22.7% of Americans would be foreign born, or a little bit less than one in every four American residents would be migrants. Yes, this is a potentially sustainable foreign population, but at once? These GCC countries being used as a reference class are members of the Gulf Cooperation Council. Members include the UAE, Qatar, Kuwait, Bahrain, Oman, and Saudi Arabia. Countries positively praised for their labor rights record.

A more sane reference class would be Australia and New Zealand, which have about one foreigner for every two natives (admittedly, which are mentioned as examples). Where did they come from? According to Wikipedia:

After World War II Australia launched a massive immigration program, believing that having narrowly avoided a Japanese invasion, Australia must "populate or perish". Hundreds of thousands of displaced Europeans migrated to Australia and over 1,000,000 British subjects immigrated under the Assisted Passage Migration Scheme... The scheme initially targeted citizens of all Commonwealth countries; after the war it gradually extended to other countries such as the Netherlands and Italy.

In other words, Australia heavily enouraged white immigration (see the White Australia policy) until the early 1970s, at which point it was comfortable accepting large amounts of migrants. Skilled migrants, I might add. There are only about 1.5 million temporary migrants in Australia (compared to about 7.5 million total foreign-born Australian residents), which is an imperfect yet decent upper bound on the number of low-skilled Australian migrants.

There is no developed country which has managed to have a large foreign-born low-skilled migrant population enter over a short period of time and provide them the same standard of living or the same rights as natives. The GCC has similar proportions of low-skilled workers and an atrocious human rights record; Australia and New Zealand manage to treat migrants decently but with a predominantly high-skilled population.

These are arguments against any open-borders policy that advocates for a large influx of low-skilled immigrants over a short period of time. But the VIP is doubly stupid because it places the liability of migrant well-being on low-skilled natives, the very same people who are the least well equipped to handle it.

Anthony is a native Ohioan, Bishal is a hypothetical migrant under the VIP:

In our case, Anthony will be required to obtain basic health insurance for Bishal before he arrives (though this would come out of Bishal's earnings). If Bishal is unable to find work, Anthony must support him for as long as he remains in the country...If Bishal commits a crime, he will be deported after serving his sentence; Anthony will be required to pay a fine. If Bishal disappears, Anthony will also be fined. We do not think that the fine needs to be large, but it should sting.

100 million Americans are supposed to sign up for this??

If you want a defense of open borders & greatly expanded immigration, you can read Caplan's comic book. Not this.

Dismembering the Octopus

This chapter advocates for letting BlackRock invest in Uber but not Lyft, Coca-Cola but not Pepsi, and McDonald's but not KFC. Anti-trust for institutional capital. A good idea in theory, but I have doubts about the practical implementation.

Who are the institutional investors, anyway? They include companies that manage mutual funds and index funds, asset managers, and other firms that buy and hold equities on behalf of their customers. The largest names are those we mentioned above: Vanguard, BlackRock, State Street, and Fidelity.

26% of the public American stock market is held by institutional investors. As institutional investors are the only single entities with large enough amounts of capital to own significant portions of centibillion dollar companies, they have an outsized influence on industry policy. And because they own significant portions of companies which are nominally supposed to be competing in the same industries, it is in their interest to stifle competition within those industries, creating pseudo-cartels negatively affecting consumers.

As an example, look at banking. BlackRock, Vanguard, State Street, and Fidelity are in the top five largest shareholders of JP Morgan Chase, Bank of America, Citigroup, Wells Fargo, U.S. Bank, and PNC Bank. If the same entities own the companies that are nominally supposed to be competing, the incentives to compete are diminished & monopolistic tendencies being to crop up. Think of Rockefeller after Standard Oil broke up -- he still was the owner of the subsidiary companies in each state, and this allowed him to multiply his wealth far beyond what would have been possible if Standard Oil stayed as one company.

Monopolies are bad for consumers! This is what anti-trust law was made to combat! We already bring anti-trust litigation against firm mergers which will increase market concentration, why not do the same for capital investments?

A simple but Radical reform can prevent this dystopia: ban institutional investors from diversifying their holdings within industries while allowing them to diversity across industries. BlackRock would own as much as it wants of (say) United Airlines, but it would own no stake in Delta, Southwest, and the others. It would also own as much as it wants of Pepsi, but not Coca-Cola and Dr. Pepper. And it would own as much as it wants of JP Morgan, but none of Citigroup and the other banks...

Our approach can be stated as a simple rule:

No investor holding shares of more than a single effective firm in an oligopoly and participating in corporate governance may own more than 1% of the market.

I agree with this policy, in principle. However, it seems difficult to disambiguate industries cleanly. In today's world, where a relatively small number of massive corporations control disproportionate amounts of the economy, it seems at least somewhat viable. But what of venture capital firms investing in multiple competing startups? Would they be required to divest from one of them if both become unicorns? How big would an industry have to be for the FTC to care about regulating it?

The authors raise the issue of the lack of usage of anti-trust in local markets. For instance:

...sociologist Matthew Desmond suggests that landlords in poor neighborhoods often buy up enough housing to have substantial power to drive up rents by holding units vacant and artificially depressing supply. Yet as far as we know, no antitrust case has ever been brought against such local but potentially devastating attempts at monpolization.

Yes, property is easy to point to as an instance of monopolistic behavior ruining lives. I am wary, however, of giving regulators even more power than they currently have. Property market regulations seem to be, on the whole, good and necessary (or at least, good regulation has the potential to have a large positive impact). But letting regulators control the behaviors of small businesses seems bad? Perhaps giving business owners more leeway to bring civil suits against others engaging in anti-competitive practices is the way to go.

Finally, the authors address the digital economy:

Antitrust authorities, who are accustomed to worrying about competition within existing, well-defined, and easily measurable markets, have allowed most mergers between dominant tech firms and younger potential disrupters to proceed. Google was allowed to buy mapping startup Waze and artificial intelligence powerhouse DeepMind; Facebook to buy Instagram and WhatsApp; and Microsoft to buy Skype and Linkedin.

Their solution:

To prevent this dampening of innovation and competition, antitrust authorities must learn to think more like entrepreneurs and venture capitalists, seeing possibilities beyond existing market structures to the potential markets and technologies of the future, even if these are highly uncertain.

Much easier said than done. The day a government regulating body has the same market foresight as the entrepreneurs and investors in the market, I will eat my hat.

Meh. Not as groundbreaking as the first two proposals, and not as patently idiotic as the third. It is surprising that they advocate for a drastic expansion in market regulation in a book nominally focused on relying on free market forces to bring about good, but oh well.

(this is where the Activist Markets label came from)

Data as Labor

Pay people for their data. Never mind that the marginal benefit of your data to any given firm is negligble. Because machine learning systems need large amounts of data and have spiky loss curves, diminishing marginal returns for data doesn't count here!

(I am not joking, this is the argument Posner & Weyl make)

However, if later, harder problems are more valuable than earlier, easier ones, then data's marginal value may increase as more data become available. A classic example of this is speech recognition. Early ML systems for speech recognition achieved gains in accuracy more quickly than did later systems. However, a speech recognition system with all but very high accuracy is mostly useless, as it takes so much time for the user to correct the errors it makes. This means that the last few percentage points of accuracy may make a bigger difference for the value of the system than the first 90% does. The marginal value grows to the extent that it allows this last gap to be filled.

Maybe the concept of scale, scale, scale wasn't as prominent in their time as it is nowadays? Two issues: model capabilities grow logarithmically with the amount of data they are trained on, and oftentimes getting from 99% to 99.9% requires as much data as getting from 0% to 99%. The idea that the marginal datapoint has anything but negligible value is absurd.

Paying people for data that's used to finetune models, sure. Paying people to RLHF models, sure. Paying people for data that's used in massive datasets -- in principle, this should probably happen, but this will come nowhere near to providing individuals with an income equivalent to their job that has just been automated.

To make a ballpark estimate of what gains we might expect, we suppose that over the next twenty years, AI that would (absent our proposal) not pay data providers comes to represent 10% of the economy. We further assume that the true share of labor if paid in this area of the economy is two-thirds, as in the rest of the economy; and that paying labor fairly expands the output of this sector by 30%, as seems quite reasonable given productivity gains accompanying fairer labor practices in the early twentieth century. Then our proposal would increase the size of the economy by 3% and transfer about 9% of the economy from the owners of capital to those of labor.

Fundamentally, the authors miss the point that AI will make labor substitutable with non-human entities. The world in five years will likely use datasets generated by AI systems, simply because the Internet is too small for the capabilities of the models we want to build. Paying people for their data is not an alternative to UBI in an increasingly automated economy.

I would critique this chapter more, but it seems like a reasonable proposal to make in the world before ChatGPT and before people realized that scaling laws held to this extent. Still not worth reading in 2023.