Skip to content
Go back

Clearly Explaining: The First Law of Complextropy [1/30]

· 8 min read

TL;DR

The Paradox: We intuitively feel the universe gets more complex (stars, life) before dying out (heat death), yet the Second Law of Thermodynamics says Entropy (disorder) only ever increases.

The Insight: Standard definitions of complexity fail here because they either treat random noise as complex (Kolmogorov) or they ignore time limits (Sophistication).

The Solution: Scott Aaronson proposes Complextropy: defining complexity as the shortest program that can generate a state efficiently (in polynomial time). By acknowledging that computation is not free, we mathematically recover the “bell curve” of history: Simple Start \to Complex Middle \to Simple End.

The Question

Scott Aaronson once attended a conference on time—FQXi’s Setting Time Aright—where physicist Sean Carroll opened with a simple but profound observation.

We know the Second Law of Thermodynamics: Entropy (disorder) increases monotonically with time. The universe starts in a low-entropy state (singular, ordered) and moves towards a high-entropy state (heat death, uniform disorder).

But have you noticed that “complexity”—or “interestingness”—doesn’t follow that same straight line?

Complexity seems to follow a bell curve. It starts low, rises to a peak (where we are now), and eventually falls back to zero.

The question is: Is there a law of physics for this? Can we mathematically define this “Complexity” such that it provably follows this curve?

Complexity Entropy Graph


The Coffee Cup Analogy

To visualize this, imagine a cup of coffee. You carefully pour some cream on top but don’t stir it yet.

  1. The Start (Separated): The coffee is black, the cream is white. They are perfectly separated. This is a “low entropy” state because it’s highly ordered. It’s also simple—you can describe it easily: “Cream on top, coffee on bottom.”
  2. The Middle (Mixing): You stir it once. Tendrils of white cream swirl into the black coffee, creating intricate fractals and complex geometric patterns. The entropy has increased, but so has the complexity. To describe this state, you’d need to trace every little swirl.
  3. The End (Mixed): You keep stirring. The liquid becomes a uniform light brown. The entropy is now maximized (it can’t get any more mixed). But the complexity is gone. It’s simple again: “Uniform light brown liquid.”

The Paradox: Standard physics has a great definition for Entropy (which goes up effectively forever), but no standard definition for this “Complexity” (which goes up and then down).

Coffee Cup Analogy


Attempt 1: Kolmogorov Complexity

Scott’s first instinct was to look at Kolmogorov Complexity.

The Kolmogorov Complexity of a string xx, denoted K(x)K(x), is the length of the shortest computer program that outputs xx.

This matches our intuition for the first two stages of the coffee cup (Simple \to Complex). But it fails at the end.

The Problem: Randomness \neq Complexity

In the “Mixed” state (uniform brown coffee), the position of every individual molecule is technically random. In Kolmogorov terms, a truly random string would be the most complex thing possible, because its description cannot be compressed meaningfully. This goes against our intuition that a uniformly random string is hardly complicated.

So, K(Coffee)K(\text{Coffee}) would behave just like Entropy: LowMediumMaximalLow \to Medium \to Maximal It misses the bell curve entirely because it counts “random noise” as complexity.

The Deterministic Trap

Even if we ignore the noise, we run into a physics problem. If we assume the universe is deterministic (acting like a computer program), then the state of the universe at any time tt can be described by:

  1. The Initial State (S0S_0)
  2. The Laws of Physics (PP)
  3. The time parameter (tt)

The “program” to generate the universe today is simply: Run_Physics(S_0, t).

The length of this program is: K(Statet)K(S0)+K(P)+log(t)K(State_t) \approx K(S_0) + K(P) + \log(t)

Since S0S_0 and PP are constants, the complexity only grows with log(t)\log(t) the number of bits needed to write down the current year. This growth is microscopically slow and fails to account for the explosion of structure (stars, biology) we see today.


Attempt 2: Sophistication

To fix this, we need to separate Structure (the interesting part) from Randomness (the noise).

This brings us to a concept called Sophistication (or “Logical Depth”). The idea is to model a piece of data xx not as a single program, but as a two-part description:

  1. A Set SS: The “rule” or “model” that describes the data.
  2. An Index ii: The specific identifying number of xx within that set.

We define Sophistication, Soph(x)Soph(x), as the size of the smallest set SS that “explains” xx well. Formally:

Soph(x)=min{K(S)}such thatxSandK(xS)log2(S)cSoph(x) = \min \{ K(S) \} \quad \text{such that} \quad x \in S \quad \text{and} \quad K(x|S) \ge \log_2(|S|) - c

That inequality is the key. It says: “Identifying xx inside SS should be as hard as picking a random element from SS.”

Deep Dive: The Playlist Analogy for Soph(x)Soph(x)

Let’s break that math down with an example.

  • xx = The song “Demons” by Imagine Dragons.
  • SS = The “Radioactive” Playlist (containing 1,000 songs).

The Components:

  • K(S)K(S): The complexity of the Playlist description (e.g., “All Imagine Dragons songs”). This is the Sophistication.
  • S|S|: The size of the playlist (1,000 songs).
  • log2(S)\log_2(|S|): The bits needed to pick a random song from the list (10\approx 10 bits).
  • K(xS)K(x|S): The bits needed to pick your specific song.

The Inequality (K(xS)log2(S)cK(x|S) \ge \log_2(|S|) - c): This acts as a “Bulls**t Detector” for your set SS.

  • Bad Set (Too Simple): If SS is “All Songs in the World”, it’s too big. Describing “Demons” inside that set requires huge extra information. The inequality fails.
  • Bad Set (Overfitting): If SS is just { "Demons" }, then SS contains all the information (K(S)K(S) is huge). The inequality holds, but we minimized K(S)K(S) poorly.
  • Good Set (Sophistication): “Imagine Dragons Songs”. This set is compact (K(S)K(S) is small), but “Demons” is just a typical member of it.

Sophistication looks for that “Good Set”—the sweet spot that captures all the structure (SS) and leaves the rest as pure noise (xSx|S).

Song Playlist Analogy

Why Sophistication Fails

Sophistication is great for static objects, but it fails for our Coffee Cup evolution because it ignores computation time.

If we ask: “What is the set SS of all possible coffee cups after 5 minutes of stirring?” That description is actually very short! S={Outcomes of Run_Physics(S0,t=5mins)}S = \{ \text{Outcomes of } \text{Run\_Physics}(S_0, t=5\text{mins}) \}

The program length is tiny: Run_Physics(S_0, 5).

Because the “History” is simple, the “Structure” is technically simple (K(S)0K(S) \approx 0). A god-like computer with infinite time could see that the complex swirls are just the trivial result of a simple rule. It would say the swirls have Zero Sophistication.


The Solution: Complextropy

This brings us to Scott’s proposal: The First Law of Complextropy.

The missing ingredient was Computational Resources. Things only look “complex” to us because we are not god-like computers. We can’t verifyingly simulate fluid dynamics in our heads.

So, we replace Kolmogorov Complexity with Resource-Bounded Kolmogorov Complexity.

Complextropy is the length of the shortest computer program that can generate the distribution in a reasonable amount of time (e.g., polynomial time).

Restoring the Curve

Let’s re-examine the Coffee Cup with this “Time Limit” constraint:

  1. Start (Separated):

    • Program: Draw_Line(Cream, Coffee)
    • Time: Fast.
    • Complextropy: LOW.
  2. Middle (The Swirls):

    • To win standard Sophistication, we’d say “Just simulate physics!”.
    • BUT due to the Time Limit, we can’t simulate the physics (too slow).
    • We are forced to “hardcode” the shape of the swirls into the program itself to output it quickly.
    • Program: Draw_Spiral(x,y,...), Draw_Fractal(...)
    • Complextropy: HIGH. (Because the description length must compensate for the lack of compute time).
  3. End (Brown Soup):

    • We can’t simulate the particles (too slow).
    • But we don’t need to! We can just use a statistical shortcut.
    • Program: Return Random_Color(Light_Brown)
    • Time: Fast.
    • Complextropy: LOW.

And there we have it! By simply acknowledging that computation is not free, we mathematically recover the bell curve of complexity.


Why this matters (The Ilya Connection)

You might wonder why this abstract physics theory is on a reading list associated with Ilya Sutskever.

I don’t know for sure, but here is my best guess:

I think Complextropy is the perfect metaphor for the Bias-Variance Tradeoff in machine learning.

Consider the “coffee cup” of training a neural network:

  1. Low Complextropy (Underfitting): The model is too simple, can’t capture the true structure of the data and therefore the predictions are not very accurate. This is high bias

  2. High Entropy / Low Complextropy (Overfitting): The model is too complex. It memorizes the training data and thus performs poorly on unseen data. This is high variance.

  3. Peak Complextropy (Generalization): The model has discovered the underlying structure and patterns complex enough to accurately predict the data. This is the optimal bias-variance tradeoff. This is what we want our models to achieve.

Generalization Relation

Just like the universe is most interesting in the middle of its life, a neural network is most useful when it sits right at the peak of the Complextropy curve.


Next Up

AlexNet : ImageNet Classification with Deep Convolutional Neural Networks (2012) - Coming Soon!


References

  1. Scott AaronsonThe First Law of Complexodynamics (2011).
  2. FQXi — Setting Time Aright (FQXi’s 3rd International Conference) (2011).
  3. FQXi — Talks & Extras: Setting Time Aright (includes Sean Carroll “Opening remarks” slides).
  4. Sean CarrollTime Is Out of Joint (2011).
  5. A. N. KolmogorovThree approaches to the quantitative definition of information (1965/1968).
  6. Ming Li & Paul M. B. VitányiAn Introduction to Kolmogorov Complexity and Its Applications (Springer).
  7. Moshe KoppelComplexity, Depth, and Sophistication (1987).
  8. Luís Antunes & Lance FortnowSophistication Revisited (2007).
  9. Charles H. BennettLogical Depth and Physical Complexity (1988).
  10. Harry Buhrman, Lance Fortnow, Sophie LaplanteResource-Bounded Kolmogorov Complexity Revisited (SIAM J. Comput., 2002).
  11. Reading list (as circulated online) — Deep learning reading list from Ilya Sutskever (GitHub mirror/compilation) and Ilya Sutskever’s Top 30 Reading List (Aman.ai).

Share this post on:

Previous Post
Clearly Explaining: Playing Atari with Deep Reinforcement Learning [1/35]