Donald Hoffman's "Fitness Beats Truth" (FBT) theorem proves that evolution selects against veridical perceptions. We mathematically prove FBT using the Information Bottleneck method and the Data Processing Inequality (DPI). By analyzing the Markov chain $X \to Y \to A \to F$ (World $\to$ Sensor $\to$ Action $\to$ Fitness), we demonstrate that bounded channel capacity forces a trade-off. By formulating the objective as minimizing the fitness distortion $D_{fit}$ under a tight capacity constraint $C$, the Information Bottleneck principle mathematically guarantees that the mutual information $I(X;Y)$ is driven to zero for any structural features of $X$ that do not yield gradients in the fitness landscape $F(X)$. Thus, FBT is not merely game-theoretic dominance; it is a fundamental limit of rate-distortion compression in biological networks.
Evolutionary game theory suggests truth goes extinct (Hoffman et al., 2015). We seek an algebraic proof using Information Theory, specifically utilizing the Information Bottleneck method (Tishby et al., 1999).
The perceptual cycle forms a Markov chain: $X \to Y \to A \to F$.
The Data Processing Inequality states that $I(X;F) \le I(X;A) \le I(X;Y)$. To maximize expected fitness, the organism must maximize $I(X;F)$, which requires maintaining sufficient capacity in $I(X;Y)$.
The organism has a strictly bounded channel capacity $C$. It must find an optimal encoding $p(y|x)$ that minimizes the objective functional:
$$
\mathcal{L} = I(X;Y) - \beta I(Y;F)
$$
where $\beta$ controls the tradeoff between compression and fitness relevance.
Crucially, the fitness landscape $F(X)$ is structurally orthogonal to the topological features of $X$. Because the capacity $I(X;Y)$ is highly restricted (metabolically), the optimal bottleneck solution $p^*(y|x)$ systematically annihilates any mutual information regarding the structural topology of $X$ that does not contribute to variance in $F$.
Therefore, $Y$ does not resemble $X$; it is a compressed sufficient statistic of $F$.
Fitness beats truth because any veridical mapping of structurally irrelevant features wastes precious channel capacity $C$, violating the optimal Information Bottleneck.