# Step-by-Step Mathematical Derivation

This document follows the discrete-time version of `paper_v2/the_power_game_v2.tex`. The main model is the multi-period finite-state dynamic game. The one-period, two-period, and three-period cases are used to compare mechanics and build intuition.

Current structure: four propositions and one theorem in the main text.

## Modeling Choice

The main text should use the discrete dynamic game. The reason is not taste: the
homogeneous-Bertrand repeatable payoff is kinked at parity, and finite simultaneous-move
games can have mixed or multiple equilibria. A continuous-time HJB presentation can make
those issues look like differentiability problems, but they are economic selection problems.

Continuous time remains useful as a small-period limit. It delivers the local marginal
formula
\[
x(g)=\frac{[V'(g)]^+}{r}
\]
and gives interpretable comparative statics for smooth approximations. It should not carry
the paper's global equilibrium or threshold claims. The object to compute or estimate is the
finite-state Bellman-Nash system after choosing a state grid, an action grid, and an
equilibrium-selection rule.

## Horizon Comparison

The one-period model is not the final model. It answers one narrow question: does inactive parity survive when a firm can make an arbitrarily small training move?

The two-period model adds the first continuation value. Training changes today's payoff and tomorrow's state.

The three-period model repeats the same recursion and shows how backward induction works mechanically.

The model the paper should use is the finite/infinite dynamic game. It is the smallest version that contains persistent racing, compute scarcity, Markov-perfect equilibrium, and equilibrium-selection issues.

## Two Faces

Household quality is
\[
C(q)=\int e^{-\lambda(q)n}v(dn),\qquad \lambda(q)=e^{-q}.
\]

Step 1. As \(q\to\infty\), \(e^{-q}\to0\), so \(\lambda(q)\to0\).

Step 2. For every fixed task length \(n\),
\[
e^{-\lambda(q)n}\to e^0=1.
\]

Step 3. Because \(\lambda(q)\downarrow0\), the integrand increases monotonically to one. Therefore
\[
\lim_{q\to\infty}C(q)=\int 1\,v(dn)=\bar C.
\]

Step 4. Differentiate the integrand:
\[
\frac{d}{dq}e^{-\lambda(q)n}
=e^{-\lambda(q)n}\{-n\lambda'(q)\}.
\]
Since \(\lambda'(q)=-e^{-q}\),
\[
\frac{d}{dq}e^{-\lambda(q)n}
=ne^{-q}e^{-\lambda(q)n}.
\]

Step 5. Integrate:
\[
C'(q)=e^{-q}\int n e^{-\lambda(q)n}v(dn).
\]

Step 6. Since \(e^{-\lambda(q)n}\le1\),
\[
0\le C'(q)\le e^{-q}\int n\,v(dn)=e^{-q}\bar N\to0.
\]

Producer value is
\[
A(q)=\max_{n>0}bn^ae^{-\lambda(q)n}.
\]

Step 1. Take logs:
\[
\log b+a\log n-\lambda(q)n.
\]

Step 2. Differentiate with respect to \(n\):
\[
\frac{a}{n}-\lambda(q).
\]

Step 3. Set equal to zero:
\[
\frac{a}{n}=\lambda(q)
\quad\Rightarrow\quad
n^*(q)=\frac{a}{\lambda(q)}.
\]

Step 4. Substitute \(\lambda(q)=e^{-q}\):
\[
n^*(q)=ae^q.
\]

Step 5. The second derivative is
\[
-\frac{a}{n^2}<0,
\]
so this is the maximum.

Step 6. Substitute \(n^*(q)\):
\[
A(q)=b\left(\frac{a}{\lambda(q)}\right)^a e^{-a}.
\]

Step 7. Substitute \(\lambda(q)=e^{-q}\):
\[
A(q)=b(a/e)^ae^{aq}\equiv Ke^{aq}.
\]

Step 8. For two firms,
\[
\frac{A(q_1)}{A(q_2)}
=\frac{Ke^{aq_1}}{Ke^{aq_2}}
=e^{a(q_1-q_2)}
=e^{ag}.
\]

## One-Period Payoff

The gap update is
\[
g'=(1-\mu)g+x_1-x_2.
\]

Training quantities are
\[
T_1=x_1^2/2,\qquad T_2=x_2^2/2,
\]
so total training is
\[
T=\frac{x_1^2+x_2^2}{2}.
\]

Residual serving compute is
\[
S=\bar Q-T.
\]

Constant household spending gives rental price
\[
R(T)=\frac{\bar W}{S}=\frac{\bar W}{\bar Q-T}.
\]

Laboratory 1's payoff is
\[
u_1(g,x_1,x_2)=\pi(g')-R(T)x_1^2/2.
\]

At parity with both firms inactive,
\[
g=0,\quad x_1=x_2=0,\quad g'=0,\quad T=0.
\]
The payoff normalization gives
\[
\pi(0)=0,
\]
so
\[
u_1(0,0,0)=0.
\]

Now let firm 1 deviate to \(x_1=\varepsilon>0\), while \(x_2=0\). Then
\[
g'=\varepsilon,
\qquad
T=\varepsilon^2/2.
\]
The deviation payoff is
\[
u_1(0,\varepsilon,0)
=\pi(\varepsilon)-R(\varepsilon^2/2)\varepsilon^2/2.
\]

Because \(R\) is continuous at zero,
\[
R(\varepsilon^2/2)=R(0)+O(\varepsilon^2)=r_0+O(\varepsilon^2).
\]
Thus
\[
R(\varepsilon^2/2)\varepsilon^2/2
=\frac{r_0}{2}\varepsilon^2+O(\varepsilon^4).
\]

For small positive \(g\),
\[
\pi(g)=\pi'_+(0)g+o(g).
\]
Substitute \(g=\varepsilon\):
\[
\pi(\varepsilon)=\pi'_+(0)\varepsilon+o(\varepsilon).
\]

Therefore
\[
u_1(0,\varepsilon,0)-u_1(0,0,0)
=\pi'_+(0)\varepsilon-\frac{r_0}{2}\varepsilon^2+o(\varepsilon).
\]

The right derivative is
\[
\pi'_+(0)=\omega a(1-3s/4).
\]
Since \(s\in[0,1]\),
\[
1-3s/4\ge1/4>0.
\]
So \(\pi'_+(0)>0\), and the deviation is profitable for sufficiently small \(\varepsilon\).

On a finite action grid, the smallest positive action is fixed. Inaction is ruled out only when
\[
\pi(\varepsilon)>R(\varepsilon^2/2)\varepsilon^2/2.
\]
If the grid is too coarse, inaction can be a grid artifact.

## Two-Period Model

Let \(V_{i,0}(g)=0\).

With one period remaining,
\[
U_{i,1}(g,x_1,x_2)=u_i(g,x_1,x_2)+\beta V_{i,0}(g').
\]
Since \(V_{i,0}=0\),
\[
U_{i,1}(g,x_1,x_2)=u_i(g,x_1,x_2).
\]
Solving the one-period matrix game at each state gives \(V_{i,1}(g)\).

With two periods remaining,
\[
U_{i,2}(g,x_1,x_2)
=u_i(g,x_1,x_2)+\beta V_{i,1}(g').
\]
This is the first genuinely dynamic case. The firm now values training because it affects \(g'\), and \(g'\) affects tomorrow's value \(V_{i,1}(g')\).

## Three-Period Model

With three periods remaining,
\[
U_{i,3}(g,x_1,x_2)
=u_i(g,x_1,x_2)+\beta V_{i,2}(g').
\]
There is no new equation. The three-period case repeats the same logic:

1. solve the one-period game to get \(V_{i,1}\);
2. solve the two-period game to get \(V_{i,2}\);
3. use \(V_{i,2}\) in the three-period continuation game.

This is why three periods are useful pedagogically: they show that multi-period dynamics are just repeated continuation-value substitution.

## Finite-Horizon Recursion

For a horizon \(H\), define terminal values
\[
V_{i,0}(g)=0.
\]
For \(h=1,\ldots,H\), define
\[
U_{i,h}(g,x_1,x_2)
=u_i(g,x_1,x_2)+\beta V_{i,h-1}(g').
\]

At every state \(g\), the action sets are finite, so \(U_{1,h},U_{2,h}\) form a finite normal-form game. A mixed Nash equilibrium exists. Selecting one equilibrium gives strategies
\[
\sigma_{1,h}(g),\sigma_{2,h}(g).
\]
The value is
\[
V_{i,h}(g)
=\mathbb E_{\sigma_{1,h}(g),\sigma_{2,h}(g)}
\left[U_{i,h}(g,x_1,x_2)\right].
\]
Applying this recursively gives a finite-horizon Markov-perfect equilibrium.

## Infinite-Horizon Finite-State Game

Let \(G\) be a finite state grid and \(X\) a finite action grid. Let
\[
P(g''\mid g,x_1,x_2)
\]
be the transition probability from current state \(g\) to next state \(g''\).

Given values \(V_1,V_2\), define continuation payoff matrices
\[
A_i(g;x_1,x_2)
=u_i(g,x_1,x_2)
+\beta\sum_{g''\in G}P(g''\mid g,x_1,x_2)V_i(g'').
\]

At equilibrium, the mixed action profile at state \(g\) must be a Nash equilibrium of this finite game. Values must satisfy
\[
V_i(g)
=\mathbb E_{\sigma_1(g),\sigma_2(g)}
\left[A_i(g;x_1,x_2)\right].
\]

These are the Bellman-Nash fixed-point conditions. Since the game has finite states, finite actions, bounded payoffs, and \(\beta<1\), Fink's theorem gives a stationary mixed Markov equilibrium.

## Compute-Market Discipline

Total training is
\[
T=\frac{x_1^2+x_2^2}{2}.
\]
Residual compute is
\[
S=\bar Q-T.
\]
Rental price is
\[
R(T)=\frac{\bar W}{\bar Q-T}.
\]
Training expenditure is
\[
E_T=R(T)T.
\]
Substitute \(R(T)\):
\[
E_T=\frac{\bar W}{\bar Q-T}T
=\frac{\bar WT}{\bar Q-T}.
\]

If \(T\le\bar Q-\delta\), then
\[
R(T)\le\frac{\bar W}{\delta},
\]
and
\[
E_T\le\frac{\bar W(\bar Q-\delta)}{\delta}.
\]
Therefore unbounded rental rates or training expenditure require
\[
T\uparrow\bar Q.
\]

## Composition

The logistic function is
\[
L(z)=\frac1{1+e^{-z}}.
\]
Its derivatives at zero are
\[
L(0)=1/2,\quad L'(0)=1/4,\quad L''(0)=0,\quad L'''(0)=-1/8.
\]
So
\[
L(z)=1/2+z/4-z^3/48+O(z^5).
\]
Set \(z=\theta ag\):
\[
L(\theta ag)-1/2
=\frac{\theta a}{4}g-\frac{(\theta a)^3}{48}g^3+O(g^5).
\]

The repeatable Bertrand term is
\[
2\sinh(ag/2)^+.
\]
For \(g<0\), this equals zero. For \(g>0\),
\[
2\sinh(ag/2)
=2\left(\frac{ag}{2}+\frac{(ag/2)^3}{6}+O(g^5)\right)
=ag+\frac{a^3g^3}{24}+O(g^5).
\]
Thus
\[
\pi'_-(0)=\omega\frac{s\theta a}{4},
\]
and
\[
\pi'_+(0)=\omega\left[s\frac{\theta a}{4}+(1-s)a\right]
=\omega a\left(1-s+\frac{s\theta}{4}\right).
\]
The kink size is
\[
\pi'_+(0)-\pi'_-(0)=\omega a(1-s).
\]
For \(s<1\), the payoff is kinked at parity.

## Access Pricing

Producer value is
\[
A(q)=Ke^{aq}.
\]
For leader \(q_L\) and follower \(q_F\),
\[
\frac{A(q_L)}{A(q_F)}
=\frac{Ke^{aq_L}}{Ke^{aq_F}}
=e^{a(q_L-q_F)}
=e^{ag}.
\]

Static indifference requires
\[
\frac{A(q_L)}{P_0}=A(q_F).
\]
Solving,
\[
P_0=\frac{A(q_L)}{A(q_F)}=e^{ag}.
\]
Raw-unit markup is
\[
P_0-1=e^{ag}-1.
\]
Ad-valorem margin is
\[
\frac{P_0-1}{P_0}=1-e^{-ag}.
\]
Gross expenditure \(E\) buys \(E/P_0\) raw units, so static rent is
\[
E-E/P_0=E(1-e^{-ag}).
\]
With leakage cost \(\ell(E)\Delta V(g)\), sale occurs iff
\[
E(1-e^{-ag})\ge \ell(E)\Delta V(g).
\]

## Continuous-Time Approximation

Let the period length be \(\Delta\), and set
\[
\beta=e^{-\rho\Delta}=1-\rho\Delta+O(\Delta^2).
\]
Write the scaled gap law as
\[
g'=g+\Delta(x_1-x_2-\mu g).
\]
Let
\[
d=x_1-x_2-\mu g.
\]
Then
\[
g'=g+\Delta d.
\]
Taylor expand:
\[
V(g+\Delta d)=V(g)+\Delta dV'(g)+O(\Delta^2).
\]

The discrete Bellman equation is
\[
V(g)=\max_{x_1\ge0}
\left\{
\Delta\left[\pi(g)-rx_1^2/2\right]
+\beta V(g+\Delta d)
\right\}.
\]
Substitute the expansions:
\[
V(g)=\max_{x_1\ge0}
\left\{
\Delta\left[\pi(g)-rx_1^2/2\right]
+(1-\rho\Delta)\left[V(g)+\Delta dV'(g)\right]
+O(\Delta^2)
\right\}.
\]
Expand:
\[
V(g)=\max_{x_1\ge0}
\left\{
V(g)+\Delta\left[\pi(g)-rx_1^2/2+dV'(g)-\rho V(g)\right]
+O(\Delta^2)
\right\}.
\]
Subtract \(V(g)\), divide by \(\Delta\), and take \(\Delta\to0\):
\[
0=\max_{x_1\ge0}
\left\{
\pi(g)-rx_1^2/2+(x_1-x_2-\mu g)V'(g)-\rho V(g)
\right\}.
\]
Rearrange:
\[
\rho V(g)=\pi(g)+\max_{x_1\ge0}
\left\{x_1V'(g)-rx_1^2/2\right\}
-x_2V'(g)-\mu gV'(g).
\]
In a symmetric Markov profile, \(x_2=x(-g)\), giving
\[
\rho V(g)=\pi(g)+\max_{x\ge0}
\left\{xV'(g)-rx^2/2\right\}
-x(-g)V'(g)-\mu gV'(g).
\]
The derivative of the maximand is
\[
V'(g)-rx.
\]
If \(V'(g)>0\), the maximizer is
\[
x=V'(g)/r.
\]
If \(V'(g)\le0\), the constraint \(x\ge0\) binds and \(x=0\). Therefore
\[
x=[V'(g)]^+/r.
\]
