Tadelis Chapter 9 multistage games

Multistage Games
The normal-form game was a model of players choosing their actions simultaneously,
without observing the moves of their opponents. The extensive form
enriches the set of situations we can model to include those in which players learn
something about the choices of other players who moved earlier, in which case a
player can condition his behavior on the moves of other players in the game.
The extensive-form games analyzed so far were characterized by having payoffs
delayed until the game reached a terminal node that was associated with the game’s
end. In reality dynamic play over time may be more complex, and it may not be correctly
modeled by one “grand” game that unfolds over time with payoffs distributed
at the end of the game. For instance, consider two firms that compete in one market,
with the flow of profits that results from their behavior; after some time they choose to
compete in another market, with a new flow of payoffs. Similarly a group of elected
lawmakers may serve on a committee, on which their behavior and votes will affect
the likelihood of their being reelected. Those who are reelected may sit on another
committee together, with different choices and different consequences.
The broad message of these examples is that players can play one game that is
followed by another, or maybe even several other games and receive some payoffs
after each one of the games in this sequence is played. In such cases two natural
questions emerge. First, if the players are rational and forward looking, should they
not view this sequence of games as one grand game? Second, if they do view these
as one grand game, should we expect that their actions in the later stages will depend
on the outcomes of earlier stages? In particular, will the players be destined to play a
sequence of action profiles that are Nash equilibria in each stage-game, or will they be
able to use future games to support behavior in the earlier stages that is not consistent
with Nash equilibrium in those early stages?
This chapter answers these questions. Before doing so, Sections 9.1–9.3 formally
show how such multistage games can be modeled as extensive-form games so that we
can use the tools developed in the previous chapter. It will be possible to answer the
two questions just posed. First, the players should indeed anticipate future games and
use them to create a richer environment for themselves. Second, they will benefit from
using future play to create incentives that constrain their behavior in earlier stages. If
players have the opportunity to condition future behavior on past outcomes, they may
be able to create credible, self-enforcing incentives to follow behavior in earlier-stage
games that would not be possible if there were no continuation games following. This
is the core insight of multistage games: if players can condition future behavior on
past outcomes then this may lead to a richer set of self-enforcing outcomes.
9.1 Preliminaries
We define a multistage game as a finite sequence of normal-form stage-games,
in which each stage-game is an independent, well-defined game of complete but
imperfect information (a simultaneous-move game). These stage-games are played
sequentially by the same players, and the total payoffs from the sequence of games
are evaluated using the sequence of outcomes in the games that are played.We adopt
the convention that each game is played in a distinct period, so that game 1 is played
in period t = 1, game 2 in period t = 2, and so on, up until period t = T, which will
be the last stage in the game.1We will also assume that, after each stage is completed,
all the players observe the outcome of that stage, and that this information structure
is common knowledge.2
In each stage-game, players have a set of actions from which they can choose, and
the profiles of actions lead to payoffs for that specific game, which is then followed
by another game, and another, until the sequence of games is over. A sequence of
normal-form games of complete information with the same players is not too hard
to conceive. Consider again the examples used earlier in this chapter, such as firms
in common markets or lawmakers interacting on a sequence of committees. To make
things concrete, let’s start with a stylized example, which is a small twist on the
familiar Prisoner’s Dilemma game.
Consider a two-stage game that we call the Prisoner-Revenge Game. Suppose that
in a first period labeled t = 1 two players from different local neighborhoods play the
familiar Prisoner’s Dilemma with pure actions mum and fink (uppercase for player 1
and lowercase for player 2) and with the payoff matrix
Now imagine that after this game is over the same two players play the following
Revenge Game in a second period t = 2. Each player can choose whether to join
his local neighborhood gang or remain a “loner.” If the players remain loners then
they go their separate ways, regardless of the outcome of the first game, and obtain
second-period payoffs of 0 each. If both join gangs, then because of the nature of
neighborhood gangs they will fight each other and suffer a loss, causing each to receive
a payoff of −3. Finally if player i joins a gang while player j remains a loner then
the loner suffers dearly since he has no gang to defend him, thus receiving a payoff
of −4, while the new gang member suffers much less and receives a payoff of −1.
The Revenge Game is described by the following matrix
From our previous analysis of static games of complete information, we know
how to analyze these two games independently. In the first-stage game, the Prisoner’s
Dilemma, there is a unique Nash equilibrium, (F, f ). In the second-stage game, the
Revenge Game, there are two pure-strategy Nash equilibria, (L, l) and (G, g), and a
mixed-strategy equilibrium in which each player plays loner with probability 1
2 (verify
this!). In what follows, we will see that once these two games are played in sequence,
equilibrium behavior can be more interesting.
9.2 Payoffs
How should we evaluate the total payoffs from a sequence of outcomes in each of the
sequentially played stage-games?We adopt the well-defined notion of present value,
which we used to analyze the single-person decision problems of consumption over
time described in Sections 2.5.2 and 8.3.4.
In particular at any period t we will collapse any sequence of payments from the
games played starting at period t onward into a single value that can be evaluated at
period t . To do this we take the payoffs derived from each of the individual games and
add them up with the added assumption (as described and motivated in Section 2.5.2)
that future payoffs are discounted at a rate 0 ≤ δ ≤ 1. A higher discount factor δ means
that the players are more patient and will care more about future payoffs. An extreme
case of impatience occurs when δ = 0, which means that only the payoffs of the
current stage-game matter.
Consider a multistage game in which there are T stage-games played in each of
the periods 1, 2, . . . , T . Let vt
i be player i’s payoff from the anticipated outcome in
the stage-game played in period t . We denote by vi the total payoff of player i from
playing the sequence of games in the multistage game and define it as sum equation
which is the discounted sum of payoffs that the player expects to get in the sequence of
games. The payoffs one period away are discounted once, those two periods away are
discounted twice (hence the δ2), and so on. For example, if in the Prisoner-Revenge
Game we have the sequence of play be (F, m) followed by (L, g) then, given a
discount factor δ for each player, the payoffs for players 1 and 2 are v1= 5+ δ(−4)
and v2 =−1+ δ(−1).
Using the concept of discounted payoffs we can write down the extensive form of
the multistage Prisoner-Revenge Game to demonstrate all the possible outcomes, and
the total sum of discounted payoffs, as shown in Figure 9.1. Notice that this way of
representing the game does not seem to capture the fact that payoffs are accumulated
as the stages of the game proceed. That said, this is not important from a strategic
point of view because we have incorporated the payoffs into the extensive form of
the multistage game by accurately taking into account the sum of discounted payoffs
that are a consequence of any combination of play in the two stage-games.
9.3 Strategies and Conditional Play
Imagine that the players are playing a multistage game that consists of T stagegames.
If we naively treat each game independently, so that players do not make
any attempt to link their behavior in a given stage-game to the outcomes of previous
stage-games, then strategieswould be extremely simple: each player treats each stagegame
independently and commits to some action for each of the stage-games that is
independent of how the games unfold over time.
More sophisticated players may benefit from using strategies that create a strategic
link between the stage-games in ways that are no different than what children do in
preschool: “If you play with the dolls this way now, I will play hide-and-seek with
you later, and if you don’t then I won’t play with you at all.” More generally players
can use strategies of the form “If such-and-such happens in games 1, 2, . . . , t − 1
then I will choose action ai in game t .” In the Prisoner-Revenge Game, for example,
player 1 can use the following strategy: “I will play F in the first game and I will play
L in the second game only if player 2 played m in the first game. If player 2 played
f in the first game then I will play G in the second.”
A convenient way to introduce such strategies is by writing down the extensive
form of the complete multistage game as we did in Figure 9.1 and focus on what the
players can do at every stage of the game. It is important to determine correctly the
information sets of each player. Because the outcomes of every stage are revealed
before the next stage is played, the number of information sets at any stage t must be
equal to the number of possible outcomes from the previously played stage-games,
1, 2, . . . , t − 1.
For example, in the multistage Prisoner-Revenge Game, the first-stage game has
four outcomes. This implies that in the second-stage game each player should have
four information sets, each associated with one of the four outcomes of the first-stage
game. Note, however, that in each stage-game the players do not know what their
opponent is choosing at that stage, which means that we must have some information
sets with more than one node in them, as demonstrated in Figure 9.1. In the first stage
each player chooses between mum and fink without knowing what his opponent is
choosing at that stage. In the second stage he chooses between loner and gang without
knowing what his opponent is choosing at that stage, but each player knows exactly
how the first-stage game concluded
More generally, in a multistage game that consists of T stage-games, a pure
strategy of player i will be a list of conditional pure strategies of the following form:
 where ht−1 is a particular outcome
that occurred up to period t , not including period t , and strategy (ht−1) is an action for
player i from the tth stage-game. We can conveniently think of ht−1 as a particular
history of events that occurred up to period t , from the set of all possible histories
(or outcomes) Ht−1. The different histories that occurred before stage-game t are
associated with different information sets in the stage-game of period t , each history
being associated with a distinct sequence of outcomes from the previous stage-games
in periods 1, 2, . . . , t − 1 combined.
In summary the definition of strategies for multistage games uses the extensive
form to represent the game’s structure. Furthermore, as in extensive-form games,
strategies are defined as a complete list of (mixed or pure) actions for each player at each of his information sets. In multistage games, each player’s information set is
associated with the history of play from the previous stage-games.
9.4 Subgame-Perfect Equilibria
Because multistage games are dynamic in nature, and because past play is revealed
over time, we turn to the notion of a subgame-perfect equilibrium as a solution
concept. In particular, rational players should play sequentially rational strategies,
which justifies the use of the subgame-perfect equilibrium concept. The question then
is, how do we find a subgame-perfect equilibrium for such a game? The following
result will offer a natural first step:
Proposition 9.1 Consider a multistage game with T stages, and let σt∗ be a Nash
equilibrium strategy profile for the tth stage-game. There exists a subgame-perfect
equilibrium in the multistage game in which the equilibrium path coincides with the
path generated by sigma 1 to sigma n

That is, each player plays fink in the first-stage game and then plays loner in the
second-stage game regardless of what was played in the first stage. Now observe
that with this pair of strategies the players are clearly playing a Nash equilibrium in
each of the four subgames in the second stage. This follows because the first-period
payoffs are set in stone and do not depend on the second-period choices, and in each of
these second-period subgames both players playing loner is a Nash equilibrium. Then,
moving back to the subgame that begins with the first stage-game (which is the whole
subgame), because the payoffs in the second period are a constant equal to 0 for each
player, regardless of the play in the first stage-game, it follows that each player has a dominant strategy, which is to play fink.3 Thus both playing fink in the first-stage
game is part of the Nash equilibrium in the whole game. It is easy to see that this
argument immediately implies that there is another (pure-strategy) subgame-perfect
equilibrium of the multistage game The proposition should therefore be quite consistent with what we expect. It basically says that if we look at a sequence of plays that is a Nash equilibrium in
each game independently then players should be content playing these stage-game
strategies in each stage-game. By considering such unconditional strategies, we are
removing any strategic link between stages, and thus each stage can be treated as if
it were an independent game.
The interesting question that remains to be answered is whether we are ignoring
behavior that can be supported as a subgame-perfect equilibrium if we actually use a
strategic linkage between the games. In other words, if players were to condition their
future play on past play in a sequentially rational manner, would we be able to support
behavior in early stages that is not a Nash equilibrium in the early stage-games?
The answer is yes, and this approach yields some interesting insights, which we
discuss subsequently. The trick for strategically linking the stage-games will be to
condition the behavior in later stage-games on the actions taken in earlier stagegames.
As we will see, this will be possible only when some of the stage-games in
later periods have multiple Nash equilibria. As a first step toward understanding this
requirement it is easy to establish the following proposition:
Proposition 9.2 If σ
∗ is a Nash equilibrium of the multistage game consisting of
stage-games G1, G2, . . . , GT then the restriction of σ
∗ to the stage-game in period
T must be a Nash equilibrium of that stage-game.
The proposition states that for any finite stage-game of length T, in the last stagegameGT
the players must play a Nash equilibrium of that stage-game. The reasoning
is simple: because there is no future that can depend on the actions taken at stage T,
the only thing that determines what is best for any player at that stage is to play a best
response to what he believes the other players are doing at that stage. This insight
implies another important fact:
Proposition 9.3 If a finite multistage game consists of stage-games that each have
a unique Nash equilibrium then the multistage game has a unique subgame-perfect
equilibrium.
Thus if the discount factor is not too small then we can support the behavior of
(M, m) in the first-stage game even though it is not a Nash equilibrium in the standalone
Prisoner’s Dilemma. The fact that the Revenge Game has two different Nash
equilibria, one of which is significantly better than the other, allows the players to offer
a self-enforcing incentive scheme that supports cooperative behavior in the first-stage
game.
This example is illuminating because it sheds light on two requirements that are
crucial to supporting behavior in the first stage (or early periods in general) that is not
a Nash equilibrium at this first stage. These requirements are as follows:
1. There must be at least two distinct equilibria in the second stage: a “stick” and
a “carrot.”
2. The discount factor has to be large enough for the difference in payoffs
between the “stick” and the “carrot” to have enough impact in the first stage
of the game.
The first requirement is necessary to offer the players the possibility of “reward
and punishment” strategies, which help support first-period play that is not a Nash
equilibrium in the stand-alone first-stage game. To see this, consider the following: If
we are trying to support behavior in the first period that is not a Nash equilibrium in that
period, it must be the case that at least one playerwould like to deviate in the first stage
(because at least one is not playing a best response). The player would indeed deviate
from the proposed play in the first period if there were no future consequences to such
a deviation. In other words, in the first stage some players would benefit in the short
run from deviating from the proposed path of play, and to keep them on the path of
play we must guarantee that such deviations will be met with credible “punishments.”
For these punishments to be credible, they must take the form of moving from an
equilibrium in which the player gets a high payoff to an equilibrium in which he
gets a low payoff. That is, the players use long-term losses to deter themselves from
pursuing short-term gains.
This is where the second requirement has bite: it must be that these long-term
losses are enough to deter the players from deviating toward their short-term gains.
For this to be the case it must be that the players value the payoffs that they will receive
in the second period, which can only happen if they do not discount the future too
much. Thus the effective punishment from deviating will depend first on the difference
in payoffs of the two equilibria, the “reward” and the “punishment,” and second on the
discount factor. A more figurative way to think about this is that the friendly and gang
continuation equilibria are the “carrot” and the “stick,” respectively. Then a smaller
discount factor makes the carrot smaller and the stick less painful, which in turn makes
deterrence harder.
The foregoing analysis of the multistage Prisoner-Revenge Game captures most
of what is interesting about multistage games. For this reason it is not too useful to
consider more involved games. The next section offers a result that is quite technical
in nature but that is also very useful in proving which strategies are or are not part of
a subgame-perfect equilibrium. To summarize: in order to check if a strategy profile
i) is a subgame-perfect equilibrium in a multistage (or any finite extensiveform)
game, all we need to show is that for every player i, given σi , player i does
not have a single information set from which he would want to deviate. In the next
chapter this result will prove to be even more useful.
The One-Stage Deviation Principle
As an illustration of multistage games, the Prisoner-Revenge Game was not hard to
analyze. In particular, to check that a certain proposed strategy profilewas a subgameperfect
equilibrium, we needed only to check that no player wanted to deviate in the
first stage, because we constructed our candidates for subgame-perfect equilibria to
consist of Nash equilibria in each of the second-stage subgames. It might seem that if
we had, say, a five-stage game it would be more complicated to check that a profile of
strategies constitutes a subgame-perfect equilibrium, and that in particular we would
have to check that the players do not want to deviate at any single stage-game. That is,
maybe they could gain by combining several deviations in separate stages of the game.
This is where the one-stage deviation principle comes in to simplify what seems
like a rather daunting task. Interestingly the principle is based on an idea that was
originally formulated by David Blackwell (1965) in the context of dynamic programming
for a single-player decision problem. The relation to a single-player decision
problem comes from the fact that when we want to check if a player i is playing a
best response to σ−i in every subgame, then all we are really doing is checking to
see whether he is playing an optimal action in each of his information sets, taking
the actions of all other players σ−i as given. Thus once the strategies σ−i of other
players in the extensive form are considered to be fixed, player i solves a standard
dynamic programming problem like the ones discussed in Section 2.4. This in turn implies that we can formulate the one-stage deviation principle within the context of
a single-person decision tree.

Clearly an optimal strategy is one-stage unimprovable. It is the converse that is
stated in the following theorem:
Theorem 9.1 A one-stage unimprovable strategy must be optimal.
Though the formal proof is a bit tedious, the intuition is not. If the strategy σi were
not a best response to σ−i (not “optimal” in the proof just given), then given that the
game is finite, there must be a finite number of deviations to improve upon σi. We can
look for the “last” such deviation in the sequence, and if we restrict attention to the
subgame that starts at that information set (or at the closest possible root that precedes
it, if it is not a singleton) then in that subgame player i has a one-stage deviation that
makes him better off.

9.6 Summary
. Multistage games are defined by a series of normal-form stage-games that are
played in sequence, in which players obtain payoffs after every stage-game
and future payoffs are discounted.
. Any sequence of stage-game Nash equilibrium play can be supported as a
subgame-perfect equilibrium in the multistage game, regardless of the discount
factor.
. Players can use credible threats and promises in later stages to provide longterm
incentives for short-term actions that may not be self-enforcing in the
earlier-period stage-games.
. Because future payoffs are discounted, the effectiveness of long-term incentives
will depend on how patient the players are.
. The set of outcomes that can be supported by a subgame-perfect equilibrium
will often depend on the discount factor.




Comments

Popular posts from this blog

ft

karpatkey