Equilibrium Multiplicity in Dynamic Games: Testing and Estimation

This paper surveys the recent literature on dynamic games estimation when there is a concern of equilibrium multiplicity. We focus on the questions of testing for equilibrium multiplicity and estimation in the presence of multiplicity.


Introduction
During the last twenty years a substantial body of empirical works on dynamic games has emerged.The theoretical underpinnings are given by the pioneering works by Abreu et al. (1990) and Maskin and Tirole (1988), which emphasize that equilibrium outcomes are typically not a singleton.Abreu et al. (1990) propose a method to calculate the set of subgame perfect equilibria, while Maskin and Tirole (1988) illustrate equilibrium multiplicity in the context of a Markov perfect dynamic pricing game.Some empirical works have departed from the classical theory by imposition of additional ad hoc simplifying assumptions requiring that a unique and identical equilibrium is played in the cross section of markets.This paper focuses on the literature that remains with the classical theory permitting equilibrium multiplicity.
The pioneering paper by Jovanovic (1989) describes the problem of equilibrium multiplicity for statistical inference.Multiplicity arises when a complete specification of the environment does not give rise to a unique determination of the endogenous variables.Jovanovic shows that equilibrium selection will place restrictions on observables that complete the model.If an equilibrium selection mechanism is included into the model, then this requires an extra assumption, which may give rise to the possibility of misspecification, see de Paula (2013).Tamer (2003) treats equilibrium multiplicity as an incomplete model.Incompleteness refers to the property that the model may have multiple outcomes for some (or all) parameter values.
Leaving the equilibrium selection mechanism unspecified, and assuming random sampling, the incomplete model gives rise to a set of inequalities, or bounds, that need to be satisfied by a set of parameter values.The influential works by Tamer (2003) and Ciliberto and Tamer (2009) show that these bounds contain useful information which in general leads to set identification of parameters.There is an emerging body of works that applies these ideas to inference in one-shot static games with equilibrium multiplicity, see de Paula (2013) and Aradillas-López (2020) for excellent surveys.
This paper reviews recent contributions on econometric methods under equilibrium multiplicity in the context of dynamic Markov games.Data generated by dynamic Markov games have the feature that multiple observations over time from play of one equilibrium are observable.This distinguishing feature gives rise to an advantage over one-shot static games which can facilitate the inference problem posed by multiplicity concerns.In particular, if the time series dimension is large, then estimation can proceed using the time series for one Markov equilibrium, see Jofre-Bonet and Pesendorfer (2003).Our aim in this paper is to consider settings in which the time series dimension is short, and the researcher wishes to pool data across markets.Our focus will be on describing recent approaches that develop statistical testing procedures for poolability and parameter inference under the presence of equilibrium multiplicity using a cross section of markets.For a general discussion on backgrounds and applications of dynamic games estimation, we refer a detailed survey by Aguirregabiria et al. (2021).
The setup for dynamic Markov games is described in Section 2.1.Well-known properties are highlighted in Section 2.2.Traditional estimation approaches, such as the classic minimum distance approach based on the set of Markov perfect equilibrium conditions are described in Section 2.3.We also review the moment inequality approach by Bajari et al. (2007).Then we discuss two important issues for these traditional approaches: effects of iterations on the asymptotic properties of different estimation methods, and accommodation of unobserved market heterogeneity.These traditional estimation methods typically focus on the case where the data used for estimation are generated from a single equilibrium.
In the following sections we discuss two strands of recent advances on econometrics for dynamic games: First, Section 3 discusses hypothesis testing for homogeneity of the data generating process or equilibrium multiplicity.We describe the asymptotic tests by Otsu et al. (2016) that examine homogeneity of the distribution of state variables, where asymptotics may refer to the number of time periods or the number of markets getting large.We also discuss recent advances on homogeneity or multiplicity testing, such as an approximately exact test by Bugni et al. (2021) and rank test based on serial correlations due to multiplicity by de Paula and Tang (2021).Second, Section 4 discusses robust inference methods under equilibrium multiplicity.We review conventional approaches for multiplicity robust inference methods and discuss two recent advances: set inference for partially identified parameters by Otsu et al. (2021), which is based on the duality approach in Schennach (2014), and nonparametric point identification analysis by Luo et al. (2021).Finally, Section 5 discusses some directions for future research.

Setup and traditional methods
This section describes our setup and well known implications, and reviews traditional estimation methods.

Setup
Consider the dynamic Markov game setup in discrete time t = 1, 2, . ... We describe the setup for one market.The same setup applies for all markets j = 1, . . ., M .

Players.
A typical player is denoted by i = 1, . . ., N .The number of players is fixed and does not change over time.Every period the econometrician observes a profile of states and actions described as follows.
States.Each player is endowed with two state variables (s t i , ε t i ) with s t i ∈ S i in finite support and ε t i ∈ R K at each period t.The state variable s t i is publicly observed by all players.We maintain the assumption that the econometrician observes s t i but does not observe ε t i .We assume that ε t i is drawn from a continuous cumulative distribution function F with support R K and observed privately by player i at the beginning of period t.The vector of all players' public state variables is denoted by s t = (s t 1 , . . ., s t N ) ∈ S = × i S i whose cardinality is |S|.Actions.Each player chooses an action a t i ∈ A i = {0, 1, . . ., K} in finite support at each period t.The decisions are made after the state (s t i , ε t i ) is observed.The decisions can be made simultaneously or sequentially.The decision may also be taken after an idiosyncratic random utility (or a random profit shock) is observed.We leave the details of the decision process unspecified.Our specification encompasses the random-utility modeling assumptions, and allows for within-period correlation in the random utility component across actions and across players.The vector of joint actions in period t is denoted by a t = (a t 1 , . . ., a t N ) ∈ A = × i A i whose cardinality is |A|.We assume actions are publicly observed by all players and the econometrician.We also use the notation a = (a i , a −i ) where a −i denotes the actions of all players other than player i.While most of our exposition considers the finite action space A i , we shall sometimes remark on the continuous action settings in which A i = R K .
Choice probability matrix.Let σ(a|s) = Pr{a t = a|s t = s} denote the conditional probability that an action profile a will be chosen conditionally on a state s.Throughout the paper, we assume that σ is time invariant and is conditionally independent from other past actions and states.The matrix of conditional choice probabilities is denoted by σ, which has dimension |S| × (|A| • |S|).It consists of conditional probabilities σ(a|s) in row s, column (a, s), and zeros in row s, column (a, s ) with s = s.We denote the marginal probability of State-action transition matrix.Let g(s |a, s) = Pr{s t+1 = s |a t = a, s t = s} denote the state-action transition probability that a state s is reached when the current action profile and state are given by (a, s).We also assume that g is time invariant and is conditionally independent from other past actions and states.We use the symbol G to denote the (|A| • |S|) × |S| dimensional state-action transition matrix in which column s ∈ S consists of the vector of probabilities [g(s |a, s)] a∈A,s∈S .
State transition matrix.Under the above assumptions on σ and G, the state variables s t obey a (first-order) Markov chain with the (stationary) state transition matrix P = σG whose dimension is |S| × |S|.A typical element p(s |s) = a∈A σ(a|s)g(s |a, s) of the |S| × |S| matrix P equals the probability that state s is reached when the current state is given by s.
Hereafter we focus on the first-order Markov chain.
Period payoff.After actions are observed, players collect their period payoffs π i (a t , s t , ε t i ) ∈ R, and π = (π 1 , . . ., π N ).We assume that the payoff is additively separable for the a t i -th component of ε t i , i.e., π i (a t , s t , ε t i ) = πi (a t , s t ) + ε t i (a t i ).That is, when player i choses action a t i = k, then only the k-th element of the vector ε t i enters the period payoff.We denote with Πi the |A| • |S| × 1 dimensional payoff vector [π i (a, s)] a∈A,s∈S and with Π i the |A| • |S| × 1 dimensional vector of expected payoffs [π i (a, s) + Eε i (a i )] a∈A,s∈S .
Game payoff.Players discount future payoffs with discount factor β ∈ (0, 1).The expected . We sometimes denote game payoffs using the |S| × 1 dimensional ex ante value function Markovian strategies.Player i's choice of an action is a function a i (s t , ε t i ) ∈ {0, 1, . . ., K} that depends on the publicly observed state vector s t and the player-specific private information ε t i .Markovian perfect equilibrium (MPE).Player i's Markovian strategy must be optimally taken given the strategy profile of rivals, and beliefs must be consistent with the strategies.

Well-known properties
We next describe some important well-known properties for our setup.They concern the equilibrium fixed point mapping, the non-uniqueness of equilibria property, and the existence of a steady-state distribution.
Equilibrium fixed point mapping.Let denote the continuation value of action a i , where σ −i (a The representation lemma in Aguirregabiria and Mira (2007) together with Proposition 1 and Theorem 1 in Pesendorfer and Schmidt-Dengler (2008) establish the following characterisation and existence results.
Lemma 1. Equation ( 2) is a necessary and sufficient condition for choice probabilities to be consistent with a MPE.A MPE exists.
In the continuous action setting a set of necessary equilibrium conditions are given by the first order condition ∂ ∂a i u i (a i , s, p, ) and σ −i denotes the distribution function of rival players' actions.The first order condition can be written in form of equation ( 2) with elements Limiting steady-state distribution.When the limit exists, let Q(s , s) = lim T →∞ T −1 T t=1 1{s t = s , s 0 = s} denote the long run proportion of time that the Markov chain P spends in state s when starting at the initial state s 0 = s.Suppose the unconditional long run proportion of time ) for all initial states s.Then the |S| dimensional row vector of probabilities Q = {Q(s)} s∈S is called the steady-state distribution of the Markov chain.Since the state space is finite, Q describes a multinomial distribution.The properties of Markov chains are well known.We next describe some property useful for our purpose.To do so, we introduce the concept of communicating states.
Communicating states.We say that a state s is reachable from s if there exists an integer T so that the chain P will be at state s after T periods with positive probability.If s is reachable from s, and s is reachable from s , then the states s and s are said to communicate.The random component ε t i having full support in the real numbers implies that all actions arise with strictly positive probability for any state s ∈ S. Thus, states will communicate if the state-action transition matrix allows that state s (or s) can in principle be reached when starting from state s (or s ) for any pair of states s, s ∈ S.
Lemma 2. Suppose all states of the Markov chain P communicate (or sometimes called ergodic or irreducible).Then the steady-state distribution Q exists and is unique.It satisfies This lemma guarantees existence and uniqueness of the steady-state distribution and states that the long run proportion of time that the Markov chain P spends in state s is strictly positive for any state s ∈ S and the equation Q = QP must hold.A proof of the above properties is given in Proposition 1.14 and Corollary 1.17 in Levin et al. (2009).

Traditional estimation methods
This section reviews the traditional inference approach.
While early work, e.g.Jofre-Bonet and Pesendorfer (2003), envisioned estimation for a single industry with time series data, subsequent methodological contributions extended the ideas to a cross-section of markets with (possibly) short time horizon.The facilitating assumption invoked on the data generating process (DGP) is that the data consist of an identical and independently distributed (iid) sample generated from a single MPE.
For each market j, a sequence of action-state profiles d j = (a t j , s t j ) t=1,...,T is observed, where T is the length of time periods in the data set.Let n = M • N • T denote the total number of observations.Assumption (DGP): The observed data (d j ) j=1,...,M consist of an iid sample drawn from a single MPE.
Identification.The model is identified if there exists a unique set of model primitives (π, F, β, g) that can be inferred from a sufficiently rich data set describing choices and state transitions from a single equilibrium.Rust (1994), and Magnac and Thesmar (2002) show that parametric assumptions on model primitives are required, and Pesendorfer and Schmidt-Dengler ( 2008) characterize necessary and sufficient conditions for point identification of parameters.Bajari et al. (2007) consider moment inequality models which do not require point identification.
Estimation.Let θ = (θ π , θ F , β, θ g ) ∈ Θ ⊂ R q denote a vector of parameters to parametrize the model primitives (π, F, β, g).The estimation methods follow the two-step approach originally developed in Hotz and Miller (1993) for single agent settings.First, consistent estimators of choice probabilities p0 are obtained from the data on actions and states.Second, the equilibrium conditions in (2) are invoked to estimate the parameters θ.
The first stage choice probability estimates ideally would involve a non-parametric estimation.Frequently parametric assumptions are invoked to facilitate estimation in practice.In the second stage, a commonly used approach for discrete actions is to estimate θ based on the minimum distance (MD) problem: min θ ρ n (p 0 , θ), where (3) for a suitable weight matrix W .
The one-step pseudo maximum likelihood (PML) estimator, proposed in Aguirregabiria and Mira (2007), is asymptotically equivalent to a MD estimator with the weight matrix W = Σ p0 , where Σ p0 denotes the variance matrix of p0 .Pesendorfer and Schmidt-Dengler (2008) propose a general class of MD estimators and show that the efficient weight matrix is where ∇ p Ψ is the Jacobian matrix of Ψ with respect to p.This efficient estimator is asymptotically equivalent to the maximum likelihood estimator.
The GMM estimator proposed in Pakes et al. (2007) is also asymptotically equivalent to a MD estimator.
For continuous actions, Jofre-Bonet and Pesendorfer (2003) use the first order conditions for optimal actions ∂ ∂a i u i (a i , s, p0 , ε i ; θ) = 0 to conduct inference.The privately known ε i enters the first order condition of a dynamic bidding game in additive separable form which enables nonparametric identification and estimation of the distribution of ε i .Srisuma (2013) extends MD problem (3) to the case of continuous actions.Bajari et al. (2007) relax the requirement of point identification of the parameter vector θ.
Instead they consider set identified models and propose a two-step estimator for the identified set of parameters.In particular, based on the characterisation of an MPE σ, V i (s, σ i , σ −i ; θ) ≥ V i (s, σ i , σ −i ; θ) for any alternative σ i , they consider the population criterion function where H is some distribution over a set for (i, s, σ i ).Bajari et al. (2007) suggest to construct its empirical counterpart ρ V n (θ) by estimating the value function V i (•; θ) for given θ in the first step and evaluating the integral by simulation.Their set estimator is for some suitably chosen positive sequence µ n that decays to zero and may depend on the data (see, e.g., Chernozhukov et al. (2007)).Furthermore, their moment inequality approach allows action spaces to be continuous.properties of the iterated PML and MD estimators.They show that the above described optimally-weighted one-step MD estimator is asymptotically more efficient than iterated MD and iterated PML estimators.Kasahara and Shimotsu (2012) propose modified algorithms with the aim to alleviate the convergence issues of iterated methods.
As indicated in Pesendorfer and Schmidt-Dengler (2008) and reconfirmed in Bugni and Bunting (2020), the asymptotic theory of the MD estimator of θ becomes analogous to that of the standard GMM theory (e.g., Newey and McFadden, 1994) under standard conditions.
GMM theory directly implies that the optimally weighted one-step MD estimator is asymptotically optimal.There is no gain achievable by iteration.Other estimation methods such as the (iterated) PML are typically inefficient for estimating dynamic games.
A key insight for lack of efficiency of the PML estimator is that in contrast to the single agent setup (Aguirregabiria and Mira, 2002), the mapping Ψ for dynamic games typically fails to satisfy the zero Jacobian with respect to p at the fixed point so that additional policy mapping iterations have no first-order effect on the asymptotic distribution.To recover this zero Jacobian property, Dearing and Blevins (2021) develop an efficient version of the iterative PML estimator by considering an alternative mapping (say, Ψ) which approximates the full Newton step of the original constrained maximum likelihood problem.They show that the iterative ML estimator using Ψ is asymptotically efficient and the first-order asymptotic property of the estimator does not vary across iterations due to the zero Jacobian property of their mapping Ψ.
Unobserved heterogeneity.The benchmark setup discussed above has been extended in various directions.In particular, several papers develop estimation methods to accommodate unobserved market heterogeneity.Aguirregabiria and Mira (2007) extend their PML estimator to allow discrete and finitely supported unobserved heterogeneity in which the pseudolikelihood function is specified by the finite mixture form.Arcidiacono and Miller (2011) and Connault (2016) extend the estimation approach to account for unobserved heterogeneity in state variables which refers to an element that is known to players and may evolve over time in a Markovian way, but is not observed by the econometrician.Importantly, the initial unobserved heterogeneity for a specific market is an iid draw from a known probability distribution.

Multiplicity: Testing
A maintained assumption in the conventional estimation approaches is that the data generating process is identical across markets.Researchers assume that action-state profiles are generated from an identical equilibrium in all markets, and policy functions can be estimated by pooling those data across markets.If this assumption does not hold, then policy functions will be inconsistently estimated and the resulting inference for structural parameters will be erroneous.We next review the literature that explores statistical tests of validity of this pooling assumption and also presence of multiple equilibria.
Based on the setup described in the previous section, the data generating process of the profile (a t j , s t j ) t=1,...,T of the j-th market is characterized by the |S| × (|A| • |S|) dimensional conditional choice probability matrix p j , which consists of conditional probabilities Pr{a t j = a|s t j = s} in row s, column (a, s), and zeros in row s, column (a, s ) with s = s, and the (|A| • |S|) × |S| dimensional state-action transition matrix G j for Pr{s t+1 j = s |a t j = a, s t j = s}.Note that the transition matrix of states is written as p j G j .If all states of the Markov chain p j G j communicate, then by Lemma 2, there exists a unique steady-state distribution Q j and the identical distribution hypothesis across markets j = 1, . . ., M may be tested by homogeneity of the steady-state distribution The |S| × 1 dimensional vector of relative frequencies Q j = [T −1 T t=1 1{s t j = s}] s∈S is a nonparametric estimator of the steady-state distribution Q j .Otsu et al. (2016) show that a test statistic for H Q 0 is under H Q 0 , where Q = M −1 M j=1 Q j and V − means a generalized inverse of V, the asymptotic variance estimator of T 1/2 ( Q j −Q j ) by using e.g.Newey and West (1987) estimator.The term V in (5) may be replaced with the identity matrix and a researcher can employ a bootstrap critical value.For example, for every bootstrap iteration a set of states can be randomly chosen from the steady-state distribution Q for every market j. 2 The above test requires that the steady-state distributions exist and that the Markov chains are in the steady-state, see Lemma 2. That is, regardless of H Q 0 , it is assumed that all states in the chain p j G j communicate for each j.Otsu et al. (2016) relax this assumption and propose an alternative test based on the conditional state distribution given the initial state.This situation is relevant in new industries in which the steady-state has not been reached yet, or when there are some absorbing states.Using the state transition matrix pG, the conditional distribution s t |s 1 = s is described by ι s (pG) t , where ι s takes one at the element corresponding to s and zero otherwise.The null hypothesis of identical data generating process can be considered in the form of The left hand side is a vector of model-free conditional probabilities, and the right hand side is the model-based prediction for those probabilities.The hypothesis H s 0 is implied from two assumptions: (i) the state variables (s t j ) t=1,...,T for j = 1, . . ., M are iid over j which allows us to express the hypothesis H s 0 without using a market index j, and (ii) the Markov chain is first-order and time-homogeneous.
be the relative frequency estimator for the vector of conditional probabilities [Pr{s t = s |s 1 = s}] s ∈S for t = 2, . . ., T for a given initial state s.
If the model parametrized by pG is correct, then a test statistic can be defined based on the as M → ∞ with fixed T under H s 0 , where V − s is a generalized inverse of an estimator of the asymptotic variance of √ M C s under H s 0 .The test based on T s requires T ≥ 3.As in (5), the term V s in (7) may be replaced with the identity matrix and a bootstrap critical value can be employed.
In contrast to the asymptotic tests by Otsu et al. (2016), Bugni et al. (2021) propose an approximation to the exact randomization test (see, Ch. 15.2 of Lehmann and Romano, 2005) of the homogeneity hypothesis across markets and time periods , the sufficient statistic of the data d = (a t j , s t j ) j=1,...,M,t=1,...,T following a multinomial distribution is given by and thus in principle we can consider the exact randomization test.
Let D be support of d and Γ be the set of mappings from D to D such that for all γ ∈ Γ and all d ∈ D. Then the randomization test with significance level α is given by where |Γ| is the number of elements of Γ and τ (•) is any scalar-valued function to define the test statistic.This test is exactly valid, i.e., under H hom 0 , the rejection probability satisfies E[φ rd (d)] ≤ α for any finite N , T , and M (Lehmann and Romano, 2005).However, since Γ is usually difficult to enumerate, this test is practically infeasible.To address this practical issue, Bugni et al. (2021) propose an MCMC-based approximation for φ rd (d).In particular, they construct an iterative algorithm which randomly picks an element γ (k) (d) from Γ, develop the approximate test and show its validity as the number of iterations increases, i.e., lim sup ≤ α for any finite N , T , and M .The main idea of their iterative algorithm is to draw the action and state profiles (a (k) , s (k) ) = γ (k) (d) separately from configurations that yield the same value of the sufficient statistic U (d). de Paula and Tang (2021) study testable implications of multiple equilibria in static and dynamic discrete games, where players' private signals are correlated but the econometrician can split the sample into clusters within which equilibrium selections under multiplicity are correlated.In the context of dynamic games, correlation among players' private signals naturally arises through time-varying serially correlated unobserved heterogeneity, say η t , and the clusters can be constructed by matching players across different periods in the same dynamic game.
To be specific, consider the setup in Section 2.1, where the publicly observable state variables are augmented as (s t , η t ).Although s t is observable to the econometrician, η t is not.
We partition disjointly the set of players, I A and I B , and then define a t A = {a t i : i ∈ I A }, a t B = {a t i : i ∈ I B }, and matrices P AB and PAB for joint probability masses of (a t A , a t B ) and (a t−1 A , a t+1 B ), respectively.Under the assumption that a single equilibrium is played in each market and some rank conditions on the matrices defined by conditional probability masses of P ω {a t A , a t B |η t = η} for each MPE ω = 1, . . ., d ω conditional on η, de Paula and Tang (2021) explore mixture representations of P AB and PAB by unobserved heterogeneity and equilibrium selection, and show that the number of MPE is identified as d ω = {rank(P AB )} 2 /rank( PAB ).
Therefore, inference on the number of MPE including uniqueness can be conducted by applying the rank test in Kleibergen and Paap (2006).

Multiplicity: Inference
What can we do if the tests described in the last section reject the null of homogeneous data generating process across markets or unique equilibrium?This section reviews some existing approaches and recent advances for inference on structural parameters under equilibrium multiplicity.
First, a researcher may search for a subsample where the null of homogeneity is accepted.
Second, unobserved-heterogeneity robust inference methods, such as Aguirregabiria and Mira (2007), Arcidiacono andMiller (2011), andConnault (2016), can be adopted under some further assumptions to estimate structural parameters under the presence of equilibrium multiplicity.However the following caveats arise: unobserved heterogeneity typically affects choice probabilities and payoffs jointly, while equilibrium multiplicity affects choice probabilities only.
Hence, the channel in which unobserved heterogeneity enters has to exclude payoff effects and focus on choice probability effects.Second, methods accounting for unobserved-heterogeneity postulate a unique outcome arises with a given probability.As the pioneering work by Tamer (2003) shows, postulating a particular probability distribution over equilibrium outcomes is an 'ad hoc' assumption which changes the model and can lead to inconsistent estimate s if specified incorrectly.It requires knowledge of the 'equilibrium selection rule'.Third, parametric specifications employed for the distribution of unobserved heterogeneity are typically restrictive and may cause inconsistency.For instance, a recent paper that accounts for unobserved heterogeneity in dynamic games is Arcidiacono et al. (2016).It postulates a parametric functional form in which unobserved heterogeneity affects choice probabilities.Yet, if this approach were adopted to account for multiplicity, then this would translate into a particular functional form how multiplicity may affect choice probabilities.If this functional form is misspecified, then this may lead to inconsistent parameter estimates.
For large T , Bajari et al. (2007) propose to estimate choice probabilities for each market separately.To conduct inference that accounts for equilibrium multiplicity, the second step estimator can be based on pooling all equilibrium conditions from all markets.This approach permits consistent inference that accommodates different equilibria in different markets.Yet, when T is small the market-specific choice probabilities can no longer be consistently estimated.

Set inference
An inference method for the identified set of parameters which does not require specification of the equilibrium selection rule is proposed in Otsu et al. (2021).They consider the asymptotic setting where T fixed and the number of markets M diverges.In this subsection we describe this approach.
The econometrician observes action-state profiles d j = (a t j , s t j ) t=1,...,T generated by the equilibrium play of some underlying economic model in market j = 1, . . ., M .The market specific choice probabilities p j are usually not observed in the data, and treated as random variables.It is assumed that (d j , p j ) j=1,...,M is an iid sequence.Then Otsu et al. (2021) where the upper script '(a i , s, i)' means the element corresponding to player i, action a i , and state s.The upper block in (8) consists of the equilibrium restrictions (2) and the lower block contains moment restrictions placed on the choice probabilities by the data.Here s} is the frequency of action-state profile (a i , s), and f (s) = T t=1 1{s t = s} is the frequency of state s.Although E[g(d, p; θ)] = 0 holds, GMM cannot be applied to estimate θ due to the latent variables p.
In this setup, following Schennach (2014), the identified region for the structural parameters θ can be defined as where µ is the probability measure of the observables d, λ is the conditional probability measure of p given d, and Λ is the set of all conditional probability measures supported on the set of choice probabilities.In other words, the identified region Θ 0 is the set of parameters in which some conditional measure λ can rationalize the moment conditions.The expectation  Schennach (2014) shows this selected exponential distribution has the property of being the 'least favorable' distribution of the unobservables.
Note that, while the intermediate problem ( 9) requires optimization over the infinitedimensional set Λ, the equivalent problem (10) entails optimization over only a finite-dimensional Euclidean space for γ.This property allows for a practically feasible characterization of the identified region Θ 0 for the structural parameters.Based on the data (d j ) j=1,...,M , we can replace the population moment in (10) with the sample analogue ḡ(θ, γ) = M −1 M j=1 g(d j , θ, γ) and define a GMM-type criterion as where V (θ, γ) is some estimator of V ar(g(d, θ, γ)).A simple but conservative confidence set suggested by Schennach (2014) is where χ 2 d,α is the (1−α)-th quantile of the χ 2 distribution with degree of freedom d.Schennach (2014) shows its asymptotic validity in the sense that lim n→∞ Pr{θ / ∈ Ĉ} ≤ α for all θ ∈ Θ 0 .
A drawback of the confidence set Ĉ is: when the dimension d of the moment function g (or γ) is high, the critical value tends to be large.Recall that d = m s • N • K + 1.Thus, if the number of states, players, or actions is large, the confidence set may be too large to obtain a meaningful conclusion.Otsu et al. (2021) propose an adapted version of Kleibergen (2005) statistic to the moment function g defined by the entropic latent variable integration.For each θ, let γ(θ) be an estimator of the solution in (11) satisfying certain regularity conditions.Then the test statistic is written as where D(θ) is a d × q matrix with the l-th column ∂ḡ(θ, γ(θ)) and Ĝl (θ, γ) = M −1 M j=1 g(d j , θ, γ)∂g(d j , θ, γ)/∂θ l .In the limit, the distribution of this statistic satisfies K(θ 0 ) d → χ 2 q for each θ 0 ∈ Θ 0 .The confidence set based on this statistic is obtained as where χ 2 q,α is the (1 − α)-th quantile of the χ 2 distribution with degree of freedom q.Note that the critical value χ 2 q,α depends only on the dimension of structural parameters θ and is robust to the dimension of the moment function g.
In practice, the dimensionality of the function g can be high which may pose computational difficulties, or may not be practical for the data at hand.We briefly outline three ways in which the dimensionality is typically reduced in applied work: First, if the data are not sufficiently rich to consider a frequency distribution of the vector p for every realisation of the market-state-action-player set, then the second element in (8) can be replaced with moment conditions of action variables possibly aggregated across states.Second, to facilitate computation of the d-dimensional strategies p, the strategy function may be parametrized with a low dimensional parameter vector τ ∈ Q ⊆ R l, by assuming p (a i ,s,i) = p(a i , s; τ ).This approach is commonly used for empirical dynamic models and referred to as a policy function approximation.Equation ( 2) is then replaced with the corresponding fixed point equation for τ defined on Q.Finally, for computational reasons it can be useful to aggregate the first element in equation ( 8) across action-state-players into a single equation.The equilibrium condition can be equivalently formulated as a problem of finding zeros of the following one-dimensional equation: i∈N s∈S a i ∈A i {p (a i ,s,i) − Ψ (a i ,s,i) (p; θ)} 2 = 0.
Observe that this estimation approach readily accommodates unobserved payoff elements as nuisance parameters as well.Suppose the period payoff has additionally an additive payoff shock v ij , which is time-invariant and market-specific.The equilibrium equation system (2) becomes p = Ψ(p, v; θ) and the function g is redefined accordingly.With these modifications in place, the structural parameters can be partially identified in the presence of unobserved heterogeneity as before.This is a notable difference to the approach of Arcidiacono and Miller (2011) which requires the researcher to use a multinomial distribution for unobserved heterogeneity.Luo et al. (2021) investigate nonparametric identification of the model in Section 2.1 but allowing both multiple equilibria and unobservable market heterogeneity which may vary with time.Let η t be a state variable with finite support which is publicly observable to players but is unobservable to econometricians, and e * be the index for equilibria conditional choice probabilities with finite support (i.e., {p : p = Ψ(p|π, F, β, G)} = {p e * : e * = 1, . . ., |e * |}).

Point identification
Then define the unobserved 'type' variable τ t = τ (η t , e * ) with finite support {τ 1 , . . ., τ |τ | } for unknown |τ |.In the first stage, Luo et al. (2021) identify the conditional choice probabilities of a t |s t , τ t and transition probabilities s t+1 |a t , s t , τ t by applying the eigenvalue-eigenvector decomposition technique to identify mixture models (e.g., Hu and Shum, 2012) based on four periods of data.In the second stage, they employ the conventional identification argument as in Pesendorfer and Schmidt-Dengler (2008) to identify payoff primitives based on the identified conditional choice and transition probabilities.Furthermore, by comparing the identified payoff primitives at different values of τ t , one can also distinguish the unobservable heterogeneity and multiple equilibria.If the payoff functions are same (or different) at different values of τ t , they represent multiple equilibria (or unobserved heterogeneity).
Let d t = (a t , s t ) for t = 1, 2, 3, 4 be observables for four periods.The key step in the identification analysis by Luo et al. (2021) is the first stage, where the conditional probabilities for (d 3 , τ 3 )|(d 2 , τ 2 ) and d 4 |(d 3 , τ 3 ) are identified so that the conditional choice probabilities for a 3 |(s 3 , τ 3 ) and transition probabilities for s 4 |a 3 , s 3 , τ 3 are implied.First of all, since (d t , τ t ) follows a first-order Markov process, the joint distribution of three periods of observables can be written in a multiplicatively separable way (given (d 2 , τ 2 )) Pr{d 3 |d 2 , τ 2 } Pr{d 2 , τ 2 , d 1 }, i.e., correlation between d 3 and d 1 given d 2 is associated with unobserved heterogeneity and/or multiple equilibria.Indeed Luo et al. (2021) show that under certain rank conditions, the number of types |τ | is identified as the maximum rank of the probability matrix Pr{d 3 = •, d2 , d 1 = •} over d2 .Second, by assuming 'limited feedback' from the current unobserved state variable η t onto the next-period observable state s t in the sense that Pr{s t , η t |s t , η t } = Pr{s t |η t , s t } Pr{η t |s t , η t }, the joint distribution of four periods of observables admits the

Conclusion
There are several interesting directions of future research.First, while Maskin and Tirole (2001) conceptualize Markovian strategies as a way to capture the simplest form of behavior that is consistent with rationality, which leads to the notion of payoff relevant state variables, applied work has taken a richer interpretation of strategies dependence on state variables.
Collard-Wexler ( 2013) allow strategies to depend on plant size, past plant size for each firm plus 50 market-level demand levels which leads to a state space consisting of 1.4 million elements.Second, if the required state space is indeed large as suggested by applied work, then applications of machine learning tools to conduct inference on dynamic games will be a promising direction.For example, Semenova (2018) extends the two-step set inference approach in Bajari et al. (2007), where the state space (and thus the dimension of p) is high-dimensional, by applying the Neyman orthogonalized moment function and cross fitting to deal with the bias in the first stage estimation of p using machine learning methods (e.g., Chernozhukov et al., 2018).This methodology may be extended to other estimation strategies and accommodate unobserved heterogeneity and multiple equilibria.
Third, it is also interesting to bridge the gap to the subgame-perfect dynamic games framework.Miller et al. (2021) estimate a novel dynamic price-leadership game, in which one firm proposes super-markups to rivals (over Bertrand prices) in order to maximize it's own discounted payoff subject to participation constraints of rivals which entails punishment with reversion to permanent static Bertrand pricing.The game, which can be solved using techniques of Abreu et al. (1990), is shown to provide a good fit for the observed pricing path.A future research agenda is to build on the recursive methods introduced by Abreu et al. (1990) and to develop an estimable class of sub-game perfect equilibria that relaxes the Markovian strategy framework.
The fixed point mapping (2) can have multiple solutions.Pesendorfer and Schmidt-Dengler (2008) and Doraszelski and Satterthwaite (2010) provide examples of dynamic Markovian games with multiple equilibria.
Jofre-Bonet andPesendorfer (2003) consider a continuous action framework in which choice probabilities are specified by a Weibull distribution function of actions, whileAguirregabiria and Mira (2007) andPesendorfer and Schmidt-Dengler (2008) consider a discrete action framework using the empirical frequency estimator.Bajari et al. (2007) allow for both continuous and discrete action variables.
Some authors have considered iterating the fixed point equation (2).Unfortunately, this can lead to poor statistical properties due to multiplicity of solutions in the fixed-point equation inherent to games.Consider the iterated (nested) PML estimator, proposed in Aguirregabiria and Mira (2007).It entails solving the above MD problem iteratively by replacing p0 in Ψ in the -th iteration with p obtained from p = Ψ(θ , p −1 ).Pesendorfer and Schmidt- Dengler (2010) show analytically in a dynamic game framework with a unique symmetric equilibrium that this type of iteration can lead to inconsistent (Lyapunov stable) estimates almost surely.For a fixed number of iterations,Bugni and Bunting (2020) study asymptotic extend two-step games estimators to allow for a continuous state space.Doraszelski and Judd (2012) andArcidiacono et al. (2016) consider random (poisson) arrival of decision nodes which reduces the computational complexity of the ex ante value function calculation.Egesdal et al. (2015) propose a constrained maximum likelihood estimator.MiessiSanches et al. (2016) propose conditions under which the minimization problem in (3) simplifies to standard OLS/GLS.Abbring and Campbell (2010) study last-in first-out dynamics, which typically implies a unique MPE.Abbring et al. (2018) propose a simple dynamic model of entry/exit which has an essentially unique symmetric MPE. 1
t j = •, s t j = •} are identical for all j, t, by employing a Markov chain Monte Carlo (MCMC) algorithm.Under the setup in Section 2.1 and H hom 0 consider the d = 2•K •|S|•N dimensional moment function for the observable d and unobservable p:

E
λ×µ[•]  is infeasible to compute because the true distribution λ of the equilibrium choice probabilities is unknown and the econometrician does not want to specify the distributional form.As described in Section 2, the fixed point problem in (2) can have multiple solutions, and the set inference approach inOtsu et al. (2021) allows explicitly such a scenario.In particular, based on the duality approach inSchennach (2014), they treat the market-specific choice probabilities p as latent variables with their conditional probability measure λ treated as an infinite-dimensional nuisance parameter, and show by entropic latent variable integration that θ ∈ Θ 0 if and only if infγ∈R d |E µ [g(d, θ, γ)]| = 0,(10)where g(d, θ, γ) = g(d, p; θ) • exp(γ g(d, p; θ))dF (p) exp(γ g(d, p; θ))dF (p) , with a user-specified probability measure F .In practice, F may be set as the uniform distribution on the strategy set Q. A similar result can be obtained for the case where p is parametrized by τ ∈ Q (say, p(τ )), as far as Q is compact and Ψ(p(•); θ) is continuous at each θ ∈ Θ.The dual problem (10) is equivalent to the primal problem (9).The moment condition in the dual involves the function g(d, θ, γ) that is an integral of the original moment function g(d, p; θ) under an exponential distribution of the unobservables.
Arcidiacono et al. (2016) allow strategies to be a function of the number of stores operated by each firm, population, and five time-invariant unobserved variables equal to discretized values of a standard normal.A rich strategy space potentially allows a large set of equilibria which should be balanced against the original aim of Markovian strategies.An open research question concerns the minimal size of the state space required to capture rational behaviors consistent with the data.