Identiﬁcation in simple binary outcome panel data models

Summary: This paper ﬁrst reviews some of the approaches that have been taken to estimate the common parameters of binary outcome models with ﬁxed effects. We limit attention to situations in which the researcher has access to a data set with a large number of units (individuals or companies, for example) observed over a number of time periods. We then apply some of the existing approaches to study ﬁxed-effects panel data versions of entry games, like the ones studied in Bresnahan and Reiss (1991) and Tamer (2003).


GENERAL SETUP
It is natural to model decisions made by individuals in terms of the information available to them when the choice is made. This motivates a general panel data setup in which the distribution of a dependent variable (or vector) in time period t, y it , can be modelled as a function of its own past values, y t−1 i = {y is } s<t , a vector of explanatory variables up to time t, x t i = {x is } s≤t , and an unobserved individual-specific characteristic, α i . Consequently, where f is the distribution of y it conditional on x t i , y t−1 i , and α i , and θ is the vector of parameters. Throughout this paper, we treat α i as a 'fixed effect' in the sense that its distribution is allowed to depend on the explanatory variables in an arbitrary way.
In (1.1) the explanatory variable is allowed to be predetermined so that future realisations of x may depend on the realisation of y in the current period. This is attractive from an economic point of view when y is the outcome of a choice as indicated above. An individual makes a decision, y it , based on her information at that point. The information set contains the covariates that she has observed until now, x t i , her past choices, y t−1 i , and her time-invariant characteristics, α i (which are unobserved to the econometrician).
While it is possible to allow for predetermined explanatory variables in models where the fixed effect, α i , enters linearly or multiplicatively on the outcome variable, y it , we are not aware of any results that allow for this in panel data discrete response models where the distribution C The Author(s) 2021. Published by Oxford University Press on behalf of Royal Economic Society. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. of α i is left unrestricted. We therefore maintain throughout the stronger modelling setup where explanatory variables are strictly exogenous and dependence on the whole sequence of covariates is considered: (1. 2) The additional restrictions embedded in (1.2) rule out that individuals choose x in time period t in response to the outcomes of y in periods prior to t. As mentioned above, this will sometimes make it unattractive in economic applications. However, (1.2) makes it possible to make probability statements on the whole sequence (over time) of y it conditional on the whole sequence of the explanatory variables.
It is important to recognise that knowing θ in (1.1) and (1.2) is typically not sufficient for calculating counterfactual distributions or marginal effects. Those will depend on the distribution of α i as well as on θ and they are typically not point identified even if θ is. For a discussion of this, see, for example, Chernozhukov et al. (2013). On the other hand, it seems that point or set identifying and estimating θ is a natural first step if one is interested in bounding, say, average marginal effects.
In Section 2 of this paper, we first review some approaches for estimating univariate binary outcome versions of (1.1). The traditional approach is to find a sufficient statistic for the fixed effects and then proceed by conditional maximum likelihood (conditioning on the sufficient statistic). This approach dates back to Rasch (1960Rasch ( , 1961, and a recent example includes Aguirregabiria et al. (2020). When it is not possible to find a sufficient statistic for the fixed effects, it is sometimes possible to construct moment equality conditions which must be satisfied at the true parameter value. See Johnson (2004) and Honoré and Weidner (2020) for an early and recent example, respectively. Significant progress has also been made by employing moment inequality conditions. See, for example, Manski (1987) and, more recently, Pakes and Porter (2016) or Pakes et al. (2021). Section 3 discusses bivariate binary outcome models. We first describe some recent advances for reduced form models, and we then analyse a simple panel data version of an entry game. Section 4 concludes.

THE INCIDENTAL PARAMETERS PROBLEM
It is well understood that estimating the individual-specific effects, {α i }, along with the common parameter, θ , typically (though not always) leads to inconsistent estimation of θ in a panel where the number of time periods is fixed and to asymptotic bias in 'large' panels, where both the number of time periods and the number of micro-units increase. This is known as the incidental parameters problem (see Neyman and Scott, 1948).
There are many papers that attempt to eliminate the asymptotic bias in 'large' panels. These include Hahn and Newey (2004), Arellano and Bonhomme (2009), Dhaene andJochmans (2015), andWeidner (2016). These papers consider procedures that are justified asymptotically as the number of time periods grows with the number of individuals. See, for example, Fernández-Val and Weidner (2018) for a review of this literature. A different set of papers tries to construct methods that work when the panel contains observations for a large number of micro-units observed in a few time periods. This is the situation that we consider in this paper. Specifically, in this section, we briefly review three alternative approaches for dealing with individual-specific parameters in standard binary response models that have been explored in the literature: conditional likelihood, construction of moment conditions, and mo-C 2021 Royal Economic Society. ment inequalities. This list is by no means exhaustive. For example, a number of papers have explored the usefulness of restricting the relationship between individual-specific effects and one of the explanatory variables, including papers like Chen et al. (2016). A different set of papers places restrictions on the distribution of the fixed effects. For example, Bonhomme and Manresa (2015) assume that its marginal distribution is discrete with a finite number of points of support.

Conditional likelihood
The traditional approach for obtaining consistent estimators of the common parameters in a parametric model with incidental parameters is to condition on a set of sufficient statistics for the individual-specific parameters. This was proposed by Rasch (1960Rasch ( , 1961 and studied in detail by Andersen (1970). Suppose that the distribution of y T i conditional of x T i and the individual-specific effects has been specified as a function of the common parameter, θ . The idea behind conditional likelihood is that if there exists a (possibly vector-valued) function of the data for individual i, S i , such that (a) the distribution of y i conditional on (S i , x i , α i ) does not depend on α i (i.e., S i is a sufficient statistic for α i ), and (b) the distribution of y i conditional on (S i , x i , α i ) depends on θ , then one can estimate θ by maximum likelihood using the conditional distribution of the data given (S i , x i ). Andersen (1970) shows that the conditional maximum likelihood estimator is consistent and asymptotically normal under mild regularity conditions.
The main limitation of the conditional likelihood approach is that, in binary response settings, it is typically not possible to find a statistic, S i , with the properties described above. The main exceptions include a number of logit models, some of which are discussed below.

2.1.1.
Simple examples: logit models. Rasch (1960Rasch ( , 1961) considered a static panel data version of the standard logit model, where (·) is the logistic cumulative distribution function.
In this case, the distribution of (y i1 , . . . , y it ) conditional on (x i1 , . . . , x it ) and on S i = T t=1 y it does not depend on α i . If T ≥ 2 it nonetheless does depend on β: where c it ∈ {0, 1} and As a result, the conditional likelihood can be used to identify and estimate β in the static panel data logit model. Unfortunately, this does not generalise to other simple models such as the probit model. Indeed, Chamberlain (2010) showed that in a model of the form P (y it = 1|x T i , y t−1 i , α i ) = F (x it β + α i ), regular root-n estimation of β without additional assumptions is only possible if F is the logistic cumulative distribution function. C 2021 Royal Economic Society.

C81
The conditional likelihood approach can also be used to estimate some simple panel data autoregressive logit models. See, for example, Chamberlain (1985) and Magnac (2000). Consider the simple model , for t = 2, . . . , T . (2.2) Since (2.2) models an outcome in terms of its past value, we only insist that it applies starting in the second time period. The first observation, y i1 , is usually referred to as the initial condition.
Conditional on S i = (y i1 , T t=1 y it , y it ), the distribution of (y i1 , . . . , y it ) does not depend on α i . However, for T ≥ 4, it does depend on γ when y i1 = y i4 . The corresponding conditional likelihood can therefore be used to identify and estimate γ . This approach has been extended to an AR(2) model: In this model there are two fixed effects, γ i1 and α i , and when the coefficient on y it−2 is 0, the model corresponds to a Markov switching model with individual-specific transition probabilities.
Here, the initial conditions are y i1 and y i2 .
y it y it−1 and the corresponding conditional likelihood can be used to estimate γ 2 . 1 Magnac (2000) showed that this generalises to AR(p) panel data logit models. In a model like (2.3), with p lags rather than 2, it is possible to find a vector of sufficient statistics, S i , such that the distribution of (y i1 , . . . , y it ) conditional on S i does not depend on (α i , γ i1 , . . . , γ ip−1 ), but for T sufficiently large, it does depend on γ p . The corresponding conditional likelihood can therefore be used to identify and estimate γ p with no assumptions made on (α i , γ i1 , . . . , γ ip−1 ). While the model in (2.3) illustrates the usefulness of the conditional likelihood approach, it also illustrates its limitation. Suppose that one is willing to assume that γ i1 is homogeneous (i.e., γ i1 = γ 1 for all i) so In this case, the numerical calculations in Honoré and Kyriazidou (2019b) suggest that (γ 1 , γ 2 ) is identified for T ≥ 5. Specifically, Honoré and Kyriazidou (2019b) assume values of γ 1 and γ 2 and a distribution for α i conditional on the initial conditions, y i1 and y i2 . This implies a distribution, P , for (y i3 , y i4 , y i5 ) conditional on y i1 and y i2 . For a fine grid of potential values of γ 1 and γ 2 , they then ask whether one can find a heterogeneity distribution (conditional on the initial conditions) that produces the probabilities, P , using the values on the grid. They find numerically that this is only possible when γ 1 and γ 2 take the true values. This suggests that γ 1 and γ 2 are both point identified. However, they also note that it seems that conditioning on any statistic that eliminates α i in a conditional likelihood will also eliminate γ 1 . This suggests that even in a simple model like (2.4) where conditioning that eliminates α i is possible, there is 1 The conditional likelihood approach can also be used to estimate models where the coefficient γ 2 differs depending on the value of y it−1 . This is, for example, relevant if one does not want to tie the parameters that govern the transition out of employment to the parameters that govern the transition out of nonemployment. C 2021 Royal Economic Society. additional information not captured by the conditional likelihood approach. We turn to this in the next subsection.

Moments
The observation that (γ 1 , γ 2 ) appears to be identified in (2.4) is the inspiration for a recent paper by Honoré and Weidner (2020). The approach in that paper is to try to construct moment conditions that depend on (γ 1 , γ 2 ), but do not depend on the individual-specific effects, α i . To do this, Honoré and Weidner (2020) follow the general approach in Bonhomme (2012). Bonhomme (2012) points out that models for discrete data generally cannot be dealt with using his approach. It is therefore 'trial and error' to see whether it can be applied to models like (2.4).
To find a moment condition for (γ 1 , γ 2 ) in (2.4) with T = 5, one needs to find functions, m, of the data and the parameters such that for all values of α i and, hence, no matter what the true values of (γ 1 , γ 2 ) are in the data generating process. The subscript (γ 1 , γ 2 ) on the expectation is a reminder that the expectation is a function of γ 1 and γ 2 . Since (y i3 , y i4 , y i5 ) can take eight values, (2.5) can be written as a sum over eight terms, Honoré and Weidner (2020) approaches this and related problems by first fixing d 1 , d 2 , γ 1 , and γ 2 at a particular value and α i at q values for some q. At that point, the probabilities are numbers and the question becomes whether one can solve the q equations for the eight 2 unknown (the m's) without making them all zero. If this is not possible, then there is no hope of finding an appropriate moment function, m.
After experimenting with various values for γ 1 , γ 2 , and the q values of α i , Honoré and Weidner (2020) conclude numerically that for each combination of the initial conditions one can find a nontrivial moment condition. They obtain these analytically by solving (2.6) for a set of specific values of α i and then verifying that the obtained solution satisfies (2.6) generically. The moment conditions are otherwise, where the subscripts on m denote the initial values of y 1 and y 2 .
The functions m (0,0) and m (1,1) are both strictly monotone in γ 1 if (d 3 , d 4 , d 5 ) = (0, 1, 1) or = (1, 0, 0), respectively, and constant otherwise. It is therefore clear that as long as either 4 In other words, if every combination of the initial conditions (y i1 , y i2 ) has positive probability, (γ 1 , γ 2 ) is overidentified in the sense that there are four moment conditions (one corresponding to each of the initial conditions) and two parameters to be estimated. This partly solves the puzzle in Honoré and Kyriazidou (2019b) discussed above.
The strategy of looking for moment conditions developed in Bonhomme (2012) and explained above can be used for a number of other models. Honoré and Weidner (2020) present explicit expressions for such moment functions for AR(p) (for p = 1, 2, and 3) panel data logit models with strictly exogenous explanatory variables of the type (2.7) In this case, the moment functions will be functions of x T i , so the approach will yield conditional moment conditions which can be turned into unconditional moment conditions for the purpose of estimation. While the conditional moment restrictions in Honoré and Weidner (2020) will not always point-identify the common parameters, β and the γ j 's, the paper presents conditions under which the conditional moments can be turned into a finite number of unconditional moments which do identify the common parameters. Generalised method of moments estimation will deliver a root-n consistent and asymptotically normal estimator in that case. 5 When T = 4 and p = 1, the moment functions in Honoré and Weidner (2020) yield moment conditions which are transformations of moment conditions that had previously been discovered 3 Formally, this assumes that P (y i3 = 0, y i4 = 1, y i5 = 1|y i1 = 1, y i2 = 1) > 0 and/or P (y i3 = 1, y i4 = 0, y i5 = 0|y i1 = 1, y i2 = 0) > 0. This will be true as long as α i take finite values with positive probability. 4 This follows because m (0,1) and m (1,0) are both monotone in γ 2 . 5 Honoré and Kyriazidou (2000) provide conditions under which a conditional likelihood approach can be used to estimate models like (2.7). In order to achieve root-n consistency, their approach requires that there is positive probability that the explanatory variables are the same in two time periods. The sufficient conditions in Honoré and Weidner (2020) are weaker than that. C 2021 Royal Economic Society.
by Kitazawa (2013Kitazawa ( , 2016. To apply the moment conditions, one needs a total of T ≥ 2 + 2p periods of observations. Of these, the first p correspond to the initial conditions, and one therefore only needs to observe the explanatory variables in the last 2 + p periods. Based on numerical calculations for various combinations of T and p, Honoré and Weidner (2020) conjecture that for each of the 2 p combinations of the initial conditions, there are 2 T −p − (T + 1 − 2p) 2 p linearly independent conditional moment conditions. For example, for an AR(2) panel data logit model with ten time periods (two of which would provide the initial conditions), there are 228 conditional moment conditions. Intuitively, this implies that the model contains a lot of information about the parameters. However, the large number of moments also implies that one should be careful about blindly applying generalised methods of moments estimation.

Inequalities
As mentioned above, Chamberlain (2010) shows that in a binary response model of the form where F is a known cumulative distribution function, regular root-n estimation of β is only possible if F is the logistic cumulative distribution function. This suggests that it is also not possible to construct root-n consistent estimators for dynamic models like (2.7) if one deviates from the logit model. Of course, this does not imply that it is not possible to construct useful consistent estimators or informative bounds for the common parameters in non-logit models. In this subsection, we discuss some of the progress that the literature has made in this direction.
Consider the panel data discrete choice model where, conditional on (x it , x is , α i ), ε it and ε is are identically distributed with unknown distribution function F (x it ,x is ,α i ) . This is a strict exogeneity assumption on the explanatory variables and a stationarity assumption of the errors. When F is the logistic distribution, it is the logit model studied by Rasch (1960Rasch ( , 1961 and as shown in (2.1). Manski (1987) observed that if F (x it ,x is ,α i ) has support equal to the real line, then this implies that The key property is that the left-hand side does not depend on α i and can be identified from the data, while the right-hand side is a constraint on β. Equation (2.9) allowed Manski (1987) to define a conditional maximum score estimator, With random sampling and assumptions on the support of the explanatory variables, this estimator is consistent, but its rate of convergence is n −1/3 . 6 Honoré and Kyriazidou (2000) use Manski's insight to construct an estimator for a version of (2.8) that also has a lagged y as an explanatory variable. They assume that the errors are independent and identically distributed-and not just stationary as in Manski (1987)-and that the researcher has access to a sample with at least four time periods for each individual. 7 In order 6 Imposing additional smoothness assumptions, Horowitz (1992) shows that one can improve the rate of convergence by defining a smoothed maximum score estimator asβ = arg max b , where H is a cumulative distribution function which plays the same role as a kernel in nonparametric estimation. 7 The first observation provides the initial condition. The model is not required to hold in this period. to get point identification, Honoré and Kyriazidou (2000) had to make the strong assumption that the vector x i4 − x i3 has support in a neighbourhood of 0. Other papers have been able to obtain bounds without the assumption that x i4 − x i3 has support in a neighbourhood of 0. For example, Aristodemou (2021) also considers a version of (2.8) that has a lagged y as well as strictly exogenous regressors as explanatory variables. Consider an individual for whom y it is observed in three time periods. Aristodemou (2021) observes that if the errors in periods two and three are independent of the explanatory variables conditional on the initial y i1 , then For simplicity, suppose that x it is one-dimensional and that β is normalised to 1 without loss of generality. Then each value, w, in the support of x i2 − x i3 , in (2.10) provides a lower bound on F .ε i2 −ε i1 |y i0 (w − γ ) while (2.11) gives an upper bound on F .ε i2 −ε i1 |y i0 (w). This gives a bound on γ .
More recently, Khan et al. (2020) characterise the identified region for (γ, β) under the weaker assumptions that the errors are stationary conditional on the sequence of explanatory variables and on the individual-specific effect. Like Aristodemou (2021), this paper does not maintain the strong assumption on the explanatory variables needed by Honoré and Kyriazidou (2000). Interestingly, Khan et al. (2020) show that it is sometimes possible to point-identify (γ, β) with as few as three time periods (including the initial condition).

BIVARIATE MODELS
As mentioned in Section 1, it is desirable to allow for predetermined-as opposed to strictly exogenous-explanatory variables in economic panel data settings. While doing this is an unsolved problem in general, it is possible to get results like those discussed above for a variety of models where a dependent variable and an explanatory variable are modelled jointly. We first illustrate this in a reduced form setting where two binary variables are modelled jointly. We next turn to a setting where they are the outcome of a simple game.

Reduced form bivariate models
Following Schmidt and Strauss (1975), who propose a cross-sectional bivariate binary response model, Honoré and Kyriazidou (2019a) consider the bivariate panel data model for two outcomes C 2021 Royal Economic Society.
(y 1,it , y 2,it ) P y 1,it = 1 y 2,it , y t−1 1,i , y t−1 2,i , x T 1,i , x T 2,i , α 1,i , α 2,i = α 1,i + x 1,it β 1 + ρy 2,it , (3.1) P y 2,it = 1 y 1,it , y t−1 1,i , y t−1 2,i , x T 1,i , x T 2,i , α 1,i , α 2,i = α 2,i + x 2,it β 2 + ρy 1,it . Honoré and Kyriazidou (2019a) show that in this case β 1 , β 2 , and ρ are identified with T = 2. Honoré and Kyriazidou (2019a) also consider a vector autoregressive version of the simultaneous logit model in (3.1): When ρ = 0 it corresponds to the probabilities in the model proposed by Narendranthan et al. (1985): where ε 1,it and ε 2,it are logistic random variables that are independent of each other and independent over time. Narendranthan et al. (1985) show that all parameters in this model are identified with a total of T = 4 periods. Honoré and Kyriazidou (2019a) generalise this result by showing that (γ 11 , γ 12 , γ 21 , γ 22 ) is identified in the model given in (3.2) with at least four time periods. 8 However, the conditioning argument that leads to the identification eliminates the parameter ρ along with the heterogeneity terms α 1,i and α 2,i . On the positive side, this implies that one can allow the parameter ρ in (3.2) to be individual-specific. Alternatively, ρ may be the parameter of interest in many applications. This makes it problematic that the conditioning argument eliminates it along with α 1,i and α 2,i . The calculations in Honoré and Kyriazidou (2019b) suggest that ρ might be identified, despite the fact that it drops out when one pursues a conditional likelihood approach to eliminate α 1,i and α 2,i . It would be interesting to know whether the results in Honoré and Weidner (2020) can be used to derive moment conditions that can be used to identify ρ in the same way that one can identify γ 1 in (2.4).

Panel data games
The model in equations (3.1) and (3.2) is a natural generalisation of the classic linear simultaneous equations model to a logit framework. On the contrary, it is not straightforward to give a behavioural interpretation to the model. This is in contrast to single equation logit or probit models which can be interpreted in terms of threshold-crossing or utility maximisation. We therefore turn to an alternative panel data version of the bivariate binary response models that are more inspired by economics. 8 Honoré and Kyriazidou (2019a) also discuss how one can generalise the identification results in Narendranthan et al. (1985) and Honoré and Kyriazidou (2019b) to achieve identification if one also allows for strictly exogenous explanatory variables. The identification argument mimics that in Honoré and Kyriazidou (2000) and using the empirical counterpart for estimation will lead to estimators that converge at a rate slower than the usual √ n if the strictly exogenous variables are continuously distributed. C 2021 Royal Economic Society.
Consider a game with two players i = 1, 2, each of whom takes a binary action, y ∈ {0, 1}, at instance t according to the best-response function: where ε 1t and ε 2t are error terms. Except for the α terms, this is the canonical model considered in Tamer (2003). If players are firm contemplating their presence in a particular market, it is natural to assume that γ > 0. One can envision observing their entry decisions across different periods for the same market or over distinct geographic markets. Our aim is to study identification and estimation of β and γ in panel data versions of this model with the α's being company market-specific effects. 9 The econometric model above can also be seen as a dyadic network formation model defining directed connections between (i, j ) pairs of individuals, households, companies, or countries. Here t indexes node pairs and y it indicates whether person i sends a link to person j . The individual effect α i would in turn encode the 'gregariousness' of individual i. Charbonneau (2017), for example, considers such a model for directed networks with γ = 0 and an additional individual effect for the 'target' node j , which can be interpreted as this node's 'attractiveness' (see also Graham, 2017). A specification with γ < 0 would in turn allow for i to have a tendency to reciprocate the link decision of its counterpart j (see de Paula 2020, n5). For expositional ease, we nevertheless assume that γ ≥ 0 for the remainder of this section and refer to players as companies and the game as a market.
It is well understood that the model in (3.3) is incomplete in the sense that there is no unique mapping from (x 1t , x 2t , α 1 , α 2 , ε 1t , ε 2t ) to (y 1t , y 2t ). For example, when γ > 0, certain realisations for (x 1t , x 2t , α 1 , α 2 , ε 1t , ε 2t ) are consistent with (y 1t , y 2t ) = (1, 0) or (y 1t , y 2t ) = (0, 1) as depicted in Figure 1, and this leads to difficulties in the conventional panel data manipulations discussed so far. It is easiest to explain our ideas in a setting where the errors are stationary, independent of (x 1t , x 2t , α 1 , α 2 ), and independent over time, but the derivations below suggest that these assumptions can be relaxed considerably. Clearly, it would be interesting to allow for dynamics (i.e., lagged dependent variables) in the model. To illustrate the main idea, we will nonetheless abstract from dynamics, but combining the insights from the literature discussed previously (as well as the possibility of forward looking behaviour discussed later) would be an important angle on which to expand the ideas delineated below.

Identification of β
In this subsection, we discuss the potential for identifying the β in (3.3) when the distribution of (ε 1 , ε 2 ) is left unspecified.
Letting N t be the number of entrants in a market in period t, conventional calculations (see de Paula, 2013) deliver: where F is the cumulative distribution function of (−ε 1 , −ε 2 ). Note that the probability above is monotone in (x 1t β, x 2t β). Suppose there are two time periods or instances, and that x is market-specific, so x 1t = x 2t = x t . Then if (and only if) x 1 β x 2 β. Now condition on the event that N t equals 2 in exactly one of the two periods. A maximum score argument like that in Manski (1987) (see above) applied to the event N 1 = 2 with x 1 − x 2 as the explanatory variables can then be used to identify and estimate β (up to scale). 10 More specifically, conditional on {x s } 2 s=1 , α 1 , α 2 and 1(N 2 = 2) = 1(N 1 = 2), the variable 1(N 2 = 2) − 1(N 1 = 2) is a Bernoulli random variable with the median given by 10 Since F is not specified, there is no scope for identifying the scale of (β, γ ). C 2021 Royal Economic Society.

C89
The last equality follows since P N 1 = 2 {x s } 2 s=1 , α 1 , α 2 P N 2 = 2 {x s } 2 s=1 , α 1 , α 2 if (and only if) x 1 β x 2 β. Then, under the assumptions delineated in Manski (1987), one obtains that β = argmax b E[sgn((x 2 − x 1 ) b)(1(N 2 = 2) − 1(N 1 = 2))] as established in that paper and discussed previously. Note also that this will work even if γ is market-and/or player-specific. On the other hand, it is crucial for the argument that β is the same for the two players. Using a similar argument, we can also recover β by conditioning on the event N 1 = 0 or N 2 = 0, but not both.
When the x's are not market-specific, we can use the same argument by conditioning on x 21 = x 22 = x 2 (i.e., player 2 has the same x in two periods; needless to say, this assumes that x 21 − x 22 has support in a neighbourhood around 0). In that case, if (and only if) x 11 β x 12 β. As a result, we can identify and estimate β by conditioning on markets where N 1 = 2 or N 2 = 2, but not both.

Bounds on γ
Even if the distribution of (ε 1t , ε 2t ) in Tamer (2003) is known, the model's incompleteness does not allow us to represent the probability distribution of (y 1t , y 2t ) conditional on (x 1t , x 2t , α 1 , α 2 ) as a function of (β, γ ) (see Figure 1). However, the model does provide bounds on the probabilities for each outcome as a function of (β, γ ), conditional on (x 1t , x 2t , α 1 , α 2 ), and, analogously, bounds on the probabilities for each outcome as a function of (β, γ ) conditional on (x 1t , x 2t )-see, for example, Tamer (2003). For a given period t we can establish that whereas P (y 1t , y 2t ) = (0, 1)| {x 1s , x 2s } 2 s=1 , α 1 , α 2 ≤ 1 − P (y 1t , y 2t ) = (1, 1)| {x 1s , x 2s } 2 s=1 , α 1 , α 2 −P (y 1t , y 2t ) = (0, 0)| {x 1s , x 2s } 2 s=1 , α 1 , α 2 (3.5) and P (y 1t , y 2t ) = (0, 1)| {x 1s , x 2s } 2 s=1 , α 1 , α 2 ≥ 1 − P (y 1t , y 2t ) = (1, 1)| {x 1s , x 2s } 2 s=1 , α 1 , α 2 −P (y 1t , y 2t ) = (0, 0)| {x 1s , x 2s } 2 s=1 , α 1 , α 2 −P (ε 1 , and similarly for (y 1t , y 2t ) = (1, 0). Since (ε 1t , ε 2t ) is independent across time, one can thus obtain probability bounds on ((y 11 , y 21 ), (y 12 , y 22 )), conditional on ({x 1s , x 2s } 2 s=1 , α 1 , α 2 ), by taking products of the above bounds. Finally, to obtain probability bounds conditional on {x 1s , x 2s } 2 s=1 one can integrate the above equalities and inequalities against the distribution H (α 1 , α 2 | {x 1s , x 2s } 2 s=1 ) for α 1 and α 2 . An identified set for the unknown parameters 11 is then the set of parameters that is consistent with the above bounds (for some admissible distribution H (α 1 , α 2 | {x 1s , x 2s } 2 s=1 )). For example, P (y 11 , y 21 ) = (1, 1) and (y 12 , y 22 ) = (1, 1)| {x 1s , x 2s } 2 s=1 (3.7) Since the support of ((y 11 , y 21 ), (y 12 , y 22 )) has sixteen points, there are thus two probability equalities-for the events ((y 11 , y 21 ), (y 12 , y 22 )) = ((1, 1), (1, 1)), and ((y 11 , y 21 ), (y 12 , y 22 )) = ((0, 0), (0, 0))-and twenty-eight inequalities, two for each of the remaining fourteen events (given covariates). To operationalise this, we would need to compute those restrictions across all possible distributions, H , for the individual-specific ('fixed' ) effects. If the data-generating process satisfies the assumptions that one needs to apply maximum score above, then one only needs to bound γ using the inequalities in (3.5) and (3.6). If not, then one could combine the restrictions implied by the inequalities in (3.5) and (3.6) with, for example, (3.4) to obtain bounds for β and γ . The same approach can be used if the γ 's are different for units one and two. The approach for bounding the model parameters above can be combined with parametric assumptions on the distribution H . Since the model is not dynamic, this will not lead to potential internal inconsistencies. Alternatively, one can proceed more nonparametrically. For example, Honoré and Tamer (2006) approximate the distribution of the individual-specific effects by a discrete distribution with many points of support. To determine whether a particular parameter value belongs to the identified set, they use linear programming to check whether there exists a distribution of the individual-specific effects such that the probability distribution calculated from the econometric model matches the probability distribution in the data. This approach seems reasonable when the individual-specific effect is one-dimensional and one does not need to condition on additional covariates. However, when that is not the case, the necessary number of support points is likely to be unreasonably large. Theorem 2.1 in Winkler (1988), alternatively, implies that to match m probabilities (adding to 1), there is no loss of generality in considering discrete distributions with m + 1 points of support. This or similar results have been used in statistics and econometrics (see, e.g., Lindsay, 1995;Chernozhukov et al., 2013;d'Haultfoeuille and Rathelot, 2017). Winkler's (1988) result suggests a hybrid algorithm where one searches over the location of the points of support using nonlinear methods and then solves for the implied probabilities using linear programming. The result, that there is no loss of generality in considering a discrete distribution for the unobservable, is similar to a result in Honoré and Lleras-Muney (2006), except that in that instance the structure of the problem determined their location. Here, searching over those locations will be part of the computational challenge.

Generalisations
The argument above combines the setup in (3.3) with simple static panel data insights. In most economic applications it will be important to also allow for dynamics. One may consider 'nonstructural (myopic) dynamics' as in, for example, Honoré and Kyriazidou (2000) as well as 'structural dynamics' as in Aguirregabiria et al. (2020), where utility maximising agents realise that their choice today has an effect on their utility tomorrow. In addition, one may explore the econometric consequences of restricting how the equilibrium selection mechanism evolves over time.

CONCLUSIONS
Much of the literature on nonlinear panel data models has been inspired by standard cross-sectional models. Historically, these models have been made dynamic by including lagged dependent variables as explanatory variables. While this is natural in some settings, it is important to recognise that it has implications if one wants to interpret the estimated model in terms of some implicit underlying economic model. For example, if one wants to motivate a logit model with lagged dependent variables in terms of a random utility model in which the utility of an option in one period depends on the choice in the previous period, then one typically implicitly rules out that the agents are forward looking. In a recent paper, Aguirregabiria et al. (2020) demonstrates that in a particular example it is possible to adapt some of the conditioning arguments for logit models to more natural economic models. Investigating whether this is true for panel data discrete choice models with fixed effects more generally is an interesting topic for future research.