The Cost of Job Loss

This paper identifies an equilibrium theory of wage formation and endogenous quit turnover in a labour market with on-the-job search, where risk averse workers accumulate human capital through learning-by-doing and lose skills while unemployed. Optimal contracting implies the wage paid increases with experience and tenure. Indirect inference using German data determines the deep parameters of the model. The estimated model not only reproduces the large and persistent fall in wages and earnings following job loss, a new structural decomposition finds foregone human capital accumulation (while unemployed) is the worker's major cost of job loss.

The Cost of Job Loss *

Introduction
Following Topel (1990) and Ruhm (1991) an important reduced form literature quantifies the surprisingly large and persistent earnings losses that follow layoff -the cost of job loss. For example the estimates of Jacobson et al. (1993) and Couch and Placzek (2010) suggest that, six years after displacement, worker losses are between 13% and 25% of their pre-displacement earnings. The aim of our paper is to provide a new structural decomposition of such earnings losses. We do so by extending Burdett and Coles (2003), which considers equilibrium wage contracting with on-the-job search, to the case of learning-by-doing while employed and skill loss while unemployed. The equilibrium approach is powerful not only because structural estimation of the model yields wage outcomes that are consistent with observed tenure and experience effects, it also consistently explains why wages paid are disperse across firms, why higher wage paying firms raise wages more slowly with tenure (see Abowd et al., 1999, andBagger et al., 2014), while taking into account endogenous quit turnover -that workers typically quit for better paid employment.
Because the equilibrium market structure is consistent with the Jacobson et al. (1993) statistical approach, we use it to decompose the estimated cost of job loss into three constituent

parts: [i] job ladder losses, [ii] human capital losses and [iii] employment gap effects [the laid-off
worker is more likely to be unemployed in the future with zero earnings]. Using German data and consistent with the large earnings losses described above, the estimated cost of being laid-off is large, being around 8-9% of expected lifetime discounted earnings. Typically unemployment policy focusses on compensating workers for foregone earnings while unemployed. But an important policy insight here is human capital loss, which is mainly due to foregone learning-by-doing This paper builds on the Burdett and Mortensen (1998) framework on equilibrium wage formation and labour turnover in frictional labour markets. This literature not only provides a structural interpretation for how wages evolve over individual worker careers, it also explains the surprisingly large variation in wage outcomes across firms and across workers (e.g. Mortensen, 2003). Much of the recent literature adopts the sequential auction approach of Postel-Vinay and Robin (2002) for this yields a highly tractable econometric framework (see Robin Tenure effects naturally arise in this framework because firms backload wages to reduce worker quit incentives (e.g. Stevens, 2004, Burdett and Coles, 2003, Carrillo-Tudela, 2009). By allowing learning-by-doing while employed, wages exhibit both experience and tenure effects (e.g. Altonji and Shakotko, 1987, Topel, 1991, Dustmann and Meghir, 2005. By allowing that human capital might decay while unemployed (e.g. Pissarides, 1992, Ljungqvist andSargent, 1998), we develop an equilibrium framework which is ideal for identifying the cost of job loss. Unfortunately although the decision theory is relatively straightforward, the equilibrium fixed point problem is formidable. It not only requires identifying a non-degenerate set of equilibrium (recursive, dynamic) wage contracts posted by firms, there are ex-ante heterogeneous workers who all use optimal job search strategies [which are best responses to the equilibrium set of posted wage contracts] along with an endogenous joint distribution of employment, worker productivities and tenures which is an evolving (infinitely dimensional) aggregate state variable. By adapting the notion of timeless equilibrium in Woodford (2003), we not only describe a tractable equilibrium framework, we provide a closed form characterisation of equilibrium. Most importantly the market equilibrium yields an econometric wage structure which is consistent with the statistical frameworks of Abowd et al. (1999) and Jacobson et al. (1993).
A surprising takeaway is the central role played by learning-by-doing in explaining observed wage dynamics. For example Davis and von Wachter (2011) was the first to evaluate the cost of job loss using an equilibrium search model. But because that paper did not allow learningby-doing, it could not explain the large measured earnings losses that follow layoff. Similarly Hornstein et al. (2011) argue the equilibrium search framework does not seem consistent with the empirical M m ratio, but that paper also does not consider learning-by-doing in conjunction with optimal contracts. Because unemployed workers wish to purchase "learning-by-doing" as an investment into higher future wages, equilibrium here finds the unemployed will indeed accept low starting wages consistent with the M m ratio. There is also a very large literature on optimal unemployment insurance, though very little considers learning-by-doing. For example recent work in the Shavell and Weiss (1979) unemployment insurance literature proposes an income tax increment when the laid-off worker finds work (e.g. Hopenhayn andNicolini, 1997, Shimer andWerning, 2008). 2 This proposal, however, does not take into account the laid-off worker's main loss is already reduced future wages. An integrated policy analysis is clearly an important direction for future research.
Our approach also identifies an important statistical issue for the Jacobson et al. (1993) framework. Although in this framework it is possible to difference out the worker fixed effect, we show the estimated cost of job loss depends on job turnover parameters. Furthermore the German data used here finds that school leavers with few qualifications face much higher layoff rates over their careers than do workers with higher level qualifications (e.g. Adda et al., 2013, Burdett et al., 2016. Conversely well qualified workers have higher outside job offer rates and so enjoy greater job ladder gains while employed (through promotion and job shopping). Because different types have different costs of being laid-off, it is thus necessary to disaggregate the data when estimating the cost of job loss. Here we find that just disaggregating into 3 different educational attainment groups already yields very good results.
Section 2 describes the model and Sections 3 and 4 characterise and establish the existence of 2 The underlying idea is the UI system additionally operates a loans program -the worker is given more generous UI while unemployed but only as a loan. The loan is repaid through a tax increment when the worker is re-employed. timeless equilibria. Section 5 describes the data and estimates the model using indirect inference. Section 6 then uses the Jacobson et al. (1993) methodology to estimate the cost of job loss both on the data and on the model simulated data. The results are remarkably well aligned and we use the model to provide a structural decomposition of those costs. Appendix A contains the longer proofs. Appendix B provides a full description of the data, simulations and the estimation procedures.

The Model
Time is continuous with an infinite horizon. There is a continuum of both firms and workers, each of measure one. All are infinitely lived and discount the future at rate r > 0. Firms are indexed by j ∈ [0, 1], are equally productive with a constant returns to scale technology. Workers are ex-ante heterogenous with general human capital k ∈ (0, ∞). A worker k generates revenue flow Ak > 0 while employed and home production flow bAk while unemployed where b ∈ [0, 1) implies a gain to trade exists. A > 0 is an aggregate productivity parameter which grows at exogenous rate γ A ≥ 0.
Learning-by-doing implies a worker's human capital grows at rate ρ ≥ 0 while employed.
While unemployed there is skill loss whereby the worker's human capital falls at rate φ ≥ 0.
Unemployed workers receive job offers at exogenous Poisson rate λ 0 > 0, on-the-job search implies employed workers receive outside offers at rate λ 1 > 0 and job search is random in that any job offer is considered a random draw from the set of all job offers in the market. There is no recall of rejected job offers.
So what is a job offer? We generalise Burdett and Coles (2003) by allowing each firm j ∈ [0, 1] to precommit at date zero to a company wage policy which pays wage w = w jt (τ, k, A) to each employee at any future date t ≥ 0 depending on the employee's tenure (or seniority) τ , human capital k and aggregate productivity A at that date. Thus given contact with a potential hire k 0 at date t ≥ 0, the company's wage policy implies a promised sequence of wages w jt (t − t , k 0 e ρ(t−t ) , A(t)) at future dates t > t where, should the worker remain employed at the firm by that date, the worker will have accumulated tenure τ = t − t , human capital k 0 e ρ(t−t ) with aggregate productivity A(t). Should an employee (τ, k, A) at firm j at date t receive a (random) outside job offer from firm j ∼ U [0, 1], the worker calculates the continuation value of remaining at current firm j on contract w jt (.) with current tenure τ , and compares it to the value of being employed at the outside firm j on contract w j t (.) but with zero tenure. No recall implies the worker quits if the latter contract yields greater value. Note this contracting approach rules out offer matching; e.g. Postel-Vinay and Robin (2002). The simplest justification is that outside job offers are not observed by the employer. We further suppose an equal treatment rule -that anti-discrimination legislation requires the firm's wage policy to offer the same wage to equally productive workers with the same seniority. Thus should an employee receive a preferred outside offer, the worker is let go and the firm hires replacement employees on the company wage contract.
There are exogenous job destruction shocks which imply employed workers are laid-off into unemployment at rate δ. There are also exogenous "godfather" shocks which occur at rate λ q . Should a godfather shock occur, the worker quits exogenously to a randomly generated outside offer. Bagger and Lentz (2019) motivate the godfather shock process by assuming laidoff employees must be given notice. Thus λ q /(δ + λ q ) might be considered the fraction of laid-off employees given notice who obtain an outside job offer before the notice expires. 3 Although job destruction shocks imply risk averse workers have a precautionary motive to save, for tractability we simplify by assuming consumption equals earnings at all points in time; i.e. there are no savings. We further assume constant relative risk aversion; i.e. u(w) = w 1−σ /(1 − σ) with σ > 0.
Equilibrium requires each company j's wage policy w jt (.) maximises expected discounted profit given the set of contracts posted by all other firms and the optimal quit strategies of workers. Different to Burdett and Coles (2003), the equilibrium set of optimal contracts depend on the joint distribution of employment across firms j, the tenures of employees within each firm j and their corresponding human capital which evolves endogenously over time. An added difficulty with all dynamic precommitment games is that when firms precommit to their optimal contracts at date zero, those choices depend on the initial aggregate state χ 0 which, in turn, generates complex and uninteresting non-stationary wage dynamics. For tractability we will follow Woodford (2003) and define "timeless" equilibria in which each firm j ∈ [0, 1] precommits to an optimal company wage policy w = w j (τ, k, A) which does not change with time (though individual wage payments vary over time as an employee accumulates greater tenure, experience and aggregate productivity). The timeless equilibrium essentially describes the stationary [ergodic] growth path of the economy.
Although the framework allows firms to offer general contracts of the form w j (τ, k, A), the following establishes the existence of a particular class of equilibria: those where contracts w j (.) = Ak θ j (τ ) are fully optimal, where θ j (τ ) describes the wage rate paid by firm j to an employee with tenure τ . That is not to say other contracting equilibria w j (.) do not exist. But this class of equilibria is particularly interesting for it yields a structural log-linear wage equation of the form: Specifically there are worker i and firm j fixed effects (worker i's initial human capital, firm j's starting [log] wage rate log θ j (0)), experience effects (x it is worker i s total work experience and Z it is worker i s time spent unemployed), as well as firm specific tenure effects. Furthermore the equilibrium is consistent with the AKM definition of exogenous mobility: that worker i's tenure τ it and place of employment j = J(i, t) at date t is sufficient information to predict worker (i, t)'s quit rate. The market equilibrium is thus consistent with the AKM approach. Because it is also consistent with the Jacobson et al. (1993), Sianesi (2004) literature, we can then use the structurally estimated model to decompose the cost of job loss into its constituent effects.
As discussed in Section 6, we will find the distribution of firm starting wages {log θ j (0)} j∈ [0,1] and firm specific tenure effects {log 1] determine the temporary wage losses employees 3 Bagger and Lentz (2019) additionally allow that a worker might receive more than one random outside offer during the notice period. We abstract from this possibility. face when laid-off into unemployment. The human capital dynamics instead determine the permanent losses.
Definition of Equilibrium: Equilibrium is a set of (timeless) contracts { θ j (τ )} τ ∈[0,∞) for each firm j ∈ [0, 1] such that: (i) contract w j (τ, k, A) = Ak θ j (τ ) maximises expected discounted lifetime profit for each firm j ∈ [0, 1], (i.e. no more general contract w j (τ, k, A) exists which increases firm profit), where (ii) all workers use optimal job search strategies given the market set of posted wage contracts w j (.) for all j ∈ [0, 1] and (iii) the joint distribution of employment, tenures, wages and human capital is consistent with optimal job search, the set of contracts posted and the ergodic limit of the economy. Burdett and Mortensen (1998) is an early example of this equilibrium concept. In that equilibrium each firm j posts a [timeless] fixed wage w j , workers use optimal job search strategies [given the set of posted wages] and the distribution of employment and wages is consistent with the ergodic limit of the economy [i.e. with steady state turnover]. Here instead we allow competition in general [timeless] contracts w j (.) and extend the definition of equilibrium to allow growth which, though exogenous here, might be endogenous in future applications. 4 We identify such equilibria using the following approach. The next section considers optimal worker behaviour given all firms post contracts consistent with equilibrium; i.e. each firm j ∈ [0, 1] posts a contract of the form w j = Ak θ j (τ ). Given the resulting worker turnover, we then identify the equilibrium set of contracts { θ j (.)} such that there is no deviating general contract w j (τ, k, A) which is profit increasing.

Worker Optimality
Suppose each firm j ∈ [0, 1] posts a company wage policy of the form w j (.) = Ak θ j (τ ). In the timeless equilibrium, let V = V (τ, k, A|θ) denote the employment value enjoyed by a worker with tenure τ , human capital k with aggregate productivity A on representative wage contract θ = θ j (·). Let V U (k, A) denote the value of being unemployed.
Because there is a gain to trade, it is never optimal for a firm to post a contract θ(.) which induces its employees to quit into unemployment. For any such contract θ(.), standard arguments imply V (.) is identified by the Bellman equation: In words, the flow value of being employed on contract θ equals the flow utility of the current wage paid plus the capital gains due to (i) the wage rate paid varying with tenure (picked up by the ∂V /∂τ term), (ii) the worker's productivity increases through learning-by-doing (at rate ρ), (iii) aggregate productivity increases (at rate γ A ), (iv) a layoff shock occurs at rate δ, (v) a randomly drawn outside offer θ j is received at rate λ 1 and (vi) an exogenous quit occurs at rate λ q (where the worker quits into unemployment if the offer θ j has too low value).
Similar arguments imply the value of being unemployed satisfies The restrictions to a CRRA utility function and definition of equilibrium imply the critical simplifying property: the value functions are separable in productivity Ak where with U (τ |θ) and U U as defined below. U (.) is central to the analysis for it is the same measure by which all workers value (or rank) any contract θ(.) and so determines equilibrium quit turnover.
In what follows we refer to U (τ |θ) as the value of contract θ (at tenure τ ) and U U as the value of unemployment.
Claim 1: Equilibrium and optimal job search for any worker k implies: (a) while unemployed, the worker accepts a contract offer θ j (.) if and only if its starting value (b) while employed with contract value U (τ |θ), the worker accepts an outside job offer θ j if and only if it offers greater contract value U (0| θ j ) > U (τ |θ). The worker quits into unemployment whenever value U (τ |θ) < U U .
Claim 1 yields an important corollary. Let G(U ) denote the fraction of employed workers who enjoy contract value no greater than U and let U , U denote its support. As Claim 1 implies equilibrium turnover is independent of k, it implies for any given type k, that the distribution of contract values across workers of type k is also G(.); i.e. the distribution of contract values across the entire population is independent of k.

Optimal Contracts in a Timeless Equilibrium
Consider any contract w j (τ, k, A) in a timeless equilibrium. Clearly with no loss of generality any such contract can be rewritten as w j (.) = Akθ j (τ, k, A). Consider now a representative hire, where k 0 denotes the worker's human capital when first hired and A 0 the aggregate productivity level at that date. As k = k 0 e ρτ and A = A 0 e γ A τ within the job spell, there is no loss in generality by further restricting attention to contracts of the form θ j = θ j (τ |k 0 , A 0 ). In other words, any contract w j (.) is equivalent to a wage rate paid θ j (τ |k 0 , A 0 ) which varies with tenure but firm j potentially discriminates across types (k 0 , A 0 ) when hired.
Consider then any such contract θ(τ ) = θ j (τ |k 0 , A 0 ). If the starting value of this contract U (0|θ) < U U , the offer is rejected (worker (k 0 , A 0 ) prefers being unemployed) and so this contract makes zero profit. Suppose instead it yields starting value U (0|θ) ≥ U U . If u denotes the steady state unemployment level then, given a random contact with a worker (k 0 , A 0 ), Bayes rule implies is the probability that the worker is either unemployed or an exogenous quitter. In either case, U (0|θ) ≥ U U implies the worker accepts the job offer. Instead with complementary probability 1 − α this worker is employed and Claim 1 implies G(.) describes the distribution of contract values earned by such workers. Hence α + (1 − α)G(U 0 ) with U 0 = U (0|θ) is the probability this contract offer is accepted.
Suppose the worker accepts the job offer and U (τ |θ) is the value of this contract at tenure τ . Because F (.) describes the distribution of starting contract values offered by all other firms, the probability this new hire remains employed by tenure τ is To determine the set of equilibrium optimal contracts, we first consider that contract which maximizes expected discounted profit conditional on hiring a new employee (k 0 , A 0 ) with starting subject to U (0|θ) = U 0 . As ψ(.) defined by (4) does not depend on (k 0 , A 0 ) then given starting value U 0 , the optimal profit maximising contract is independent of (k 0 , A 0 ) for the optimisation problem is simply multiplicative in A 0 k 0 . Let θ = θ * (τ |U 0 ) denote this optimal contract and define [maximised] contract profit Suppose now the firm is contacted by a potential employee (k 0 , A 0 ) and the firm offers the above optimal contract θ * (.) with starting value U 0 ≥ U U . Because α + (1 − α)G(U 0 ) is the probability this contract offer is accepted, the firm's expected profit by offering U 0 is then The firm thus chooses U 0 to maximise Ω(U 0 |A 0 , k 0 ). As the profit maximisation problem is again simply multiplicative in (k 0 , A 0 ) we have established Claim 2.
Claim 2: Equilibrium implies it is always optimal to offer contracts θ * (τ |.) which are independent of A 0 , k 0 .
Given there is no value to discriminate contract offers by (k 0 , A 0 ), it is consistent with optimality to only consider equilibrium in which each firm j offers the same contract θ j (.) to all potential hires (k 0 , A 0 ). We only consider such contracts from now on.
Theorem 1 now describes the optimal contract θ * (τ |U 0 ) for any U 0 ≥ U U ; i.e. it solves the dynamic optimisation problem: subject to U (0|θ) = U 0 , where ψ(.) is given by (4) and U (.) by (2). For ease of exposition we only consider contracts for which the constraint θ(.) ≥ 0 is never binding (we discuss this further below).
Proof: See the Appendix.
The structure of the optimal contract is similar to Burdett and Coles (2003). Differentiating (6) and (7) with respect to τ yields the system of differential equations for {θ, Π, U }: with · U given by (8).
Equation (9) describes how the wage rate paid changes optimally with tenure, where the corresponding wage path is w(τ ) = A 0 k 0 e (ρ+γ A )τ θ(τ ). (9) and some algebra now establishes: A quit at tenure τ implies the firm loses continuation profit A 0 k 0 e (ρ+γ A )τ Π(τ ). When Π(τ ) > 0, (11) implies the wage paid increases within the job spell, where F (U ) is the measure of firms whose outside offer will marginally attract this worker. If F (U ) = 0 then marginally raising the wage paid at tenure τ has no impact on the worker's quit rate and optimal consumption smoothing implies the firm pays a (locally) constant wage. If F (U ) > 0, however, a slightly higher wage results in a slightly lower marginal quit rate and it is optimal for the firm to increase the wage paid with tenure. The scaling term A 0 k 0 e (ρ+γ A )τ arises as the worker's value of employment at tenure τ is V (τ, .) = [A 0 k 0 e (ρ+γ A )τ ] 1−σ U (τ |θ) while the firm's continuation profit is A 0 k 0 e (ρ+γ A )τ Π(τ ). As workers compare contracts by value U (.), however, Theorem 1 describes the choice-relevant objects. Most importantly conditional on any U 0 ≥ U U , Theorem 1 describes the optimal contract for all worker types (A 0 , k 0 ), while (6) describes the solution to the differential equation (9) for θ(.).
Because w(τ ) = A 0 k 0 e (ρ+γ A )τ θ(τ ), a constant wage paid (perfect consumption smoothing within the job spell) implies θ(τ ) declines at rate ρ + γ A . Thus although an optimal contract implies wages paid always increase within a job spell, it is not the case that tenure effects are necessarily positive. Let (θ ∞ , Π ∞ , U ∞ ) denote the stationary point of this dynamical system.  Consider first the optimal contract for the firm offering the least generous contract in the market, i.e. one which yields starting value U 0 = U and suppose U < U ∞ . As depicted in Figure   1, the wage rate paid θ(.) and contract value U (.) both increase with tenure and U (.) converges to U ∞ from below. Let θ 1 (τ ) denote the optimal least generous contract in the market, which we refer to as the lower baseline scale. Let U 1 (.) denote the corresponding path of contract values. Consider instead the optimal contract for firms which offer the most generous contract in the market, U 0 = U , and suppose U > U ∞ . Although the wage paid increases within the job spell, θ(.) decreases with tenure. Contract value thus falls with tenure and so U (.) converges to U ∞ from above. Let θ 2 (τ ) denote the optimal most generous contract in the market, which we refer to as the upper baseline scale, and U 2 (.) the corresponding path of contract values.
As depicted in Figure 1b, define t 0 as the point on the lower baseline scale where U 1 (t 0 ) = U 0 .
Optimality of the lower baseline scale yields the critical simplification: the optimal contract θ * (.|U 0 ) is simply the continuation contract starting at point t 0 on the lower baseline scale; i.e. θ * (τ |U 0 ) = θ 1 (t 0 + τ ) where the wage rate paid at tenure τ corresponds to point (t 0 + τ ) on the lower baseline scale. Let Π 1 (t 0 ) denote the firm's corresponding contract profit.
Suppose instead U 0 ∈ (U ∞ , U ). This time the optimal contract yielding U 0 is the continuation contract starting at point t 0 on the upper baseline scale where U 2 (t 0 ) = U 0 and yields contract profit Π 2 (t 0 ). It is this baseline property of the optimal contract structure which makes tractable the characterisation of equilibrium.

Characterisation and Existence of Equilibrium
Given a starting value U 0 ∈ [U , U ], the previous section has shown that the optimal contract θ * (.|U 0 ) corresponds to a baseline contract θ i (t 0 + τ ) with i = 1, 2 and a starting point t 0 ≥ 0 where U i (t 0 ) = U 0 . If accepted by worker (A 0 , k 0 ), this contract then generates profit A 0 k 0 Π i (t 0 ).
All such contract offers then generate expected profit per worker contact. Because expected profit is simply proportional to k 0 A 0 , equilibrium reduces to solving the constant profit condition: To understand the approach below, note that standard recursive arguments (e.g. Spear and Srinivastan, 1987) suppose a contract "promises" continuation value U to an employee and then identifies θ = θ(U ) as the optimal wage rate paid at that point. The approach here instead identifies the inverse function: let U = U (θ) describe the contract value enjoyed by a worker when the optimal contract pays θ. The baseline property implies U (.) is given by where t 0 (.) is the inverse function of θ = θ i (t 0 ), with i = 1, 2. Let Π(θ) describe the firm's corresponding contract profit. Claim 3 reveals why this alternative approach is so useful.

Proof: See the Appendix.
Claim 3 is a powerful result: it provides the closed form solution for Π(θ). (8) then implies and we are almost done: equilibrium simply reduces to identifying the boundary condition for (13). To do this we transform the analysis from the time domain [how wage rates vary with tenure] to the domain of wage rates paid θ ∈ [θ, θ].
Let F θ (θ) denote the distribution of starting wage rates paid by firms. Because (12) and (13) imply U (.) is a strictly increasing function, the definition of F (.) implies: Let G θ (θ) denote the distribution of wage rates paid across employed workers and so G θ (θ) = G( U (θ)).
In the wage rates domain θ ∈ [θ, θ], the constant profit condition is now As an optimal contract implies the worker never quits into unemployment and strict positive profit implies all firms offer starting contracts which are preferred to being unemployed [otherwise the firm makes zero profit], steady state unemployment is given by u = δ/(δ + λ 0 ).
The first step is to solve for equilibrium Ω. Let U = U (θ) denote the highest contract value offered by firms. A simple contradiction argument establishes G(θ) = 1 and so substituting θ = θ in (14) finds Π(θ) = Ω. Substituting θ = θ and substituting out Π(θ) = Ω in (12) now determines The next step, Claim 4, shows that the upper baseline scale, though consistent with optimality, does not survive equilibrium. The final step is to tie down equilibrium θ. The argument used is the same as that identified in Burdett and Coles (2003). We first consider a candidate θ and Theorem 2 below constructs the corresponding equilibrium offer distribution F θ (.|θ) consistent with the equal profit condition. Theorem 2: Given θ, equilibrium implies Π(·), U (·), G θ (·), F θ (·) are given by: Proof: See the Appendix.
The conditions of Theorem 2 depend on θ which is the last equilibrium variable to be determined. Claim 5 identifies the relevant boundary condition.
To describe the equilibrium fixed point problem, we use the following notation. First fix a candidate equilibrium value for θ in the range Inspection establishes that any such candidate value implies strictly positive profit (Ω > 0) and θ > 0 (strictly positive wage rates). Given this candidate choice of θ, let F θ (.|θ) denote the candidate distribution function F θ implied by the conditions of Theorem 2. Given the implied distribution of contract offers, the proof of Claim 6 now identifies the implied values of U (≡ U (θ)) and U U , which we denote U (θ), U U (θ), respectively.
Claim 6: Given θ and the implied candidate distribution function F θ then U (θ), U U (θ) are given by Proof: See the Appendix.
Proof: See the Appendix.
If F θ is a positive increasing function (i.e. has the properties of a distribution function), then Theorems 2 and 3 fully characterise equilibria. By construction, all optimal contracts which offer starting wage rate θ ∈ [θ, θ] yield the same expected profit Ω > 0. Consider then any deviating contract w j (.). Because any such contract w j (.) is equivalent to a wage rate paid θ j (τ |k 0 , A 0 ) then, by construction, any such contract θ j (.) which offers a starting value cannot yield greater profit. Further as U U = U (θ), any contract θ j which offers value U 0 < U (θ) yields zero profit as all workers reject such an offer. Finally any contract θ j which offers U 0 > U (θ) attracts no more workers than the optimal contract which offers U 0 = U (θ) while the deviating contract earns strictly less profit per hire. Thus no deviating contract exists which yield greater profit and so Theorems 2 and 3 describe equilibrium.
Theorem 2 describes all equilibrium objects aside from the [lower] baseline scale θ(.). Equations (9), (13) and (40) in the proof of Theorem 2 imply θ(.) is identified by the initial value problem: In Appendix B.1 we describe the algorithm to compute equilibrium. Note that the θ ≥ 0 constraint may bind if σ < 1 and b sufficiently small. For example, suppose λ 0 = λ 1 . Because experience is valuable, a worker will accept a lower starting wage rate θ ≤ b and thus θ ≥ 0 binds if b = 0. Whenever this occurs, the baseline scale pays a zero wage rate for tenures τ ≤ τ and a positive (increasing) wage rate thereafter. Because estimation finds σ > 1, however, this constraint never binds in the quantitative analysis.

Quantitative Analysis
We estimate our model using indirect inference (see Gourieroux

Data
Our main source of information is the Sample of Integrated Labour Market Biographies (SIAB), an administrative matched employer-employee dataset developed by the German government for tax purposes. It is a 2% random sample drawn from the Integrated Employment Biographies (IEB), which comprises the universe of individuals who are either (i) in jobs that are subject to social security, (ii) in marginal employment, (iii) in benefit receipt according to the German Social Code, (iv) officially registered as a job-seeker at the German Federal Employment Agency or (v) participating (plan to participate) in active labour market policies programmes.
Individuals are followed as from 1975 or since the worker entered the labour market, whichever is the later. The data provide daily information on employment status and information on the gross daily wage/benefit, education, gender, occupation, age of the individuals and geographical location of the place of work. 5 Importantly the data also provide the unique establishment identifier employing these individuals. This establishment identifier allows us to match the worker information to that of his/her establishment. Using this information we are able to reconstruct individuals' labour market histories as well as identify mass layoffs, which are needed for our estimates of the cost of job loss (see Section 6).
The establishment information is obtained from the Establishment History Panel (BHP), which collects information from all German establishments with at least one employee contributing to social security since 1975. The BHP provides annual information about the number of employees working in the establishment, their 3-digit industry classification, the median gross daily wage of full-time employees and the location of establishment. For convenience we will use the terms establishment and firm interchangeably.
Using these data we restrict attention to all West German, male workers with a contributing job who entered the panel between the ages of 18 and 35. This implies that we exclude those workers who are reported as trainees, marginal part-time workers, employees in partial retirement, interns and student trainees, or in other employment status. 6 We divide the workers in our sample into three educational groups. workers with a university degree, either from a university of applied science (Fachhochschule), technical college (technische Hochschule) or a university. Table 1 presents the size of the data along several dimensions for these three categories. 5 The gross daily wage in the SIAB is constructed by dividing total gross earnings by the number of days employed in that job. If a worker did not leave the employer during a given year, the average gross daily wage in a job is computed annually. If the worker changed employers during the year, the gross average daily wage is computed for each employer using the time spend with the employer during that year. 6 Note that we do not consider civil servants or the self-employed as they are not covered in our data. We also exclude those individuals in the armed and police forces as well as members of parliament. These data allow us to estimate workers' average unemployment, job and employment durations, their average wage-experience profiles and measures of wage dispersion. 7 The SIAB, however, is not suitable to estimate firm specific effects as many establishments have only one worker and these effects might not be properly identify. Instead, we obtain estimates of the firm specific wage rate and its correlation with firm specific returns to tenure from Carrillo-Tudela et al. (2019), who use the full IEB data containing information on the universe of German workers and their establishments. 8 For these exercises, we deflate the wage information using the CPI.
An important issue is that the wage data are top-coded, meaning that the data do not report the wage paid should it exceed a certain level. Although only 1.3% of the low educated and 6.0% of the medium educated group are top-coded, unfortunately 40% of the higher educated group are top coded. We impute the missing wages using the methodology of Buetnerr and Raessler (2008) but note that 40% of worker wages in the latter group are subject to imputation error. 9 We use the German Socio-Economic Panel (GSOEP) to complement the information derived from the SIAB and IEB data. The main advantage of the GSOEP for our study is that it provides information about the nature of an employer-to-employer transition. We label a "voluntary" transition as one in which the worker reported "own resignation", "mutual agreement separation", or "leave of absence" when leaving his job to take another job with a different employer. An "involuntary" transition is one in which the worker reported he changed employers due to "company shut down", "dismissal", or "temporary contract expired". In contrast to the SIAB or IEB, the GSOEP data is a household panel survey and hence is much smaller. The GSOEP started in 1984 and is updated on an annual basis. 10 Appendix B.4 provides detailed 7 We consider a job spell as the time spent with a given employer and an employment spell as the time spent between two consecutive unemployment spells, where an unemployment spell takes into account both registered unemployment and non-participation periods. We follow this strategy as a large proportion of male workers who lost their jobs did not registered as unemployed or if they did register they stopped registering soon afterwards before re-entering employment. Carrillo-Tudela et al. (2018) provides a statistical analysis of this feature during the 1999-2014 period. This implies that to capture the consequences of job loss we need to consider both registered and non-registered unemployment spells. See also Schmieder et al. (2016) for a similar practice. Throughout the analysis we also distinguish between potential and actual labour market experience. Potential experience is defined as the sum of the overall time spent in employment and unemployment; while actual experience is the sum of the overall time spent in employment. we provide an account of their estimation procedure and how we use their estimates in our analysis. 9 See also Card et al. (2013) for alternative imputation procedures. We find that when regressing the Mincer wage equations, described below, wage imputations do not seem to make much of a difference on the average returns to experience and tenure relative to using top-coded wages. 10 Further information about the SIAB and GSOEP data can be found in Antoni et al. (2016) and in information on the data construction.

Estimation Procedure
We adopt a month as the reference unit of time and set r = 0.005 (an annual discount rate of 6%). We set γ A = 0.0022 to match the estimated slope of a linear trend on output per hour in Germany over the relevant time period. 11 This leaves a vector Λ = {δ, λ 0 , λ 1 , λ q , ρ, φ, σ, b} of 8 parameters that we recover by minimizing the sum of squared differences between a set of simulated moments from the model and their counterparts in the data, using the variancecovariance matrix of the empirical moments as a weighting matrix (see Appendix B for full details).
We target 12 statistics based on the main characteristics of the labour market to which the model is directly related. Table 2  Equation (11) implies wages evolve within an employment spell according to Wages increase within the job spell for the worker is becoming more productive through learningby-doing (k = k 0 e ρτ ), outside wage competition induces the firm to raise wages paid as aggregate productivity increases (A = A 0 e γ A τ ) and equilibrium tenure effects are strictly positive. In an optimal contract, the magnitude of these contract effects depend on the degree of risk aversion σ.
For example, σ → ∞ implies each firm j optimally commits to a constant wage rate θ j , analogous to the case considered in Burdett and Mortensen (1998) and there are no tenure effects. At the other end of the spectrum, risk neutral workers with σ = 0 instead imply tenure effects can be infinitely large; e.g. step wage contracts as in Stevens (2004) and Carrillo-Tudela (2009). Given the optimal contract trades-off consumption smoothing against improved quit incentives, greater risk aversion implies flatter wage profiles and smaller wage tenure effects. The inference process below identifies the parameterisation which best explains the observed experience and tenure effects in the data, noting that firm wages are disperse, that tenure effects are firm specific, quit turnover is endogenous and experience effects arise from two sources: (i) general human capital accumulation through learning-by-doing and (ii) workers' job-to-job transitions. 12 http://www.diw.de/en/diw02.c.222857.en/documents.htm, respectively. 11 We estimate the slope of the linear trend through OLS, by regressing the log of yearly output per hour on a linear trend for the period 1991 to 2014. We start in 1991 to avoid the discrete change in the series introduced by the German re-unification. Similar results are obtained when using output per head of household as an alternative measure of labour productivity. Output per hour and output per head of household are directly obtained from the OECD website. 12 Burdett et al. (2016) demonstrates that workers' job shopping behaviour on its own is able to generate a positive and concave wage-experience profile as workers move to better paying jobs over time (see also Burdett, 1978). Further, the same arguments that motivate the literature that tries to estimate unbiased returns to tenure (Altonji andShakotko, 1987, andTopel, 1991, among others) also imply that the returns to labour market experience could be biased if workers' experience in the labour market is correlated with an unobserved match-Because the framework is consistent with the AKM approach, Abowd et al. (1999), we use the AKM regression equation as an auxiliary equation to measure the degree of wage dispersion across firms. Specifically the estimated AKM equation is where α i is the worker i fixed effect and there are two firm fixed effects -firm j's starting wage rate γ 0 j and its firm specific return to tenure γ 1 j -where j = J(i, t) describes worker i's place of employment at date t. X it denotes a vector of covariates composed of a polynomial in potential experience and a time trend and u it is the wage residual which is assumed white noise. From this regression we take two target moments: (i) the estimated variance of the firm fixed effects γ 0 j and (ii) the correlation of the estimated γ 0 j with the firm specific tenure effects γ 1 j . These targets ensure the estimated model not only generates wage dispersion across firms consistent with the measured AKM variance in firm starting wages, but also with firm specific tenure effects which vary systematically across firms. The estimated worker fixed effects, however, do not provide useful targeting information. The model implies the distribution of worker productivities is (asymptotically) log normal and uncorrelated with the firm fixed effects. Although we might fit an underlying log normal human capital distribution to the mean and variance of the AKM estimated worker fixed effects, this yields no further useful information. To infer the rate of skill loss while unemployed, φ, we follow Ortego-Marti (2016) and Jarosh (2015) who use the auxiliary equation which relates a worker's re-employment (log) wage, w r it , on last unemployment duration U dur last it , a worker fixed effect, α i , and year dummies, d t . The estimated coefficientβ 0 thus provides a measure of skill loss φ while unemployed and so is used as a target moment. In this version, however, we instead target the coefficient of variation (on residual wages) which is a more robust statistic. The model is over-identified with 12 targets and 8 parameter values. Table 2 shows the fit of the model is very good for each education group. Because the fit is qualitatively identical across these groups, we first focus the discussion on the low educated.

Model Fit
The estimated turnover parameters {δ, λ 0 , λ 1 , λ q }, described in Table 3  A major success of this exercise is that the model captures the AKM correlation between the firm fixed effect and the firm-specific tenure effect, which not only establishes that high wage firms do indeed raise wages more slowly with tenure, but also captures the magnitude of this effect. The model also reproduces the variance of the firm fixed effect very well. Note that in the model firm heterogeneity arises from differences in the starting values of their offered contracts and not from differences in ex-ante firm-specific productivity. Nevertheless, the estimation shows that this source of heterogeneity is sufficient to capture the variation in firm fixed effects estimated through the AKM approach.
The model generates appropriate linear experience effects, but it does not generate sufficient curvature as measured by the quadratic experience term in the Mincer wage regression. Given the rate of learning-by-doing is assumed constant, the curvature generated by the model reflects the original Burdett (1978) job-ladder insight: that as employed workers accumulate experience they also climb to higher wage points on the job ladder which then causes a positive and decreasing correlation between wages earned and experience (see footnote 12 for further discussion). To mitigate for possibly declining rates of learning-by-doing, we have restricted the data sample to the relatively young (entrants aged between 18 and 35) though also see Section 6.3 for further discussion.
The model reproduces very well the estimated linear and quadratic tenure effects in the Mincer wage regression. It is interesting that the low educated group have high [estimated] marginal tenure effects which are only slightly smaller than the marginal return to experience.
But because the average job spell for these workers is just 20 months, such large marginal tenure effects do not yield large overall tenure effects and so, not surprisingly, we will find that the estimated job ladder effects are small for this group. Nevertheless small overall tenure effects on wages is not evidence that marginal tenure effects are unimportant. Indeed the key feature of the frictional labour market is the existence of a job ladder, which workers climb either through internal promotion [tenure effects] or through on-the-job search.
The model also captures very well the negative relation between unemployment duration and re-employment wages and the coefficient of variation of frictional wage dispersion. Though not fitted to the M m ratio, we find that for low educated workers the model and the data generate  We now broaden the discussion to compare economic outcomes across the education groups. Table 3 describes the estimated parameters for each of these groups. Reflecting that those with low education have much lower average employment spells, it is important for what follows to note the inferred layoff rate δ in the less educated sector is around three times higher than those in the more educated sectors. The inferred job offer arrival rate while unemployed is also low for the less educated, though Table 3 suggests job offer arrival rates while employed are broadly the same.
The structural estimates identify high learning-by-doing rates: ρ is equal to 4.8% per annum for those with low education, 4.0% for those with medium education and 4.9% for those with higher education. These estimated returns are appreciably higher than those suggested by the Mincer wage regressions. This reveals an important source of bias in those Mincer wage regressions: they omit skill loss while unemployed, where estimated human capital loss rates φ are 1.2% per annum for the less educated, 1.4% for those with medium education and 1.8% for those with higher education. Because actual experience is correlated with age, and so positively correlated with time unemployed, omitting skill loss while unemployed biasses downwards the Mincer estimates on the return to experience. This bias is clearly highest for the low educated who are particularly liable to long spells of unemployment, and smallest for university graduates. The estimated relative risk aversion parameters are plausible, though their values are higher than the standard ones used in the macro literature. 14 An important role of σ is to make within-firm wage variation consistent with the data. For example with risk neutral workers, the sequential auctions framework (e.g. Postel-Vinay and Robin, 2002) would imply unemployed workers in the low education group accept negative starting wage rates θ = −1.03. This low starting wage reflects that at rate λ 1 the worker receives an outside offer and subsequently earns θ = 1 (until layoff), and that unemployed workers are willing to "buy" valuable experience. 15 The sequential auction literature reduces such large within-firm wage variation by assuming 14 Gandelman and Hernandez-Murillo (2015) survey some of the literature and find that estimates of relative risk aversion vary widely, going from from 0.2 to 10 and above. 15 Specifically because unemployed workers obtain no surplus from a job offer and skills decline at rate φ while unemployed. Conversely the matching offers game with equally productive firms implies an employed worker with an outside offer thereafter enjoys θ = 1 and hence value An unemployed worker is hired on initial rate θ0 until an outside offer or a job destruction shock occurs, and so: The ratio λ 1 /δ measures the rate at which workers receive outside job offers relative to the rate at which they become unemployed. This is estimated at 1.63 and 1.89 for those workers in the medium and high education groups, but it is just 0.74 for those workers in the low education group. Because the latter are more likely to be laid off into unemployment than receive an outside offer, the efficiency gain to backloading wages is much reduced. At first sight it is then surprising that the Mincer wage regression, both on the actual and simulated data, suggest that low educated workers enjoy the highest returns to tenure [see Table 2]. But there is a [i] job ladder losses: the laid-off must seek re-employment at a new firm; As no worker surplus implies V 0 (k0, A0) = V U (k0, A0), some algebra establishes: .
[ii] skill losses: there is foregone human capital accumulation as well as skill loss while unemployed; [iii] the employment gap effect: it takes time for the laid-off worker to find suitable employment.
The Jacobson et al. (1993) approach selects a given year y and considers those who were displaced into unemployment in that year. Let Ψ it denote the variable of interest: either measured earnings or measured log wages of agent i in calendar year t . Losses due to job separation in year y are estimated using the diff-in-diff specification: where α i is the worker fixed effect, d t are year dummies, X it is a cubic on worker i s potential experience and the D it are a set of dummy variables which take value 1 if worker i was displaced in year y and t − y = t, and is zero otherwise. The estimated parameters ε t thus describe the displaced worker's average loss of earnings [or log wages depending on case] t years following displacement relative to a control group of those who were not displaced in year y. The error term u it is assumed white noise.
To minimise selection effects -that employers might choose which workers to lay off -the standard approach is to focus on mass layoff events. For consistency we also adopt this approach though robustness checks find that instead considering all separations does not much affect the results. Estimating the ε t for t < 0 provides a simple check for selection effects and trending differences in the Ψ it between those who are laid-off (at future date y > t ) and those not laid-off. A crucial property of the theoretical model is that the change in worker (log) wages is independent of the worker fixed effect. This is precisely that required for identification in the reduced form approach: given two identical workers where one is initially employed the other unemployed, the (percentage) difference in expected future earnings (or wages) is independent of the [unobserved] fixed effect. Our approach, however, also identifies an important caveat.
Although it is possible to difference out the worker (productivity) fixed effect, the estimated cost of job loss still depends on underlying turnover parameters. Because Table 3 shows workers in different education groups face very different job turnover parameters, it is important to disaggregate the data by education choice.
The following first estimates the cost of job loss in terms of lost earnings, and then in reduced log wages. There is, however, one more caveat. For this data set, earnings information on the highly educated group is missing for 40% of the sample due to top-coding. Because the imputation method generates additional measurement error, the estimates of (26) have large standard errors for the high education group. 17 The following focusses on the results for the low and medium education groups and we refer the reader to the Appendix for the full set of results.

Reduced Form Cost of Job Loss I : Earnings losses
Because earnings in some periods are zero, the statistical literature considers (26)  on model-simulated data use the same estimation procedure but instead with simulated data generated by the parameters values described in Table 3. denoted ε E t , both for the SIAB/BHP data and for the simulated data. Figure 2 shows, for both the model and the data, there are large drops in earnings immediately following layoff: initial expected losses are of the order of 40% for both education groups. This estimate would seem a little on the low side, however, for a newly laid-off worker necessarily suffers a 100% earnings reduction. This statistic reflects the presence of a temporal aggregation bias -that earnings are measured as total earnings over the accounting year. Notice then that the worker laid-off on January 1st loses a maximum of 100% earnings over the year, while the worker laid off on December 31st loses none. Hence the average measured earnings loss ε E 0 through layoff is never more than 50%. Indeed an estimate of ε E 0 close to 0.5 for the 17 The sample is also very small: there are only 138 instances in which higher education workers were actually laid-off as part of a mass layoff event. 18 Note that the estimation of δ in Section 5.2 relied on all separations into unemployment, while the reduced form approach relies on mass layoffs which occur at a lower frequency. This generates a potential tension between the way the model is estimated (Section 5.2) and the analysis in this section. In Appendix B.3, however, we show that this tension is small as estimating equation (26) 2019) for a discussion about the validity of using mass layoff as a way to identify a random displacement event due to firm financial distress to minimize worker selection effects when estimating equation (26). 19 More precisely to be considered a mass layoff event in year y, the employer must meet the following criteria: (i) 50 or more employees in y − 2; (ii) employment reduces by 30% to 99% from y − 2 to y; (iii) employment in y − 2 is no more than 130% of employment in y − 3; (iv) employment in y + 1 is less than 90% of employment in y − 2.
20 These tenure restrictions do not play an important role in the results presented below. In particular, restricting to at least 12, 24 or 36 months of tenure in the establishment prior to a mass layoff event leads to similar post displacement patterns for each of the education groups. Further, considering only workers with full-time employment spells or pooling full-time and part-time employment spells also has a small effect in our results. The latter probably occurs because part-time spells represent a very small proportion of all spells for male workers in our sample. Following the literature, here we present the results based on workers with full-time employment spells.

Reduced Form Cost of Job Loss II: [log] wage losses
We use the same methodology but now specify (26) in log wages. Table 6 in Appendix B.3 describes the parameter estimates ε w t (reported in percentage terms) and standard errors. Figure  3 graphs these estimates, along with a 95% confidence interval, for the low and medium educated groups.
The results using the actual data exhibit two main features: (i) workers have large and persistent displacement wage losses and (ii) those losses are bounded away from zero as t becomes large. Medium educated workers suffer a larger wage loss immediately following displacement (point estimate ε w 1 = −11.7%) compared to low educated workers ( ε w 1 = −8.1%). Though not quite such a good fit as obtained in Figure 2, the model still captures very well the extent of the wage losses at t = 0 and their persistence following job displacement. Figure 3, however, seems to suggest that log wages losses are overestimated at intermediate t, particularly for the low education group (though estimated standard errors are also large). A plausible explanation is learning-by-doing rates decline with experience in which case estimated foregone learning while unemployed is overstated.
Prior to displacement t < 0 and for the actual data, Figure 3 demonstrates a clear positive  pre-displacement trend arises because of the assumed "treated group": they are those laid-off as part of a mass layoff event with at least 3 years tenure. This group is thus ex-ante lucky: they previously enjoyed 3 solid years of learning-by-doing. In contrast some workers in the control group were previously unemployed with skills loss. The difference in average skills accumulation between these two groups (prior to the date of displacement) then generates a positive trend in the pre-displacement estimates. Because the results using model-simulated data are so closely aligned to those obtained on the actual data, we now use the structurally estimated model to decompose the cost of job loss.

The Cost of Job Loss: A Structural Decomposition
Consider two representative workers where at date t = 0, each initially has the same human capital k = 1 but "control" is employed while "treated" is unemployed. We suppose control has wage rate θ C ∼ G θ consistent with the ergodic distribution of wage rates paid.
The model describes a Markov process for how employment, human capital and wage rates subsequently evolve over time. Let p C t and p t denote the workers' respective probabilities of being employed at date t ≥ 0. Conditional on being employed at date t, let θ c t describe the wage rate earned by control and k c t denote control's human capital. Both, of course, are random variables. Conditional on being employed at date t, let θ t and k t describe the earned wage rate and human capital of the laid-off worker.
We first decompose ε w t . Conditional on being employed, the expected log wage gap is Because the turnover processes are independent, ε w t can be re-expressed as We denote the first term which describes the laid-off worker's expected wage loss due to job ladder effects. This term not only reflects that the laid-off worker must re-climb the job ladder, but also that control remains employed at t = 0.
We denote the second term which describes the expected wage loss due to differential skill accumulation rates while employed and unemployed. Hence we have the decomposition By providing an analytic solution for ε SL t , Proposition 2 will allow us to decomposed estimated losses ε w t into skill loss and job ladder losses. 21 Proposition 2. Equilibrium implies ε SL t = µ c t − µ t with: Proof: See the Appendix.
We can then decompose ε E t , the estimated cost of job loss in earnings, by defining 21 Given the temporal aggregation issue described above, the skill loss estimate described in Figure 4 below is the average value of over the accounting year. For example ε SL 0 is not zero, rather it is the average value of ε SL t over t ∈ [0, 1].
which is the gap between the two different estimates of the cost of job loss. For reasons that will become clear, we refer to ε U t as the employment gap effect [that the laid-off worker is more likely to be unemployed (with zero earnings)]. Hence we obtain the decomposition 2. Skill Loss ε SL t given by Proposition 2; 3. Job Ladder: To be theory consistent we decompose the estimates ε E t obtained using model-simulated data. Figure 4 graphs the resulting decomposition by year following layoff.   Table 4 below describes the model predicted permanent human capital loss lim t→∞ ε SL t . Figure 2 shows Φ P is fully consistent with the permanent earnings losses lim t→∞ ε E t as estimated on the actual data. Insight 2: the estimated cost of job loss in log wages at t = 0 describes the immediate drop in log wages due to falling off the job ladder. This estimated loss is consistent with Φ T in Table 4 below which describes the model predicted loss.
Insight 3: the unusual recovery path of estimated earnings losses reflects that the employment gap effect and job ladder losses decay at different rates.

Job Ladder Effects ε JL t
Recall that If at t = 0 the laid-off worker is immediately re-employed, then the wage rate earned θ ∼ F θ .
Hence for t = 0, the model implies For the medium educated workers, the model implies an initial job ladder wage loss of 10% while the low educated group have a 6.2% loss. This differential arises because workers in the low educated group are more likely to be laid off than receive an outside job offer (i.e. their δ > λ 1 ) and so their job ladder effects are relatively small. Instead medium educated workers are more likely to receive outside offers then be laid-off (i.e. λ 1 > δ) and so their job ladder effects are larger. Using the actual data, Figure 3 demonstrates that estimated [short run] wage losses ε w t are indeed consistent with the model-implied Φ T ; i.e. early wage losses are consistent with job ladder effects. Now consider the limit as t → ∞. Conditional on being employed, the distribution of the laid off worker's wage rate gradually converges to the ergodic distribution G θ . Hence lim t→∞ ε JL t = 0 as confirmed by Figure 4. This figure also shows the job ladder loss decays more slowly than does the employment gap loss ε U t . We return to this feature below.
Skill Loss ε SL t Proposition 2 provides the analytic solution for Note that ε SL t = 0 at t = 0 for there is no skill loss should the laid-off worker be immediately rehired (as in the godfather shock). Conversely as t → ∞, Proposition 2 implies ε SL t → Φ P > 0 where Hence as time since displacement becomes large, expected skill loss not only depends on foregone skill accumulation rates ρ + φ but also on job turnover rates. Not surprisingly if finding work is fast (λ 0 large), foregone skill accumulation through becoming unemployed is small. But the measured loss also depends on job loss rates, for δ high implies the control worker is also likely to become unemployed in the near future. 23 Using the parameter values in Table 3, Table 4 below describes Φ P [the permanent expected skills loss as t becomes large] for each education group. Despite having quite different turnover processes, the model implies Φ P , the expected [long run] fall in skills following displacement, is 6.5% and 6.9% for the two groups. are decreasing rates of learning-by-doing. But an important insight here is that the skills loss term converges to Φ P very quickly [see Figure 4]. Like the employment gap effect discussed below, Proposition 2 and some algebra implies convergence occurs at rate (λ 0 + δ) which is fast: Table 3 imply workers in the low education group have slow re-employment rates λ 0 but high layoff rates δ which together yield a convergence rate of λ 0 + δ = 0.071 per month [i.e.
a half-life of just 10 months]. Surprisingly the medium educated group has almost the same convergence rate λ 0 + δ = 0.069, though for the opposite reason. Although learning-by-doing rates might decline with experience, fast convergence to Φ P implies (a relatively slow) decline in learning-by-doing has little effect on realised skills loss. Hence the model predicted Φ P is remarkably close to the [long-run] estimated earnings losses. rate λ 0 + δ. 24 The intuition is simply that the "treated" worker regains employment at rate λ 0 while "control" becomes unemployed at rate δ and so the probability gap p C t − p t decays at rate λ 0 + δ. And because the above establishes this decay rate is around 7% a month (see above), the employment gap effect declines very quickly. This implies the estimated profile of earnings losses converges quickly to the estimated profile of [log] wage losses.
But why are job ladder losses seemingly more persistent? An important difference is that the control worker has wage rate θ C consistent with the ergodic distribution. Conversely (11) implies that when the laid-off worker is re-employed, wage tenure effects are proportional to the job offer arrival rate λ 1 . Thus via on-the-job search and tenure effects, measured job ladder losses decay at rate λ 1 and parameter estimate λ 1 λ 0 + δ then implies job ladder losses are more persistent than the employment gap effect.
Our final issue is to measure the relative contribution of these three channels to discounted lifetime earnings losses following displacement. This requires taking into account there is positive earnings growth over time: that average total earnings grow at rate γ A + γ k where γ k = (1 − u)ρ − uφ is the average growth rate of human capital. The model estimated parameters imply annual growth rates γ k = 2.0%, 2.4% for the low and medium educated groups, respectively.
Taking trend growth into account, we measure the percentage loss in lifetime earnings [LLE] due to layoff as: Here the denominator measures discounted lifetime earnings with y 0 describing representative worker earnings at date zero, and using ε E t in the numerator then describes the (proportional) loss in discounted earnings in every period t ≥ 0. Because earnings losses at year 15 are close to the permanent wage losses Φ P described in Table 4, we set ε E t = Φ P for t ≥ 16. Using model-based estimates of ε E t , the expected loss in lifetime earnings due to layoff is 9.3% for the low educated and 7.9% for the medium educated. These are large losses.
We decompose this loss LLE into its constituent parts LLE = LLE SL + LLE JL + LLE U by defining with k = SL, JL, U using the model estimated ε k t for t ≤ 15 and, for consistency, setting ε SL t = Φ P and ε JL t = ε U t = 0 for t > 15. Doing this yields the following decomposition of lifetime earnings losses by education groups Table 5 reveals skill loss is by far the most important effect: it contributes over 70% of those lifetime earnings losses. For the low education category, losses due to job ladder effects are very small and the employment gap effect is correspondingly large. Instead for the medium educated, the job ladder effect is roughly of the same magnitude as the employment gap effect. For these workers falling off the job ladder implies a 0.8% fall in lifetime earnings, which is the expected 24 Specifically

Conclusion
This paper has generalised the equilibrium framework of Burdett and Coles (2003) to the case of learning-by-doing while employed, human capital loss while unemployed and to a "timeless equilibrium" which allows for growth. Structural estimation finds the model explains well not only the variation in firm fixed wage effects and firm wage tenure effects as estimated using the AKM methodology, but also standard Mincer-estimated returns to experience and tenure.
The validation exercise finds that when the data is disaggregated by education type and using the control group advocated in Sianesi (2004), the structurally estimated model replicates very well the estimated cost of job loss. Results find that the estimated cost of job loss is very large A strong assumption of the model is no selection effects into layoff. This is not only necessary to make the equilibrium analysis tractable, it is also necessary for the validity of the Jacobson et al. (1993) approach. The standard selection argument is that some workers may be more likely to be laid-off than others due to firm choice. Our approach instead presumes layoff rates are type-specific, that some types of workers are more likely to be laid-off than others. For example a manufacturing firm's workforce might be mainly composed of (low educated) assembly line workers, and shocks to the manufacturing sector then cause such low educated types to be over-represented in any mass layoff event. Instead of a selection issue, this instead describes a composition problem which can only be resolved through disaggregation of the data. Even though the data here is just disaggregated into 3 educational groups, we find the results are already very good in the sense that the type-specific estimated cost of job loss is closely aligned to the theoretical predictions of the structurally estimated model. Of course future research might consider an even finer disaggregation.
That is not to say selection effects are unimportant. Indeed an interesting research question is rather than assume a firm implements a random layoff rule, what instead might be an optimal selection rule? An obvious candidate is a last in/first out seniority rule; i.e. the most recently hired worker is the first to be laid-off. Such a rule is not only transparent (which avoids claims of unfair dismissal) it also backloads job security with tenure and so has valuable incentive properties. Recently Pinheiro and Visschers (2015), Jarosch (2015) find laid-off workers are more likely to be laid-off again once re-employed. That approach explains such outcomes as caused by firm heterogeneity -that firms at the bottom of the job ladder offer less secure employment.
But it could simply reflect seniority protocols, that new hires are the first in line to be laid-off.
A different selection rule might instead find firms asking older workers, particularly those close to retirement, to take (compensated) voluntary redundancy. Such selection based on life cycle issues, however, cannot be considered in the ageless framework analysed here.
A different way to generate a richer layoff structure is to introduce match heterogeneity of the form F (.) = Akε where ε is a match specific component which follows an exogenous geometric process. If ε = 1 for all new hires and subsequently grows at a constant rate γ F , then γ F would describe the growth rate of firm specific human capital. The only material difference this makes to the analysis is that firm-worker profit Π(.) in (11) additionally grows at rate γ F and so wages would then increase more quickly with tenure. However to keep tenure effects consistent with the Mincer wage equations, estimation would then have to increase worker risk aversion σ to generate a flatter wage tenure profile. Without additional wage information, distinguishing between tenure effects due to growth in firm specific human capital and due to the backloading of wages is problematic. In this paper we have assumed no firm specific human capital.
If instead ε is an idiosyncratic match draw and assuming ε is contractible, (11) would still describe how wages evolve within the match, but a bad match would not only imply low wages today but also low, and even negative, wage growth. Quit turnover would then depend on an employee's [idiosyncratic] wage and promotion prospects. Layoff instead occurs should match value ε fall below some reservation value ε R (where match surplus is zero). Introducing idiosyncratic match draws then generates a rich and complex relationship between layoff rates and worker employment histories.

A Proofs
Proof of Theorem 1: Substituting out ψ in the objective functions gives the dynamic optimisation problem: subject to starting value U (0|θ) = U 0 where (2) describes how U (.) evolves with tenure. Define transformed variable and note it satisfies the differential equation The dynamic optimisation problem is equivalently rewritten as where ψ 0 , U are state variables which evolve according to the autonomous, first order differential equations (28) and (2) respectively with initial values ψ 0 = 1, U = U 0 at τ = 0. We can solve this dynamic optimisation problem using the Hamiltonian approach. Define where ξ ψ 0 , ξ U are the respective costate variables. The Maximum principle yields the following necessary conditions for optimality: along with autonomous differential equations (28), (2) for · ψ 0 and · U . As we do not wish to assume F is differentiable, however, we drop condition (32) and instead note that as the objective function in (29) does not depend explicitly on tenure, optimality also implies (e.g. p298, Leonard and Long, 1992). Now integrating (31) forward yields: where B 0 is the constant of integration and Π(.) is the firm's continuation profit as defined in Theorem 1. (30) implies ξ U = −ψ 0 θ σ . Substituting out ξ U and ξ ψ 0 in the definition of H, the restriction H = 0 yields the optimality condition: Now the restriction r + δ − ρ − γ A > 0 ensures the exponential term becomes arbitrarily large as τ → ∞. As Π and U must be bounded, then (34) implies B 0 = 0. (34) now yields (6) given in the Theorem. Using this to substitute out θ 1−σ 1−σ in (2) then yields (8). This completes the proof of Theorem 1. (12) follows by solving the constant profit condition. To do so, note that standard turnover arguments imply G(U ) satisfies

Proof of Claim 3: Equation
where the left hand side describes the flow of workers into employment with wage rate value more than U while the right hand side describes the flow out through job separation. As (8) and (10) together imply rearranging the previous expression yields where Π = Π i (t 0 ) and θ = θ i (t 0 ).
While dF (U ) > 0, differentiating the constant profit condition implies: and using (35) to substitute out Π (U ) and G (U ) gives Inspection finds the F -terms all cancel out and so: But the constant profit condition also implies Using this to substitute out G in (37) and substituting out u = δ/(δ+λ 0 ) yields the quadratic equation As dF (U ) > 0 implies the firm must make positive profit Π > 0, the positive root to this quadratic equation yields the result. This completes the proof of Claim 3.
To determine the equilibrium distribution of offers F θ , standard turnover arguments imply G θ must satisfy where the left hand side describes the flow of workers into employment with wage rate more than θ while the right hand side describes the flow out through job separation. Now (9), (13) and Using this solution for · θ and G θ , Π described in the Theorem, a lot of algebra finds the turnover equation for G implies the following first order differential equation for F : where Integration now yields the stated solution for F θ while using (12) it is easy to show that Ψ(θ) > 0 for all θ ∈ [θ, θ]. This completes the proof of Theorem 2.
Proof of Claim 6: Integration by parts finds Putting θ = θ in (17) implies Putting θ = θ in (6), noting U U = U in a timeless equilibrium (Claim 5) implies: Using (41) and (42) to substitute out U implies: Equation (3) with U U = U [Claim 5], (41) and substituting out U using (42) implies As a timeless equilibrium requires U = U U , we obtain the equilibrium condition stated with F = F . This completes the proof of Claim 6.
Proof of Proposition 1. Let t denote the first date when either worker transits employment state; i.e. either the employed worker [control] becomes unemployed, or the laid-off worker finds employment. The [independent] Poisson processes implies this occurs with probability density (λ 0 + δ)e −(λ 0 +δ)t .
At this date t , their difference in log k is (ρ + φ)t . Because both workers are in the same employment state at date t then, in the continuation t > t , both workers face the same d[log k] process and so, in expectation, there is no further change in the difference in their [log] human capitals. Hence the expected difference in log k, at date t, is where (ρ + φ)t describes the expected difference in log k at date t for first transitions which occur at date t < t, while the second term describes the difference in log k if the first transition has not occurred by time t. Integration by parts yields and taking the limit t → ∞ yields Proposition 1.
and have initial conditions X U = X E = 0. Standard arguments now apply which yield Because measure p C t of workers are employed, then the mean value of log k across employed workers at date t is which yields µ C t described in Proposition 2. The same argument, but with p C t replaced by , determines µ t defined as the average value of log k across employed workers at date t who were instead unemployed at t = 0. This completes the proof of Proposition 2.

B.1 Simulation
To estimate the parameters of the model separately for each education group we use the indirect inference formulaΛ where M D is an 12 × 1 vector of data moments as described in Table 2 (and standard errors) using AKM estimates from the SIAB. This is the best we can do given that we cannot access directly the full BHP data set. Although we acknowledge that this is not ideal given the potential lack of identification of the firm fixed effects using the SIAB, these variance and covariances show that the AKM estimates are nevertheless very precisely estimated.
For a set of parameter values, computing the equilibrium implies picking a θ satisfying (21), then using Theorem 2 to compute F θ over [θ, θ] with θ given by (20). Then computing U (θ), U U (θ) as defined in Claim 6. The equilibrium value of θ is then determined by U (θ) = U U (θ). Using the corresponding value of θ we then solve the differential equation describing the evolution of θ to obtain the baseline scale.
Given these equilibrium outcomes, we simulate the employment histories of 15,000 workers.
We assume that all workers start unemployed and experience different types of shocks during their lifetime depending on the worker's employment status. In particular, every time a worker is unemployed, he receives a job offer at rate λ 0 . We obtain his unemployment duration by drawing a random number, r1 ∈ [0, 1] and then exploiting the fact that the inter-arrival time between events in a Poisson process follows an exponential distribution with parameter equal to the rate of the process. That is, the duration until the worker receives a job offer is determined by tu = −log(1 − r1)/λ 0 . After deriving tu, we sample a position in the baseline scale from the offer distribution F , by choosing a random number between 0 and 1 and interpolating between the sample value of F and the corresponding value of θ.
When the worker is employed, he faces three shocks: a "godfather" reallocation shock, a job offer shock and a job layoff shock. All these shocks follow Poisson process with rates, λ q , λ 1 and δ, respectively. What is important here is to track the duration of the job and the employment spells, where the latter is defined as the sum of job spells that start with the worker transiting from unemployment to employment and end with the worker becoming unemployed.
To obtain these durations we need to compute the durations until the worker receives a job offer tj, receives a displacement shock, tu, or receives a reallocation shock, tr. We do this by drawing three random number between 0 and 1 and using the inverse of the corresponding exponential distribution. The job duration until the worker experiences one of these three events is equal to the min{tj, tu, tr}. If the worker becomes unemployed, tu = min{tj, tu, tr}, we repeat the To compute the firm-specific wage rate and its correlation with the firm-specific tenure profile we estimate the AKM equation described in (24) using a 1 to 10 ratio between the number of firms (establishments) and workers. We have done several robustness exercises whereby we increased the number of firms to obtain a 1 to 5 ratio and decreased the number of firms to obtained a 1 to 50 ratio without observing any meaningful change in our results. We also use the simulated panel to compute the coefficient of variation and the M m ratio as measures of frictional wage dispersion. The latter two follow the same procedure as we use to compute these moments from the data.
After the the simulated moments are computed, we obtain the solution to the loss function, (46). If the value of the loss function is high enough, a new set of parameter values are chosen and the above procedure is repeated, iteratively until the value of the loss function is sufficiently closed to zero. For our minimisation algorithm we first use simulating annealing to perform a global search and then use a constrain minimisation procedure to perform a local search. Once the parameters that solve (46)

B.2 Data Moments
In this section we describe the procedure we follow to compute the data moments obtained from the auxiliary equations. In particular, to compute the firm-specific wage rate and its correlation with the firm-specific (linear) tenure profile as described  Table 2. As robustness we also used the estimates based the 1993-1999 and 1998-2004 periods on their own, without any meaningful change in the model's parameter estimates.
To compute the average returns to experience and tenure we estimate the following standard Mincer wage equation based on the SIAB data: where X is a vector of covariates consisting of a quadratic on actual experience, a quadratic on tenure and year dummies, and µ it denotes the error term. To compute the correlation between the last completed unemployment duration and re-employment wages, we estimate equation (25) as described in the text using a fixed effect estimator also using the SIAB data.
To compute the coefficient of variation and the M m ratio we follow Hornstein, et al. (2007) and first estimate the wage equation (47) for each year of the sample period and education group using OLS and the same covariates in X. We then eliminate unobserved worker heterogeneity from wages by using the individual residuals η it and their individual specific mean then captures the wage variation due to fixed unobserved individual factors. Finally, we use the estimated distribution of transformed wages, w it = exp( η it − η i ), across individuals and time to calculate the coefficient of variation and M m ratio for each education group. For each education group, we estimate a set of three M m ratios using the minimum observed wage and the wage at the first percentile. Given that the ratios using the minimum observed wages are implausibly high, we report in the main text the one based on the first percentile. Once again we use the SIAB data for this purpose.
Finally, as described in the main text, we also use the SIAB data to compute the average employment, unemployment and job spell durations, but use the GSEOP to compute the ratio of involuntary to voluntary employer-to-employer transitions.

B.3 Empirical Wage/Earnings Losses
Our analysis of wage/earnings losses focus on a sample of male, West German workers. Following Jacobson et al. (1993), Couch and Placzek (2010) and many others using administrative data, to reduce selection effects in our main analysis we only consider displaced workers who were part of a mass layoff event in year y. The identification of a mass layoff follows the Davis and von Wachter's (2011) criteria such that to qualify as a mass layoff event in year y, the employer must meet: (i) 50 or more employees in year y − 2; (ii) employment decreases by 30% to 99% from years y − 2 to y; (iii) employment in year y − 2 is no more than 130% of employment in year y − 3; (iv) employment in year y + 1 is less than 90% of employment in year y − 2. Due to data anonymization the number of employees in a given establishment is not exact, but given in 8 bands. We use the smallest value in a given band as a proxy for the number of employees, i.e. 1, 5, 10, 20, 50, 100, 200, or 500.
To further reduce selection effects, in our baseline results we consider workers with at least 3 years of tenure in the establishment prior to the mass layoff event in the establishment prior to the mass layoff event. However, we experimented with other tenure requirements. In particular, we considered at least 12 months or at least 24 months of tenure in the establishment prior to separation, obtaining very similar results across different specifications. We also experimented between using only full-time employment spells or pooling together full-time and part-time employment spells, but this makes little difference to our estimates due to the small proportion of part-time workers in our sample. We also impose that workers should be re-employment at most after 36 months after displacement. The window during which displacements are recorded is 1981-2005, with 3 pre-separation periods, and 15 post-separation periods. We find that there are 815 instances of mass layoffs among the low educated groups of workers, 2,975 instances among the medium educated group and 163 instances among the higher educated group. As standard in the literature, we use all workers who did not lose their jobs as our control group.
Following Davis and von Wachter (2011) we estimate equation (26) in the main text separately for different value of y, which implies that for each coefficient ε w t in (26) we obtain a distribution of its estimates from which we take the average. Alternatively we estimate equation (26) by pooling all the years. The results obtained using these two methods are very similar.
The advantage of the latter is that we can report the standard errors for each coefficient. Table   6 reports these latter results for the pooled sample (all workers) as well as for each of the three individual education groups when using (log) wages. For this exercise, annual wages are constructed by averaging the real daily wages reported during months in which the establishment identifier is present (months of employment) in a given year y. Table 7 instead reports the estimated coefficient when estimating (26) on earnings by individual education group. In this case, earnings are created by adding the zero wages during the months in which the establishment identifier was missing (months of unemployment) to the real daily wage when the establishment identifier is present (months of employment). Annual earnings in year y are constructed as the mean of earnings across all months in year y, including the zeros. Note: * -significant at a 10%, * * -significant at a 5%, * * * -significant at a 1%.
As discussed in the main text, there is a potential tension between the way we estimate the model, where all employment-to-unemployment (EU) transitions are used to identify δ, and the Note: * -significant at a 10%, * * -significant at a 5%, * * * -significant at a 1%.
analysis referring to the long-term earnings/wage losses of displaced workers, where we use mass layoffs to perform to identify displacement. Here we show that this tension appears to be small as estimating equation (26) using either mass layoff events or all EU transitions yields similar results. Figure 5 depicts the earnings losses of the low and medium education groups using either all EU transitions or mass-layoffs; while Figure 6 depicts the (log) wage loss of these education groups when using either all EU transitions or mass-layoffs. These graphs show that losses due to mass layoffs are slightly larger than when considering all EU separations, particularly for wages, but both follow very similar patterns. Jarosh (2015) using the same data (but a different sample) obtains a similar conclusion.

B.4 Data Construction
Since nearly all of our data work arises from the SIAB/BHP, we start by describing the main features in the construction of our SIAB/BHP sample. We merge the individual files with the establishment files on the establishment identifier and year. Then using adjusted beginning and end of spell date we compute the months when the spell started and ended. 25 As there can be many spells in the same month, especially when a worker moves between two labour market states, we calculate, for months at the beginning and end of a spell, how many days are part of the spell (on each "side"), assuming each month has 31 days. If there are two spells with the same individual identifier and monthly date, we assign a "repeated spell" dummy variable to the second and following spells. If the repeated spell is shorter (in days) than the previous 25 We have to adjust the beginning and end of a spell as in the original data spells are split in such a way that they do not overlap, however more than one spell in any given period is possible.   one, we assign it a dummy "repeated spell short" that is equal to one. If the repeated spell has daily wage lower than the previous spell, we assign the dummy "repeated spell low wages" to be equal to one. If the repeated spell starts on the same daily date as the previous spell, and lasts for the same number of months, we assign a "repeated spell same time" dummy and set it equal to one. We then use the duration of spells in months to construct a monthly panel. To determine which observations to keep, first we drop repeated spells which started on the same date and have the same duration in months, and lower daily wages than the previous spell. If two spells still coexist in a given month, we drop the repeated one with shorter duration. year and year of birth. We also keep those spells that are liable to social security, which implies that spells of trainees, marginal part-time workers, employees in partial retirement, interns and student trainees, and workers having "other" employment status are dropped. Additionally soldiers, border guards, police, and other related professions as well as members of parliament, ministers and civil servants are also dropped.
To construct the education groups, we create 8 education groups based on combinations of formal schooling and vocational qualifications. Employment spells are obtained as the number of consecutive months of employment, defined as observations not missing establishment identifier. In case of more than one spell in a month, the spell with higher daily wages would take precedence, so gaps up to one month would not break an employment spell. If the whole month is missing, it will be considered an unemployment spell. The average employment spell is calculated as an average of the average length of employment spells for each person. Job spells are obtained as a number of consecutive months in which an individual was employed and the establishment identifier remained constant.
Otherwise they are constructed in the same way as the employment spells. Unemployment spells are the consecutive months for an individual, in which the establishment identifier is missing.
Employer-to-employer transitions are recorded when there is a change in establishment identifier, but worker remains employed. Employment-to-unemployment (unemployment-to-employment) transitions are recorded when worker moves from employment (unemployment) to unemployment (employment).
In the case of the GSEOP, we construct a sample that is as close as possible to the SIAB sample described above. In these data, employer-to-employer transitions are defined as a transition from full-time employment to full-time employment when the reason for job change given by the worker reported either "job with new employer" or "company taken over". A voluntary employer-to-employer transition is one where the worker gave one of the following reasons for the termination of the previous job: "own resignation", "mutual agreement", or "leave of absence".
An involuntary employer-to-employer transition is one where the worker reported "company shut down", "dismissal", or "temporary contract expired".