Optimal Income Tax in an Extensive Labor Supply Life-cycle Model

This article considers a stationary economy populated with overlapping generations that reproduce identically in continuous time. Each dynasty has a productivity and an opportunity cost of going to work that vary with age. Labor supply is extensive. At each date, the typical agent can either work full time or not work at all. The deci-sion to work is based on a comparison between after tax income and the privately known opportunity cost of work. We assume that the utilitarian government, which aims at redistributing lifetime utility across dynasties, has a single policy instrument, a stationary income tax schedule function of current income. The article develops a method to study the government problem. This technique is applied to derive the properties of the optimal income tax schedule in a number of examples. independent of the (potential) productivity, supposed not to be verifiable for non-working individuals. We derive the properties of the optimal tax system in this setup.


Introduction
We study income taxation in a dynamic stationary economy with overlapping generations, made of dynasties with fixed lifetime that reproduce identically. Time is continuous. Labor supply is extensive. At each date, one can either work full time or not work at all. The typical agent is characterized by an invariant instantaneous utility function for consumption and deterministic profiles of productivity, that is, production when working full time, and pecuniary cost of going to work. For simplicity, we assume that these profiles are deterministic. The government aims at redistributing lifetime welfare across dynasties, using the income tax. We put a number of restrictions on the government instrument, based on casual observation of developed economies: we suppose that the tax schedule is invariant over time and that tax depends only on current labor income. It cannot depend on the pecuniary cost of going to work, which is implicitly not verifiable by the tax authority. All non-workers are identically treated (they pay the same amount or receive the absolute value of the transfer if tax is negative). This transfer is independent of the (potential) productivity, supposed not to be verifiable for non-working individuals. We derive the properties of the optimal tax system in this setup.
Its dynamic structure makes the article unlike the bulk of the optimal taxation literature that takes place in a static model; see, for example, the survey of Piketty and Saez (2013). Also, it is different from the intensive labor supply models that follow Mirrlees, where the agents only differ by their productivities along a single dimension of heterogeneity. It shares features of the extensive model, notably the heterogeneity in the cost of going to work, which may generate an optimal subsidy of the labor supply of the low-skilled agents as in Choné and Laroque (2005).
The dynamic setup is similar (overlapping generations, deterministic trajectories, . . . ) to that used by a number of recent works, Rogerson (2011), Shourideh and Troshkin (2012), or Weinzierl (2011). Our contribution with respect to these studies is in our focus on the extensive margin and the treatment of the two-dimensional heterogeneity on productivity and the cost of going to work.
To be able to derive the properties of the optimal tax system, we rule out tax schemes that depend on the history of earnings, an assumption which we feel justified by casual observation. This makes our analysis very different from that of Brito et al. (1991) and more recently the dynamic public finance literature; see, for example, Kocherlakota (2005Kocherlakota ( , 2010. These works are interested in the dynamic revelation of information. We stick to a simpler redistributive problem where the nonlinear tax is constrained to be a function of current income. We also do not allow income tax to depend on age, contrary to Weinzierl (2011). Again our justification is casual observation. A natural way to have transfers conditional on age would be to introduce pensions. This is outside the scope of the current article and should be the subject of future work.
The main contribution of the article is our characterization of the optimal tax scheme in this dynamic extensive labor supply environment. Indeed, the optimal tax can be described as the result of a balance between two forces: a redistributive force, holding labor supply constant, and an efficiency force, which comes from changes in labor supply and production. This generalizes the first-order conditions from Laroque (2011). There is a lot of bunching at the optimum: the tax scheme often is piecewise constant, and there are regions where the marginal tax rate is 100%. This is due to the extensive character of labor supply: what is important is whether to work or not to work, the average and not the marginal tax rate. Otherwise the model is too general to lead by itself to definite policy implications. Given the trajectories of the productivity and opportunity cost of work of the dynasties in the economy, given also the tastes for redistribution of the government, our analysis allows to compute the optimal tax schedule. Without further restrictions on the parameters of the economy, the tax schedule however is largely unconstrained. An interesting feature, for policy purposes, is the treatment of lowskilled agents whose labor supply may be distorted downward as in the traditional Mirrleesian intensive models or distorted upward, justifying the American Earned Income Tax Credit (EITC) and the introduction of negative taxes.
We have investigated in more depth, economies with two types of agents. We show that the optimal allocation dramatically changes depending on the source of the heterogeneity. When the agents primarily differ by their opportunity costs of work, the income tax schedule solves a standard equity/redistribution trade-off with each of the two types of agents having their labor supply distorted downward. By contrast, when the agents primarily differ through their productivity, the tax schedule is used to subsidize low-skilled work, and hence generates upward labor supply distortions. The logic at work is reminiscent of Choné and Laroque (2011). Here, however, the mechanism comes from the low-skilled agent being the only one working at low productivities, due to the form of her life cycle trajectory, rather than to the shape of the distribution of the social weights in the population.
The article is organized as follows. Section 2 presents the model. Section 3 deals with optimal taxation, introducing the efficiency and redistributive forces. Finally, Section 4 studies in detail the properties of optimal tax in the case where there are two types of agents in the economy.

Model
We consider an economy in continuous time. All agents have the same life length, normalized to one. The types i of the agents belong to a set I. Agent of type i has a lifetime utility function of the form where u i is an increasing concave function and c i ðaÞ denotes consumption at age a. Agent i, if she works at age a, produces at most w i ðaÞ units of a single homogeneous good. Going to work on the market, and therefore producing w i ðaÞ, has an opportunity cost for the agent, for instance because it takes time otherwise devoted to family gardening or to childcare. This cost varies with age along the life of agent i. We assume that this dependence with respect to age is deterministic and known. Formally the opportunity cost to work of agent i is pecuniary, measured in units of good, and represented with the function d i ðaÞ.
The type of an agent is thus characterized by a couple of exogenous, nonnegative functions ðw i ðÁÞ; d i ðÁÞÞ defined on ½0; 1 and by the instantaneous utility index u i ðÁÞ. The pair ðw i ðaÞ; d i ðaÞÞ as the age a varies determines a curve in the ðw; dÞ space, which we call a 'trajectory'. We assume that the functions w i ; d i , and u i are differentiable. The economy potentially exhibits a lot of heterogeneity.
At each date t, for each i in I, the economy contains a continuum of agents of type i of all ages a in ½0; 1; overtime the older agents die and are replaced by newborns of the same type. Cohort i has size n i , with P I n i normalized to 1, and the economy is stationary. An 'allocation' specifies the nonnegative consumption c i ðaÞ and the labor supply ' i ðaÞ in {0, 1} of all types i along their lives.
Furthermore, we assume that there are perfect financial markets for transferring wealth across time, with a zero interest rate. The agents use these markets to smooth their consumption overtime, c i ðaÞ ¼ c i independent of age. From now on, we restrict our attention to allocations where consumption is constant and equal to its aggregate value over the lifetime c i .

Feasibility
An allocation is 'feasible' if and only if total consumption does not exceed total output net of production cost: An allocation is 'efficient' whenever output net of production costs is maximized, that is, any agent works whenever her opportunity cost of work is lower than or equal to her productivity, ' i ðaÞ ¼ 1 if d i ðaÞ < w i ðaÞ and ' i ðaÞ ¼ 0 if d i ðaÞ > w i ðaÞ.

Utilitarian optimum (first best)
The utilitarian optimum is the allocation that maximizes P I n i u i ðc i Þ subject to the feasibility constraint (1). It is the feasible efficient allocation such that marginal utilities are equal: for all i in I.

Laissez-faire
The agents maximize their lifetime consumption They decide to work whenever their productivity is larger than their opportunity cost of work, 1 so the laissez-faire equilibrium is efficient. In general, laissez-faire yields an allocation that differs from the utilitarian optimum.
'In all of the article we suppose that the utilitarian government observes the employment status of the agents and, when they work, their productivity w. It never observes the pecuniary cost d, which is private information.'

Income tax
We study redistributive taxation in a setup where the tax schedule is assumed to be age independent and time invariant. The tax schedule is made of a function R(w), the after-tax income of a worker with before tax wage w, and of a scalar s equal to the subsistence income of the non-workers.

Second-best program
Facing the tax schedule ðRðÁÞ; sÞ, the consumer chooses her labor supply 'ðaÞ, so as to maximize her lifetime utility, that is, where 'ðaÞ belongs to {0, 1}. Feasibility then can be written in two equivalent ways, either as a balanced government budget: 1 Suppose that d i ðaÞ is a disutility cost instead of a pecuniary one, i.e. agent i, when working, produces w i ðaÞ and has instantaneous utility uðc i ðaÞÞ À d i ðaÞ, while she has instantaneous utility uðc i ðaÞÞ when not working. Then agent i works at age a under laissez-faire if and only if u i 0 ðc i Þw i ðaÞ > d i ðaÞ, where c i is her constant, instantaneous consumption level. Hence, this specification entails an 'income effect' in labor supply: participation decreases with c i . Using Pareto-optimality conditions, it can be checked that laissez-faire is efficient. The pecuniary model adopted in this article avoids these complications.
or as the equality of aggregate production and aggregate consumption: The second best allocation maximizes the sum of utilities under the above constraints.

Optimal Income Tax
When an agent has productivity w at some date, her financial incentive to work is equal to RðwÞ À s, which is to be compared with the opportunity cost of work d. It is useful to represent the financial incentive to work in the same plan as the individual trajectories ðwðaÞ; dðaÞÞ.
Hereafter, the 'incentive schedule' is the curve ðw; RðwÞ À sÞ as productivity varies. An agent works in regions where her trajectory is located below the incentive schedule, that is, her opportunity cost of work d is smaller than the financial incentive to work RðwÞ À s. Her work status changes at points where her trajectory crosses the incentive schedule.
Assuming that the agents can choose occupations requiring skills below their own ability, no one would choose an occupation whose required productivity belongs to a decreasing part of the function R, preferring to produce less and to earn a higher after-tax income. Formally, we can replace any function R with R $ ðwÞ ¼ max w 0 w Rðw 0 Þ. It follows that, without loss of generality, we limit our attention to functions R that are nondecreasing and assume that workers work at full productivity. Lifetime consumption of agent i is therefore given by For notational simplicity, we do not mention the policy instruments R and s in the arguments of the labor supply functions ' i . The Lagrangian of the problem reduces to where k is the multiplier of the government budget and Y i ð' i Þ is agent i's lifetime net output: The problem is to find the tax instruments ðRðÁÞ; sÞ which maximize the Lagrangian L subject to the constraint that RðÁÞ be nondecreasing. An equal translation of RðÁÞ and of the subsistence income s, which does not alter labor supply, yields the first-order necessary condition: The Lagrangian depends on the tax schedule through two channels: consumption levels c i and labor supplies ' i . Hereafter, we label 'redistribution force' and 'efficiency force' the effect of R through these respective channels. The first force is present at all productivity levels, while the second is active only at points w where an agent is indifferent between working and not working. Formally, we compute the Fréchet derivatives of the Lagrangian of the government problem, seen as a functional that maps the set of functions R into R. To this aim, we evaluate the Lagrangian at a slightly perturbed function R þ eh, compute the ratio ½ LðR þ ehÞ À LðRÞ=e, and let e tend to zero. A mathematical derivation of the limit can be found in Appendix A. Here we present a heuristic approach of the differentiation.

The redistribution force
This force comes from the dependence of lifetime consumptions on the after-tax schedule. Suppose we replace the after-tax income R with R þ dR on the interval ½w; w þ dw, with dw > 0. This change in after-tax income translates into a change in consumption for the agents who work at productivity levels in ½w; w þ dw. The change in agent i's lifetime consumption is given by where T i ðw; ' i Þ denotes the time spent by agent i with worktime profile ' i working in a productivity lower than or equal to w and, accordingly, its derivative dT i ðw; ' i Þ represents the time spent by agent i working in a productivity between w and w þ dw. By construction, T i ðw; ' i Þ is a nondecreasing function of w. The limit of T i ðw; ' i Þ as w goes to infinity is the total time agent i works over her life cycle, hereafter denoted L i .
The derivative of T i ðw; ' i Þ with respect to w, dT i ðw; ' i Þ, is a positive measure which is almost everywhere continuous, possibly having mass points at productivity levels where agent i spends non-infinitesimal periods of time. If we think of agent i's productivity when she works as a random variable, the probability measure dT i ðw; ' i Þ=L i can be thought of as the distribution of that random variable. Suppose agent i's trajectory crosses the incentive schedule from below at w 0 , that is, the agent works for w w 0 and does not work for w ! w 0 along the trajectory. Then dT i has a downward discontinuity at w 0 , and T i has a concave kink at w 0 . If the trajectory crosses the schedule from above, then the kink of T i is convex.
By the chain rule, the variation of the Lagrangian coming from the changes in lifetime consumptions is given by d L ¼ dUðw; 'Þ, where Uðw; 'Þ is the social marginal utility of income (net of the cost of public funds) for workers with productivity below w: The term dUðw; 'Þ reflects the redistributive force. Redistribution induces the government to raise (lower) after-tax income in regions where dUðw; 'Þ > 0 ( dUðw; 'Þ < 0). The observation that k is the average of marginal utilities yields the following result.
Lemma 1. The net social marginal utility of income of workers with productivity below w, Uðw; 'Þ, has the same sign as the correlation between marginal utilities u 0 i ðc i Þ and working times T i ðw; ' i Þ.

Labor supply elasticity
A change in the tax schedule may also affect labor supply. We say that there is 'indifference' at w if there exists an agent i, having productivity w at some age a i ; w ¼ w i ða i Þ, who is indifferent between working and not working at this age, that is, RðwÞ À d i ða i Þ ¼ s. A 'switch point' is an indifference point such that the work status of the indifferent agent changes in a neighborhood of w, that is, the trajectory of agent i crosses the incentive schedule at w. When the slopes of the tax schedule and of the trajectory are different, R 0 ðwÞ 6 ¼ d i 0 =w 0 i , the quantity is positive and finite. Consider a switch point w and replace R with R þ dR on the interval ½w; w þ dw, with dR ¼ ðd i 0 =w 0 i À R 0 Þ dw, as shown on Figure 1. (In the represented example, the trajectory is decreasing in the ðw; dÞ space; specifically the agent's productivity and cost of work, respectively, decline and rise with age.) The perturbation changes the status of the agent on the interval from working to non-working. The time spent in the interval is hence, g i is the absolute value of the derivative of labor supply with respect to the tax schedule R. When Rs increases by 1%, the time agent i spends working at a productivity below w is increased by e i ðw; RÞ percent, where e i ðw; RÞ denotes the elasticity of agent i's labor supply, T i ðw; ' i Þ, with respect to financial incentives to work: The labor supply elasticity depends on both the gradient of the trajectory and the slope of the after-tax schedule at the switch point. In particular, the steeper the tax schedule at w, the lower the elasticity, because the agent spends less time in the region affected by the perturbation. The above formula is readily adapted if agent i's trajectory crosses the tax schedule more than once. Formally, the Fréchet-derivative of T i ðw; ' i Þ with respect to the tax schedule R is a positive measure made of mass points at agent i's switch points below w, see equation (16) where SðwÞ is the set of agents who switch at w.

The efficiency force
A marginal change of the incentives to work, dðR À sÞ, on a small interval around a switch point w of agent i has only a second-order effect on her permanent income because she is indifferent between working and not working at this point. Such a change, however, affects the net output she produces over her life cycle: where d ¼ RðwÞ À s is agent i's cost of work at the switch point. At the same time, the change affects the government revenue. For instance, if dR > 0 and w > R À s, the variation of the tax schedule induces the agent to switch from not working to working on a short period of her life, which raises the government revenue. We define the efficiency force as We show formally in Appendix A that this force is a discrete measure concentrated on the set of all switch points Wðw; 'Þ ¼ k X r2S n i ½w r À Rðw r Þ þ s g i ðw r ; RÞ1 wr w; where S is the set of all agents' switch points, w r is the productivity level at r, and eðw; RÞ is the total labor supply elasticity given by (10). The previous analysis is summarized in the following proposition. Proposition 1. The Lagrangian L is differentiable at any point ðw; RðwÞ À sÞ where no trajectory is tangent to the incentive schedule. Its derivative can be written as the sum where the almost everywhere continuous measure dUðw; 'Þ given by (7) and the discrete measure dWðw; 'Þ given by (12) represent the redistribution and efficiency forces.
Raising R at an indifference point increases labor supply, which alleviates the government budget constraint if w > R À s and makes it more stringent if w < R À s. Hence, income maximizing pushes the government to raise (lower) after-tax income in regions where w > R À s (w < R À s). This force translates into mass points in the derivative of the Lagrangian or even into discontinuity points in the Lagrangian function.
On the other hand, the redistributive force, expressed in the term (7), is absolutely continuous (except at productivity levels where some workers spend a finite time): the redistributive effect of an increase in the after-tax income on an interval of productivities is the integral on the interval of the net social marginal utility of income dU.

Finite number of types
The above analysis allows to concentrate attention on a particular class of tax schedules when I is finite: 'When the number of types is finite, the second-best optimum may be implemented with an incentive schedule that is piecewise either constant or coincident with an increasing trajectory.' The proof is in Appendix B. When the tax schedule coincides with an increasing trajectory, the government faces a particularly strong efficiency force. 2 Otherwise, the monotonicity constraint binds. Putting the signs of dUðw; lÞ and dWðw; lÞ on the diagram of trajectories allows to qualitatively separate intervals of productivities where the redistribution and efficiency forces tend to push R up from those where these forces are downward.
Since we expect bunching to be the norm, it is worthwhile to spell out the form of firstorder conditions under bunching. Consider a bunching interval ½w 0 ; w 1 . We can raise or lower R on the whole bunching interval, raise it on right subintervals ½w; w 1 , and lower it on left subintervals ½w 0 ; w. None of these variations should increase the Lagrangian, which yields the first-order conditions: for all w in the interval, with equality for w ¼ w 0 . This implies in particular that d L is nonnegative at w 0 and non-positive at w 1 .

An Example: Two Types and Decreasing Trajectories
The results of the previous analysis greatly simplify the computation of the optimal tax schedule in economies covered by our assumptions. We can derive stronger analytical results when the environment is simpler. In practice, in this last section, we consider economies with two types of agents, a high type H and a low type L, endowed with the same utility function u. To adapt arguments based on incentives, we need the trajectories to be single crossing, which pushes us to focus on the second parts of lives, when productivity decreases with age while the opportunity cost of work increases with age. We also need the two types to be unambiguously ranked, the high type being more productive on the market and with a smaller opportunity cost of work than the low type, at all ages: w H ðaÞ > w L ðaÞ and d H ðaÞ < d L ðaÞ for all a in ½0; 1. We also suppose that there is a natural retirement age: the trajectories intersect the 45 degree line. 3 This set of assumptions is consistent with many different patterns. If the agents' productivities are very close while their opportunity costs of work are very different, agent L's trajectory is above agent H's in the ðw; dÞ plan; see Figure 3. In the opposite case, agent H trajectory lies at the right of that of agent L; see Figure 4. The trajectories may very well cross, possibly many times, meaning that the same characteristics (productivity, cost) are reached by the two agents at different ages. Formally, the following properties hold.
Assumption 4.1 (Decreasing trajectories). The two agents have the same utility functions u. Their productivities, w H ðaÞ > w L ðaÞ, decrease with age and their pecuniary costs of work, d H ðaÞ < d L ðaÞ, increase with age. There exist ages a Ã L and a Ã H in (0, 1) The fact that type H dominates pointwise type L implies that its consumption and welfare are at least as large, whatever the tax schedule, c H ! c L . Any nondecreasing tax schedule crosses each trajectory only once, respectively, at ages a H and a L , a H ! a L , with associated wages w H ða H Þ and w L ða L Þ and opportunity costs of work d H ða H Þ and d L ða L Þ. The wages w H ða H Þ and w L ða L Þ represent the lowest productivities at which the agents work. The following proposition, whose proof is in Appendix C, provides the list of all possible configurations at the second-best optimum. Then we present two examples that illustrate how unobserved heterogeneity affects the labor supply distortions. Proposition 2. Under Assumption 4.1, the following properties hold: i. There exists an optimal tax schedule with at most two values; ii. Agent H has her labor supply distorted downward; iii. Agent L labor supply can be distorted in any direction or undistorted; iv. Agent H retires later and enjoys higher lifetime consumption than agent L: We now illustrate the impact of the heterogeneity on labor supply distortions, see point (ii) and (iii) of Proposition 2. We use two examples where the agents differ only in one dimension, either productivity or opportunity cost of work. These examples, therefore, are at the limit of what is permitted by Assumption 4.1. Again, we focus on the second part of the 3 A referee has asked us to what extent the results below can be generalized. We certainly can have more than two types. But the monotony and ranking of the dynasties are an essential element of the argument. Example 1 (Same productivities, different opportunity costs of work). In addition to Assumption 4.1, suppose that the agents are equally productive, w H ðaÞ ¼ w L ðaÞ for all a, while agent H has a lower pecuniary cost of work: d H ðaÞ < d L ðaÞ for all a. Then at the optimum, both agents have their labor supply distorted downward.
In Example 1, we must be in case (1) of the proof of Proposition 2. Moreover, the configuration with d L ða L Þ ¼ d H ða L Þ is not possible here. Indeed, as the two agents would work exactly the same time at productivities above any threshold w, an increase of the tax schedule above wða L Þ À e for a small e > 0 would have no redistributive effect and a positive efficiency effect at wða L Þ-a contradiction. The optimal schedule is therefore discontinuous at wða L Þ as shown on the left panel of Figure 5.
Example 2 (Different productivities, same opportunity costs of work). In addition to Assumption 4.1, suppose that the agents have the same pecuniary costs of work, d H ðaÞ ¼ d L ðaÞ for all a, while agent H is more productive: w L ðaÞ < w H ðaÞ. Then at the optimum, agent L has her labor supply distorted upward.  In Example 2, we must be in case (2) of the proof of Proposition 2. Moreover, the configuration where the tax schedule is flat is not possible here. Indeed, the equality d L ða L Þ ¼ d H ða L Þ would imply a L ¼ a H , meaning that the two agents would have the same total working time: a uniform increase of R À s would thus have no redistributive effect and a positive efficiency effect at w H ða H Þ-a contradiction. The optimal schedule is therefore discontinuous at w L ða L Þ as shown on the right panel of Figure 5 .
In Example 1, the heterogeneity primarily comes from the opportunity cost of work, while in Example 2, it comes from the productivity. In both cases, the government cannot implement the first best in these two-type economies. The direction of the distortions, however, is sensitive to the source of the heterogeneity.

A.2 Labor supply elasticity and efficiency force
Labor supply is changed under the perturbed schedule R þ eh only if the support of h contains switching points. For ease of exposition, we assume that the support contains only one switching point, that we denote by w. We denote by i the switching agent and by a i the age at which agent i switches at w. We have: w i ða i Þ ¼ w and RðwÞ À s ¼ d i ða i Þ. To fix ideas, we suppose that both d 0 ða i Þ and w 0 ða i Þ are positive and that the slope of the indifferent agent's trajectory is larger than the slope of the schedule: d i 0 ða i Þ=w 0 ða i Þ > R 0 ðwÞ.
The perturbed schedule R þ eh crosses agent i's trajectory at points w such that there exists a with w ¼ wðaÞ and Iða; eÞ ¼ 0, where Iða; eÞ ¼ ehðwðaÞÞ À dðaÞ þ RðwðaÞÞ À s: As @I=@eða i ; 0Þ ¼ hðwÞ and @I=@aða i ; as e goes to zero. If the slope of the tax schedule is larger than that of the trajectory, d i 0 ða i Þ=w 0 ða i Þ < R 0 ðwÞ, replacing R with R þ eh changes labor supply on the left of w and the ratio ½T i ðw; R þ ehÞ À T i ðw; RÞ=e tends to hðwÞ jd i 0 ða i Þ À R 0 ðwÞw i 0 ða i Þj for w ! w 0 for w < w: as e goes to zero. This yields expression (9) for the elasticity of agent i's labor supply. The Fréchet derivative of T i ðw; RÞ is thus given by @T i ðw; RÞ @ðR À sÞ ¼ X r2SiðwÞ e i ðw r ; RÞ T i ðw r ; RÞ Rðw r Þ À s fðw r Þ; where S i ðwÞ is the set of agent i's switch points r located below w, w r w is the agent's productivity at r, and fðw r Þ denotes the mass point at w r . The Fréchet derivative of the total labor supply T has the same expression as above, replacing S i ðwÞ with SðwÞ, the set of 'all' agents' switch points located below w. We use the same method to compute the Fréchet derivative of the term Ð 1 0 ½w i ðaÞ À d i ðaÞ' i ðaÞ da. The only difference with the above analysis is the presence of the multiplicative term w i ðaÞ À d i ðaÞ, which, at a ¼ a i , is equal to w À RðwÞ þ s, given that w is a switch point. This yields (12) and (13).

Discontinuous Lagrangian
Consider an indifference point w such that the incentive schedule is locally tangent to the indifferent agent's trajectory. (In other words, we have: r ¼ R 0 .) Then the Lagrangian is discontinuous at w, as an infinitesimally small increase in R implies a non-infinitesimal change in the Lagrangian. In other words, the efficiency force is particularly strong, creating a discontinuity in the Lagrangian, whose sign is the same as that of w À R þ s. This is in particular the case where the tax schedule locally coincides with an agent trajectory.

B. From Increasing to Piecewise Constant Schedules
Lemma B.1 Let R be any nondecreasing tax schedule. Let w < w be such that none of the agents' trajectories ðw i ðaÞ; d i ðaÞÞ; a 2 ½0; 1, i 2 I, intersects the rectangle ½w; w Â ½RðwÞ; RðwÞ. Assume that the functions T i have at most finitely many discontinuity points.
Then there exists a nondecreasing tax schedule R, such that R is piecewise constant, with finitely many pieces, on ½w; w; R takes its values in ½RðwÞ; RðwÞ on this interval, and Proof. By assumption, labor supply is not affected as long as the schedule remains between RðwÞ and RðwÞ. We can therefore drop the second argument in the functions T i , writing T i ðwÞ rather than T i ðw; RÞ. Let w 1 ; . . . ; w N be the discontinuity points of the functions T i . Let w 0 ¼ w and w Nþ1 ¼ w. We have: It is sufficient to prove the result on each interval ½w j ; w jþ1 . Integrating by parts yields: wj ½T i ðwÞ À T i ðw þ j Þ dRðwÞ: We now apply Lemma A.1 (p. 1260) of Ghosal and Van der Vaart (2001) with the compact set K ¼ ½w j ; w jþ1 , the probability measure F 0 ¼ dRðwÞ=½Rðw jþ1 Þ À Rðw j Þ and the functions W i ðwÞ ¼ T i ðwÞ À T i ðw þ j Þ; i ¼ 1; . . . ; I, which are continuous on K. The Lemma yields a discrete probability measure on K with at most I þ 1 support points such that RðwÞ dT i ðwÞ with RðwÞ ¼ Rðw j Þ þ ½Rðw jþ1 Þ À Rðw j ÞðwÞ. The schedule R is nondecreasing and piecewise constant, with at most I þ 1 pieces. It takes its values in ½Rðw j Þ; Rðw jþ1 Þ. h Properties of the optimum when there is a finite number of types Consider an interval where the schedule is increasing. The schedule can locally coincide with an increasing trajectory, in which case efficiency and redistribution play in opposite directions: the schedule is slightly below the trajectory if dU > 0 and w < R À s, slightly above if dU < 0 and w > R À s. For instance, in the former case, lowering (raising) R entails an infinitesimal (a non-infinitesimal) fall in the Lagrangian through the redistribution (efficiency) effect. 4 Now consider an interval ½w; w 0 where the schedule is increasing and does not coincide with an increasing trajectory. 5 By compactness of ½w; w 0 , there exists a finite sequence w ¼ w 1 < . . . < w h ¼ w 0 such that no trajectory crosses the rectangles ½w j ; w jþ1 Â ½Rðw j Þ; Rðw jþ1 Þ. On each interval ½w j ; w jþ1 , we apply Lemma B.1 and replace R with a piecewise constant schedule that takes its values in ½Rðw j Þ; Rðw jþ1 Þ and leaves the government revenue and the agents' lifetime consumption and labor supply unchanged. h

C. Proof of Proposition 2
Proof. The inequality a H > a L follows from the ordering and the monotonicity of the functions Rðw i ðaÞÞ À s À d i ðaÞ; i ¼ H; L. This inequality, in turn, implies c H > c L and u 0 ðc H Þ < u 0 ðc L Þ. Consider any productivity threshold w greater than w ¼ max ðw H ða H Þ; w L ða L ÞÞ. Only the redistribution force is present above w and its integral over the set ½w; w H ð0Þ, ðu 0 ðc H Þ À kÞðT H ðw; lÞ À T H ðw H ð0Þ; lÞ þ ðu 0 ðc L Þ À kÞðT L ðw; lÞ À T L ðw L ð0Þ; lÞ; is strictly negative because u 0 ðc H Þ < u 0 ðc L Þ and agent H spends strictly more time working in that region than agent L, w À1 H ðwÞ > w À1 L ðwÞ. The function rðwÞ ¼ min ð0; RðwÞ À RðwÞÞ is non-increasing, and is an admissible variation of the tax schedule, as R þ er is nondecreasing for small values of e. We must therefore have: Since the bracketed term in the above inequality is negative and r 0 0, we get r 0 ¼ 0 and thus RðwÞ ¼ RðwÞ. The after-tax schedule, therefore, is flat above max ðw H ða H Þ; w L ða L ÞÞ.
4 In other words, the Lagrangian is locally discontinuous in productivity regions where the tax schedule is locally tangent to a trajectory, see Appendix A.2. 5 The first-order conditions imply that the net social marginal utility of income, dU, is identically zero on ½w ; w 0 and that RðwÞ À s ¼ w at any switch point in this region.
In case (1) To examine agent L's labor supply, we suppose first the tax schedule is continuous at In this case, the after-tax schedule takes only one value for w ! w H ða H Þ, namely the common value of d L ða L Þ ¼ d H ða H Þ, and agent L's labor supply is distorted downward because w L ða L Þ À d L ða L Þ is larger than w H ða H Þ À d H ða H Þ > 0. We consider now the case where the schedule is discontinuous at w L ða L Þ. We know that R À s is flat and equal to d L ða L Þ above that point. We can therefore consider a transformation that pushes R À s down from d L ða L Þ to d H ða H Þ just above wða L Þ. The redistribution and efficiency effects of the transformation respectively bear on type H and type L. The former is positive and of the sign of ðu 0 H À kÞðd L À d H Þ since it takes d L À d H from type H while leaving L's lifetime consumption level unaffected. The latter is of the sign of Àðwða L Þ À d L Þ. For them to sum up to zero, we must have wða L Þ > d L ða L Þ: agent L's labor supply, again, is distorted downward.
In case (2) We deal separately with the situation where the two agents have the same opportunity costs when they stop working, and when that of H is larger than that of L. Suppose first that d L ða L Þ ¼ d H ða H Þ. Then the financial incentive to work RðwÞ À s is equal to that common opportunity cost for all w ! w L ða L Þ. The efficiency force w L ða L Þ À d L ða L Þ cannot be downward at w L ða L Þ as this would violate the first-order condition on the bunching interval starting at w L ða L Þ (in practice, the government would slightly decrease the after-tax income at w L ða L Þ), hence w H ða H Þ > w L ða L Þ ! d L ða L Þ ¼ d H ða H Þ: agent H's labor supply is distorted downward.
Suppose now that d L ða L Þ < d H ða H Þ. Since u 0 ðc L Þ > u 0 ðc H Þ and only agent L works at productivities lying between w L ða L Þ and w H ða H Þ, the redistribution force pushes upward and the financial incentive to work R À s equals d H ða H Þ in that interval. The tax schedule, therefore, is discontinuous at w L ða L Þ and equal to d H ða H Þ above that point. Consider the perturbation that moves the discontinuity point w L ða L Þ in the tax schedule slightly to the left while maintaining R À s ¼ d H ða H Þ. This perturbation, which does not affect agent H, increases agent L's labor supply and consumption. Consumption is increased by a firstorder quantity because the agent receives positive extra income d H ða H Þ À d L ða L Þ > 0 during a small time interval, hence a positive redistributive effect. The efficiency part of the perturbation is a change in the Lagrangian of the sign of w L ða L Þ À dða L Þ. Expressing that the latter must outweigh the former, the first-order condition on R in the bunching interval yields w L ða L Þ < dða L Þ, an upward distortion in L's labor supply.
Finally, we consider case (3), denoting by w the common value of w H ða H Þ and w L ða L Þ. We first show that the tax schedule necessarily intersects the two trajectories at the same point: d H ða H Þ and d L ða L Þ must be equal. Suppose for instance that d H ða H Þ < d L ða L Þ. A small increase dR H in after-tax income below w would put agent H to work on a small time interval of length dT H ¼ g H dR H . Similarly, a small decrease À dR L in after-tax income above w would put agent L out of work on a small time interval of length dT L ¼ g L dR L . These transformations have redistribution effects that are of the second order. Choosing dR H and dR L such that dT H ¼ dT L , we find by (12) that the associated changes in the Lagrangian would be respectively kðw À d H ðaÞÞ dT and Àkðw À d L ðaÞÞ dT. The sum of these two quantities would be of the sign of d L ðaÞ À d H ðaÞ, therefore positive, implying that one of the above changes would increase the Lagrangian through the efficiency force-a contradiction. A similar contradiction is found if d H ða H Þ > d L ða L Þ, hence the announced equality.
The tax schedule is flat above w. A slight decrease of its constant level has a positive redistribution effect, and must therefore have a negative efficiency effect, implying that both agents have their labor supply distorted downward.
Collecting the results obtained in the three cases, we directly get parts (1), (2), and (3) of the proposition. We have also seen that a H > a L and can therefore compute c H À c L ¼ ð aH 0 ½Rðw H ðaÞÞ À s À d H ðaÞ da À ð aL 0 ½Rðw L ðaÞÞ À s À d L ðaÞ da: In each of three cases studied above, the tax schedule is flat over ½w L ða L Þ; w H ð0Þ, and hence Rðw L ðaÞÞÞ ¼ Rðw H ðaÞÞ for all a a L , which yields (15).