Signalling to Experts


 We study competitive equilibria in a signalling economy with heterogeneously informed buyers. In terms of the classic Spence (1973, The Quarterly Journal of Economics, 87, 355—374) model of job market signalling, firms have access to direct but imperfect information about worker types, in addition to observing their education. Firms can be ranked according to the quality of their information, i.e., their expertise. In equilibrium, some high-type workers forgo signalling and are hired by better informed firms, which make positive profits. Workers’ education decisions and firms’ use of their expertise are strategic complements, allowing for multiple equilibria that can be Pareto ranked. We characterize wage dispersion and the extent of signalling as a function of the distribution of expertise among firms. Our model can also be applied to a variety of other signalling problems, including securitization, corporate financial structure, insurance markets, or dividend policy.


INTRODUCTION
We study competitive markets with the following features: sellers are privately informed about their own type; they can take a publicly observable action that is differentially costly for different types; buyers can directly observe imperfect information about sellers' types; and the quality of this information is heterogeneous across buyers. The first two features define a standard signalling environment. 1 Our objective is to move beyond the special case, studied extensively, where buyers are completely uninformed and rely exclusively on the public signal to form beliefs about sellers' types. Instead, we investigate the effect of adding the third and fourth features, buyers' heterogeneous direct information, on equilibrium prices and allocations. Our running example is an extension of the canonical Spence (1973) model of job market signalling: workers have 1. Throughout, we refer to a signalling rather than a screening problem. Traditionally, which term is used depends on which party proposes contract terms. Since in our setup there are markets for all possible contracts, the distinction vanishes.
The editor in charge of this paper was Veronica Guerrieri.

REVIEW OF ECONOMIC STUDIES
private information about their own productivity; education is purely wasteful but is more costly for less productive workers so it can be used to signal; and firms have heterogeneous expertise in directly assessing workers' productivities, in addition to verifying their education. For instance, firms have access to such direct information through tests, interviews, referrals or trial periods, and differ in their ability to extract accurate predictions from them. We ask how differences in recruiting expertise across firms affect the equilibrium: what wages do more-versus less-expert firms offer, which workers do they hire, how much profit do they make, what education levels do they require, and what are the implications for social welfare? While we present our setup and results in terms of this application, our model is general and can be used to answer these basic questions for many signalling and screening problems. How do investors' abilities to directly assess a company's profitability affect IPO prices, incentives for insiders to retain undiversified shareholdings, and the payment of dividends? What are firms' incentives to engage in costly brand-building or to offer warranties if consumers have heterogeneous ability to find out about product quality directly, e.g.,b y studying product reviews? How does the use of different risk assessment models across insurance companies affect equilibrium deductibles and premiums? What are the effects of asset managers' heterogeneous pricing techniques on the design and tranching of assetbacked securities? Returning to labour markets, we focus on the most parsimonious setting with two worker types and consider configurations for firms' direct information that allow us to rank firms by their expertise, i.e., their probability of making mistakes: the "false positives" case where firms may observe good signals from low-productivity workers and the opposite case, with "false negatives." We assume that each firm hires a single worker; such capacity constraints are crucial to rule out trivial solutions where the most-expert firms hire all workers.
Our first task is to define a notion of competitive equilibrium that applies to this environment. We assume that each combination of a wage and an education level defines a separate market. Any worker is allowed to apply for a job in any market (provided he acquires the level of education prescribed by that market) and any firm can recruit in any market. For workers, markets are partially exclusive: naturally, they commit to a single education level but can apply for jobs at many different wages. When hiring, firms need not hire randomly from the pool of applicants: they can reject some applicants and only hire from among those they find acceptable, but only to the extent that their own direct information allows them to tell workers apart. Markets do not necessarily clear: in any given market, workers can apply for jobs and not get them and firms may not find acceptable workers. Equilibrium requires that workers' expectations of their chances of finding work in each market and firms' beliefs about what workers they will encounter in each market be consistent with each other and with firm recruiting and worker education decisions.
As is common in signalling models, the set of equilibria depends on what beliefs agents can entertain regarding markets where in equilibrium there is no trade. A crucial technical contribution of this paper is to construct restrictions on these out-of-equilibrium beliefs that deliver a unique and plausible equilibrium in the familiar uninformed-buyers benchmark, yet still guarantee equilibrium existence and tractability in the general case of heterogeneous expertise. We propose the following conditions: first, for any market where a firm has well-defined beliefs about what acceptable workers it would encounter, these beliefs can only place weight on workers who would find it (weakly) optimal to apply to that market. Second, if a firm does not have welldefined beliefs about acceptable workers it would encounter, we impose that any workers that would be acceptable to the firm must expect that, if they were to apply for a job in that market, they would get one for sure.
For the benchmark where firms have no direct information, our definition ensures that the least-cost separating allocation is the unique equilibrium. Our refinement implies that pooling is inconsistent with equilibrium: at slightly higher education levels than a putative pooling allocation, firms must believe that they will only encounter high type workers because they are the ones most willing to choose higher education, and therefore firms could profitably deviate.
For the false positives case, the following "partial signalling" pattern emerges. Low worker types get no education and high types get either no education or enough education to fully separate. Firms with sufficiently accurate information recruit zero-education workers at a wage w P that leaves high-productivity workers indifferent between signalling and not signalling, and make positive profits. These firms face both high-and low-productivity applicants, so they can only profit if they are able to reject a sufficient proportion of low types. Firms with less accurate information recruit either educated workers at a wage equal to the high types' productivity, or zero-education workers at a wage equal to low types' productivity, and make zero profits in either case. Two simple conditions summarize any equilibrium: an indifference condition that requires the marginal firm to make zero profits by hiring zero-education workers at wage w P , and a market-clearing condition requiring high-type workers who forgo education to indeed find jobs at wage w P . This tractable structure allows us, for instance, to study comparative statics. We find that signalling decreases if the cost is higher, if the demand for workers increases, or if firms' expertise improves, intuitive properties that, somewhat unappealingly, cannot be obtained in the standard signalling model with uninformed firms.
Our model features strategic complementarities between high-quality workers' signalling decisions and firms' recruiting decisions. If enough high productivity workers forgo education, the pool of applicants in zero-education markets improves. This induces less-expert firms to recruit zero-educated workers, which in turn allows more high-type workers to forgo education. As a result, the model may feature multiple equilibria, each with different proportions of high types choosing to forgo education. The least cost-separating allocation, where all high types get enough education to separate, is always one of these equilibria: if all high types signal, there is no hope to hire them without requiring the signal, and therefore firms' expertise is useless-an extreme form of coordination failure. More generally, when there are multiple equilibria, they can be Pareto ranked. The signal is a pure deadweight cost, and the equilibrium with less signalling is preferred by everyone.
One feature of the classic signalling and screening model that has been criticized is a discontinuity as the buyers' prior becomes degenerate. The symmetric information case involves no signalling, but in the presence of even a minimal mass of low types, the high types must emit a non-trivial signal to separate. Our model offers a natural way to smooth out this stark property: there always exists an equilibrium that continuously approaches the full information limit, both as the share of low types vanishes and as buyers' direct information becomes perfect. A similar discontinuity arises in the standard signalling model when the signalling costs of the two types converge: whenever the costs differ, there is a discrete amount of signalling, but no signalling when they are equal. We show that our model overcomes this discontinuity as well.
Finally, we characterize equilibrium in the false negatives case, which we show to be essentially unique. Productive workers now make different choices depending on how transparent they are, that is, how many firms are able to identify them as high types. Those most easily identified forgo education and are paid their productivity. Less transparent workers also forgo education but now earn a range of lower wages. They are hired in part by non-selective firms in markets where low types also apply, so wages must be low enough to allow these non-selective firms to break even on whatever pool of applicants they face. The least transparent productive workers instead resort to education in order to separate from low types. Therefore, our model provides a novel theory KURLAT & SCHEUER SIGNALLING TO EXPERTS 5 Daley and Green (2014) also study an environment where the possibility of signalling coexists with direct information ("grades") and find conditions such that the equilibrium features either partial or complete pooling. They assume that grades are equally observable by all firms, so they have no role for expertise on the firm side. Feltovich et al. (2002) also consider an environment with (homogeneous) direct information in addition to signalling, and find that-in a model with three types-the highest types may refrain from signalling to distinguish themselves from the medium types, a behaviour they refer to as "countersignalling." A similar feature emerges in our model in the false negatives case, where some high types separate through signalling while others pool with low types in terms of the signal they emit, relying instead on expert buyers to identify them. Fishman and Parker (2015), Bolton et al. (2016), and Kurlat (2019) study environments where buyers can differ in the quality of their information but where sellers do not have a way to signal. Their focus is on the efficiency of buyers' information acquisition decision. Board et al. (2017) share our interest in the idea that firms differ in their ability to tell apart high-and low-quality job applicants. In their setup, however, workers do not make any decisions, so whether or not they know their own productivity does not matter. This rules out any way in which workers may signal their private information, or be screened other than through firms' direct assessment of them. Instead, in our model, workers can emit a publicly observable signal, such as education, that can be used to convey information about their productivity. In addition, Board et al. (2017) assume that firms' direct information is independent across firms, whereas we work with a nested information structure where more-expert buyers know strictly more than less-expert ones.
The rest of this article is organized as follows. Section 2 introduces the model and briefly illustrates a number of well-known applications. Section 3 provides our equilibrium definition and Section 4 shows that it gives rise to a unique equilibrium in the standard signalling environment where firms are uninformed. In Section 5, we characterize the set of equilibria with false positives and in Section 6 the case of false negatives. Finally, Section 7 concludes. Various extensions and all proofs are relegated to the Appendix.

THE ECONOMY
Our model is intended to capture a generic signalling setting. For clarity, we present our results in terms of Spence's original job market signalling model. However, the only critical assumptions are perfect competition, heterogeneous information, and the existence of some action (the signal) that is inefficient from a first-best point of view but involves different costs for different sellers. Our results therefore apply to any setting with these features, and we provide some alternative interpretations of the model below.

Job market signalling
There is a unit measure of workers indexed by i, uniformly distributed in the interval [0,1]. Each worker is endowed with a single unit of labour. Worker i's productivity is with q L < q H . Workers with i <λ and i ≥ λ are low and high types, respectively. A worker's index i is private information. Workers of the same type but different indices i all have the same productivity; they differ only in terms of how easy it is for firms to identify them, as specified below.

REVIEW OF ECONOMIC STUDIES
Workers can choose a publicly observable level of education e, which has no effect on their productivity. If worker i chooses a level of education e and gets a job at a wage w, his utility is We assume c L > c H , so low types experience a higher utility cost of obtaining education. Up to here, the model coincides with the Spence (1973) signalling model. Our innovation is to introduce firms' heterogeneous information about the workers they encounter. Formally, there is a continuum of firms of measure greater than one, indexed by θ ∈ [0,1]. The measure of firms over [0,1] is denoted by F. When firm θ analyses worker i, it observes a direct signal If θ = λ, this signal allows the firm to perfectly infer the worker's productivity. If θ<λ, the firm makes "false positive" mistakes: it observes positive signals from a subset of the low-type workers.
If θ>λ,the firm makes "false negative" mistakes. We assume that firms can be perfectly ranked by their expertise, so one of two cases applies: either F has support in [0,λ] or it has support in [λ,1]. For instance, firms can be interpreted as being "bold" in the first and "cautious" in the second case. 2 Clearly, (2) is a restrictive model of how well informed firms are: in general, firms could make both kinds of mistakes in arbitrarily correlated ways. This formulation has the advantage of providing a natural measure of a firm's expertise since the closer θ is to λ, the better the firm is at correctly identifying a worker's productivity. Each firm can hire at most one worker. Equivalently, we could assume that buyers have limited funds (and are unable to borrow) to leverage their expertise, which may be more natural in some of our financial market applications sketched below. Either way, some form of capacity constraints are needed to keep the problem interesting by preventing the best-informed buyers from implementing all trades. If a firm hires worker i at wage w, its profits are q(i)−w.
Thus, our key innovation compared to the canonical signalling model is that buyers have access to direct, even though imperfect, information about sellers, rather than relying exclusively on self-selection. Moreover, the quality of this information is heterogeneous. 3 For example, some managers have better judgement in assessing the talent of job applicants, as in Board et al. (2017), or recruiters may run tests or interviews (see e.g. Guasch and Weiss, 1980;Lockwood, 1991). Another channel of direct information about workers is through referrals. For example, Beaman and Magruder (2012) and Burks et al. (2015) show empirically that better employees make more and better referrals, and that firms differ in the degree to which their employees can predict the performance of their referrals.

Other interpretations
As is common to signalling models, the crucial feature is that the signal e is costly and satisfies a single-crossing property. For the job market signalling application, single crossing can be verified by letting u(e,w) = w−c(i)e and computing the marginal rate of substitution: − ∂u(e,w)/∂e ∂u(e,w)/∂w = c(i), which is higher for low types. There are many other signalling settings that are formally isomorphic to our baseline model. We briefly describe four of them.
2.2.1. Securitization. Consider first the security design problem of DeMarzo and Duffie (1999). A continuum i ∈ [0,1] of originators each own a pool of assets that generate future cash flow y. The distribution of these cash flows is privately known to the originators, and given by G L (y)if i <λand G H (y)ifi ≥ λ, where G H first-order stochastically dominates G L , and they have common support. The originators prefer receiving cash over holding their risky assets, for instance because they have access to other profitable investment opportunities, or because they have superior ability in valuing assets and therefore want to raise cash to fund new asset purchases. Formally, they value future cash flows from their unissued assets at discount factor α<1. They face a pool of small, heterogeneously informed, buyers who do not discount, so the efficient allocation calls for selling all assets. Of course, due to their private information, the originators face a lemons problem when selling their assets. To raise cash, they therefore issue a limited-liability security backed by their assets. DeMarzo and Duffie (1999) show that, under general conditions, it is optimal to sell a high-quality, senior claim to the assets (i.e. debt) and retain the remaining, risky equity tranche as "skin in the game," i.e. a signal of asset quality. Let Y denote the upper bound of G k , k = L,H; let Y −e denote the face value of the debt tranche, and w denote its price per unit of face value.
Y −e . By first-order stochastic dominance (FOSD), this is higher for low types and therefore satisfies single crossing. Finally, suppose each buyer demands one unit of face value of the asset-backed security. Then the buyer's payoff is q k (e)−w just like in our baseline model, where q k (e) ≡ min y Y −e ,1 dG k (y), because each unit of the security has face value Y −e, so buying one unit of face value means buying 1/(Y −e) securities.
Our model thus captures the equilibrium in this classic tranching problem with the additional feature that buyers are heterogeneously informed about the quality of the asset-backed security. This may involve differential knowledge of aspects of the underlying asset pool or, more importantly, special expertise in the pricing of these securities (such as proprietary pricing models). For instance, Bernardo and Cornell (1997) provide empirical evidence for significant variation in valuations of mortgage-backed securities (with the winning bid exceeding the median bid by over 17% on average) even though all buyers in their data were sophisticated investors or intermediaries. They conclude that this variability is due to differences in pricing technology (see also Eisfeldt et al., 2019). Mattey and Wallace (2001) document heterogeneity of this variability across different mortgage-backed securities, suggesting that some securities are easier to price than others.

Financial structure of firms.
Our next example is a variant of the corporate finance problem studied by Leland and Pyle (1977). Each entrepreneur i owns a project whose future payoff, privately known, is given by (1). As in the previous example, entrepreneurs are impatient, so their own valuation for their project's return is αq(i), and they wish to sell their project to heterogeneously informed investors. To signal the quality of their project, entrepreneurs can publicly announce that they will retain a fraction e of the equity of their firm. If an entrepreneur sells a fraction 1−e of his firm at a price per unit of w then his utility will be w(1−e)+αq(i)e. The marginal rate of substitution is − ∂u(e,w)/∂e ∂u(e,w)/∂w = w−αq(i) 1−e which, again, is higher for low types. If an investor buys one unit of firm i at a unit price w, his profits are q(i)−w. Heterogeneous information among investors could be the result of differential experience in this particular industry, differential contacts with company insiders, or differential access to analyst reports, which make some investors better than others at distinguishing good from bad projects. 4

Insurance.
Our model can also be mapped into the Rothschild and Stiglitz (1976) insurance problem. A continuum i ∈ [0,1] of risk-averse households each have wealth X and will suffer a loss of d with probability 1−q(i). q(i) is given by (1) and is privately known to the household. They face a pool of small, risk-neutral, heterogeneously informed insurance companies, so the efficient allocation calls for households to be fully insured. Insurance companies offer policies that cover the loss d minus a deductible e, in exchange for an up-front premium (1−w)(d −e), so that 1−w is the implicit probability of loss that makes the insurance contract actuarially fair. If a household gets contract (e,w), its where v(·) is the household's von Neumann-Morgenstern utility function. The marginal rate of substitution is . It is straightforward to show that this is decreasing in q and therefore satisfies single crossing. If an insurance company covers one unit of losses from household i at an implicit probability 1−w, then its profits are 1−w−(1−q(i)) = q(i)−w. 5 Heterogeneous information among insurance companies could be the result of some of them having larger actuarial databases or more sophisticated predictive models that allow them to tell apart riskier from safer types.

Dividend policy.
Finally, consider the dividend puzzle, which observes that firms pay dividends even though their tax treatment is less favourable than that of share repurchases. The dividend signalling hypothesis (going back to Bhattacharya, 1979) explains this corporate payout policy by viewing dividends as a costly signal to convey private information about profitability (see e.g. Bernheim and Wantz, 1995, for empirical evidence). Formally, suppose a continuum i ∈ [0,1] of firms will each produce a random, i.i.d. stream of cash flows {y t } ∞ t=1 . The distribution of y is privately known to the incumbent shareholder and given by G L (y)ifi <λand G H (y)ifi ≥ λ, where G H first-order stochastically dominates G L . The conditional means are E i (y) = rq (i), where r is the interest rate and q(i) is given by (1). The incumbent shareholder announces a dividend e to be paid at t = 1 and then sells all its shares (cum-dividend) to heterogeneously informed outside investors. Dividends are taxed at a rate τ . Furthermore, following Bhattacharya (1979), if the cash flow y 1 is less than the announced dividend e, the incumbent agrees to provide the firm with a loan to finance the shortfall, at a cost β (e−y 1 ). Letting w−τ e denote the price paid by investors, the payoff for the incumbent shareholder is u(e,w) = w−τ e−β e 0 (e−y)dG k (y) with k = L if i <λ and k = H if i ≥ λ. The marginal rate of substitution is − ∂u(e,w)/∂e ∂u(e,w)/∂w = τ +βG k (e). By FOSD, this is higher for low types and thus satisfies single crossing. An outside investor's profit is given by the net present value of the firm's cash flows q(i) minus the dividend tax τ e minus the price paid w−τ e, for a total of q(i)−w, just like in the benchmark model. 4. Leland and Pyle (1977) model the cost of retention as risk-bearing by a risk-averse entrepreneur, rather than reduced investment by an entrepreneur who can reinvest his proceeds from selling the project at an above-market rate of return r = 1/α −1 > 0, as we do here following DeMarzo and Duffie (1999). Though the interpretation is similar, the mechanics in Leland and Pyle's model are therefore closer to the Rothschild and Stiglitz (1976) insurance application we sketch below.
5. Since each contract covers d −e losses, covering one unit of losses means that the insurance company enters into 1/(d −e) contracts.
KURLAT & SCHEUER SIGNALLING TO EXPERTS 9 3. EQUILIBRIUM We adopt a Walrasian approach similar to the notion of competitive search equilibrium. There are many (non-exclusive) markets open simultaneously, each defined by a required signal e ∈ R + and a wage w ∈[0,q H ], and there is no guarantee for either workers or firms of finding a counterparty in a market they visit.

Worker's problem
Worker i first chooses a signal e and then applies for jobs. This aligns well with the natural timing, where education is determined before entering the labour market. Similarly, in the corporate finance applications, it corresponds to situations where the design of the security (the size of the junior tranche), the financial structure of the firm (the retained equity) or the amount of dividends to be paid out are determined first, and then the securities or firm shares are offered, potentially in multiple markets with different unit prices. A worker is allowed to apply to all the markets that require his chosen signal e. We assume that, for any given signal e, markets at different wages clear sequentially, starting from the highest wage, as in the "buyer's equilibrium" studied by Wilson (1980). Therefore, a worker starts by applying to market (e,q H ) and, as long as he has not been hired, continues to apply to lowerwage markets. Eventually, he gets hired in market (e,w), and does not apply to markets with lower wages. The worker understands that each choice of e is associated with some probability distribution over wage offers, with c.d.f. denoted by µ(·;e,i). The worker's problem is therefore: wherew(e,i) = wdµ(w;e,i) is the expected wage. We denote the choice of worker i by e i .

Firm's problem
When a firm observes applicants, it may use its information to select which ones to hire, to the extent that it can tell them apart. A feasible hiring rule for firm θ is a function χ : [0,1] → {0,1} that is measurable with respect to its information set, that is, χ (i) = χ i ′ whenever x(i,θ) = x i ′ ,θ . A firm will reject applicants with χ (i) = 0 and hire workers (which we describe as χ -acceptable) from the set I χ = {i ∈ [0,1] : χ (i) = 1}. Let X denote the set of possible hiring rules.
A firm must decide what market to hire from and what hiring rule to apply (it is without loss of generality to assume the firm hires only in one market and uses only one rule). To make this decision, the firm needs to form beliefs G(·;e,w,χ) about what workers it will be drawing from should it choose to hire in market (e,w) with hiring rule χ . If the firm thinks it will find χ-acceptable workers in market (e,w), then G(·;e,w,χ) is a probability measure on I χ ; otherwise beliefs integrate to zero: G I χ ;e,w,χ = 0. Let g denote the density or p.m.f. of G, which we assume is well defined. Firm θ 's problem is: max e,w,χ q(i)−w dG(i;e,w,χ) s.t. χ feasible for θ.
Note that a firm has the choice not to hire workers by simply directing its search to a market/hiring rule where G I χ ;e,w,χ = 0. We denote the choices of firm θ by (e θ ,w θ ,χ θ ).

Consistency of wage distributions
We define demand as a measure on the set of wages, signals and hiring rules 0,q H ×R + ×X.
For any set of wages W 0 ⊆ 0,q H , signals E 0 ⊆ R + and hiring rules X 0 ⊆ X, demand is the total number of firms who make those choices: We then impose the following consistency condition on firms' hiring and the distribution of wage offers received by workers: Condition 1 µ(w;e,i)I(e i = e) = w≤w,X g(i;e,w,χ)dD (e,w,χ) for all e,w,i.
The indicator I(e i = e) takes the value 1 if worker i chooses signal e and zero otherwise; µ(w;e,i) is his probability of getting a wage at most w. Hence, the left-hand side of Condition 1 is the total number of i-type workers with signal e who will obtain wages at most w. Moreover, since beliefs are rational, a firm imposing hiring rule χ in market (e,w) will hire g(i;e,w,χ) workers of type i. Adding these hires across all hiring rules and wages below w using the demand measure results in the right-hand side of Condition 1, which is the total number of i-type workers hired in markets with signal e and wages up to w. Condition 1 simplifies when i-type workers choose signal e (so I(e i = e) = 1), and they have strictly positive probability of finding a job at wage w, so the c.d.f. µ makes a discrete step of size dµ(w;e,i). Then, Condition 1 can be written as: dµ(w;e,i) = X g(i;e,w,χ)dD (e,w,χ) I(e i = e) .
This is the standard rationing rule under frictionless matching, by which the probability dµ for an i-type worker of finding a job at wage w is equal to the ratio of i-type workers demanded by firms in that market over their supply, which is equal to 1. 6 The more general formulation of Condition 1 also deals with cases where µ may increase continuously over some interval of wages, so the probability of being hired in any single market is zero but there is an associated probability density. Both situations will occur in the equilibria we find below. (3) and (4)  In words, type-A workers get hired for sure at wage w H , type-B workers get hired with probability 0.5 at wage w H and probability 0.5 at wage w L , and type-C workers get hired for sure at wage w L .

Example 1 To illustrate the meaning of equations
Condition 1 imposes no constraints on µ when I(e i = e) = 0, i.e., no constraints on i-workers' chances of being hired in markets where there are no i-applicants. For these markets, we impose the condition:

Condition 2 µ is weakly decreasing in i
Condition 2 says that higher-i workers expect higher wages in a FOSD sense. This rules out low types being more optimistic than high types about the wages they would obtain for some off-equilibrium signals. This condition can be derived from more primitive assumptions. Suppose workers believe that firms which hire in markets with off-equilibrium signals use optimal hiring rules. These are weakly monotonic: no firm finds it optimal to accept worker i while rejecting worker i ′ > i. If firms draw workers randomly from those that satisfy their hiring rule (as discussed in the next section), applying Condition 1 to off-equilibrium signals implies that higher types will be hired at weakly higher rates than lower types, resulting in FOSD higher wages.

Consistency of beliefs
Consider a firm that hires in market (e,w) with hiring rule χ. The pool of workers available for hire in this market includes i-type workers only if they choose education e and have not already been hired at higher wages. Therefore, it includes I(e i = e)µ(w;e,i) i-type workers. If firms simply chose at random from the χ -acceptable subset of this pool, then Bayes' Rule would imply that rational beliefs should be: However, if firms with different hiring rules hire sequentially in the same market, firms that hire earlier skew the pool that later firms face, so rational beliefs depend on the order in which firms hire within a market. Kurlat (2016) assumes that there exist separate markets for each wage combined with each possible way of ordering rules, and firms and workers choose which markets to trade in, making the order endogenous. He shows that under "false positives" information, less selective firms hire first. This implies that no one's sample is skewed by earlier firms, so it is as if all firms were drawing from the entire pool of χ-acceptable applicants, and (5) As a result, the 1.5 measure of type-γ firms hire all the remaining workers.
Instead, under "false negatives" information, Kurlat (2016) shows that there may be markets where more selective firms hire first, and (5) does not apply. However, this possibility only arises among firms who only accept high types. Therefore, even if early firms skew the pool, later firms still hire only high types, just as if they had been the first in line. Therefore, the following weaker condition still holds: Condition 3 says that beliefs must be such that the average productivity that firms expect to get if they hire in market (e,w) with hiring rule χ must be the same as if they were drawing from the entire pool. For the false positives case, it holds because it is implied by (5). For the false negatives case, it holds because the only cases where (5) might not hold are when firms only accept high type workers. Rather than explicitly allowing for endogenous ordering of firms' trades and rederiving these results, we incorporate them directly into our definition of equilibrium by imposing Condition 3.

Example 3 Continuing further on Examples 1 and 2, suppose that workers of type A and B
have productivity q H while worker type C has productivity q L . If less selective firms hire first, as assumed in Example 2, then by Bayes' rule, firms α and β expect average quality q H , while firm γ expects average quality (2q L +q H )/3, so Condition 3 applies. Suppose now, by contrast, that more selective firms hire first, so firm α gets to pick workers before firm β in market (e,w H ).
Since firm α hires half a type-A worker first, this skews firm β's remaining applicant sample, and hence its beliefs are as follows:

which violates equation (5). Nonetheless, since both workers A and B have quality q H , firm β expects average quality q H , so Condition 3 remains satisfied.
Condition 3 only applies to markets where the denominator is positive, i.e., where there are χ -acceptable workers. The key challenge in constructing a tractable equilibrium notion is how to discipline firms' beliefs in markets that are empty of χ -acceptable workers. We propose a refinement which guarantees equilibrium uniqueness in the no-information benchmark, and at the same time preserves equilibrium existence throughout. 7 For markets in which no χ -acceptable workers apply, there are two possibilities: either the firm nevertheless believes it could find χacceptable workers and G(·;e,w,χ) is a well-defined probability measure, or the firm believes the market is empty and G I χ ;e,w,χ = 0. For the first case, we require that beliefs only place weight on χ-acceptable workers that would in fact be willing to look for a job in market (e,w). In other words, a firm can never expect to find in market (e,w) a worker who could obtain higher utility by choosing a different signal, or who can find a job for sure with the same signal but a higher wage. Formally, we require:

Condition 4 For any i in the support of G(·;e,w,χ):
1. χ (i) = 1 7. This issue does not arise in Kurlat (2016). He only studies unidimensional contracts, where the price is the only contract dimension. This rules out signalling, and thus there are no markets corresponding to off-equilibrium signals, and no need to specify how beliefs react to these off-equilibrium signals. In his setup, beliefs must satisfy Condition 3,b ut there is no equivalent to the condition we impose in the following.

REVIEW OF ECONOMIC STUDIES
2. e solves worker i's problem 3. µ(w;e,i) > 0 The alternative is that a firm is certain that it cannot find χ -acceptable workers in market (e,w). We impose that a firm can only reach that conclusion if guaranteeing χ -acceptable workers a job with a wage at least w is not enough to persuade them to choose signal e. Formally, we impose: Condition 5 If G I χ ;e,w,χ = 0, then µ(w;e,i) = 0 for all i such that χ (i) = 1.
Conditions 4 and 5 are closely related to the infinite-tightness condition in Guerrieri et al. (2010) and Guerrieri and Shimer (2014). In their setup, for every market there either is at least one worker type who finds that market optimal, or the market tightness is infinite. In the first case, this allows firms to have well-defined beliefs about which workers they would encounter; in the second, workers would match for sure. Conditions 4 and 5 generalize this idea by imposing it separately for each χ -acceptance group. For each χ , it has to be the case that either some χ -acceptable worker finds visiting this market optimal (in which case this worker can be in the support of well-defined beliefs) or all χ -acceptable workers are guaranteed jobs. Within a given market, which of these possibilities applies can be different across different hiring rules χ .

Equilibrium definition
We summarize the above discussion in the following equilibrium definition:

PURE SIGNALLING
We now characterize equilibrium for the case where F is a point mass at θ = 0 (or equivalently at θ = 1), i.e. when all firms are completely uninformed. This corresponds to the classic signalling environment. For this case, the least-cost separating allocation emerges as the unique equilibrium. In this allocation, low types get no education, high types get just enough education to separate with: and each type is paid their own productivity, as illustrated in Figure 1.

Figure 1
The least-cost separating allocation.

Wage distributions:
4. Beliefs: The equilibrium is constructed by setting the distribution µ as a point mass at the lower envelope of the indifference curves of both types, which makes low types indifferent between any e ∈ 0,e * and high types indifferent between any e ≥ e * . Therefore, e = 0 for low types and e = e * for high types is indeed optimal. This is then sustained by firms' belief that in the range 0,e * they will only encounter low types above the lower envelope and no one at all below, and similarly for high types above e * . Hence there are no profits in any market, and firms are trivially optimizing. A measure F(1)−1 remain inactive, for instance by choosing a market with e = 0 and w < q L (and selection rule χ (i) = 1 for all i).
The key step in establishing uniqueness is to rule out pooling, i.e. markets with positive supply of both high and low types. This follows the standard logic based on single crossing. If there was pooling at a level of education e ′ , then high types would require a lower wage than low types to be willing to choose e = e ′ +ǫ. Hence firms that consider hiring in a market with e = e ′ +ǫ and a wage that leaves high types indifferent must believe that they will only encounter high types, which for small ǫ must be more profitable than hiring at e ′ .
The types of deviations to pooling contracts that may lead to non-existence of a pure-strategy equilibrium in Rothschild and Stiglitz (1976) are not profitable because each firm perceives itself to be small. 8 A job with e = 0 and w = q H −c H e * +ǫ is strictly preferred to the equilibrium by all workers and if a firm was large and could hire the entire population it could break the equilibrium by offering to hire everyone in this market, which would be profitable for low values of λ. Here, 8. Rosenthal and Weiss (1984) and Dasgupta and Maskin (1986) show that mixed-strategy equilibria do exist.

REVIEW OF ECONOMIC STUDIES
if a small firm tries to hire in this market, it will not attract any high types, because they know that they will be competing with all the low types for an infinitesimal chance to be hired and will have to settle for w = q L if they are not. Formally, this is captured by the assumption that beliefs do not depend on whether a firm decides to recruit in a particular market. This is the same logic that leads to existence and uniqueness in Guerrieri et al. (2010).

FALSE POSITIVES
We now consider the case where F has full support on [0,λ] and is continuous. We start by characterizing the equilibrium when workers do not have a way of signalling ("no signalling"). Next, we characterize a class of possible equilibria ("partial signalling") that involve signalling by a fraction of the high types. We then show that any equilibrium must be either the pure signalling equilibrium described in Section 4, a no signalling equilibrium or a partial signalling equilibrium, and find conditions for each of them to arise.

No signalling
We now characterize the equilibrium for the case where workers are constrained to choose e = 0, which is the case studied by Kurlat (2016). Let θ N and w N be defined as the solutions to: In equilibrium, all high-type workers (and some low-type workers) are hired at some wage w N , and the low-type workers who fail to find a job at wage w N are hired at wage q L .
To understand the meaning of conditions (11) and (12), observe first that if firm θ hires at wage w N and imposes hiring rule χ θ (i) = I(i ≥ θ), it hires randomly from the interval [θ,1]. Therefore it ends up hiring a low type with probability λ−θ 1−θ and a high type with probability 1−λ 1−θ . Its expected profits will be: Profits are increasing in θ: firms whose information enables them to screen out a higher proportion of low types will be hiring from a better pool of workers. Only firms that are sufficiently confident in their ability to tell workers apart will be willing to hire in this market; they will make profits if and only if they are above the cutoff θ N defined by equation (12), which satisfies (θ N ) = 0.
For all 1−λ high-type workers to be hired at wage w N , it must be that there are enough firms in the range θ N ,λ to hire all of them. Given that in expectation firm θ hires 1−λ 1−θ high-type workers, this means that θ N must satisfy (11). 9 9. Note that low-type workers do not guarantee themselves a job at wage w N since some of the firms hiring in this market will reject them; only firms θ ∈ θ N ,i hire in market at wage w N and accept worker i. Therefore, his probability of finding a job at wage w N is This probability is increasing in i since higher-i low types mislead more firms into hiring them at wage w N . It is equal to zero for workers i <θ N since no firm that would accept them hires at wage w N .

KURLAT & SCHEUER SIGNALLING TO EXPERTS 17
The following result, proved by Kurlat (2016), establishes that this is the unique equilibrium.
Proposition 2 If workers are constrained to choose e = 0, there is a unique equilibrium where: 1. Firms with θ ≥ θ N hire at wage w N and other firms are indifferent between hiring at wage q L or not hiring. 2. High types are hired at w N with probability 1. 3. Beliefs follow (5) for all w ≥ q L and are zero for lower wages.
Conditions (11) and (12) are the analogues of conditions (19) and (20) in Kurlat (2016). There are four minor differences. First, Kurlat (2016) assumes that assets are divisible and the law of large numbers applies, so he has exact pro-rata rationing instead of probabilistic rationing. Under risk neutrality, this distinction does not matter. Second, he allows some sellers to have a positive value for retaining the good, while we assume it to be zero so workers sell all their labour endowment in equilibrium. Third, Kurlat (2016) assumes q L = 0, so the one-price equilibrium he finds is equivalent to the two-price equilibrium we have, where some low types trade at price q L . Finally, he models buyers' capacity constraints in terms of dollars rather than in terms of quantities, so the price appears in the market-clearing condition. Note that as F approaches a point mass at θ = 0, then equations (11)-(13) imply that θ N → 0, w N → λq L +(1−λ)q H and dµ w N ;i → 1. If firms are uninformed and workers cannot signal, then all workers get hired for sure at a wage equal to the average productivity. This is the pure Akerlof (1970) outcome: all workers have the same reservation wage (zero) so there is no adverse selection at the pooled price.

Partial signalling
In a partial signalling equilibrium, low-type workers choose e = 0. They are hired with some probability in at wage w P , defined by: and otherwise at wage q L . High-type workers choose either e = 0 (and are hired for sure at wage w P )ore = e * (and are hired for sure at wage q H , which gives them the same utility). Let π P be the fraction of high types that choose e = 0. If firm θ hires in market 0,w P with hiring rule χ θ (i) = I[i ≥ θ ], it will hire a high type with probability π P (1−λ) λ−θ +π P (1−λ) , so its profits will be (θ) This defines a cutoff θ P such that firms can make profits in market 0,w P if and only if θ>θ P : Firms with θ<θ P are indifferent between hiring in market (0,q L ) (with low-type applicants only), in market e * ,q H (with high-type applicants only), or not hiring at all, since they make zero profits in any case. The calculations above assume that some high types are indeed willing to apply to market 0,w P . For this to be true, it must be the case that they are sure they will find a job, since they can always guarantee themselves the same utility by choosing e = e * and getting a job that pays w = q H . This means that there must be enough firms above θ P to hire all π P (1−λ) high types who forgo education and apply to market 0,w P . By the arguments above, each firm θ ≥ θ P hires By the same reasoning as in the no-signalling case, low types are hired in market 0,w P with probability dµ w P ;0,i = i θ P 1 λ−θ +π P (1−λ) dF (θ). The indifference condition (15) and the market-clearing condition (16) define two relationships between the cutoff firm θ P and the fraction of high types π P that forgo signalling. Both of these relationships are downward sloping, as shown in Figure 2.
The indifference condition (15) is downward-sloping because if more high types decide to forgo education, they improve the pool of workers available for hire in market 0,w P , allowing less-informed firms to earn profits. The same is true for the market-clearing condition (16) because if more high types decide to forgo education, they can only find jobs in market 0,w P if additional firms decide to hire there. In other words, there is a complementarity between entry into market 0,w P by firms and by high-type workers. The more high types forgo education, the more profitable it is for any given firm to hire in 0,w P ; the more firms hire in 0,w P , the more high-type workers can refrain from signalling.
The strategic complementarity implies that there can be multiple intersections of (15) and (16), and possibly multiple partial signalling equilibria. This source of multiplicity is different from the forces that may lead to multiplicity in Akerlof (1970) (where adverse selection depends on the price) or in canonical signalling models (where different off-equilibrium beliefs can be selfsustaining). Indeed, with our refinement on beliefs, the uninformed-firms benchmark has a unique equilibrium (Proposition 1), as does the no-signalling case (Proposition 2). The multiplicity we identify here relies on the presence of both signalling and heterogeneous information among firms.  Note that a no-signalling equilibrium corresponds to a situation where the market-clearing condition is above the indifference condition at π = 1, as in Figure 2. This means that if π = 1 (no high types signal) there are more firms willing to hire at w P than the total mass of workers they would accept. As a result, high-θ firms "bid up" the wage to w N > w P , leading some firms to drop out until the number of firms willing to pay this wage equals the number of hightype workers. Moreover, a pure signalling equilibrium is a special case of a partial signalling equilibrium, with π P = 0 and θ P = λ. Figure 3 shows which markets are active in each class of equilibrium.

Candidate equilibria
The following result establishes that any equilibrium must belong to one of the three cases described above.
Proposition 3 Any equilibrium is of one of the three following types: 1. Pure signalling. Low types choose e = 0 and high types choose e = e * . 2. No signalling. All workers choose e = 0. Firms hire in market 0,w N if and only if θ ≥ θ N . θ N and w N satisfy (11) and (12); and w N ≥ w P . 3. Partial signalling. Low types choose e = 0; a fraction π P of high types choose e = 0 and the rest choose e = e * . Firms hire in market 0,w P if and only if θ ≥ θ P . π P and θ P satisfy (15) and (16).
The key to proving Proposition 3 is to establish that high and low types cannot coexist at any level of education other than e = 0, so there is no pooling at positive signalling levels. The logic is similar, though somewhat subtler, to that in the uninformed-firms benchmark.
Suppose that low and high types coexisted in some market (e,w) with e > 0, as illustrated in Figure 4. With differentially informed firms, the standard argument that rules this out by considering market e ′ ,w ′ does not go through. Some firms hiring in market (e,w) may be screening out low types, so low types' expected wage with signal e could be lower than that of high types. Thus, it is possible that both high and low types find e ′ ,w ′ more attractive than (e,w). Instead, we can rule out pooling in market (e,w) by contradiction, as follows. There are two possibilities: either the highest firm that hires in market (e,w) has θ = λ (i.e. it can tell workers

Figure 4
Ruling out pooling at e > 0. apart perfectly), or the highest firm to hire in this market has θ<λ. In the first case, we arrive at a contradiction by considering the beliefs of firm θ = λ about market 0,w ′′ . This firm's beliefs can only include high types so it only cares about the wage it pays. Therefore this firm finds market 0,w ′′ preferable over market (e,w), leading to a contradiction. If instead the highest firm that hires in market (e,w) has θ<λ, then any low-type worker in the range i ∈ (θ,λ) has the same chance of getting a job in market (e,w) as a high type. If this is so, then firm θ can apply the standard cream-skimming deviation by hiring in market e ′ ,w ′ . Firm θ can reject all low types who prefer e ′ ,w ′ over (e,w), so it can guarantee itself high types by hiring in this market, which contradicts the premise that it hires in market (e,w).

Existence
So far, we have described the possible candidates for equilibrium but we have not proved that any of them is actually an equilibrium. We now show that the candidate equilibria described above may or may not actually be equilibria. We construct a class of possible deviations and derive an easy-to-verify condition to determine whether these deviations are profitable. We then show that checking this condition is sufficient to establish an equilibrium, and prove that at least one equilibrium always exists.
Consider first a candidate partial signalling equilibrium. Define e D θ ,w D θ as the lowest-wage market where equilibrium requires that the beliefs of firm θ ∈ θ P ,λ only include high types. A necessary condition for equilibrium is that firm θ cannot increase its profits by recruiting in market e D θ ,w D θ instead of market 0,w P . The location of market e D θ ,w D θ is illustrated in Figure 5. Worker i = θ is the lowest-i low-type worker that firm θ cannot filter out. In equilibrium, this worker obtains expected utility: by getting a wage of either w P or q L with the equilibrium probabilities. For small but positive levels of education e, it is consistent with equilibrium for firm θ to believe that it will only KURLAT & SCHEUER SIGNALLING TO EXPERTS 21

Figure 5
Beliefs for firm θ encounter low types in e-markets. The reason is that since u(θ) < w P , worker i = θ will be willing to choose e for a lower wage than high types would. Hence, one can specify beliefs such that firm θ does not want to recruit at education level e. However, for large e that is no longer the case because education is more costly for low types. e D θ ,w D θ is defined by the intersection of the equilibrium indifference curves of worker i = θ and high types. At education levels higher than e D θ , firm θ can only believe that it will encounter exclusively high types, because high types would be willing to choose these education levels for a lower wage than worker i = θ.
Hiring in a market like e D θ ,w D θ is similar to the cream-skimming deviations that are used to break putative pooling equilibria in Rothschild and Stiglitz (1976) and related models, including the uninformed-firms benchmark of Section 4. In candidate equilibria where some high types choose e = 0, they end up being hired in market 0,w P , where they are pooled with low types. Just like in the benchmark, the possible deviation involves peeling off high types by requiring an action that is more costly for low types than for them. However, there are two important differences.
First, unlike in the Rothschild and Stiglitz (1976) model, purely local deviations do not work. A firm cannot cream-skim the high types off a pooling contract by requiring a small amount of signalling. Since low types are hired from the 0,w P pool at lower rates than high types, they obtain lower utility. Therefore, they find deviations more attractive than high types as long as they involve only a small amount of extra signalling. In order to repel the low types, the deviating firm must require a sufficiently larger signal.
Second, in order to profit, the deviating firm must use both sources of information in combination: direct assessment and signalling. A completely uninformed firm cannot profitably deviate because in order to repel the lowest-i low types (who cannot get jobs at 0,w P at all) it must require e = e * and therefore pay at least q H to attract high types, at which point the deviation is no longer profitable. In order to profitably deviate, a firm must possess sufficient expertise to be able to reject the lowest-i low types directly and then rely on the signal to screen out the higher-i low types.
A candidate partial pooling equilibrium can only be an equilibrium if, for every θ ∈ (θ P ,λ), the profits firm θ can obtain in market e D θ ,w D θ by hiring only high types are weakly lower than those it obtains in market 0,w P by hiring a mixture of workers at a lower wage. A similar logic applies to the case of a no-signalling equilibrium. The following result determines when this condition is satisfied in either case and establishes that checking against this possible deviation is a sufficient condition for equilibrium existence.
Proposition 4 1. The pure signalling candidate equilibrium described in Proposition 3 part 1 is always an equilibrium. 2. Suppose θ N and w N ≥ w P satisfy equations (11) and (12) for a no-signalling candidate equilibrium. Then the worker and firm decisions described in Proposition 3 part 2 are part of an equilibrium if and only if: 3. Suppose π P and θ P satisfy equations (15) and (16) for a partial signalling candidate equilibrium. Then the worker and firm decisions described in Proposition 3 part 3 are part of an equilibrium if and only if In sum, the pure signalling equilibrium, which coincides with the no-information benchmark, always exists in our model. The reason is that π P = 0, θ P = λ always satisfies equations (15) and (16) and condition (19) holds for θ = λ. Depending on parameters, additional equilibria may exist where firms use their expertise.
It is easy to construct examples where a partial or no-signalling equilibrium does exist. Figure 6 shows an economy with multiple candidate equilibria. For the candidate equilibrium π P 1 ,θ P 1 , condition (19) holds, so it is indeed a partial signalling equilibrium. Instead, for candidate equilibrium π P 2 ,θ P 2 , condition (19) fails for some θ>θ P 2 , so it is not an equilibrium. 10 5.5. Properties of the equilibrium 5.5.1. Equilibrium regions. Figure 7 illustrates what type of equilibrium arises in different regions of the parameter space. As we change parameters, the possible outcomes of the model span the range from pure signalling, via partial signalling, up to the no-signalling allocations in Kurlat (2016).
Both panels plot the equilibrium regions as a function of a parameter A on the vertical axis that shifts the distribution of firms F towards more expertise. 11 We know from Proposition 4 that the pure signalling equilibrium always exists, and it is indeed the only equilibrium for low enough levels of expertise as captured by the parameter A. As the distribution of expertise improves (in a FOSD sense), holding the other parameters fixed, first a partial signalling and finally a nosignalling equilibrium emerges in addition. Hence, as firms become better informed, less costly signalling is required. Moreover, we show formally in Appendix A that, in the region with a partial signalling equilibrium, the share of high types 1−π P who signal also decreases with a FOSD shift in expertise. Better tools for directly evaluating job applicants, firm shares, asset-backed securities or insurance applicants reduce the need to signal through education, dividends, retained equity tranches, or high deductibles, respectively. In this way, direct information substitutes for traditional signalling.
A FOSD increase in F is isomorphic to an increase in demand where each firm hires workers instead of just one. This is because making firms more expert is equivalent to letting the

24
REVIEW OF ECONOMIC STUDIES more expert firms hire more workers. 12 Our model thus generates the plausible prediction that more high types forgo signalling through education (or that the amount of retained equity falls) in boom times (see Gee (2018) for descriptive evidence of this effect). This intuitive property is absent when buyers are uninformed: in that case, pure signalling is always the only equilibrium independent of demand. It is also absent in the no-signalling equilibrium where higher demand just translates into higher wages. The left panel shows that increasing the relative cost of signalling c H /c L has the same effects as improving expertise on the type of equilibrium we find, holding the other parameters fixed (including A). 13 Moreover, we show in Appendix A that, within the class of partial signalling equilibria, the amount of signalling 1−π P decreases with signalling costs. Hence, as signalling gets more expensive, fewer high types signal. Note that the no-information benchmark, somewhat unappealingly, does not have this property: all high types choose e = e * and e * does not depend on c H , so high types do not respond to a higher cost of signalling by signalling less. Allowing for heterogeneously informed firms overturns this counterintuitive feature: equilibrium forces do lead workers to respond on the extensive margin.
Finally, in the right panel, we vary the share of low types on the horizontal axis. To do so in a clean way, we reparametrize the model by assuming that the mass of low-type workers iŝ λ, distributed uniformly in the interval [0,λ], with a densityλ λ ; correspondingly, the mass of high types is 1−λ, distributed uniformly in the interval [λ,1] with a density 1−λ 1−λ . Changes inλ have the interpretation of changes in the fraction of low types, leaving their relative detectability in the eyes of firms constant. We see that reducing the share of low types this way moves the equilibrium from pure signalling to partial equilibrium and finally to no signalling. 14 Indeed, we show formally below that, as the share of low types becomes sufficiently small, a no-signalling equilibrium must always emerge.

Continuity in the symmetric information, no-signalling, and no-information limits.
One counterintuitive feature of the uninformed-firms benchmark is that it is discontinuous in the buyers' prior. If all workers have the same productivity, there is no-information asymmetry and no signalling in equilibrium. However, as soon as there is even an infinitesimal mass of low types, high types will signal enough to separate. The following result shows that this unappealing property vanishes in our model, as in Daley and Green (2014) where the presence of exogenous information also avoids the discontinuity.
Proposition 5 1. Let F be any continuous measure with full support on [0,λ]. For lowλ there is a no-signalling equilibrium with limλ →0 w N = q H . 2. Let F * be a mass point at θ = λ. For any continuous F sufficiently close to F * (under the total variation distance), there exists a no-signalling equilibrium, and lim F→F * w N = q H .
One way to approach the symmetric information limit is by takingλ → 0, sinceλ = 0 implies symmetric information. Asλ → 0, there is always a no-signalling equilibrium, and w N → q H . Hence, this equilibrium smoothly approaches the symmetric information outcome. Pure signalling is also an equilibrium for any positiveλ, so the discontinuity does not go away entirely, 12. Mechanically, this is because condition (16) becomes λ−θ +π P (1−λ) dθ = 1, so changing is equivalent to a change in f (θ ).
13. The example in the graph uses q H = 1,q L = 0.4,λ= 0.55 in addition to the linear specification of f (θ ) from above.
14. The example uses q H = 1,q L = 0. but the set of equilibria is lower hemi-continuous inλ. A second direction to approach the symmetric information limit is making the distribution F approach a mass point at θ = λ, since that limit also implies symmetric information. Again, a no-signalling equilibrium always exists sufficiently close to the limit, so the set of equilibria is lower hemi-continuous in this dimension as well. 15 A second form of discontinuity in the uninformed-firms benchmark arises with respect to the cost of signalling. For any c H /c L < 1, high types will signal enough to fully separate, whereas when c H /c L = 1 the signal does not allow high types to separate and pooling allocations result. The current model, instead, is lower hemi-continuous as c H /c L → 1. In the opposite limit, as signalling becomes cheap, the model reduces to the uninformed-firms benchmark.
Proposition 6 1. For c H /c L sufficiently close to 1, there is a no-signalling equilibrium. 2. For c H /c L sufficiently close to 0, only the pure signalling equilibrium exists.
Part 1 of Proposition 6 establishes that if signalling is sufficiently expensive, there is an equilibrium with no signalling, where all workers pool at e = 0. If within this limiting case one takes the limit as F becomes degenerate at 0 (meaning firms have no information), then this reduces to the pooling allocation in Akerlof (1970). Conversely, part 2 establishes that if signalling is sufficiently cheap, then the only equilibrium allocation is the benchmark least-cost separating allocation and firms' expertise is not used.

Welfare
The only reason why allocations in the model are not first-best efficient is that signalling is socially wasteful. This does not immediately imply that equilibria with less signalling are Pareto superior: expected wages for different workers are different across equilibria so it is possible that there could be winners and losers from shifting from one equilibrium to another. The following result establishes that partial signalling equilibria can indeed be Pareto ranked against each other (and against the pure signalling equilibrium), but cannot be Pareto ranked against a no-signalling equilibrium if it exists: Proposition 7 1. Suppose there is a partial signalling equilibrium with π P 1 > 0.
(a) It Pareto dominates the pure signalling equilibrium in the same economy. (b) If there is another partial signalling equilibrium with π P 1 >π P 2 in the same economy, the first equilibrium Pareto dominates the second.
2. Suppose there is a no-signalling equilibrium.
(a) It Pareto dominates the pure signalling equilibrium in the same economy. (b) If there is also a partial signalling equilibrium with π P > 0 in the same economy, neither equilibrium Pareto dominates the other.
In comparing partial signalling equilibria, it is straightforward to show that firms are better off in the higher-π P equilibrium, since wages are the same and there is a better pool of workers at 15. By contrast, in the degenerate case where F has full mass at some θ<λ, a no-signalling equilibrium never exists. The right-hand side of (18) is zero at θ in this case, so there is always a profitable deviation. Intuitively, when all firms are equally well informed, our model collapses to a standard signalling model and only the pure signalling equilibrium exists. Hence, heterogeneity of information is crucial to obtain the continuity results in this section.

26
REVIEW OF ECONOMIC STUDIES 0,w P . High-type workers are indifferent because their payoff is w P . The critical step is to show that low-type workers are also better off. They gain from the fact that more firms are hiring in market 0,w P , which (other things being equal) increases their chances of earning w P but lose from the fact that there are more high-type workers looking for work at 0,w P , which lowers their chance of being hired by any given firm. However, using the fact that in both equilibria high types must be hired for sure it is possible to show that the first effect dominates, so low types also prefer the higher-π P equilibrium.
A no-signalling equilibrium (if it exists) cannot be Pareto ranked against partial signalling equilibria. Since the wage is higher and the cutoff firm is lower, workers are better off in the no-signalling equilibrium. However, the best firms are worse off since they have to pay higher wages and their accurate signals mean they benefit little from the improved pool of workers. Intermediate firms with θ ∈ θ N ,θ P are better off in the no-signalling equilibrium while they would make zero profits in the partial signalling equilibrium.
The model also makes it possible to ask, assuming there is a technology for firms to choose θ at some cost, whether they have the right incentives to invest in acquiring expertise, such as improving assessment models for job applicants, risk scoring models in insurance markets or pricing models for stocks and financial derivatives. In Appendix B, following the approach in Kurlat (2019), we show that in general the answer is ambiguous: firms may have incentives to either over-invest or under-invest in expertise. We also provide a simple formula to quantify the ratio of the social and private returns to expertise based on observable properties of the equilibrium.

FALSE NEGATIVES
We now turn to the case with "false negative" mistakes, where F is continuous with support [λ,1]. Higher-i workers are relatively transparent, since most firms can tell (with certainty) that they are high types, while lower-i high types are relatively obscure, since they can only be identified as high types by the smarter, lower-θ firms. For expositional purposes, assume that the density of firms f (θ) is strictly increasing, meaning that there is a higher density of less informed firms. The general case, which requires working with an "ironed" density, is treated in Appendix D.
Unlike the false positives case, firms face a non-trivial decision as to what hiring rule to use. There may be markets where a firm θ observes x(i,θ) = 0 for all the workers that apply (so if it insisted on hiring only workers with a positive signal it would not be able to hire at all) but it knows that in equilibrium some high-type workers with i ∈[λ,θ) do apply, so it may want to hire from the pool of all applicants. We refer to this as non-selective hiring.

Description.
In equilibrium, only the least transparent high-type workers signal. Letting u L denote the low types' payoff, there is a cutoff i S such that workers in the interval i ∈ [λ,i S ] signal by choosing: while everyone else chooses e = 0. Signalling markets with e = e S are straightforward: all the applicants are high types, so less informed firms compete for them and hire them (non-selectively) at a wage w = q H . No-signalling markets, with e = 0, are more interesting. Define i H by:

KURLAT & SCHEUER SIGNALLING TO EXPERTS 27
Since f (θ ) is assumed to be increasing, this means that for all i > i H there are more firms who can detect high-type workers than there are workers. Hence, firms compete for them and hire them (selectively) at wage w = q H . Conversely, for i ∈ (i S ,i H ), there are more workers than firms who can identify them as high types. Therefore, some of them have to be hired non-selectively, at wages sufficiently low to attract non-selective firms. At each wage w ∈ (q L ,q H ) where there is active hiring, two types of hiring take place: some workers are hired non-selectively, and in addition all the highest remaining i-types are hired selectively and thus drop out of subsequent, lower-wage markets. Let w(0,i) be the highest wage such that all worker types above i have already been hired. The pool of applicants at w(0,i) consists of all the low types plus high types in the interval (i S ,i] who have not been hired non-selectively at higher wages. As a result, non-selective firms break even at a wage of: 16 Firms with θ = i hire f (i) workers selectively in this market since it involves the cheapest wage at which they can identify high-type workers. Therefore, it must be that the remaining 1−f (i) workers of type i were already hired non-selectively at wages above w(0,i). Since this is true for any i, the probability density for any worker of being hired non-selectively in market (0,w(0,i)) must be f ′ (i). Hence, the expected utility obtained by worker i is: This defines a cutoff worker i * who is indifferent between signalling (which gives a payoff q H −c H e S ) and not signalling: Market 0,w 0,i * is the lowest-wage market at which there is a chance of being hired nonselectively. Low-type workers who have not found a job at or above this wage end up getting hired at w = q L . Therefore, the expected utility of low types is Replacing (20), (22), and (25) into (24) and simplifying gives the following indifference condition for the marginal worker i * : Equation (26) defines a positive relationship between i * (the worker who is indifferent between signalling and not signalling) and i S (the cutoff for actually signalling). In general, i * and i S are 16. Accordingly, the non-selective firms' beliefs are given by g(i;0,w,χ) = I(i <λ)+I(i ∈ (i S ,i r (w)]) λ+i r (w)−i S for χ(i) = 1∀i, where i r (w) is the inverse of w(0,i). not equal; there is a range of workers i ∈ i S ,i * who are indifferent between signalling and not signalling but choose not to. It is straightforward to show that i * is increasing in i S . Workers who signal do not apply for jobs in e = 0 markets. Higher i S (more signalling) means the pool of applicants for non-selective firms worsens, so in order to maintain zero-profits the wage must fall (equation (22)). In turn, this means that the utility of both high-and low-type workers falls (equations (23) and (25)). It falls more for high types because low types are hired with positive probability in market (0,q L ), where the wage is unaffected by higher i S . Hence, other things equal, higher i S makes signalling more attractive, so the indifferent type i * rises.
A fraction 1−f i * of workers in the range i ∈ i S ,i * are hired at wages above w 0,i * ,sothe remaining f i * i * −i S workers must be hired at wage w 0,i * . For any i ∈ i S ,i * , the measure of firms who are capable of identifying i as being a high type is F (i), so we need f (i * )(i−i S ) ≤ F(i).
By monotonicity of f , this is implied by the market-clearing condition: Equation (27) defines another positive relationship between i S and i * . If more of the obscure workers decide to signal, then the most informed firms will work their way up to hire slightly less obscure workers. Figure 8 summarizes the equilibrium signals, wages and hiring decisions.

Corner equilibrium.
Equations (26) and (27) hold for an interior equilibrium where some range of workers are indeed hired by non-selective firms. However, it is possible that all workers below i H prefer to signal rather than being hired at a wage low enough to attract non-selective firms, which would result in a corner equilibrium with i * = i H . For this corner equilibrium, the market-clearing condition (27) and definition (21) imply i S = i H −F (i H ). Also, in this corner equilibrium, there is no nonselective hiring, so u L = q L and e S = e * . This will be an equilibrium if workers just below i H indeed prefer to signal: wage for non-selective firms to break even so, using (6), which is equivalent to Ŵ(i H ,i H −F (i H )) < 0. The following proposition summarizes these results: The proof is in Appendix D, which also describes all firms' decisions. Moreover, we show that the equilibrium behaves continuously in the symmetric information and expensive signalling limits, and we deal with the general case in which the density of firm types f (θ) is not necessarily monotone.

Properties.
This model generates dispersion in expected wages among workers who are equally productive and educated, depending on how transparent they are. In particular, the expected wages of high types i ∈[i * ,i H ), who all select e = 0, are increasing in i. 17 Similarly, the model can explain, for instance, different prices for asset-backed securities for which both the structure of tranches and the underlying cash flows are similar, but which differ in how many buyers have access to accurate pricing models to evaluate them. Interestingly, this dispersion is driven by break-even conditions of firms that are not making use of expertise. The structure of equilibrium is similar to the pattern of signalling and "countersignalling" (Feltovich et al., 2002): it is the hard-to-identify high types who must use the costly signal in order to differentiate themselves from low types. By contrast, the most obvious high types can be confident that expert buyers are able to tell them apart, thus eliminating the need for signalling. The setup in Feltovich et al. (2002) features three different levels of worker productivity; in our two-type model, countersignalling instead emerges because high types differ in their transparency. Moreover, our model generates the intuitive prediction that expected wages of those high types who "countersignal" increase in their transparency.

REVIEW OF ECONOMIC STUDIES
We can also ask how the intensive and extensive margins of signalling, measured by e S and i S −λ respectively, depend (locally) on parameters around an interior equilibrium. In Appendix A, we show that an increase in the ratio c H /c L reduces signalling along both margins. For example, an increase in dividend taxes leads to both a smaller fraction of firms paying dividends and a lower dividends per dividend-paying firm. We also show that an increase in demand leads to polarization in signalling: fewer workers choose positive education but those who do choose a higher quantity.

CONCLUSION
We have developed a general theory to analyse competitive equilibria in economies where buyers possess heterogeneous information about sellers and contracts are multidimensional, specifying both a price and a signal. These information and contracting patterns are the feature of many markets, including labour, asset, and insurance markets, as we have illustrated through a series of examples. Our notion of equilibrium implies that an equilibrium always exists, it may not be unique in the false-positives case but is generically unique in the false-negatives case, and it may not be efficient. Moreover, we uncover a tractable structure to characterize it in both cases, based on the intersection of an indifference and a market-clearing condition. This allows us to provide results on comparative statics. Our model predicts intuitive and continuous equilibrium responses to, for instance, changes in the prior, demand, signalling costs or expertise that cannot be generated in the canonical model with uninformed buyers.
We expect that our framework can be extended to study other structures of buyers' direct information, including ones where firms cannot be perfectly ranked by their expertise, such as when both false positive and negative errors occur. In this case, we conjecture the equilibrium to feature a combination of the two pure cases we have analysed: high types are hired in a similar way as in the false-negatives case, except that those in [i S ,i H ) are partly hired by selective falsepositive firms, because those firms have an advantage over non-selective firms by being able to screen out some low types.
Our model may also be a useful starting point to study a number of richer environments. First, a market for information may arise, where better informed firms sell their information to less informed ones (e.g. in the form of analyst reports), instead of just trading on it themselves. To prevent the price of information from dropping to zero, some form of capacity constraints would again be required, which would effectively change the distribution of expertise in our model. Second, many of our applications have a dynamic aspect, where the costly signal involves a delay in trading. Our approach could be used to consider settings where some direct information is revealed to buyers gradually at heterogeneous rates, and one could explore how this affects the timing pattern of trades. These issues are left for future research. We compute how the amount of signalling 1−π P in a partial signalling equilibrium depends on parameters. We focus on cases where the locus of the market-clearing condition (16) is steeper than of the indifference condition (15), which corresponds to a heuristic notion of stability of the equilibrium.
Proposition A.1 1. Signalling decreases with the cost ratio c H /c L .
2. Signalling decreases with a FOSD increase in the expertise distribution F or an increase in the demand for workers . 3. Signalling does not change with productivities q H and q L .
The logic of part A.1 is as follows. The ratio c H /c L governs how much utility high types obtain if they separate by choosing e * . Since w P is the wage that makes them indifferent, higher c H /c L means a lower wage. This attracts lower-θ firms, so more high types can forgo signalling and still find a job. As for part A.1, a FOSD increase in the distribution of θ means that firms are able to screen out more low types, and therefore hire more high types (and an increase in has the same effect). Therefore, more high types are able to forgo education and still find a job. Finally, productivities have no effect on equilibrium signalling. The wage w P is a weighted average of q H and q L . Therefore, no matter what these productivities are, the indifferent firm θ P will be the one whose pool of acceptable workers includes a proportion of exactly c H /c L low types. If, say, the productivity of low types was lower, the wage w adjusts exactly so as to leave firm θ P indifferent and the fraction of high types who signal unchanged.
Proof. Using the reparametrization of the model where each firm demands workers rather than just one, it is straightforward to show that equations (15) and (16) become Replacing (6) and (14) into (A.1), the indifference condition reduces to: Let θ I π P ,p and θ M π P ,p represent the solutions to (A.3) and (A.2), respectively, where p is a parameter. The equilibrium value of π P is given by a solution to the equation θ I π P ,p −θ M π P ,p = 0. Using the implicit function theorem, the derivative of π P with respect to parameter p is given by: (

A.2. False negatives
We compute how the intensive and extensive margins of signalling depend on parameters around an interior equilibrium. 2. The intensive margin of signalling increases but the extensive margin decreases with the demand for workers .
3. The extensive margin of signalling is invariant with respect to productivities q H and q L ; the intensive margin e S increases with q H −q L .
Higher c H /c L makes separation more costly, so fewer high types signal. This improves the pool of workers in no-signalling markets, so non-selective firms pay higher wages. This raises the utility of low types, so less intense signalling is required to separate from them. An increase in demand means that at every level of expertise there are more selective hires, and therefore fewer non-selective hires, so it is harder for low types and obscure high types to get hired non-selectively. This makes low types worse off; therefore a more intense signal is needed to successfully separate, so fewer high types do so. As in the false-positives case, q H and q L drop out of equations (26) and (27), so the extensive margin is unchanged. However, a greater gap between q H and q L makes it more attractive for low types to mimic high types, so separation requires a more intense signal.
Proof. Let i * I (i S ,p) and i * M (i S ,p) represent the solutions to (26) and (27) respectively, where p is a vector of parameters. The equilibrium value of i S is given by a solution to the equation i * I (i S ,p)−i * M (i S ,p) = 0. Using the implicit function theorem, the derivatives of i * and i S with respect to parameter p are given by: The denominator of (A.5) is negative, and equation (26) implies that ∂i * I ∂i S is positive. Furthermore, the implicit function theorem implies that and equation (26) implies that ∂Ŵ(i * ,i S ;p) ∂i * is positive.
3. The fact that i * and i S do not depend on q H and q L follows because neither q H nor q L appear in equations (26) and (27). Using (20), (24), and (25): which is increasing in q H −q L .

B. EXPERTISE ACQUISITION
Following the approach in Kurlat (2019), we ask whether firms have the correct incentives to acquire expertise. Consider an individual firm j and suppose it could invest in becoming better at screening workers. This will affect its profits and also, by affecting the equilibrium, the economy's total deadweight cost of education. Denote by θ j the level of expertise that firm j chooses to acquire. Let denote the individual firm's profits, where we have made explicit that these depend on the firm's choice θ j and the distribution of expertise of all other firms F, which this firm takes as given. Furthermore, let W θ j ,F denote the equilibrium total payoffs (ignoring their distribution across workers and firms): W depends on θ j because firm j's choice of expertise affects equilibrium allocations. Assume the firm's cost of acquiring its screening technology is c j θ j , where c j (·) is increasing and sufficiently convex such that θ j ,F −c j θ j is concave in θ j . The function c j (·) can be different for different firms, leading to different equilibrium expertise choices. Taking the equilibrium as given, firm j will invest until the marginal cost of better screening equals the marginal benefit: c ′ j θ j = ∂ θ j ,F /∂θ j . A social planner interested in minimizing deadweight costs would instead want the firm to invest up to the point where c ′ j θ j = ∂W θ j ,F /∂θ j . Using the model, we can compute the ratio If r θ j > 1, the marginal social value of better screening is greater than the marginal cost, which would provide a rationale for subsidizing investments in expertise. Conversely, if r θ j < 1, there would be a case for taxing those investments.
The following proposition provides a formula for r(θ j ) that relates it to equilibrium objects which, in principle, could be measured, and places a lower bound on it. Denote by the elasticity of the share of high types who do not signal (in a partial signalling equilibrium) with respect to an increase in demand.

REVIEW OF ECONOMIC STUDIES
Proposition B.1 1. The ratio of social to private marginal value of expertise is r θ j = c H c L η.
First, Proposition B.1 establishes, perhaps surprisingly, that r θ j does not depend on θ j . One might have conjectured that the misalignment of incentives would be different for firms that, e.g. due to different cost functions c j (·), choose different θ in equilibrium. Yet, it turns out that, if the market under-or over-provides incentives to improve direct screening, it does so uniformly for all firms. Second, Proposition B.1 shows that r can be written as the product of the signalling cost ratio and the demand elasticity of π P . The ratio c H /c L enters the formula because, by equation (B.2), it governs the deadweight cost of signalling for a high type that chooses e * .
To understand the role of the elasticity of π P with respect to demand, observe that, again by (B.2), ∂W θ j ,F /∂θ j crucially depends on how the equilibrium π P changes in response to an individual firm's screening technology θ j .I f a firm improves its screening technology, it will reject more low type applicants and therefore hire more high types, so the market-clearing condition shifts outwards. Recall from Section 5.5 that demand affects the equilibrium through exactly the same channel: by producing an outward shift in the market-clearing condition. Hence, η precisely summarizes the effect of a firm's expertise on π P . In particular, we show in the proof of Proposition B.1 that the overall effect on π P depends on the size of the shift to the market-clearing condition and on the difference between the slopes of the indifference and market-clearing conditions. For example, when these slopes are very similar, π P will respond strongly to a firm's expertise and η will be large.
Overall, the result implies that it is desirable to encourage investments in direct screening if the cost of signalling is relatively similar for high and low types (which makes the deadweight cost of signalling high) and if the signalling decisions of high types are highly sensitive to demand (which would make them highly sensitive to improved screening as well). For example, higher dividend taxes make the signalling costs of different types more similar, thereby making an underinvestment in expertise more likely. Moreover, the cost ratio and the demand elasticity of π P are sufficient to determine the magnitude of r. Conditional on these two statistics, knowledge of other parameters, such as the shape of the cost function c(·), are not required. As usual with sufficient statistics though, η is of course endogenous to the equilibrium.
The second part of Proposition B.1 establishes a lower bound of 1 on the elasticity η, which in turn implies a lower bound of c H /c L on r. To understand this, suppose there is an increase in demand of %. If the mix of workers in market (0,w P ) remained constant, each firm in θ P ,λ would hire % more high types, implying an elasticity of 1. However, precisely because π P increases, the mix of workers available in market (0,w P ) improves, so each firm increases its hiring of high types by more than %. Furthermore, higher π P means that marginal firms enter market (0,w P ), further increasing demand. The strength of this last effect depends on the density f θ P of firms near the cutoff θ P . Since this density could be very high (to the point where the slopes of the indifference and market-clearing conditions are the same, leading to an unbounded response of π P to ), there is no upper bound on r.
The magnitude of r depends on the relative importance of the various externalities from a firm choosing its screening technology. First, in an interior partial signalling equilibrium, improved screening always helps other firms, since it leads more high types to forgo education and improves the mix of workers available at (0,w P ). Second, it is neutral for high-type workers since they get a payoff of w P regardless. Third, the effect on low types with i >θ is also positive. In principle, there are offsetting effects: these workers benefit from having more firms hiring in market (0,w P ) and lose from having more high type workers looking for work in (0,w P ). However, just like when one compares across equilibria, the market-clearing condition implies that the first effect dominates. Lastly, for low types with i <θ the effect is ambiguous, because better screening increases their chances of being rejected. If this last effect is negative and strong enough, the sum of the externalities could be negative, which would lead to r < 1.
If instead of being in a partial signalling equilibrium the economy is at a no-signalling equilibrium, it is immediate that improved screening has no marginal social value, since no worker is signalling. It would still have a positive marginal private value, so r = 0. In this region, better screening by one firm has a negative effect on other firms, since it does not improve the pool of workers in market (0,w N ) but drives up the wage w N . Proof.

C. OMITTED PROOFS
Proof of Proposition 1 1. The proposed {e i ,(e θ ,w θ ,χ θ ),µ,G} is an equilibrium. Equation (9) implies that low types are indifferent between any e ∈ 0,e * and high types are indifferent between any e ≥ e * , so education decisions (7) solve the workers' problem. (10) implies that firms can make zero profits by hiring in market (0,q L ) (where there are only low types) or (e * ,q H ) (where there are only high types), and any other market has either G I χ ;e,w,χ = 0 or results in losses. Therefore (8), which places demand only in markets (0,q L ) and (e * ,q H ) and yields zero profits, is an optimal choice. Furthermore, (8) implies that no firm hires more than one worker. Replacing (8)in (3) implies that demand is: (C.1) Equations (7), (9), (10), and (C.1) imply that Condition 1 holds. Condition 2 is trivially satisfied because (9)is independent of i. Finally, (7) and (9) imply that beliefs (10) satisfy Condition 3 in non-empty markets. Since low types find e ∈ 0,e * optimal and high types find e ≥ e * optimal, (9) implies that beliefs satisfy Condition 4 when they are well defined, and G(I χ ;e,w,χ) = 0 only at wages where µ(w;e,i) = 0 for all i, so Condition 5 is satisfied as well.

36
REVIEW OF ECONOMIC STUDIES 2. The above equilibrium is unique.
a. In any equilibrium, each firm makes zero profits. If there was a firm that made strictly negative profits, it could increase profits by setting χ (i) = 0 for all i. On the other hand, suppose there is a firm that makes strictly positive profits in some market (e,w). Recall that F(1) > 1, so there must exist a strictly positive measure of firms that do not hire. Any such firm could increase its profits by directing its search to market (e,w), so it cannot be optimizing. b. In any equilibrium, there does not exist a market (e,w) such that I(e i = e)µ(w;e,i) > 0 both for some i <λ and some i ′ ≥ λ. Otherwise, consider a market (e ′ ,w ′ ) with e ′ = e+ǫ and w ∈ w e,i ′ +c H ǫ,q H . Suppose type i <λ is in the support of G ·;e ′ ,w ′ ,χ . This requires Rearranging givesw e ′ ,i −w(e,i) ≥ c L ǫ. Since firms cannot discriminate, it follows thatw(e,i) is the same for all i. Also, since µ(w ′ ;e,i) must be weakly decreasing in i by Condition 2,w e ′ ,i is weakly increasing in i. Therefore:w This contradicts the premise that type i ′ finds e optimal. Hence, no i <λ can be in the support of G ·;e ′ ,w ′ ,χ . If the support of G ·;e ′ ,w ′ ,χ only includes i ≥ λ, then firms could make profits by hiring in market (e ′ ,w ′ ), which contradicts part (C). Therefore it must be that G I χ ;e ′ ,w ′ ,χ = 0. This implies that µ w ′ ;e ′ ,i ′ = 0 by Condition 5, which in turn impliesw e ′ ,i ′ > w e,i ′ +c H ǫ, which contradicts the premise that type i ′ finds e optimal. c. In any equilibrium, all low types obtain a payoff of q L . Suppose that they obtain a payoff q ′ L > q L . This implies that they are hired with positive probability in a market with w > q L . By part (C), the supply in this market only includes low types, which implies negative profits for firms, contradicting part (C). Suppose that they obtain a payoff q ′ L < q L and consider a market with e = 0 and w ∈ q ′ L ,q L .IfG I χ ;e,w,χ > 0, then firms can make profits by hiring in this market; otherwise, µ(w;e,i) = 0, which means low-type workers can obtain a payoff w > q ′ L by choosing e = 0. d. In any equilibrium, all high types obtain payoff q H −c H e * . Suppose first that they obtain a payoff u H > q H −c H e * . If they do so by selecting e ′ < e * , then this impliesw e ′ ,i −c H e ′ > q H −c H e * , which in turn impliesw e ′ ,i −c L e ′ > q L and sincew e ′ ,i is the same for all i, this implies that low types can obtain a payoff higher than q L , contradicting part (C). If instead e ′ ≥ e * , this implies they are hired with positive probability at a wage w > q H and hence strictly negative profits for firms, contradicting part (C). Second, suppose they obtain a payoff u H < q H −c H e * . This means that for any i ≥ λ it must be thatw(e * ,i) < q H , and thereforew(e * ,i) < q H for i <λ as well. Consider a market with e = e * and w ∈ (u H +c H e * ,q H ). G I χ ;e * ,w,χ > 0 because otherwise µ(w;e * ,i) = 0 by Condition 5, so high types can obtain a payoff of at least w−c H e * > u H by choosing education e * . But the support of G(·;e * ,w,χ) cannot include low types because choosing e * implies a payoff ofw(e * ,i)−c L e * < q H −c L e * < q L , contradicting part (C); and the support of G(·;e * ,w,χ) cannot include only high types because then firms could make profits by hiring in market (e * ,w), contradicting part (C). e.
Step (C) implies that all low types select e = 0 and get hired for sure in market (0,q L ).
Step (C) implies that all high types must select e = e * and get hired for sure in market (e * ,q H ). This determines (7) as well as (9) and (10) in these markets. It also requires that there is total demand λ in market (0,q L ) and demand 1−λ in (e * ,q H ), thus (8) must hold. For all other markets, (9) and (10) then follow from Conditions 1 to 5.

Proof of Proposition 3
We first show that, in any equilibrium, all low types choose e = 0 and get hired at least at wage w = q L . Some fraction π ∈[0,1] of the high types choose e = 0 and find a job for sure at wage w = w P (if π<1) or w ≥ w P (if π = 1). The rest of the high types choose e = e * and get hired with certainty at wage w = q H . We prove this claim based on the following sequence of steps: 1. By the same argument as in the proof of Proposition 1, all firms make non-negative profits.
2. Firms' profits must be weakly increasing in θ . To see this, suppose that θ ′ >θ but firm θ makes strictly higher profits than θ ′ , and consider the market and hiring rule (e θ ,w θ ,χ θ ) chosen by firm θ . By hiring in market (e θ ,w θ ) and setting χ θ ′ (i) = I(i ≥ θ ′ ), firm θ ′ could make profits at least as high as firm θ since it accepts all the high types but rejects more low types than firm θ possibly can. 3. There exists someθ such that all firms θ ≤θ make zero profits, and F(θ ) > 0. To see this, recall that at least a measure F(1)−1 > 0 of firms do not hire, which implies zero profits. The claim then follows from the monotonicity of profits in θ . 4. In any equilibrium, low types obtain a payoff of at least q L . Suppose that they obtain a payoff q ′ L < q L and consider a market with e = 0 and w ∈ q ′ L ,q L .I fG I χ ;0,w,χ > 0, then firms θ<θ can make profits by hiring in this market; otherwise, µ(w;0,i) = 0 by Condition 1, which means low-type workers can obtain a payoff w > q ′ L by choosing e = 0. 5. In any equilibrium, high types obtain a payoff of at least w P = q H −c H e * . Suppose they obtain a payoff u H < q H −c H e * . This means that for any i ≥ λ it must be thatw(e * ,i) < q H , and thereforew(e * ,i) < q H for i <λ as well. Consider a market with e = e * and w ∈ (u H +c H e * ,q H ). G I χ θ ;e * ,w,χ θ > 0 for all θ because otherwise µ(w;e * ,λ) = 0 by Condition 5, so high types can obtain a payoff of at least w−c H e * > u H by choosing education e * . But the support of G(·;e * ,w,χ θ ) cannot include low types for any θ because choosing e * implies a payoff ofw(e * ,i)−c L e * < q H −c L e * = q L ; and the support of G(·;e * ,w,χ θ ) cannot include only high types for θ<θ because then firms θ<θ could make profits by hiring in market (e * ,w). 6. For any i ≥ λ and any e, µ(·;e,i) has a point mass at a single wage. To see this, consider two wage levels w ′ > w and suppose that high types are hired with positive probability in both of them if they choose e. Let θ ′ be the highest-type firm that hires at wage w ′ . Conditions 1 and 3 imply that the expected productivity of workers that firm θ ′ will find in markets e,w ′ and (e,w) is the same, and therefore, it cannot be optimal for firm θ ′ to hire at wage w ′ . Therefore it must be that all high types are hired at the same wage, which implies that µ(·;e,i) is a step function for every i. 7. In any equilibrium, all low types get education e = 0. To see this, assume to the contrary that some i <λ chooses e =ẽ > 0. By step (C), we have thatw(ẽ,i) ≥ q L +c Lẽ > q L . Together with step (C), this implies that in every market (w,ẽ) with w > q L where type i has some chance of being hired, there are also high-type applicants, because otherwise firms would make losses by paying more than q L .
Step (C) implies that there can be only one such market; label it (ẽ,w). Letting u H be the utility obtained by high types in equilibrium, this implies w = u H +c Hẽ . Let θ be the lowest firm type that hires in market (ẽ,w) and π H be the measure of high types that choose e =ẽ. Using the fact that all high types that choose e =ẽ are hired in market (ẽ,w), the probability that type i <λis hired in market (ẽ,w) is bounded above by i θ 1 π H dF(θ ). Since not being hired in market (ẽ,w) implies getting a wage q L , this implies that the payoff from choosing e =ẽ is bounded above by: which is lower than q L for i sufficiently close to θ . Let i be the lowest worker type such that there is a δ 1 > 0 such that all workers i ∈ i,i+δ 1 choose e =ẽ. We know that i >θ.
Assume that some firmθ ∈ θ ,i prefers to hire in some market (e ′′ ,w ′′ ) = (ẽ,w). This implies firms θ ∈ θ ,i also prefer (e ′′ ,w ′′ ) over (ẽ,w), since they hire from the same pool of workers as firmθ in market (ẽ,w) but from a more selected pool in other markets. But then the fact that worker i =θ does not chooseẽ implies that worker i does not want to chooseẽ either, since he obtains the same payoff as worker i =θ upon choosingẽ but weakly higher in every other market. This contradicts the assumption that worker i choosesẽ. Therefore, it must be that all firms in the interval θ ,i hire in market (ẽ,w). Since there are no workers with i < i in market (ẽ,w), then upon hiring in market (ẽ,w), any firm θ ≤ i hires from the entire pool of applicants, without rejecting any. Since this hiring rule is available to all firms, part (C) implies that all θ ≤ i firms must make zero profits by hiring in market (ẽ,w). For this to be true, it must mean that they cannot make profits in any other market, including any markets with e = 0. But any firm with θ>i will be able to reject some workers in the interval i,i+δ , which implies it can make strictly positive profits by hiring in market (ẽ,w). Therefore, all firms in the interval (i,i+δ 1 ] hire in market (ẽ,w). This in turn implies that if worker i is willing to chooseẽ, then worker i+δ 1 strictly prefersẽ, since, compared to worker i, he has a higher chance of being hired in market (ẽ,w) and the same chance of being hired in any other market. By continuity, this implies that there is a number δ 2 >δ 1 such that all workers in i,i+δ 2 choose e =ẽ. Repeating the same reasoning, this implies that there is a strictly increasing sequence {δ n } such that all workers in i,δ n chooseẽ. Therefore all workers i ∈ i,λ choose e =ẽ.
Letθ be the highest firm type that hires in market (ẽ,w).

REVIEW OF ECONOMIC STUDIES
higher profits in market (0,u H +ǫ) than in market (ẽ,w), a contradiction. Instead, if G I χθ ;0,u H +ǫ,χθ = 0, this requires µ(u H +ǫ;0,i) = 0 for all i ≥ λ, which implies that e = 0 is a better choice thanẽ for high types, again a contradiction. b. Assume instead thatθ ≤ λ. This implies that all firms that hire in market (ẽ,w) accept workers i ∈ θ ,λ .
Since high-type workers are hired for sure in market (ẽ,w), this implies that workers i ∈ θ ,λ are hired for sure as well, and therefore obtain utility u H −(c L −c H )ẽ. Now consider a market with e ′ =ẽ+ǫ and w ′ ∈ (w+c H ǫ,w+c L ǫ). Suppose type i ′ ∈ θ ,λ is in the support of G ·;e ′ ,w ′ ,χ θ . This requires: Sincew e ′ ,i must be increasing in i by Condition 2, for any high-type worker i ′′ : which contradicts the premise that i ′′ findsẽ optimal. Hence, no i ′ ∈ θ ,λ can be in the support of G ·;e ′ ,w ′ ,χ θ . If the support of G ·;e ′ ,w ′ ,χ θ only includes i ≥ λ, then for small enough ǫ, firm θ would find it more profitable to hire in market (e ′ ,w ′ ) than in market (ẽ,w). Therefore it must be that G I χ θ ;e ′ ,w ′ ,χ θ = 0. This implies that µ w ′ ;e ′ ,i ′′ = 0, which in turn implies that high types prefer e ′ tõ e, a contradiction.
8. In any equilibrium, the high types select either e = 0ore = e * . If some types selected e > e * , then by step (C) this would require paying them w > q H and therefore involve negative profits for firms. On the other hand, suppose some high type i ′ sets e ′ ∈ (0,e * ) and let w be the highest wage such that µ w;e ′ ,i ′ = 0. For any w ≥ w, beliefs can only place weight on high types since by part (C), no low types choose e ′ . This implies that if w < q H , any firm, including those with θ<θ , could make profits by hiring in market e ′ ,w , which contradicts part (C). Therefore we must have w = q H . Note that this implies that there can only be a single e ′ ∈ (0,e * ] such that e i = e ′ for some i ≥ λ since otherwise the high types would only select the lowest such e. Let π be the fraction of high types who choose e = 0 and 1−π the fraction who choose e = e ′ . Since they must be indifferent, it follows that high types who choose e = 0 get a wage of w ′ = q H −c H e ′ . Since all low types choose e = 0, firms will find it profitable to hire in market e = 0,w This defines the cutoff firm θ ′ such that firms with θ<θ ′ make zero profits. Furthermore, this implies that all workers with i <θ ′ do not get hired in market 0,w ′ and therefore obtain a payoff of q L . Let ⊆ 0,θ ′ be the set of firms who hire workers in market (e ′ ,q H ). Since all high types who choose e ′ get a job at w = q H it follows that F ( ) = 1−π . Suppose worker i ′ <θ ′ chooses e = e ′ . His chance of finding a job at wage q H will be given by: Since F is continuous, then for i ′ sufficiently close to θ ′ , µ q H ;e ′ ,i ′ will be arbitrarily close to 0, and therefore (since e ′ < e * ), (1−µ q H ;e ′ ,i ′ )q H −c L e ′ > q L . Thus, there is a low type who would prefer e = e ′ to e = 0, which contradicts step (C).
To complete the proof, let u H be the equilibrium payoff of high types.
1. If u H > w P , then it must be that all high types choose e = 0 and get hired at a wage w = u H . Firms will find it profitable to hire in this market if This defines a cutoffθ ,so(12) holds. Furthermore, since all high types must be hired at this wage, (11) must hold.
2. If u H = w P , then high types are indifferent between choosing e = 0 and getting hired at wage w P and choosing e = e * and getting hired at a wage q H . Let π be the fraction that choose e = 0. Firms will find it profitable to hire in market 0,w P iff π (1−λ)q H +(λ−θ)q L π (1−λ)+(λ−θ ) > w P This defines the cutoffθ ,so (15) holds. Furthermore, since all high types who choose e = 0 must be hired at w, Letw(e,i) be the wage that would make worker i indifferent between their equilibrium payoff and choosing education e, given by:w where u(i) is given by (17). Suppose firm θ considers hiring in market (e,w). For it to believe that it will find χ θ -acceptable low types, i.e. workers with i ∈[θ,λ), it must be that: w(e,θ) ≤w(e,i) =w(e,i) ≤w(e,λ) ≤w(e,λ). 3) The first inequality follows from the fact that u(i) and thereforew(e,i) is increasing in i. The second step follows from Condition 4: if beliefs place weight on type i, then i must be indifferent between e and his equilibrium choice. The third follows from Condition 2, which implies thatw is monotonic in i. The last inequality follows from the fact that otherwise worker λ could exceed his equilibrium payoff by choosing e. By Condition 4, the only markets where firm θ can place beliefs on χ θ -acceptable low types are those with education levels that worker i = θ is willing to choose for weakly lower wages than high types. Moreover, for firm θ not to have well-defined beliefs about market (e,w) it must be that: w ≤w(e,θ), (C.4) since otherwise Condition 5 requires µ(w;e,θ) = 0, so some χ θ -acceptable worker could exceed his equilibrium payoff by choosing e. Together, conditions (C.3) and (C.4) imply that for any market (e,w) such thatw(e,λ) <w(e,θ) and w >w(e,λ), firm θ's beliefs G(·;e,w,χ θ ) can only place weight on high types. Denote by (e D θ ,w D θ ) the lowest-wage market where firm θ's beliefs are guaranteed to only include high types, which satisfies w D θ =w e D θ ,λ =w e D θ ,θ . Using (6), (14), (17), and (C.2) and rearranging, the profits that firm θ can obtain by hiring in market (e D θ ,w D θ ) are By (B.1), profits in market (e D θ ,w D θ ) exceed those that firm θ obtains in equilibrium if condition (19) is violated, which implies it cannot be an equilibrium. 2. Sufficiency of condition (19). We construct the equilibrium objects {e i ,(e θ ,w θ ,χ θ ),µ,G}.
for any other (e,w) (C.8) and for selection rule χ(i) = 1∀i, if e ∈ (0,e * ),w ≥w(e,0) for any other (e,w) (C.9) We now verify that {e i ,(e θ ,w θ ,χ θ ),µ,G} satisfies all the equilibrium conditions from Definition 1.(C.7) implies that low types i ∈[0,λ) are indifferent between any e ∈ 0,e D i ) and high types are indifferent between any e ≥ 0, so the education decisions (C.5) solve the workers' problem. The beliefs (C.8) and (C.9) together with the fact that condition (19) holds implies that firms θ ≥ θ P maximize profits by hiring selectively in market (0,w P ). All other firms make zero profits by hiring non-selectively either in market (0,q L )o r( e * ,q H ), and any other market has either G I χ θ ;e,w,χ θ = 0 or results in losses. Therefore the demands (C.6) are an optimal choice. Furthermore, replacing (C.6)in(3) implies that demand in market (e,w) for a set of selection rules X 0 θ ′ = {χ (i) = I(i ≥ θ ) : θ ∈ 0,θ ′ is: Together with (C.8) and (C.9), this implies that Condition 1 holds. Condition 2 is satisfied because, by (C.7), µ(·;e,i) is weakly decreasing in i. Finally, (C.5) and (C.7) imply that beliefs (C.8) and (C.9) satisfy Condition 3 in non-empty markets. Since low types i find e ∈ 0,e D i ) optimal and high types find any e ≥ 0 optimal, beliefs satisfy Condition 4 when they are well defined, and G(I χ ;e,w,χ) = 0 only at wages where µ(w;e,i) = 0 for all i such that χ (i) = 1, so Condition 5 is satisfied as well.
Pure signalling equilibrium. The above analysis applies for the special case with π P = 0.
No-signalling equilibrium. Necessity and sufficiency of condition (18) are proved by the same steps as for the partial signalling equilibrium. For completeness, we state the equilibrium objects {e i ,(e θ ,w θ ,χ θ ),µ,G}. (e θ ,w θ , if e ∈ 0,e N ,w ≥w(e,0) for any other (e,w) (C. 14) with

Proof of Proposition 5
1. Using the reparametrization of the model in terms ofλ, equation (12) generalizes to Forλ low enough, w N > w P so there is a candidate corner equilibrium. Condition (18) generalizes to which cannot hold for sufficiently lowλ, so the candidate equilibrium is indeed an equilibrium. Furthermore, taking the limit in (C.15) we obtain limλ →0 w N = q H . (11) implies that lim F→F * θ N = λ, which implies, using (12), that lim F→F * w N = q H for F sufficiently close to F * , so a candidate equilibrium with the desired properties exists. Furthermore, as θ N → λ, condition (18) cannot hold so the candidate equilibrium is indeed an equilibrium.

Proof of Proposition 6
1. Using (14) and (6): and therefore lim c H c L →1 w P = q L . Using (12)w eh a v ew N > w P , so there is a candidate corner equilibrium. Furthermore, as c H /c L → 1, condition (18) holds so the candidate equilibrium is indeed an equilibrium.

Proof of Proposition 7
1. It is sufficient to prove claim (b) because claim (a) is a special case with π P 2 = θ P 2 = 0. By equation (B.1), firm profits are increasing in π P , and since w P is the same across equilibria, firms are better off in the higher-π P equilibrium. High-type workers obtain a payoff of w P in both equilibria, so they are indifferent. Using (17), workers with i ≤ θ P 2 get a payoff of q L in both equilibria, so they are also indifferent. Workers with i ∈ (θ P 2 ,θ P 1 ] get q L in the first equilibrium and more than q L in the second, so they are better off in the second. For workers with i ∈ θ P 1 ,λ , their payoff is: where we used (16). This is increasing in π P , so they are also better off in the second equilibrium.  (a) In the first equilibrium, all firms make zero profits, so they are better off in the second equilibrium. Low-type workers get a payoff of q L in the first equilibrium, but those with i >θ N get more in the second equilibrium. High-type workers get a payoff of w P in the first equilibrium but w N in the second, so they are also better off. (b) By equation (B.1), for θ sufficiently close to λ, firm θ 's profits approach q H −w,sow P < w N implies they are higher in the first equilibrium. High-type workers get a payoff of w P in the first equilibrium but w N in the second, so they are better off in the second.

D. FALSE NEGATIVES
Uniqueness in case f (θ) is strictly increasing Proposition D.1 If condition (28) holds, the system of equations (26), (27) has no solution. Otherwise, it has a unique solution.
Proof. Solving (27) for i S and replacing in (26), a solution requires: Taking the derivative and rearranging: where the inequality follows because i > i * .Ifi * satisfies (D.1), then: so the function (i * ) is increasing at any i * such that (i * ) = 0. Condition (28) is equivalent to (i H ) < 0. Furthermore, Therefore, if (28) holds, there can be no i * ∈ [λ,i H ] that satisfies (i * ) = 0 because (λ) < 0 and (i H ) < 0 and must be increasing at any solution. Instead, if condition (28) does not hold, (i H ) ≥ 0, so by continuity and using the fact that is increasing at any solution, there is exactly one i * ∈ [λ,i H ] that satisfies (i * ) = 0. 1. Letting the fraction of low types beλ, equations (21), (22) and (26) generalize, respectively, to:

Continuity in the limit
while the market-clearing condition is unchanged. The first statement follows directly from equation (D.2), the second from equation (D.3) and the last from equation (D.7). 2. Since the measure of firms is assumed to be greater than 1, for F sufficiently close to F * , then f (i) > 1 for all i, which implies i H = λ. 3. Equation (26)  By part 1, as the fraction of low types goes to zero, nobody signals and everyone is paid q H . Part 2 says that as firms become fully informed, again nobody signals and all high types are paid q H . Finally, part 3 shows that if signalling is sufficiently expensive, no workers signal in equilibrium. In all cases the equilibrium allocations are continuous in the limit.

General case
For the case where f (i) is not monotone, the argument in Section 6 needs to be modified. Consider two workers, i and i ′ with i * < i < i ′ < i H and assume f (i) > f i ′ . The argument above, unmodified, implies that any worker will be able to sell a fraction 1−f i ′ of his labour to non-selective firms at wages above w 0,i ′ . This means that only f i ′ of i-workers will be available for hire in market 0,w(0,i) , which is less than the f (i) workers that firms with θ = i want to hire. Realizing this, firms would bid up the wage, displacing non-selective firms. To characterize exactly what will happen, it is useful to defineF(θ ) as the convex hull of F(θ ), i.e. the highest convex function on [λ,1] such thatF(θ ) ≤ F(θ ): F(θ ) ≡ min ω,θ 1 ,θ 2 {ωF(θ 1 )+(1−ω)F(θ 2 )} s.t. ω ∈[0,1],θ 1 ,θ 2 ∈[λ,1]and ωθ 1 +(1−ω)θ 2 = θ.
The corresponding densityf (θ ), which is weakly increasing, is the "ironed" version of the original density f (θ ). We now show how the analysis in Section 6 extends to this general case, replacing F withF. Let i H be defined as i H ≡ min i∈ [λ,1] i :f (i) ≥ 1 .
This generalizes the definition of i H in (21), allowing both for the possibility of ironing and the case where f (i) > 1 for all i (in which case trivially i H = λ). Let the reservation wage for type i ∈[i * ,i H )begivenby Hence, whenf is strictly increasing, this coincides with (22), but in a flat region (due to ironing), w(0,i) equals the value for the top of the ironing range. In other words, in intervals [i 0 ,i 1 ] where the ironed densityf is constant, there will be "bunching:" all remaining workers who are not hired non-selectively at higher wages are hired at the same wagew(0,i 1 ) by firms θ ∈ [i 0 ,i 1 ]. Based on the same steps as underlying (26) but using (D.4) instead of (22), we obtain where i b (i) = max i ′ :f (i ′ ) =f (i) . Let i * and i S solve i * = min i∈ [λ,1] {i : Ŵ(i,i S ) ≥ 0} (D.6) andF i * =f i * i * −i S . (D.7) Equation (D.6) generalizes the indifference condition (26) to account for the fact that, with bunching, the reservation wage function (D.4) and hence Ŵ(i,i S ) can be discontinuous in i. Note that, by (D.6), whenever i * falls into a bunching region, it corresponds to the lower end of it.
These definitions allow us to state the following general existence and uniqueness result, of which Proposition 8 in Section 6 is a special case. market (e ′ ,w ′ −ǫ), contradicting step (D). Hence we must have µ ′ = 0. This implies that the equilibrium payoff of the low types is u ′ L = w ′ −c L e ′ and the equilibrium payoff of those high types who choose e ′ is u ′ H = w ′ −c H e ′ . Consider a market (e ′′ ,w ′′ ) such that e ′′ = e ′ +ǫ and w ′′ ∈ (w ′ +c H ǫ,w ′ +c L ǫ). Suppose χ (i) = 1 for all i. Then G(I χ ;e ′′ ,w ′′ ,χ) > 0 since otherwise µ(w ′′ ;e ′′ ,i) = 0 for all i, so all high types who choose e ′ could obtain payoff w ′′ −c H e ′′ > w ′ −c H e ′ = u ′ H , a contradiction. The support of G(·;e ′′ ,w ′′ ,χ) cannot include low types since w ′′ − c L e ′′ < w ′ −c L e ′ = u ′ L . The support of G(·;e ′′ ,w ′′ ,χ) cannot include only high types since then any firm θ>θ could make strictly positive profits in market (e ′′ ,w ′′ ) for ǫ ∈ (0,(q H −w ′ )/c L ). This delivers the final contradiction.
5. Any high type who chooses e > 0 is hired with probability 1 at w = q H . Suppose otherwise, then there exists a market (e,w) with w < q H such that there are high-type applicants. Since there are no low types in market (e,w) by step (D), any firm θ>θ could then make positive profits by hiring non-selectively in market (e,w), contradicting step (D).
6. No firm hires high types selectively at any e > 0. Suppose there was a high type i >λwho is hired in market (e,q H ) with e > 0 by a firm θ<i that sets selection rule χ θ (i) = I(i ≥ θ ). Consider market (0,w ′ ) with w ′ ∈ (q H −c H e,q H ). Then G(I χ θ ;e,q H ,χ θ ) > 0 since otherwise µ(q H ,0,i) = 0 by Condition 5, so type i could obtain a payoff w ′ > q H −c H e by choosing e = 0. Since by construction the support of G(·;e,q H ,χ θ ) only includes high types and since w ′ < q H , firm θ can increase its profits by hiring high types in market (0,w ′ ) rather than (e,q H ), a contradiction. Hence, all high types selecting e > 0 are hired by firms using selection rule χ(i) = 1 for all i.
7. If any high types choose some education e S > 0, it must satisfy q H −c L e S = u L . Suppose first that some high types choose e ∈ (0,e S ). By step (D), they are hired at wage q H and by step (D) they are hired by non-selective firms. However, this implies that the low types, by choosing e, could obtain q H −c L e > u L , a contradiction. Suppose next that some high types choose e > e S . Consider some market (e S ,q H −ǫ) and selection rule χ (i) = 1 for all i, which is feasible for all firms. For sufficiently small ǫ, G(I χ ;e S ,q H −ǫ,χ) > 0 since otherwise µ(q H −ǫ;e S ,i) = 0 and those high types choosing e could do better by choosing education e S . By Condition 4, the support cannot include low types because q H −c L e < u L . Hence, firms θ ≥θ could make strictly positive profits in market (e S ,q H −ǫ), contradicting step (D).

Definew
There exists a cutoff i * such that: for i < i * , high types' utility is u(i) =w S and for i ≥ i * , utility is u(i) ≥w S and e = 0. Steps (D) and (D) imply that high types who choose e > 0 must obtain utility equal tow S . Therefore the only possible way to obtain higher utility is to choose e = 0. The result then follows from step (D).
9. For workers i ≥ i * (who choose e = 0) the minimum wage in their support w(0,i) is weakly increasing in i. This follows from the fact that w(0,i) solves µ(w;0,i) = 0, and µ(w;0,i) is weakly increasing in w and weakly decreasing in i by Condition 2.
10. If some type i ≥ 0 who chooses e = 0 is hired by a selective firm, this can only occur at the minimum wage in worker i ′ s support w(0,i). To see this, consider a market (0,w) where a high type i ≥ λ is hired by a selective firm θ<i setting χ θ (i) = I(i ≥ θ ), and suppose µ(w;0,i) > 0. This implies that there are i-type applicants in some market (0,w−ǫ). As a result, firm θ could increase its profits by shifting demand to market (0,w−ǫ) using the same selection rule.
11. There does not exist a market (0,w) with w > q L where all firms hire non-selectively. Suppose there were such a market and let (0,w) be the highest-wage market where all firms hire non-selectively. All firms must make zero profits in (0,w) and µ(w;0,i) =μ for all i. Supposeμ>0. Consider a market (0,w−ǫ). For sufficiently small ǫ>0, the pool of applicants is the same in markets (0,w−ǫ) and (0,w). Then all firms could make positive profits by hiring in market (0,w−ǫ), contradicting (i). Hence we must haveμ = 0. This implies that there can only be a single such market (0,w) where all firms hire non-selectively, and that all workers must obtain utility of at least w in equilibrium. Letī denote the highest i ∈[λ,1] that applies to market (0,w). By zero profits and w > q L , we must haveī >λ. To ensure that no firm wants to hire selectively in market (0,w), all firms θ ≤ī must at least make profits q H −w in equilibrium, i.e. they must hire high types in some market (e ′ ,w ′ ) = (0,w) with w ′ ≤ w. However, because all workers obtain utility of at least w, there cannot be any supply of workers in market (e ′ ,w ′ ).
12. All types i > i H , who select e = 0 by step (D), must be hired with probability 1 at w = q H . They cannot be hired with positive probability above q H because no firm would hire at such a wage. Suppose someĩ > i H is not hired with probability 1 at w = q H . This implies that all firms θ ∈ i H ,ĩ maximize profits by hiring selectively at the lower bound of the support of the wages of worker i = θ , which is below q H by step (D). The total number of workers these firms would hire is F ĩ −F (i H ) ≥F ĩ −F (i H ) ≥ĩ−i H . The first inequality follows from the fact that F (θ ) ≥F (θ ) for all θ by construction ofF, and F (i H ) =F (i H ) by definition of i H . The second follows Downloaded from https://academic.oup.com/restud/advance-article/doi/10.1093/restud/rdaa068/5931860 by University of Zurich user on 05 January 2021