The Impact of the First Professional Police Forces on Crime

This paper evaluates how the introduction of professional police forces affected crime using two natural experiments in history: the 1829 formation of the London Metropolitan Police (the first police force ever tasked with deterring crime) and the 1839 to 1856 county roll-out of forces in England and Wales. The London Met analysis relies on two complementary data sources. The first, trial data with geocoded crime locations, allows for a difference-in-differences estimation that finds a significant and persistent reduction in robbery but not homicide or burglary. A pre-post analysis of the second source, daily police reports of both cleared and uncleared crime incidents, finds a significant reduction in all violent crimes but offsetting changes in uncleared (decrease) and cleared (increase) property crimes. These (local) reductions in crime are not just due to crime displacement but represent true decreases in overall crime. Difference-in-difference analyses of the county roll-out find that only sufficiently large forces, measured by the population to force ratio, significantly reduced crime. The results are robust to controlling for spill-over effects of neighboring forces.


Introduction
An extensive empirical literature testing one of the core components of Becker's (1968) economic model of crime -the role played by the probability of apprehension -has resulted in a general "consensus that increases in police manpower reduce crime" (Chalfin and McCrary, 2017). 1 Most of this existing literature is based on (often temporary) expansions to the size of an existing police force, thereby estimating the marginal effect of an additional officer to an already established force. Yet, much is still unknown about the more fundamental relationship between police and crime. Specifically, how does the effectiveness of police in combatting crime depend on (i) the tasks assigned to an officer, (ii) police force characteristics, including size and/or age, and (iii) police (force) quality? To begin to address these questions, we study the introduction of professional forces on crime, which profoundly changed the nature of policing, by tasking officers -for the first time ever -with deterring crime.
Specifically, our paper uses two natural experiments in history: the formation of the first professional force in the world -the London Metropolitan Police (the 'Met') in October 1829 -and the roll-out of rural county forces in England and Wales over the next 30 years. Cities across the United States and around the world modelled their own police departments on the Met and, most prominently, adopted its innovative emphasis on crime prevention or deterrence. 2 These newly created institutions, which still exist today, are a fundamental component of the contemporary criminal justice system.
In contrast to the existing police-crime literature, we study a large shock to the institution of 'policing' -the introduction of a large, professional and institutionalized police force with the explicit aim of crime prevention. Specifically, there were three distinguishing features of these new professional police forces. First, they represented a substantial increase in numbers: in London, 1000 men were hired over-night (an approximately ten-fold increase over the preexisting informal 'police'), and the force expanded to 3000 men soon after. Second, there was a shift from reactionary policing focused on catching criminals for financial reward towards prevention and deterrence by slowly walking a small beat and being visibly present (Emsley, 2009). Third, there was a new-found emphasis on police quality. Many features of modern-day 1 Empirical evidence of the crime reducing effect of police was elusive due to both simultaneity bias -more police are hired in higher crime locations or times -and measurement error in the number of police (Chalfin and McCrary, 2018). Levitt (1997) was amongst the first to try to causally identify a crime-reducing effect of police with natural experiments that locally or temporarily increased police numbers. See Chalfin and McCarary (2017) for a recent review and Cameron (1988) for a review of the early literature that did not account for this simultaneity bias. 2 U.S. police forces were established in New York City (1845), New Orleans and Cincinnati (1852), Boston and Philadelphia (1854), Chicago and Milwaukee (1855) and Baltimore and Newark (1857). See Uchida (2015) and https://www.britannica.com/topic/police/Early-police-in-the-United-States (viewed October 22, 2018). policing were thus introduced for the first time during this period. A second contrast with much of the existing literature is that our natural experiments consist of two permanent shocks to policing. 3 Finally, exploiting variation in the officer to population ratio across counties at the time of force formation, we can begin addressing the question of whether the (marginal) effect of police on crime depends on force characteristics.
The historical context of our study allows for contributions to two additional literatures.
First, we enhance explanations of 19 th century crime trends in England and Wales: Crime rose in the first half of the century but was followed by a decline in the latter half despite the growing population -an 'English miracle' (Taylor, 1998). Did the formation of professional police forces contribute to this pattern? 4 Second, we contribute to the literature studying the extent to which institutions, like mass education and public spending, explain long-run economic growth, the development of human and social capital, and state capacity -a state's ability to implement its intended policies. 5 While this literature mostly takes a macro-perspective, our paper studies the micro-foundations of these questions for one such institution -police. Given the potentially important role of the level of societal crime for explaining economic growth and development (e.g. Mauro, 1995) and the state's ability to govern, understanding the extent to which crime was affected by this new institution is an important contribution.
Identifying the effect of the new police on crime is not a simple matter. One potential confounder is an increase in reporting of crimes to the police (even if there was no change in criminal behavior). This would only have happened if there was increased societal trust in 'police'. Yet, anecdotal evidence suggests that, at least initially, there were anti-police sentiments: even a magistrate stated that "a strong feeling existed against the new police" upon the Met's formation. 6 We do not rely on only anecdotal evidence, however, and note that concerns regarding changes in reporting are further alleviated by the fact that crimes were reported to magistrates' offices in London, both prior to and after the introduction of the Met, and not directly to the new police. In addition, some of our analysis emphasizes the most serious crimes of robbery, burglary, and murder, which are arguably less subject to concerns about reporting. Another potential confounder is that an increased ability to detect crime could have led to more charges, even if the number of crimes committed did not change; i.e. there could have been an increase in clearance rates. This would have been expected as the new force was substantially larger than the previously existing informal police. Note that such increased detection would also have been expected to reduce crime through incapacitation (over and above deterrence). To disentangle whether the new forces reduced crime (through deterrence and/or incapacitation) from increases in crime reporting and clearance rates (the potential confounders), we rely on two types of crime measures -incidents and charges. 7 Incident level data is especially important as it allows us to abstract from the potential problem of crime reducing effects being masked by increased clearance rates (one of the confounders), which could potentially happen when studying an administrative measure like charges.
The closest existing literature to our study includes those papers studying police deployment on the streets. 8 A number of studies report a crime reduction following temporal variation in (often non-permanent) police deployment, including post-terrorist attack increases in deployment in London (Draca et al., 2011) and Buenos Aires (Di Tella and Schargrosky, 2004). 9 However, Blanes I Vidal and Mastrobuoni (2018) do not find significant effects of nonterrorist attack related temporary increases in patrols. Negative effects of visible police presence on crime have also been found in studies of private policing using geographic boundaries (MacDonald et al., 2015;Heaton et al., 2016); these studies use spatial variation in force allocation to understand the permanent effect of policing. 10 In contrast, our study estimates the effect of a permanent change in policing, exploiting variation both over time and across space.
The first part of our empirical analysis studies the impact of the 1829 formation of the Met on crime in London. With the exception of the 'City' of London (which still has its own force today), the initial catchment area of the Met was within an approximate 7-mile radius of Charing Cross, London. 11 Because not all of London is initially 'treated' by the formation of 7 Incidents refer to a reported offense (regardless of whether it is cleared or not), while charges can only be filed when there is a suspect associated with that offense. 8 Studies have considered the extensive margin (temporary) destruction of a police force. As described by Nagin (2013), Andenaes finds a rise in crime rates, especially street crimes like robbery, after German soldiers arrested all members of the Danish police in 1944. Others have studied the effects of police strikes (Pfuhl, 1983) and slowdowns (Cann Chandrasekher, 2016), though the latter differs from the extensive margin. 9 Negative effects of police on crime are also found by Klick and Tabarrok (2005) following increases in Washington DC terrorist alert levels and Weisburd (2017) using variation in officers leaving their beats unattended. 10 To the extent that decreased response times imply an increase in police presence, Blanes I Vidal and Kirchmaier (2018) find a relationship between response time and the likelihood of clearing a crime. 11 The City of London refers a 1-square mile area (today's central business district) in Central London. the Met, our analysis relies on geocoding historical crime data from two data sources into 'treated' and 'control' regions of London for periods before and after the Met was created.
First, we use felony trial data reported in the Proceedings of the Old Bailey (Central Criminal Court of London and County of Middlesex) for the most serious offenses of burglary, manslaughter/murder, and robbery, for which we manually coded the number and type of police witnesses and geocoded offense locations intro treated and control regions. Thus, the Old Bailey data allow us to directly observe reform implementation: there is an instant shift in the type of police witnesses ('old' to 'new') that is by far largest in the treated area. Moreover, it allows for a difference-in-differences model. The second source consists of daily police reports for nine police offices run by the pre-1829 magistrates (and continued until 1839), who were tasked with processing crimes for all of London. These reports include three crime measures: 'informations' and stolen property reports (which one can interpret as uncleared crime incidents) and charges (which are cleared crimes). Though all offenses are included (as opposed to selected felonies at the Old Bailey), they cannot be geocoded, necessitating pre-post designs.
Both London analyses provide evidence consistent with a crime-reducing effect, especially for violent crimes (including robbery). A significant and persistent reduction in trials is seen for robbery (more than 40%) in the Old Bailey data, but no (consistently) significant effect for homicide and burglary trials. Using the daily reports data for London, we find a significant reduction in violent crime (driven by reductions in both cleared and uncleared crime), whereas there is a significant reduction in uncleared property crime incidents but an increase in cleared property crimes (charges). As the latter effect dominates in the daily reports data (which include a wide range of property offenses), these off-setting channels provide a potential explanation for the lack of a property crime (burglary) effect in the Old Bailey data.
The second part of our analysis studies the impact of the introduction of police forces to the counties of England and Wales; such forces were allowed for in 1839 but did not become mandatory until 1856. The county analysis complements the London analysis in a number of ways -but perhaps first and foremost, it increases the external validity of the results: Are they specific to London in the 1830s? Moreover, as the county forces were all of different sizes (relative to the population) upon creation, we can use the county roll-out of police forces to further our understanding of how the effect of a police force depends on its characteristics. Of the 48 counties in our analysis, 16 created forces in 1840, 23 in 1857, and 9 in the intermediate years. We use a difference-in-differences model to identify the effect of creating professional police forces on crime, overall and for forces of different sizes, where 1,000 people per officer was the nationally recommended (but rarely achieved) guideline. Our main crime measure (the only one available both pre-and post-reform) is the annual number of persons committed to trial by crime type (transcribed from historical Judicial Statistics yearbooks).
We find no overall effect of creating just any professional police force. But, creating a force that was closer in relative size to the nationally recommended threshold reduced crime overall (19%) and across categories (18% for violent, 14% for property, but no significant effect for other offenses); relatively smaller forces did not have a net crime reducing effect (i.e. observable in administrative data). Event-study specifications show that the crime-reducing effect of large forces is not immediate (delayed by one to two years) and increases over time.
Insignificant leads support the parallel trend assumption and a lack of anticipatory effects.
Finally, we pay careful attention to the potential role of spill-overs and crime displacement in the analyses of both the Met and county forces. Spill-overs may lead to a reduction in crime in neighboring areas, if there are spill-overs either in policing across catchment areas or in deterrence and/or incapacitation effects. In contrast, crime in neighboring areas can increase if criminals simply migrate away from the policed areas to commit crimes elsewhere (crime displacement). The extent to which the latter applies is clearly relevant to identify whether the introduction of police reduced crime overall or just locally. Using the geocoded Old Bailey data, we find little evidence of displacement within London from inside to just outside the Met's catchment area. Similarly, we do not find evidence suggestive of crime displacement from London/Middlesex to neighboring counties after the introduction of the Met.
Finally, though our main county results are robust to controlling for spill-over effects of neighboring forces, the existence of neighboring forces does indeed have an impact on local county crime: neighbors with relatively large forces decrease local crime while those with relatively small forces increase local property crime.
The remainder of the paper proceeds as follows. Section 2 provides institutional details related to the 1829 creation of the Met and the subsequent roll-out of county police forces.
Sections 3 and 4 present data and analysis for the Met and county roll-out, respectively. Section 5 concludes with a discussion of the external validity of these historical experiments to today.

Institutional Background
This section highlights the institutional context and details of the introduction of the London Metropolitan Police Force and the county force roll-out. Other criminal justice reforms (e.g. abolishing capital punishment) and societal changes (e.g. population growth) that are relevant to the analysis will be discussed in detail throughout the paper.

The Introduction of the London Metropolitan Police in 1829
The idea of policing did, of course, exist prior to the Metropolitan Police Act of 1829. In fact, unpaid and part-time local (night) watchmen date to the Westminster Watch Act of 1735.
London's Bow Street Runners, who were sworn constables of Westminster, date to around 1750 (Emsley, 2009). As there were only eight, they did not have a physical presence and were not meant to deter crime, but rather to locate and arrest serious offenders. Initially, these Runners were similar to the 18 th century thief-takers, i.e. men who earned their livings from private and public rewards upon the convictions of 'serious' criminals. By the end of the 1700s, however, the Runners were essentially full-time policemen located at the Bow Street house, which became a centralized collection point of crime incidents for the Runners to follow-up on. 12 The Bow Street office was used as a model to establish seven additional Police Offices in  (Davis, 1984). Thus, despite the separation of policing and magistracy responsibilities, these police (or magistracy) offices played an essential role in crime processing. Even post-Met, a known offender would be processed through these offices (which existed until 1839) and recorded in the daily reports from these offices (as reflected in the introductory text of these reports).
Finally, the Metropolitan Police Act (10 Geo.4, c.44) created the London Metropolitan Police (the 'Met') on September 29, 1829. This was the first professional police force in the world. Initially 1,000 men strong, there were more than 3,000 officers by May 1830: in other words, in just six months, there was a more than 20-fold increase in the number of officers in London (from around 150 pre-Met). Panel A of Figure 1 documents the weekly number of hires from September 1829 to March 1830, and Panel B the Met's weekly growth until 1856. It shows that (i) initial hiring happened in two stages, with hiring of recruits for six inner divisions in September 1829 and the 11 outer divisions in February 1830 (see Appendix Table A1) and (ii) the Met grew almost constantly in the next 30 years to about 6,000 men in 1856.
The initial catchment area of the Met was within an approximately 7-mile radius from Charing Cross in Central London and extended to 15-miles in 1839. 13 Excluded from the initial catchment area, however, were the City of London (which established its own force in 1832, expanded in 1839 and still distinct today) and, until 1839, the Thames River Police. 14 Panel A of Figure 2 presents a historical map of the original jurisdiction of the Metropolitan Police.
Panel B shows that the (geocoded) pre-existing police offices were centrally located within the 7-mile radius (and even a smaller 4-mile radius). Appendix Table A1 shows that an equal number of police were hired into each division, regardless of the geographic size. This implies that the Met Police were likely to be more visible on their beats in the smaller inner divisions.
There is thus a potentially more intense treatment in a shorter radius around Charing Cross.
On October 6, 1829, the Metropolitan Police opened its first station -Scotland Yard -on a street called Great Scotland Yard (near 4 Whitehall Place and Charing Cross). This became the home station of the Met, including the two police commissioners. 15 As the force expanded, other buildings in this area were taken over, but eventually the Met started opening police stations throughout London. To the best of our knowledge, few if any stations were opened in the 1830s. The earliest schedule of Met Police stations we have found dates to May 1, 1873, listing each station by police division, including the tenure at the current building (though not all have dates). There are more than 175 stations listed but less than 10 with leases dating to the early 1830s (and this includes Bow Street, Marylebone, Marlborough, and Scotland Yard). 16 In addition to the sharp and large increase in the sheer number of 'police', there was a shift in the primary task of an officer to deterrence. To this end, Metropolitan Police officers were assigned to walk a beat -a regular route -at a pace of 2.5 miles per hour; the beat was intentionally small to increase visibility and the new policemen 'were supposed to get to know 13 While all descriptions of the formation of the Met describe this 7-mile radius, no explicit distance was written in the original act. Rather, the Act includes a "List of the parishes, townships, precincts, and places constituting 'The Metropolitan Police District'". That list includes 88 parishes or places for which we geocoded the main point of interest (e.g. parish church); 85 lie within 7-miles from Charing Cross and all are within 8-miles. Moreover, 75% of the locations are within 4-miles. Our main analysis uses the 7-mile radius to define all potentially treated areas, but also breaks this up into a treated inner circle and potentially less intensely treated outer circle (i.e. patrols are less visible due to the larger geographic area). We also test the robustness of our results to an 8-mile radius. 14 Before 1832, 'policing' in the City of London was the responsibility of the City's Day Patrol and Night Patrol. By 1803, these patrols were 16 men strong and increased to 49 men by 1815. In April 1832, the City Day Police, incorporating the previous Day Patrol and expanded to 100 men, became fully operational. In November 1838, the City Day Police and the Nightly Watch (which had replaced the Night Patrol), merged into one establishment from which the City of London Police was created in August 1839. This information is based on a leaflet, accessed on the London Metropolitan Archives website on May 17, 2018: https://www.cityoflondon.gov.uk/things-todo/london-metropolitan-archives/visitor-information/Documents/01-family-history-at-lma.pdf 15 http://www.historyhouse.co.uk/articles/scotland_yard.html, accessed April 29, 2019. 16 See MEPO 4/234 from the National Archives. everyone who lived on these beats'. 17 Increased standards and quality may also have increased the effectiveness of the new police. Documents reporting the reason of removal of officers from the force make clear that 'police quality' was taken seriously. Panel A of Figure 3 shows the weekly number of leavers among those recruited before March 1831. Panels B and C show the weekly number of post-1833 removals by broad reason (resignation, dismissal or death) and dismissals by reason (drunk, neglect or misconduct, criminal behavior or other). These figures demonstrate high turnover especially at the very beginnings of the Met; one can even observe annual firing spikes for being drunk on duty around Christmas.
Who were the men in the new police? When the Met was created, selection criteria were not yet strict and men between the ages of 18 and 35 were eligible to apply. The job was physically demanding and subject to strict discipline as highlighted above, but offered more security than other work places and, as such, attracted, among others, previously unemployed workers (see Dell, 2004). The new police were paid a wage comparable to that of an unskilled agricultural worker, in an effort to recruit men who did not resemble gentlemen and who could gain the trust of the everyday man. 18 As work conditions became more attractive over time, recruitment became more restrictive in terms of age and physical requirements. The new police received very little training -the first formal training school was only established in 1907. 19 One question that naturally arises is whether the new officers were just the old 'police' with a new job title. We argue that this is unlikely given that the size of the 'old police' amounts to only less than 3% of the size of the Met by May 1830. 20 Moreover, even if some of the old police did become Met officers, they were not doing the same job -they were now patrolling the streets of London to deter crime. This newfound emphasis on deterrence is a fundamental component of the institutional changes being studied here.
Clearly, a relevant question is why the Met was created. Was it a direct response to rising crime? This is indeed possible as the 1829 Metropolitan Police Act itself states: 17 While this was possible in the inner divisions in Central London, beats in the outer divisions were often larger and it is plausible that policemen in these divisions were not able to fulfill these tasks (see Emsley, 2009 20 We support this conclusion by comparing registers of the first 3,000 officers hired by the Met (Source: MEPO 4/31, National Archives London) to those hired into the Bow Street Foot Patrole in the years leading up to 1829 (Source: MEPO 4/508, National Archives London). Only 156 men were hired into Bow Street between 1823 and 1829. Since turnover is high in the early years, this 156 only corresponds to hires and not the existing level of pre-Met 'police'. We can observe (using names) that a number were hired by the Met in the initial hiring wave; 24 of the last 34 Bow Street hires pre-Met subsequently joined the Met, but 9 were already dismissed by May 1830 and a number of others soon after. "[…] offences against property have of late increased in and near the metropolis; and the local establishments of nightly watch and nightly police have been found inadequate to the prevention and detection of crime, by reason of the frequent unfitness of the individuals employed, the insufficiency of their number, the limited sphere of their authority, and their want of connection and co-operation with each other […]" But, anecdotal evidence also points towards alternative reasons for forming the Met, including a need for a centralized (non-military) body to maintain order, police provision independent of parish wealth, and a desire for order and tidiness. 21 The first of Sir Robert Peel's nine Principles of Law Enforcement highlights these alternative reasons: "The basic mission for which police exist is to prevent crime and disorder as an alternative to the repression of crime and disorder by military force and severity of legal punishment." 22

The Roll-out of Professional Police Forces Across England and Wales
Professional forces were subsequently introduced in counties and boroughs throughout England and Wales via three acts: The 1835 Municipal Corporations Act, the County Police Act of 1839 (or 1839 Rural Constabulary Act) and the County and Borough Police Act of 1856.
The 1835 Act required the boroughs to appoint both a watch committee and sufficient number of fit men to act as constables, tasked with preserving the peace and preventing crime.
There was general resistance, such that by 1837 only 93 of 171 boroughs even claimed to have established a force (Hart, 1955). Many admitted to fulfilling 'statutory obligations' by reappointing previous 'police' (rather than selecting new recruits; Hart, 1955). Rather than studying the limited and fuzzy implementation of the 1835 Act, we focus on the rural county forces created by the 1839 and 1856 Acts.
The 1839 Act gave the Quarter Sessions' justices in each county the power to create a police force for all or part of the county if they chose. This act also provided guidance regarding the structure of such a force (Stallion and Wall, 1999), including a pay scale set by the Home Office. 23 Why were the 1835 and 1839 Acts passed? Hart (1955) argues that these acts were not a response to criminals fleeing already treated areas, a conclusion that our analysis of spillovers from London to the neighboring counties supports. Rather, she argues that an increased concern about relying on the military and deficiencies in the implementations of 21 See http://www.open.ac.uk/Arts/history-from-police-archives/Met6Kt/MetHistory/mhFormMetPol.html (last accessed on May 17, 2018). 22 The military had a limited pre-Met role, not extending to patrolling the streets in the sense of everyday policing. In particular, they were sent in to dissolve unlawful gatherings or to (violently) suppress riots (Dell, 2004). 23 A county constable should be paid somewhat more than an agricultural worker. Last accessed October 22, 2018, see https://www.npcc.police.uk/Publication/History%20of%20Police%20Office%20Pay%20Framework.pdf,. earlier acts motivated the 1839, and ultimately, 1856 Acts (Hart, 1956).
The 1856 Act consisted of four main features. First, at the next General or Quarter Sessions after December 1, 1856, a police force had to be established in every borough or county without an existing one. Second, all forces (new and old) had to be 'efficient', largely defined as being sufficiently large relative to the population size. Third, an Inspectorate of Constabulary was created to annually inspect and certify 'efficiency' for all forces, introducing a large measure of centralization to local policing. Fourth, clothing for constables and 25% of wages would be paid by the Treasury upon certification (Hart, 1956).
In 1856, three inspection districts -Northern, Midlands, and Southern -were formed, each with an assigned inspector. According to Cowley and Todd (2006), the initial (unofficial) inspections in 1857 found many counties with inefficient or even non-existent forces. The inspectors assessed efficiency according to (i) the size of the force, (ii) the ratio of officers to the population, (iii) the quality of supervision, and (iv) the degree of cooperation with neighboring forces. Stipulated by the 1839 Act, one officer per 1,000 people was taken as the norm by the inspectors. Following unofficial advice given during the preliminary inspections in early 1857, only five districts were declared inefficient in the first official inspection, but just one (Rutland) the following year (Cowley and Todd, 2006). 24 The Inspectors rigidly interpreted the requirement of a sufficient ratio of officers per population; even the Home Secretary, Sir Vernon Harcourt, highlighted this in 1883: "…the fanciful cast-iron rule of so many [police]men per 1,000 inhabitants. Nothing can be more ridiculous than to apply the same measure to all places alike regardless of circumstances."

London Data Description
Our London analysis necessitates geocoded historical crime data to identify crimes in the treated and control areas. We use two data sources with respective advantages and limitations.

The first is the Proceedings of the Old Bailey. The Old Bailey is the Central Criminal
Court of London and the surrounding county of Middlesex, and responsible for all felony trials.
The Proceedings were published after each monthly court session and include the records of more than 200,000 trials from 1700 to 1913; these have since been digitized by The Old Bailey Proceedings Online. Though many variables (e.g. offense type, verdict, and sentence) are tagged and easily identifiable, we also manually extracted and coded the location and date of 24 Rutland remained inefficient until the 1861/62 inspection year. the crime as well as the characteristics of police witnesses (number, type, and crime scene presence), which we will use to assess the first-stage implementation of the Metropolitan Police Act. 25 Given the time-intensive nature of the transcription and historical geocoding, we focus on the most serious offenses of murder/manslaughter, robbery, and burglary/housebreaking from 1821 to 1837. 26 For these, we can assume that their felony status (and hence representation at the Old Bailey versus a lesser court) did not change. Importantly, the emphasis on the most serious offenses further alleviates concerns regarding changes in crime reporting following the introduction of the Met -i.e. a robbery would always be reported.
To geocode the data, a research assistant transcribed the most detailed address available in the Proceedings (e.g. an intersection, parish/district name or street end/mid points) and mapped these locations into modern day London maps to obtain postcodes and geocoordinates. 27 Using the geocoded location and date of offense, we classify offenses as in the treatment and control areas (within/outside a 7-mile radius of Charing Cross and within/outside the City of London, respectively) before and after the Met's introduction. Thus, with the Old Bailey data, we can estimate both simple pre-post and difference-in-differences specifications.  Table A2 provides the number of trials by crime type as well as details regarding police witnesses within and outside a 7-mile radius of Charing Cross and in the City of London for different time windows. One statistic that stands out is the relatively low number of murder/manslaughter trials (just 258 in all areas over 1821-1837). Appendix Figure 1 demonstrates that this low murder rate is not driven by only 25 We have previously used the Old Bailey data in projects studying (i) the impact of abolishing the death penalty on jury verdicts, (ii) path dependency in jury decisions, and (iii) the gender gap in jury and judge decisions from 1715 to 1900 (see Bindler andHjalmarsson, 2017, 2018 andforthcoming). 26 We have also geocoded robbery, burglary, and murder/manslaughter for the longer period of 1820 to 1850, but focus on this shorter window to avoid the 1837 abolition of the death penalty for robbery and burglary. In addition, after the initial data coding, we noted an unusual dip in burglary from the mid-1820s to the mid-1830s. There were hardly any offenses labelled as burglary during this time, while there was a sharp increase in the offenses labelled housebreaking. We therefore geocoded housebreaking offenses for the 1820-1837 window to supplement our analysis. Housebreaking and burglary are treated as one combined offense category for the entire period. 27 The same RA coded all locations, and though they were aware of the general question, they were unaware of the specifics of the research design. There was no opportunity for manipulation in the geocoding. Whenever locations have changed names (e.g. street names), we identify the current address using historical maps (roughly 40% of our regression sample). When the most detailed address is a long street (about 11% of our sample), we geocode the nearest street endpoint as the location (i.e., assign potentially untreated observations to the treatment area). Results are qualitatively robust to excluding either of those 'fuzzy' locations (see Appendix Table A3). 28 Shapefiles for the postcode areas were obtained from Maproom's UK Postcodes Shapefiles. measuring crimes that go to trial; Old Bailey murder trials track well with an alternative time series of London homicides that is not based on trials. A more likely explanation is the limited ability of coroners at this time to identify potential murders (Emmerichs, 2001).
The limitation of the Old Bailey data only including serious felonies that go to trial is addressed by our second data source -the Report or Account of the Proceedings at the several Police Offices. These are reports by the nine police offices that were run by the pre-1829 magistrates, which continued until 1839. 29 We manually transcribed the data from January to April of 1828 (the year pre-reform), 1830 (the year post-reform) as well as 1831 and 1832.
Unfortunately, these daily police reports did not exist before 1828 and those for the second half of 1828 and 1829 are missing. 30 For each office and day (except Sundays), a detailed description of 'charges', 'informations' and 'property stolen' are reported. We use these data to create two measures of crime incidence: (i) the daily number of 'property stolen' entries and (ii) the daily number of property, violent, and other 'informations'. Most comparable to modern day arrest data, we also code the daily number of charges by crime category (property, violent, other). Our interpretation of these different measures is that daily informations and stolen property reports represent uncleared crime incidents while charges represent cleared crime incidents. To address the possibility that the introduction of police simply shifts uncleared crimes into the cleared category, we also create a measure that aggregates all types of incidents.
In contrast to the Old Bailey trial data, the daily report data include both felonies and misdemeanors and cleared and uncleared crimes (even those not going to trial). Yet, we cannot geocode the offenses into treated and control areas of London (the magistrates deal with all of London) and are thus restricted to a pre-post design. Given the high frequency of the data, we can, however, estimate the total effect on crime in London (the sum of any crime reduction and potential counteracting displacement to control areas) in a narrow window around the reform.

Evidence of the Introduction of the Metropolitan Police (Old Bailey Data)
We begin by assessing whether there is evidence of the introduction of the Met in the Old Bailey trial reports. Police witnesses were called constables (both before and after the creation of the Met), policeman (a post-Met label), watchman (a pre-Met label) and a handful of other labels 29 See Appendix Figure B1 for an example page of data, which are publicly available from the National Archives. 30 The files for the second half of 1828 as well as for 1829 have, according to information on the website of the National Archives, been lost. We therefore coded data from the documents corresponding to the months of January until April for the years 1828 (MEPO 4/12), 1830 (MEPO 4/13), 1831 (MEPO 4/15) and 1832 (MEPO 4/17). that were either predominantly pre or post-Met. 31 Do we see an increase in the number and/or these different types of police witnesses at trial after the Met was created? An important caveat is that this analysis conditions on crimes brought to trial: We cannot control for the possibility that the new police affect the number of crimes committed or the likelihood of a trial.
Panel A of Figure 5 plots the annual share of trials with a police witness of any sort for both the treated (i.e. within 7-miles but not the City of London) and potential control area (outside 7-miles or in the City of London). There is no obvious change in the proportion of trials with any police witness around the 1829 introduction of the Met. Panel B presents the share of trials in the treated and control areas with different types of police witnesses. 32 In the treated area, the share of trials with an 'old' labelled witness (watchman or other) drops sharply from about 70% to 20% while the share with a 'new' label (policeman or other) increased from 0% to almost 50%. In the control areas, we also see an increase in the share with 'new' police, but it is much smaller and more gradual than in the treatment area.
To account for the different sample sizes (and precision) in the different areas (see Panel C of Figure 5, which looks at the number rather than share of trials), as well as the potentially different composition of offenses, Table 1 looks at the 'first-stage' more formally by estimating pre-post designs for each potential treatment and control area. We divide the treatment area into two areas (within 4 miles and 4-7 miles from Charing Cross) to allow for a potentially more intense treatment in the inner divisions (i.e. more visible patrol presence as highlighted in Section 2.1). The two control areas include (i) offenses outside the 7-mile area and (ii) the City of London. These are simple regressions of each measure of police presence at crime trial i for offense o in area a at date t on a dummy indicating whether the offense occurred after the introduction of the Met and offense type fixed effects. These results are presented for two windows -1821-1837 and 1828-1832 -which we carry through the Old Bailey analysis. The latter mimics the estimation window of our second data source (daily police reports).
Consistent with the descriptive figure, Table 1 shows little evidence that the creation of the Met increased the presence of any police at a trial. But, it significantly changed the type of witness: The pre-post specifications show that the likelihood of a trial having a 'new' police witness increased by 57 and 46 percentage points in the 4 and 4-7 miles radius areas, respectively (using the 1821-37 window) while the presence of 'old' police decreased by 49 31 Other predominantly pre-Met labels include beadle, conductor, marshalsman, officer, patrol and street keeper. Other predominantly post-Met labels include inspector, sergeant, superintendent, captain and Thames. 32 Type of the police witness refers to any of the first five police witnesses; less than 1% of sample trials have more witnesses. The presence of constables, a label that is not distinctively pre-or post-Met, is excluded from this figure. and 25 percentage points respectively. Thus, the pre-post analysis confirms that there was a treatment, and suggests it may have been stronger in the inner (4-mile) circle. Also consistent with the figure, the pre-post specification for the control area (more than 7-miles) indicates some increase (16 percentage points) in 'new' but no change in 'old' police. In the City of London, potentially treated after April 1832, a shift from old to new police is seen in the larger window (including post-1832) but is much smaller and/or insignificant in the shorter window.
Finally, as the Met officers were constantly walking a short beat, it is plausible they were increasingly present at the crime scene itself, either by witnessing the crime or being close enough to be called for assistance, i.e. a shorter response time. This may depend on crime type and be especially relevant for street crimes. Columns (7) and (8) of Table 1 look at this in the pre-post regressions: There is a significant 8 percentage point increase in police presence at a crime scene in the 4-mile radius for the larger sample period (the estimate is similar but less precise for the uncertainty 4-7 mile area, while there is no such effect in the control areas).
These changes are not seen, however, in the short time window.
Though substantially smaller than for the treated area, the above analysis finds some evidence of an increase in new police in the control areas. Why? There are a number of plausible explanations. It could simply be (i) measurement error in our geocoding or (ii) that the term 'police' is increasingly used in the Proceedings by court reporters, regardless of the actual type of police (the same court reporter is responsible for the entire Proceedings, regardless of offense location). Alternatively, (iii) there could be spill-overs of the Metropolitan Police into the control areas. This could occur because the 7-mile radius/City of London is not a perfect boundary and some Met police actually patrol this area or some crimes committed outside the 7-mile radius or in the City led to arrests within the 7-mile radius. If such spill-overs existed and the control group was partially treated, we would under-estimate the treatment effect in a difference-in-differences analysis of the effect of the Met on crime. To assess the plausibility of such spill-overs, Figure 6 presents kernel densities of post-Met trials with and without police at the crime scene (Panel A) and with at least one new or one old type of police witness (Panel B), both by distance from Charing Cross. If there were no spill-overs, then one would expect to see a drop in the density just after the 7-mile threshold. The figures do not suggest a substantial spill-over of Metropolitan policing (the densities are close to zero around and outside the 7mile mark), but they do reinforce the basic findings from this section. The new police are observed to be present in the treated area, and to a greater extent in the 4-mile radius than the 4 to 7-mile radius. Police are also more likely to be present at the crime scene in the 4-mile radius.

Pre-Post Comparison of Means (Old Bailey Data)
Having established that the creation of the Met affected 'policing' in London, we turn to whether it affected crime. To do so, we have to temporally and geographically aggregate the data. In our baseline, we aggregate at the month by area level: treated (less than 4 miles from Charing Cross), uncertain (4 to 7 miles from Charing Cross) and control areas (more than 7 miles from Charing Cross plus the City of London). Table 2 compares the average number of crimes overall and by crime type (burglary, robbery and homicide) before and after introducing the Met. Panels A and B show means for 1821-1837 and 1828-1832, respectively. 33 In the larger window, there is a significant reduction of 12% (8.24 to 7.29) in the average number of total monthly crimes in the treated area; there are similar reductions (20%) in the shorter window.
This change is driven in both windows by robbery, which decreases by 44% and 53%, respectively. Such a decrease is not seen in the less intensively treated uncertainty area (if anything, there is a significant increase in burglaries in the longer window). No significant changes are observed in the control area or the City of London in the short window, though a significant increase in robbery is seen in the control and an increase (decrease) in burglary (robbery) in the City in the longer window. 34 Appendix Figure A2 takes these differences in means a step further, and plots them separately for each one-mile radius from Charing Cross.
The only evidence of a crime reduction is for robbery within the treated area.

Main Empirical Specification (Old Bailey Data)
To make the case that these post-Met crime reductions in the treated area have a causal interpretation, we turn to the difference-in-differences model in equation (1), which uses the area outside the 7-mile radius and the City of London as the best possible control groups. We again split the potentially treated areas into areas with a certain (within 4-mile radius) and an uncertain treatment intensity (4 to 7-mile radius), consistent with the higher treatment intensity in the inner circle and suggestive evidence from the difference-in-means comparison above.
The outcome variable is the number of trials overall and for offense o in area a during time period t. The baseline analysis aggregates the data at the month (t) and area (a) level, using the four previously defined areas (treatment, uncertain, control and City of London). We later conduct robustness tests to alternative aggregation levels (weeks and circles around Charing Cross). Year, month and area fixed effects are included. 33 Significance levels are based on simple pre-post regressions; the results are robust to including month dummies. 34 Similar results are found when using an 8-mile instead of a 7-mile radius from Charing Cross. (1) Intuitively, we estimate the change in crime in the treated areas before and after the introduction of the Met compared to the change in crime in the control areas. Compared to the simple prepost analyses, this allows us to account for general trends in crime that would have occurred independently of the reform. For this to be the case, the usual parallel trend assumption must hold and we must assume that during the estimation window nothing else changed in the treatment but not in the control group (or vice versa) that could have affected crime rates. We formally test for pre-reform differences between the treatment and control areas when we move from the difference-in-differences to an event-study design and discuss potential confounders. 35

Spillovers, Crime Displacement and Other Potential Confounders (Old-Bailey Data)
Are there potential confounders? We discuss five potential concerns. One obvious candidate is the City Day Police which became operational in the City of London in April 1832.
It is possible that the City Day Police introduced a similar treatment to the City of London as the Met did to the treatment area. Thus, part of our control group (City of London) was partially treated in 1832 which (if anything) leads to a downwards bias in the estimated treatment effect.
Nonetheless, we show that our results are robust to either re-allocating the City of London to the treatment group after April 1832, the uncertainty group, or dropping it completely.
A second potential confounder is the first cholera epidemic of 1832, to which the deaths of almost 7,000 in London have been attributed. 36 The epidemic could certainly have affected crime through multiple channels: directly through public riots (Tynkynnen, 1995), by affecting police resources (directly through ill/dying officers or indirectly as officers responsibilities are shifted away from crime prevention), or by impacting the population of criminals (who may be incapacitated by the disease or driven to commit crimes). If the epidemic differentially impacted the population in the treatment and control areas, then this could violate the difference-indifferences assumptions. To explicitly look at the geographical and temporal distribution of cholera in London, we use a new source of data -the Returns to Death of the Metropolitan Police officers from 1829 to 1889, which provides the date of death, police division to which the officer is assigned and often the reason of death. Appendix Figure A3 demonstrates that cholera arrived and peaked in 1832, diminished by 1833, and almost disappeared by 1834. 19 35 Robust standard errors are used in the baseline. Appendix Table A4 assesses the sensitivity of these standard errors to a wild cluster bootstrap clustering by area (treatment, uncertainly, control, and City): if anything, the findings become more precise. 36 See for instance https://www.choleraandthethames.co.uk/ (accessed April 29, 2019).
officers died from cholera, and all but one of these deaths were in July-September (peaking in August). Despite the equal number of officers across divisions, deaths were not equally distributed, with more deaths in inner London. However, our shorter estimation window mostly avoids this concern; in addition, our results are robust to further dropping all trials after May 1832, i.e. when only looking at the pre-cholera period. Moreover, while the differential exposure of the treated and control areas to the cholera epidemic raises the possibility of a violation of the parallel-trends assumption, empirically we do not find this to be the case: Eventstudy specifications with leads and lags (presented later in this section) suggest parallel pretrends in crime and hence support the assumption of no differential trends in crime. no crime-specific movements determining the year a specific category was reformed.
A fourth potential confounder is that the period is characterized by dynamic population growth. Appendix Figure A4 shows decadal population estimates and growth for Inner London (excluding the City), Outer London and the City of London. 37 In the first half of the 19 th century, Inner London grows at the highest rate (almost 25%) between each census, with Outer London not too far off (though the Inner London population is substantially larger), while the City of London does not grow at all. To the extent that population growth implies more potential criminals and increases in crime, this would bias us against finding a crime-reducing effect in the pre-post analysis. Likewise, the faster growth in the treated areas would bias us against finding a crime-reducing effect of the Met in a difference-in-difference analysis. However, the extent to which such differential population growth is a concern is again mitigated by our emphasis on the short window around the reform and by observing parallel pre-trends in crime.
Finally, to determine whether the introduction of professional police decreased crime overall or just locally (where police were introduced), one needs to understand whether police spill-over or crime displacement effects exist. Our above discussion of witnesses already raised the possibility of the former. Though it would actually attenuate our estimates of a crime reducing effect of police, we concluded that there was little evidence of a substantial police spill-over. With regards to crime displacement, if criminals chose to commit crime in less policed areas than the newly treated Met jurisdiction, this would bias the difference-indifferences estimates in the direction of a crime reducing effect. One could even conclude crime was reduced even if all crimes from the treated area were simply re-located to the control area.
To assess the extent to which displacement is a concern, we return to the pre-post estimations in Table 2. There is no evidence of an increase in crime in control areas in the short window. A spill-over cannot be ruled out in the longer window, however, as the average number of monthly robberies outside the 7-mile radius actually doubles, though it is still an order of magnitude smaller than in the treated area. To take a closer look at potential displacement, Figure 6 plots kernel densities of crime locations (relative to Charing Cross) for the periods before and after the introduction of the Met. If there is displacement, one would expect an increase in the postreform density just outside the 7-mile radius, where the Met was not introduced. While one can in fact see such a 'blip' for each crime category around this distance, we highlight that (i) it is negligible relative to the amount of crime in the treated area and (ii) similar blips are seen in the pre-Met period, suggesting it is not completely driven by displacement. 38

Main Results and Robustness Tests (Old-Bailey Data)
The results from the difference-in-differences estimation are shown in Table 3. Columns (1) to (3) correspond to the baseline specification (with the City of London classified as a control area) for three windows: 1821-1837, 1825-1835, and 1828-1832. Panel A shows the results for total crime, and Panels B to D separately by crime type. Using the largest window, we find that the introduction of the Met leads to highly significant decreases in trials in the treatment relative to the control area for total crime, which is driven by robbery. The baseline effects are sizeable: Relative to the average number of pre-Met crimes in the treatment group, the point estimates (in the larger window) translate into a reduction in combined crime of 14% and 40% for robbery. Though at least partially treated, we do not find any effects of the Met on crime in the uncertainty area; this could imply that there was no change in crime levels in the uncertainty area (maybe due to a smaller deterrence effect as police were less visible in larger 38 In this period, criminals would likely be travelling on foot. Horse drawn stage coaches could be hired, and from 1829, the first 'omnibuses' were introduced in central London (horse-drawn buses), but these alternatives were expensive. See https://www.oldbaileyonline.org/static/Transport.jsp, last accessed June 19, 2018, andHeblich et al. (2018). In this context, the control area with a radius of 7 to 15 miles from Charing Cross is not small. divisions) or that the crime reduction effect was offset by increased apprehensions.
Focusing on the inner 4-mile radius, we note that the difference-in-differences estimates are close to the simple pre-post comparison of means. Further, moving to a narrower estimation window (and mitigating potential confounders), the difference-in-differences specification yields similarly sized effects (18% overall and 46% for robbery, respectively). The remaining columns of Table 3 show the sensitivity of the main results to alternatively assigning the City of London to the treatment group after the introduction of the City Day Police in April 1832, the uncertainty group, and excluding the City of London from the analysis completely.
Unsurprisingly (as the treatment is distorted), the former attenuates the point estimates (though they remain significant), while the latter cases result in robbery point estimates only marginally different from the baseline. Our main finding that the Met led to significant and sizeable reductions in robbery (trials) is robust to alternative estimation strategies and windows.  (1) and (2) respectively. We estimate a more flexible specification that interacts the treatment and uncertainty indicator with dummies for 2-year intervals before and after the introduction of the Met. To account for the mid-year timing of the Met's creation, we define a year from September to August. The purpose of these specifications is twofold: Use the leads to test the plausibility of the parallel trends assumption and study the dynamic effects of creating the Met.
Were the effects immediate, and did they change over time (as officer quality increased with both experience and in recruiting)? The results are supportive of parallel trends for all crime categories, and for both treatment and uncertainty areas: The coefficients are not significantly different from zero in the years leading up to the reform. There is no evidence of a short or long-run effect for homicide or burglary. For robbery, the effect is immediate and persistent. while (4) -(6) consider the month by 1-mile distance band level (i.e. smaller geographic areas).
Since crime is a rarer event in these smaller units, we adopt an extensive margin measure of crime (any crime) for this table; we cannot use the same margin in the baseline aggregation due to a lack of variation (100% of treated areas have, on average, at least one crime per month).
We see the same pattern of results. A significant reduction in robbery trials (ranging from 11.5-16.8 percentage points) is seen after introducing the Met regardless of the window or level of aggregation. In contrast to the baseline, we do observe a sometimes significant reduction in the chance of burglary of 6-9 percentage points; precision is generally lost in the smallest window.
Finally, Appendix Table A3 demonstrates the robustness of the baseline results to a series of sensitivity tests, including: (i) baseline area specific time trends, (ii) excluding crimes reported to be 'somewhere' on a long street, which could lead to crimes being miss-classified as treated offenses given our geocoding strategy, (iii) including only crimes for which we could identify the coordinates without having to refer to historical maps, and (iv) excluding offenses with missing crime dates (rather than instead assigning trial dates, as in the baseline).

Summary Statistics (Daily Police Reports)
The second part of the London analysis uses a simple pre-post design to analyze the daily crime reports described in Section 3.1. Though the raw data include nine offices, we exclude the Thames Police Office from the analysis as the Thames River Police are not in the jurisdiction of the Met and the composition and nature of crimes in the Thames jurisdiction (docks and water) is likely to differ from the surrounding offices. 39 Table 5 presents summary statistics for the remaining eight offices for the entire period, the pre-reform period (1828)

Main Empirical Specification and Results (Daily Police Reports)
Equation (2)  To alleviate such concerns, we limit the sample period to the year before and after the reform for large parts of this analysis. A second concern is that having only one pre-period of data (January to April of 1828) limits our ability to say anything about pre-existing trends in crime. But, one argument made for the new police was rising crime rates -it would therefore be hard to imagine deterrence being confounded by a downward trend in crime. Moreover, the Old Bailey analysis found the results to be robust to the smaller time window and both pre-post and difference-in-differences models. Table 6 presents the baseline pre-post results using the daily crime reports for each outcome: the number of all cleared and uncleared incidents (Panel A), any and number of informations (Panels B and C), any stolen property reports (Panel D), and number of charges (Panel E). Column (1) shows the raw pre-post difference when the sample is restricted to one year before and after the reform only (i.e. 1828 and 1830) including all crime categories. There is a significant reduction in the chance of observing any informations by 15 percentage points (32% relative to the 1828 mean), the number of informations by 0.302 (38%), and the likelihood of any stolen property incidents by 9.8 percentage points (25%). In contrast, there is an increase in the total number of charges by 0.88 (16.6%). Panel A shows that the combined effect on the total number of incidents is positive (0.34 or 5.1%) and marginally significant. We build up to the baseline specification by adding police office fixed effects in column (2) and calendar week and day of the week fixed effects in columns (3) and (4); these have little impact on the raw estimates. 40 Column (5) includes the daily reports for January to April of two additional postreform years (1831 and 1832). For all outcomes, the point estimates increase while their sign and precision are unchanged. We study possible reasons for this pattern in Table 7.
To interpret these results, one must recall the potential differences between crime measures. Both informations and stolen property reports are proxies for criminal incidents when there is either no known suspect or the suspect has not been caught: these are uncleared crime incidents. For charges, on the other hand, there is always a known suspect in hand -in this sense, they are cleared crime incidents. The combined measure of cleared and uncleared incidents is our best proxy for total crime incidents. The above analysis finds that the introduction of the Met significantly reduced violent crime incidents (both cleared and uncleared), but for property crimes, there was a reduction in uncleared but increase in cleared offenses, with the magnitude of the latter being larger. That is, for property crime, it appears as if the apprehension/clearance effect of the police is greater than the deterrence/incapacitation channels. Why? One reason is that the physical presence of the Met officers walking the streets may have allowed them to apprehend property offenders, such as pick pocketers, as crimes were being committed. Did the police actually deter crimes, or just substitute crimes from the uncleared to cleared categories? It is impossible to say for property crime, but the total reduction for violent crime suggests that there was at least some true reduction in criminal behavior. 41 40 Appendix Table A5 presents a number of robustness checks, including estimates: (i) at the weekly instead of the daily level, (ii) excluding incomplete weeks of data, as occur at the beginning of each year or in weeks with holidays, (iii) excluding one office at a time to rule out that our results are driven by one particular office, and (iv) based on alternative specifications, including logarithms of the dependent variable (where appropriate). 41 The crime reductions in the daily reports is not driven by a shift in reporting from the old offices to new police stations. As discussed earlier, there were no new police stations during this period (except Scotland Yard) and the magistrates housed in these offices were responsible for processing crimes both before and after the Met.

Extensions: Short-and Medium-Term Dynamics (Daily Police Reports)
This section aims to better understand the dynamic effects of creating the Met. As described in Section 2.1, there were two initial hiring waves, the inner divisions in September 1829 and the outer divisions in February 1830. 42 We take advantage of this two-stage initial hiring and estimate a specification that allows for different coefficients on the treatment variable in (i) January 1830 (after the introduction of the Met and before the second hiring wave), (ii) all other months in 1830 (after the second hiring wave), (iii) 1831 and (iv) 1832. That is, we estimate the baseline specification presented in equation (2), but decompose the treatment into multiple time periods. We can thus study the immediate effect of a large hiring wave in February 1830 (and thereby implicitly allow for heterogeneous effects of the two hiring stages) and whether the impact of the formation of the Met changes over time. Table 7 shows the results for the combination of all incidents in columns (1) -(3), the number of charges in (4) -(6), any informations in (7) - (9), and stolen property reports in column (10). There are two key takeaways. First, the point estimates generally increase over time, which may not be surprising given the increasing quality of police after the initial introduction of the Met, and the continued hiring. Second, while some of the reduction in uncleared crimes is immediate (for violent informations and stolen property reports), the significant increase in the clearance (charges) of property crimes does not kick in until the second wave. This may mean two things: (i) Visible police (notwithstanding low quality) may deter crime even if they do not increase clearance rates, and (ii) clearance rates may have been higher already in those places with the first wave of hiring than those locations affected by the second hiring wave.

Summary and Discussion of the London Metropolitan Police Findings
The first key takeaway of the London Met analyses if that there is convincing evidence of the that a higher clearance rate completely offsets such an effect. Third, our pre-post analysis of the daily police report data supports these interpretations of the Old Bailey results: A significant reduction in (both cleared and uncleared) violent crimes, including robbery, is observed, while evidence of off-setting channels are seen for property crime: a reduction in uncleared crimes (which could include deterrence) but increase in cleared crimes. We hesitate to say more than that the findings from the two analyses are consistent with each other, or to make direct comparisons of the Old Bailey and daily reports analyses: the selected Old Bailey offenses represent just a small subset (the most serious) of the wide range offenses in the daily reports.
Fourth, we find little evidence of spill-overs in policing or crime displacement.

County Data
Our evaluation of the roll-out of county police forces uses manually transcribed archival records to measure police force creation and crime. Year of force formation and initial size for each county (see Appendix Table A6) were obtained from a Police History Society book (Stallion and Wall, 1999). Annual force size data is only available in the Judicial Statistics yearbooks after the 1856 mandatory creation of forces. Figure 9 illustrates  A potential disadvantage of using trials to measure crime is that it may confound changes in prosecution behavior (in which the police may have played a significant role at the time) with changes in criminal behavior. However, Appendix Figure A5 demonstrates that all three crime measures available in the Judicial Statistics after 1857, i.e. trials (our measure), total number of indictable crimes committed and the total number of individuals apprehended for indictable offenses, move in lock-step until the early 1890s. Another potential concern is the impact of the 1855 Criminal Justice Act on the number of trials. The Act gave judges the ability to summarily deal with larceny cases, which is reflected in the large decrease in the number of trials in the year before the mandatory creation of the police forces, specifically for property offenses (see Panel A of Figure 11). Given that this Act is a national shock (comparable figures by county are available upon request), our difference-in-differences approach mitigates related concerns.
Moreover, we estimate the effect of creating a force for two categories unaffected by that reform (violent and other) and for the early reforming counties using a sample period prior to 1855.
Finally, we use available census records from 1851 and 1861 to generate relevant control variables at the county level: the share male, married, native, in various age groups, unemployed or out of the labor force, and farmers. 46 We have coded the annual county population from the Judicial Statistics after 1857, and use the 1851 and 1841 censuses to estimate the population in earlier years. 47 We use this population variable to create crime rates.

Sample Creation, Treatment Definition and Summary Statistics
We use a difference-in-differences model to estimate the impact of creating rural county police forces on crime. We restrict our sample to rural county jurisdictions for which we can both reliably identify the year of force creation and measure crime. force for any or all of the fiscal year; for the former, the first treated year is typically only partially treated whereas for the latter, the first treated year is fully treated. The above-described treatment only captures whether there existed any professional county police force, but nothing about the quality of the force. One measure of quality explicitly used by the inspectors tasked with annually certifying each force is the relative size of the force, i.e. the number of people per officer in the county. We can measure this upon force formation, and use this to characterize whether the new force was sufficiently large. Finally, our baseline analysis uses a window of eight years before and after the earliest and latest reform years, respectively, i.e. 1832 to 1865, but also conduct sensitivity checks with respect to the start and end years of the sample. other, respectively). 75% of the counties are in England and the average county population was close to 200,000 in 1858. It is also clear that the police forces became larger relative to the population over time: the ratio of people to police averaged 2,857 at the time of force formation but was down to 1,700 by 1858. In terms of characterizing early, mid and late reformers, Table   8 shows that early reformers were on average largest in terms of population and acreage, while the mid-reformers were smallest in both of these measures. In addition, the earliest reformers did not have the highest crime rate (based on the whole time period): the average crime rate per 1,000 population was 1.9 for early reformers, 2.5 for mid-reformers, and 1.5 for late reformers.

Empirical Approach: County Police Force Formation
The difference-in-differences model is presented in equation (3) For β to represent the causal effect of creating a professional force on crime, we make the usual parallel trends assumption that the change in crime (trial) rates in treated counties would have been the same as in control counties in the absence of the reform. Panel B of Figure 11 illustrates the plausibility of this assumption by presenting the average annual log charges separately for the early, mid and late reformers. Crime rates are remarkably parallel for these three groups. We more formally test the plausibility of the parallel trends assumption in an event study design allowing for differential effects leading up to the reform. Another identifying assumption is that the timing of police force formation is random. Anecdotally, this seems reasonable, at least for the earliest and latest reformers. The earliest reformers created a force right after the passage of the 1839 Act, but did not lobby for this Act or know it was coming.
The last reformers only created a force when they had to after the 1856 Act; again, (to the best of our knowledge) they did not know it was coming. We test this assumption in Section 4.5.
In analyzing the formation of county police forces, the same potential confounders of increased reporting and/or clearance rates exist as in London. Using trials to measure crime only allows us to estimate the combined effect of deterrence/incapacitation and these confounders. Without any measure of uncleared crime incidents, we can only detect a deterrence and/or incapacitation effect if it is larger than these offsetting channels: a null or increasing effect of police on charges does not rule out the existence of such a crime reducing effect, but does not allow us to detect it in the data. Table 9 presents the results of estimating the baseline specification for the estimation window 28 1832 to 1865. The dependent variable is the log number of trials in columns (1) and (2) and the log number of trials per capita (crime rate) in columns (3) and (4). Panel A considers all charges while panels B to D consider violent, property, and other property charges, respectively. The variable of interest, Force, is equal to one in any county-year combination in which there exists a police force for at least part of the year (columns (1) and (3)) or all of the year (columns (2) and (4)), and equal to zero otherwise. The first insight from Table 9 is that the creation of a police force, on average, does not have a significant effect on overall, violent, property or other crime. Second, the estimates are comparable when using the log number of crimes or the log crime rate; going forward, we emphasize the log number given the measurement error concerns in the denominator of the crime rate. Third, estimates are comparable when defining treatment as having a police force for any or all of the year; going forward, we use the latter (which should if anything bias us against finding a crime-reducing effect).

The Effect of County Police Force Formation on Crime: Results and Robustness
The results in Table 9 show the effect of creating any police force, regardless of its quality.
Yet, some forces may have been in name only or thought to be insufficiently large by the inspectors. The lack of an overall effect of force formation on crime could be masking differential effects of forces of varying quality. We use the relative size of the force, i.e. the number of people per policeman, to assess whether the effect on crime depends on whether the force is sufficiently large. In studying that question in an expanded specification, we must rely on the added assumption that 'sufficiently large' is conditionally random. Though the 1839 Act recommended having 1,000 people per policeman, few (if any) forces initially achieved this standard. Some initial evidence regarding the determinants (or lack thereof) of force type can be seen in Table 8. Simply put, it is not just early reformers (maybe particularly motivated counties) that had a sufficiently large force (using a 1,500 threshold); rather, similar proportions of early (20%), mid (33%) and late (17%) reformers were sufficiently large at formation. We return to the determinants of the relative size of the new forces in the next section. (1) to 2,500 in column (5). Under the strictest and weakest thresholds, there are 10 and 30 sufficiently large forces, respectively. There is a differential impact of force size: Column (1) shows that creating a sufficiently large force with less than 1,500 people per policeman decreases the overall number of crimes by approximately 19%; this effect is seen for both violent and property crime (18% and 14%, respectively) but not for other offenses. In contrast, creating an insufficiently large force does not significantly affect crime overall; instead, it (insignificantly) increases the number of property and other crimes. It is the positive effect of such forces on the largest crime category of property offenses that is masking the crime reducing effect of creating a sufficiently large force in the baseline regressions. While the crime reducing effect of a sufficiently large force gets smaller as we relax the sufficiency threshold in columns (2) to (5), we still see an overall reduction in crime.
To study the dynamic effects of force creation, we estimate event-study specifications where we interact our treatments (creation of sufficiently and insufficiently large forces) with dummies for two-year intervals leading up to and following the reform. The omitted category is the two years immediately prior to the first fully treated (fiscal) year. The results are shown in Figure 12 for all crimes categories combined, and for each offense category separately in Appendix Figure A6. The top and bottom panels of Figure 12 present the estimates for the sufficiently and insufficiently large forces, respectively; note that both come from the same regression. The following conclusions can be drawn: First, the negative effect of sufficiently large police force formation on crime is generally not immediate but starts around three years after the reform. Second, the crime-reducing effects of sufficiently large forces become larger in magnitude over time. Third, for forces that were insufficient in size upon creation, no negative effect on crime is seen in any of the eight years after the force is created. These event study specifications also provide tests of the plausibility of the parallel trends assumption and the 'random' timing of force creation: There are no significant differences in crime rates in the years leading up to the reform for either (in)sufficiently large forces (neither overall nor by crime category). Additional robustness and identification tests are presented in the next section.

Sensitivity and Identification Tests for County Police Analysis
Appendix Table A8 presents a sensitivity analysis of our main finding that only the creation of a sufficiently large force visibly reduces crime (using the 1,500 threshold). The results are robust to (i) controlling for county population, England and inspection region dummies, and inspector specific and large county (above median acreage) specific time trends, (ii) reducing the sample period by three years on both sides of the window, (iii) breaking the sample into two periods: 1832 to 1849 (identified off early reformers) and 1850 to 1865 (identified off late reformers), and (iv) restricting the sample to the 36 English counties (excluding Wales).
We next turn to tests of the identifying assumptions of randomness in (i) the timing of force formation and (ii) the relative size of the created force. One potential question is whether forces were adopted as a result of crime being displaced from neighboring counties that had previously adopted a force. We first look at this directly in the context of the 1829 introduction of the London Met: Did this increase crime in the surrounding counties, and trigger force adoption? Though the patterns in Figure 9 suggest otherwise, Appendix Table A9 assesses  Appendix Table A10 looks more broadly, i.e. for all counties, at the determinants of being an early reformer (adopted by 1840) in columns (1)-(6) and the year of adoption for all counties in columns (7) - (13). For the latter, the dependent variable is a dummy equal to one in the year a county creates a force and zero in prior years; counties exit the sample once a force is created, as there is no longer a choice to be made. 49 We consider the role of: the county's own lagged charge rate, whether the neighboring counties had forces (and their relative sizes), population, the lagged charge rate in neighboring counties, and the number of initial police officers in the county area who are not part of the county force (measured before the formation of the county force and includes previously existing borough police). We find little evidence that these variables predict the timing of adoption. The most consistently significant variable is population: larger counties were significantly more likely to adopt early, but (i) the point estimates are close to zero, (ii) county fixed effects control for larger versus smaller counties, and (iii) population does not predict timing for later adopters. Having an insufficiently large neighboring force decreases the chance of adoption in a given year, raising the question of whether creating a force has spill-over effects on nearby counties, which we address shortly.
Appendix through which a relatively large neighboring force can decrease local crime. One possibility is cooperation between forces, which to some extent would reflect 'true' spillovers in policing.
This was even a criterion that the inspectors evaluated; to the extent that a large force signals high quality, it could also be correlated with a high degree of cooperation. Another possible channel is that a large enough neighboring force decreases crime by incapacitating criminals who would have committed crimes in multiple counties. With respect to the increase in property crime due to a relatively small neighboring force, we can put forward one main channel: this small neighboring force does not have the capacity (or cooperative nature) to police criminals who commit their crimes elsewhere. That is, offenders may steal in one county but hide from the authorities in a neighboring (poorly policed) county. Anecdotal evidence of this is provided in the 1839 Police Commissioner's Report (page 280, paragraph 208): "The state of insecurity produced in guarded towns by the unprotected state of surrounding districts is not confined to the facilities of escape furnished to delinquents for crimes committed within the towns, but the subsistence given in the unprotected districts to the predatory classes who harbor in the towns increases the expense of guarding against them." 51

Discussion of County Police Force Formation Results
To summarize, the above analysis of the roll-out of professional county forces has four key findings. First, the creation of 'sufficiently large' county forces reduces trials overall and for both violent and property crime. Second, the formation of 'insufficiently large' forces does not 51 https://babel.hathitrust.org/cgi/pt?id=uc1.a0000872473;view=1up;seq=7, last accessed April 28, 2019.
have an observable crime (trial) reducing effect. Third, the effect of creating a sufficient force is not immediate and increases over time. Fourth, the introduction of the London Met did not displace crime to neighboring counties. Fifth, there are spill-over effects of neighboring county forces, with an insufficiently (sufficiently) large neighbor increasing (decreasing) 'local' crime.
What do these findings tell us about why the creation of a county force decreases crime?
On the one hand, there are two main channels through which crime can be reduced: deterrence and incapacitation. On the other hand, creating a police force might increase measures of 'crime' through increased reporting of crime incidents and apprehensions. The net negative effect for sufficiently large forces suggests that deterrence and incapacitation outweigh reporting and apprehension channels. However, while (anecdotally) the aim of the new forces was deterrence, we cannot empirically disentangle it from incapacitation (the same is actually true in our London analysis). Finally, the increase in the size of the crime-reducing effect over time highlights the importance of quality: Force 'quality' clearly improved over time as people per officer ratios decreased, supervisors were increasingly hired, and experience was gained.
Linked to the notion of quality, what can we conclude about the impact of creating relatively small police force? While there is no negative net effect on the number of charges brought to trial, we cannot rule out the possibility of deterrence and/or incapacitation. We simply cannot disentangle whether there is a null effect because a force had no effect at all or because the positive and negative channels off-set each other.

Conclusion
This paper exploits two natural experiments in history -the introduction of the London Metropolitan Police in 1829 and the subsequent roll-out of county police forces throughout England and Wales -to estimate the effect on crime of the introduction of professional forces, which were for the first time explicitly tasked with deterrence. In London, we find evidence consistent with a crime-reducing effect, especially for violent crimes (including robbery); for property crimes, we find clear evidence of a reduction in uncleared crimes but also an increase in cleared crimes. Our county analysis finds that introducing 'sufficiently large' police forces reduced crime overall and across crime categories, while relatively small forces did not have a visible net crime reducing effect. symmetric upon creation and collapse of a force. Nevertheless, given the lack of research to date on this margin, we believe our study fills a significant gap in the literature.
Finally, police forces in less developed countries today are being disbanded and new forces created in an effort to eliminate police corruption. 56 Our results may have important policy implications today with respect to institution building in these countries; one potential 'lesson' is that the quality of the institution plays a fundamental role.

Figure 4. Geocoded Data from the Old Bailey Proceedings
NOTES -The figure plots geocoded crime locations of murders, manslaughters, robberies and burglaries trialed at the Old Bailey between 1821 and 1837. The two red circle mark a 7-and 15-mile radius from Charing Cross, respectively. Each dot represents a trial-defendant observation; the green dots represent crime locations inside the City of London (within 7-mile radius) as well as outside the 7-mile radius and the blue dots represent crime locations within the 7-mile radius and not in the City of London. The borders represent modern day postcode areas; the respective shapefiles were obtained from Maproom's UK Postcodes Shapefiles and contain OS, Royal Mail and National Statistics data.

Panel B. Change in Type of Police Witnesses at Trial (Share of Trials) Panel C. Change in Type of Police Witnesses at Trial (Number of Trials)
NOTES -Panel A shows the annual share of homicide, robbery, and burglary trials at the Old Bailey from 1821 to 1837 with at least one police present as a witness. The black solid line represents trials for crimes located in the treatment group (within 7 miles from Charing Cross), the grey dashed line trials for crimes located in the control group (more than 7 miles from Charing Cross or in the City of London). Panel B (C) shows the annual share (number) of trials that, among the first five witnesses present at the trial, had at least one of either the new type (black) or the old type (

Panel B. Average Log Charges for Early, Mid and Late Reforming Counties
NOTES -Panel A shows the annual number of charges brought to trial in England and Wales, overall and by crime type and for all counties included in the analysis sample, i.e. excluding Middlesex, York, Suffolk, and Sussex. The red vertical line marks 1857, the year when the creation of a county police force became mandatory. Panel B shows the annual average log charges separately for early, mid and late reformers, again excluding the counties of Middlesex, Sussex, York, and Suffolk. The red vertical lines correspond to the earliest and latest years of reform implementation (1841 and 1858). The figures are based on data from the Judicial Statistics, see Section 4.1 and 4.2 for details.

Panel B: Insufficiently Large Police Forces, Log-Level Specification, All Charges
NOTES -The above event-study figures are based on log-level regressions of offenses on sufficiently large (ratio<1,500) and insufficiently large (ratio>1,500) force dummies that are interacted with two-year intervals. All years eight or more years after police force formation and nine or more years before police force formation are combined, respectively. The omitted category is the period 1-2 years before the police force is created, where the first year (0) is defined as the first full fiscal year following the creation of a police force. The above figures show the estimated coefficients and 95% confidence intervals for the baseline specification with county and year fixed effects. The dots/lines correspond to the point estimates and 95% confidence intervals. The vertical line represents the two years before the police force is created (the omitted category). The dashed horizontal line represents the (average) diff-in-diff estimate.  314  100  314  100  314  100  314 100 NOTES -The table shows regression results for the first stage outcomes (dummy variables for any police witness at the trial, any "new" police witness, any "old" police witness, and whether any police was at the crime scene). Panel A shows pre-post specifications that include offense fixed effects. The regressions are based on data from the Old Bailey Proceedings Online and own transcriptions/calculations; the sample includes trials for robbery, burglary and homicide. See Section 3.2 for details. Robust standard errors are shown in parentheses below the coefficient. * p<0.1, ** p<0.05, *** p<0.01. 03 NOTES -The table shows the average number of monthly trials for crimes that took place before and after the introduction of the Metropolitan Police (and their difference), for all as well as each offense separately, as well as by area (separately and all areas together). The treated area includes trials for crimes located within 4 miles from Charing Cross, the uncertain area those located between 4 and 7 miles from Charing Cross, the control area those located more than 7 miles from Charing Cross and City of London those located in the City of London. Panel A shows the results for 1821-1837, Panel B for 1828-1832. The numbers are based on data from the Old Bailey Proceedings Online and own transcriptions/calculations; the sample includes trials for robbery, burglary and homicide. See the text for details. Statistical significance of the difference is based on corresponding before-after regressions. * p<0.1, ** p<0.05, *** p<0.01.  (3), the treated area includes crimes located within 4 miles from Charing Cross, the uncertain area those located between 4 and 7 miles from Charing Cross, the control area those located more than 7 miles from Charing Cross and City of London those located in the City of London. In columns (4) to (5), the City of London is alternatively assigned to the treatment group after establishing their own police (1832), in columns (6) to (7) to the uncertainty group and in columns (8) to (9) the City of London is excluded from the analysis. Regressions are based on manually geocoded data from the Old Bailey Proceedings Online and own transcriptions/calculations; see the text for details. Robust standard errors are shown in parentheses below the coefficient. * p<0.1, ** p<0.05, *** p<0.01.  Table 3 with alternative aggregation levels. In columns (1) to (3), the dependent variable is a dummy variable indicating whether there is any crime in given week and area. In columns (4) to (6), the dependent variable is a dummy variable indicating whether there is any crime in a given month and distance band from Charing Cross. Distance bands are circles around Charing Cross: less than 1 mile, 1-2 miles, 2-3 miles, … , 13-14 miles and more than 14 miles. See Table 3 for further details on specification and data. Robust standard errors are shown in parentheses below the coefficient. * p<0.1, ** p<0.05, *** p<0.01. 438 NOTES-The table shows summary statistics for the analysis sample based on the daily crime reports described in more detail in Section 3.1. The first three columns show the number of observations, the mean and standard deviations for the different crime measures for the complete sample, the remaining columns separately for 1828 (one year prereform), 1830 (one year post-reform) and the years 1830-1832 (three years post-reform). The data was manually transcribed from the Report or Account of the Proceedings of the several Police Offices sourced from the National Archives (MEPO 4/12, 4/13, 4/15 and 4/17).  (2 (2) but allowing for separate coefficients by time after the introduction of the Met (note that the second wave of hiring, mainly in the outer divisions, occurred in February 1830). For a description of the underlying data, see Section 3.1. The dependent variable in columns (1) to (3) is the total number of any incidents (charges, informations, property stolen incidents), in columns (4) to (6) the number of charges, in columns (7) to (9) a dummy variable indicating whether there are any informations, and in column (10) a dummy variable indicating whether there are any stolen property. The top of each column indicates the years included in the sample and the crime category. The p-value corresponds to the test of equality of all four shown coefficients. Robust standard errors are shown in parentheses below the estimated coefficients. *** p<0.01, ** p<0.05, * p<0.1   (3)), where the variable of interest Force is equal to one for a county c in any year t after which a county police force has been created. The year of police force formation is defined as the first year with a police force for any of the year in columns (1) and (3) and a police force for all of the year in columns (2) and (4). All specifications include county and year fixed effects. The baseline sample includes 48 counties for the years 1832-1865. Standard errors are clustered by county and shown in brackets below the coefficient. *** p<0.01, ** p<0.05, * p<0.1  Table 9), where the variables of interest -Force Sufficiently Large and Force Insufficiently Large -are equal to one for a county c in any year t after which an sufficiently large or insufficiently large police force has been created. Sufficiency is defined according to the number of people per officer, and varies as indicated at the top of each column. The year of police force creation is defined as the first year with a police force for all of the fiscal year. All specifications include county and year fixed effects. The baseline sample includes 48 counties for the years 1832-1865. Standard errors are clustered by county and shown in brackets below the coefficient. *** p<0.01, ** p<0.05, * p<0.1 regression results when estimating the effects of having a police force (at all or one that is sufficiently or insufficiently large) in a neighboring county. A sufficiently large force (whether it is a county's own or a neighbor's police force) is defined as a police force with less than 1,500 people per officer. Middlesex, though excluded from the analysis sample, is classified as a sufficiently large neighbor for those sharing a border after 1829. The year of police force formation is defined as the first year with a police force for all of the year. All specifications include county and year fixed effects. The baseline sample includes 48 counties for the years 1832-1865. Standard errors are clustered by county and shown in brackets below the coefficient. *** p<0.01, ** p<0.05, * p<0.1 Yes NOTES -The table shows sensitivity analyses of the difference-in-differences estimation shown in columns (1) to (3) of Table 3 (see notes in that table for details on the baseline specification). The estimation windows are shown at the top of each column. Columns (1) to (2) add an area-specific annual trend; columns (3) and (4) exclude locations that were identified as "long streets" only (and potentially misclassified as treated); columns (5) and (6) exclude locations for which we had to refer to historical maps; columns (7) and (8) exclude observations for which the date of the actual crime is missing in the data and proxied by the session start date instead in the baseline estimation. Robust standard errors are shown in parentheses below the coefficient. *** p<0.01, ** p<0.05, * p<0.1

Appendix Table A4. Alternative Standard Errors (Old Bailey Data)
(1) (2) (3) (4) (5) (6) (7) (8)    Table 6. Columns (1) to (8) drop one office at the time from the regression sample; the excluded office is indicated at the top of each column. Columns (9) to (11) present the results when the data is aggregated at the weekly instead of the daily level for all weeks, complete weeks only and for all weeks but using the log instead of the level number of charges. Robust standard errors are shown in parentheses below the coefficient. *** p<0.01, ** p<0.05, * p<0.1  Table 10), where the variables of interest Force Sufficiently and Insufficiently Large are equal to one for a county c in any year t after which sufficiently large or insufficiently large force has been created, using a threshold of less than 1,500 people per officer. The year of force creation is defined as the first year with a force for all of the year. All specifications include county and year fixed effects. The baseline sample includes 48 counties for the years 1832-1865. The different specifications are indicated at the top and the bottom of the table, respectively. Standard errors are clustered by county and shown in brackets below the estimated coefficients. *** p<0.01, ** p<0.05, * p<0.  (1) to (6) is a dummy variable indicating whether a county adopted a force in 1840 (i.e. an early adopter); the explanatory variables are lagged measures of crime and dummy variables for whether the neighboring county already had a police force (which in the case of early adoption implies being a neighboring county to Middlesex). The dependent variable in columns (7) to (13) is a dummy variable for all counties that is equal to zero until the year of police force formation and one in the year of police force formation. Standard errors (clustered by county in columns (7) to (13)) are shown in brackets below the estimated coefficient. *** p<0.01, ** p<0.05, * p<0.1