The diaspora model for human migration

Abstract Migration’s impact spans various social dimensions, including demography, sustainability, politics, economy, and gender disparities. Yet, the decision-making process behind migrants choosing their destination remains elusive. Existing models primarily rely on population size and travel distance to explain the spatial patterns of migration flows, overlooking significant population heterogeneities. Paradoxically, migrants often travel long distances and to smaller destinations if their diaspora is present in those locations. To address this gap, we propose the diaspora model of migration, incorporating intensity (the number of people moving to a country), and assortativity (the destination within the country). Our model considers only the existing diaspora sizes in the destination country, influencing the probability of migrants selecting a specific residence. Despite its simplicity, our model accurately reproduces the observed stable flow and distribution of migration in Austria (postal code level) and US metropolitan areas, yielding precise estimates of migrant inflow at various geographic scales. Given the increase in international migrations, this study enlightens our understanding of migration flow heterogeneities, helping design more inclusive, integrated cities.


Introduction
Births, deaths and migration are the most relevant demographic components of population change, but migration is the most difficult to quantify, model and forecast [1].Mass movements of people change the spatial distribution of the population and explain why some places grow faster than others [2,3].The size of cities and hierarchy are heavily affected by migration [4,5,6,7,8].Today, international migrants would form the fourth largest country in the world, and approximately 1% of the World's GDP is sent as international remittances [9,10].Migration is a selective process that tends to attract young and highly skilled people into large cities, increasing the burden of human capital flight but easing economic disparities across borders [11,12,13].It is a core strategy for coping with unemployment, violence, or disasters [14,15,16,17,18].Migration eases the pressure of an ageing population and alters the gender imbalance [19].Accurately predicting the number of individuals who will relocate and their precise destinations holds significant Various reasons explain why people with different backgrounds are attracted to distinct places.For example, an expensive neighbourhood receives wealthier people, students move to university towns, startups attract engineers, or cities such as Los Angeles attract artists.While numerous reasons could explain why an individual migrates to a new destination, we observe surprising regularities and predictability in migration flows.One explanation is related to the flow of information.Individuals are more attracted to places where they have more information, mainly acquired from their social networks.In this case, early migrants reduce uncertainty and provide adequate information for late arrivals, creating a self-reinforcing mechanism [45,21,46,47,48].Most people move to places where they have pre-existing ties [49].This process relates to one of the most fundamental forces of our social life, namely homophily, the tendency to interact with similar others [50].Group identity based on race and ethnicity constructs leads to homophily and affinity between people [51].Therefore, homophily can directly or indirectly affect people's decision to migrate to a specific neighbourhood.A prime example of such affinity is migration and co-location due to strong same-race and same-ethnic dating or marriage preferences [52,53].
This study examines the influence of homophilic preferences on international migration.We use the term diaspora to refer to a group of people from one nationality living elsewhere.Consequently, we distinguish individuals not solely by their geographic location but, more significantly, by their country of origin.Thus, diaspora encompasses similarities in race, ethnicity, and more.We will see that the diaspora is a much more accurate explanation for modelling and predicting future migration beyond factors like distance or a city's population.Many social processes tend to be highly homophilic, and migration is no exception.
Here, we construct a novel migration model based on the pull impact of the diaspora.Instead of looking at population size, travel distance, or points of interest, our model uses only the diaspora size.We analyse two migration scenarios.First, we look at population registers in Austria and explore arrivals to the country from other parts of the world (SM A).Second, we use the international arrivals to metropolitan areas in the US to show the pull mechanism of diasporas (SM D).We show that migration is a highly homophilic process.Opposing the principles of the gravity model, migrants travel long distances and go to small cities if there is a sizeable diaspora in the destination.We estimate that diasporas have a pulling impact, where 10,000 individuals will attract roughly 1204 new arrivals yearly in the case of Austria.We show that diaspora size can accurately explain migration even at the neighbourhood level.The diaspora model is more precise than the gravity model, holding for both international arrivals at the postal code level in Austria and the metropolitan area level in the USA.

Diaspora migration model
Although there are many reasons why a migrant from some country chooses to move to some destination, similar reasons applied in the past to previous migrants from that country.Instead of observing and modelling the reasons, the principle we apply here is to look at how many people were already attracted to some destination and use it as a proxy to forecast future migrants (Figure 1).For a given destination, we model migration with a hierarchical model, where the first component captures how many people will arrive, and the second is deciding where they will move to [54].The first component is the intensity of the migration flow [55,56,6].The intensity of arrivals from country i is modelled with a homogeneous rate λ i .Thus, the hosting country expects λ i migrants from i daily.The second component is the assortativity.Once a person decides to move, they choose j destination among ν options in the hosting country (for example, metropolitan areas or neighbourhoods) with a Multinomial distribution.The probability that a person arriving from i goes to j is π ij , with j π ij = 1.Combining the two components -intensity and assortativity-the arrivals to destination j have a homogeneous rate λ i π ij .In essence, we model first how many people will move to a country and then how they choose a specific destination (SM B).We show that arrivals to a country can effectively be modelled by a constant rate, which depends on the country of origin (see the Methods).We capture the degree to which migration is a homophilic process by looking at the size of the diaspora (Figure 1).We assume that λ i can be expressed as ρR i , where R i is the total diaspora from country i, so the arrival rate to the country depends only on the size of the diaspora and a pull rate ρ that applies equally to all countries.Then, we assume that the assortativity can be expressed as π ij ∝ R ij , where R ij is the diaspora from country i in destination j (so R i = j R ij ).Combining both assumptions, arrivals from country i to location j have a rate ρR ij .Consequently, the expected number of migrants from i to destination j for t days, is

𝐴 𝐵
reflecting that arrivals depend on the size of the diaspora (details in the Methods).Ignoring other demographic processes (births and deaths), the diaspora model results in the conservation of assortativity, meaning that the way migrants have been distributed in the past among neighbourhoods will be the observed pattern in the future.Thus, within the time window considered, assortativity is stable (SM A).

Austrian migration dynamics
Migration data is often scarce and requires long periods of observation to distinguish between migration and other types of mobility [57,58].Here, we use individualised data, which captures the primary residence of all foreign-born individuals in the country.Population registers capture all address changes and have become the primary source of migration data [6,59,60].Population registers corresponding to all arrivals to Austria before a fixed date (December 2022) and 200 days later (labelled as "arrivals") are used to test the model.The data contains information regarding 1.46 million foreign-born individuals living in Austria, and it is used to determine the size of the diaspora of all countries at the postal code level, R ij (SM A).
Arrivals to Austria are used to quantify the intensity and assortativity of the migration flow.In total, 111,244 individuals arrived in the country during the 200 observed days.The daily pull rate is around ρ Aus = 3.29 × 10 −4 per person (see the Methods).Hence, we expect one arrival for every 1/ρ Aus ≈ 3, 031 people, and the same applies to any diaspora and any destination considered (Figure 2).The diaspora model estimates the number of arrivals from any country to any destination at granular levels (data for each destination is shown by a disc in Figure 2 B, C).It gives consistent estimates for geographic units (neighbourhoods) that can be aggregated to more extensive areas (such as cities or provinces).To assess the predictive power of the diaspora model, we compare it with the gravity model for the top nationalities arriving in Austria (Figure 3).The mean square error of the gravity model is 2.85 times bigger than the diaspora model.The gravity model is particularly weak in differentiating nationalities but also for small geographical areas and does not offer a clear metric for predicting arrivals (SM C).
Migrants are more likely to move to destinations with a significant diaspora, not necessarily places with a large population.However, places with a considerable diaspora also tend to have a large population, so the gravity model works relatively well in those limited cases.Nevertheless, the gravity model fails to capture details at small geographical scales.We compare the results of our model with a gravity model at the neighbourhood level.Vienna is divided into 23 districts (or "Bezirke").They are numbered "outwards", so the first district is the city centre, and the 20th-23rd districts are suburbs.The districts tend to be highly heterogeneous regarding demographic and income compositions.Vienna's 10th district is known for being highly multicultural, attracting nearly 8.3 times more people from Serbia than from Germany.In contrast, the 7th district (known for being the trendy shopping district) attracted 1.4 times more people from Germany than Serbia (Figure 4).There are many reasons why Germans are more likely to move to one neighbourhood in Vienna and Serbians to a different one, but similar reasons have applied to previous migrants.Similar patterns are observed within other countries of origin and city neighbourhoods.People from South America, for example, are four times more likely to move to Miami than to Houston.However, the opposite applies to people from Central America, who move more frequently to Houston instead (Figure 5).This phenomenon has persisted for years, although both metropolitan areas have roughly the same population and are at a similar distance to both origins.
Migrants do not necessarily choose large or closer cities as their destination.For example, people from Africa are more likely to move to Washington DC than they are to move to New York City, even when New York City has three times more population.On the contrary, people from Europe are two times more likely to move to New York City than to Washington, DC.The diaspora model can distinguish previous migrant populations and their assortativity, which allows us to estimate the inflow of migrants with high precision.
We test the diaspora model and compare it with the gravity model.For example, the diaspora model predicts 91,694 migrants to the Miami metropolitan area from all regions considered, but the gravity model only predicts around 39,492 migrants (SM D).However, we observed around 92,500 migrants.This underestimation from the gravity model is persistent in large metropolitan areas such as Washington, New York, Houston and San Francisco, where there is already a significant migrant population acting as a pulling force.

Discussion
Understanding population dynamics and the impact of the arriving migrants into a country is crucial to planning the provision of services and integrating people into the hosting society.Due to conflict, disasters, and a demographic expansion in the countries of origin, combined with the decline in the birth rates in other parts, the share of international migrants will keep increasing in the upcoming decades [9].Thus, accurately modelling migration flows will enhance how migration is managed across countries.
We distinguish two components of migration: intensity, which captures the number of migrants, and assortativity, which captures their destination once they have decided to move to a new country.For some country of origin, the intensity is modelled as a homogeneous process, meaning that a similar number of arrivals is expected daily.Although there are minor fluctuations (fewer arrivals during the weekend, for example), and there may be some seasonality and other fluctuations, the overall perspective is that the number of arrivals is roughly stable.We show that the size of the diaspora in the country can approximate the intensity rate.Then, the assortativity captures how destinations with big diasporas attract more people from the same country of origin.Our model uses the diaspora's size as the only input and explains migration at small geographical units, such as neighbourhoods.It is impossible to know why many people have moved to a place, but those reasons persist over time and may apply to others.The principle is that most reasons for moving to a new neighbourhood remain and keep attracting similar people.Adding the arrivals at the neighbourhood level gives the estimate at the city level, which also may be combined to obtain an estimate at the state or country level.Thus, diaspora size can be a unique factor in predicting where migrants will move.

Syria Germany Serbia
Our model does not predict migration shocks resulting, for example, from a crisis, as they fall outside the scope of its predictive abilities.However, dividing migration into intensity and assortativity enables modelling migration shocks by altering the intensity of arrivals.In the case of a shock elsewhere, the migration intensity will shift to an unknown value, but the assortativity will still explain where most people will be inclined to move.Nevertheless, when the arrival rate of migrants going through a crisis attains stable features -after a certain period, for example, in Syria and Ukraine-we can alter the intensity of arrivals and predict their expected assortativity.
People arriving from different countries go to specific neighbourhoods, so segregation is one of the unintended consequences of this process.Migrants do not necessarily seek to be surrounded by their diaspora, but frequently they are.The diaspora model of migration highlights one of the biggest challenges of migration.Minorities concentrate, fostering fewer interactions with others.This process has important implications, as integrating foreigners into national life is complicated when migrants form segregated communities.Our model helps illuminate the mechanisms when migrants choose their destination and guide policies for designing more inclusive and integrated societies.Governments and international organisations must support local authorities and implement strategies to improve migrant inclusion in urban areas [61].

Data
Arrivals and diaspora size for Austria were provided by the Federal Ministry of Interior of Austria, Bundesministerium für Inneres "BMI".The data includes all individuals in Austria who register their residence through the mandatory registration form called the "Meldezettel".We also have data on asylum seekers of all stages, whether seeking asylum, approved or rejected, and displaced migrants, for example, due to the Russian invasion of Ukraine.Our data does not cover short-term visitors, for example, tourists who are not obligated to register.In addition, we cannot quantify or detect undocumented migrants.Thus, they are not included in our analysis.
Data for the US was obtained from the American Community Survey -Census data [62].The data contains the resident population and an estimate of the number of arrivals to each metropolitan area (391 areas in total).The data includes the person's residence the year before the survey but not previous years, so repeat migration is not observed.International arrivals are grouped into eight categories: four in America (North America, Central America, South America and the Caribbean) and Africa, Asia, Europe, and Oceania.Data is available in yearly intervals aggregated in five-year periods, from 2009-2013 to the 2015-2019 surveys.

Constructing a diaspora model for the migration
We decompose the migration process with two hierarchical components: the migration intensity (related to the number of migrants from some country) and the assortativity (related to the destination).We use a two-step hierarchical model to consider the two steps separately.This method is frequently used in other domains, where a random variable is modelled by considering ordered steps [54].

Modelling the migration intensity
We start with the number of migrants and assume it follows a Poisson distribution such that: where M i (t) is the flow of migrants for a period of t days, from the country of origin i, where i = 1, 2, . . ., µ and with a daily rate λ i .The Poisson distribution is frequently used to model a variable that results from a counting process, such as migrants [58].The distribution depends on a single rate, and it is used for ignoring short-term fluctuations and looking only at the more general pattern.The expected number of migrants until day t is λ i t, an expression for the cumulative number of arrivals from country i.
First, we test if a uniform rate works during n days for estimating the arrival rate from different countries.The error term for day i gives e i (t) = M i (t) − λ i t.The sum of square errors over the observed days gives f (λ i ) = n t=0 e i (t) 2 .By setting f ′ (λ i ) = 0 we obtain that Since f (λ) is a continuous function and f ′′ (λ i ) = n(n + 1)(2n + 1)/3 > 0, then the value λ ⋆ i minimises the error.For a sufficiently large number of days and arrival rate, a Normal approximation to the Poisson distribution may be used to obtain the corresponding confidence interval, given by For a lower rate (or for fewer days), a Monte Carlo method may also be used to obtain plausible departures.For the top migrant countries to Austria, the estimated arrivals are approximated by the constant rate (Figure 6).The intensity of migration for country i gives the daily arrival rate for that country λ i .For t days, the expected number of arrivals is λ i t, plotted as a dashed line for each country.For a sufficiently large rate and number of days, a Normal approximation gives a 99% confidence interval, plotted for each country as the shaded triangle (with θ = 4).The observed number of arrivals falls within the shaded triangle, so we do not reject a constant arrival rate for those countries of origin.
There are some fluctuations in the daily number of arrivals.For example, very few people arrive during the weekends.However, a fixed arrival rate works well for modelling the daily arrival of migrants and enables us to ignore minor fluctuations.Equation 3 may be used to estimate the daily arrival rate for migrants from different countries.Although Equation 3 estimates the arrival rate, we then aim to approximate the rate based on the size of the country's diaspora.
The motivation is to model whether a bigger diaspora results in more arrivals (as observed in Figure 2).We take the data for all countries, considering that where R i is the size of the diaspora from country i, and ρ > 0 is a fixed pull rate for all countries of origin and depends only on the arrivals and existing diasporas at the destination.Thus, we assume that for a given country, the flow depends on the size of the diaspora R i and some fixed pull rate ρ.Following the same logic, we obtain the value of ρ by minimising the errors such that the error e i = M i − ρR i t.The sum of the squared errors over the observed days gives g(ρ) = i e 2 i By setting g ′ = 0, we obtain that an estimate for the pull rate that depends on the arrivals and diaspora of all countries.The second derivative of g is 2 µ i=1 R 2 i t 2 > 0, so the value of ρ ⋆ minimises the sum of the squared differences.Equation 5 depends on the size of the diaspora and the arrivals over µ countries, unlike Equation 3, which depends on the daily arrivals for a single country.
In the case of Austria, we obtain that ρ Aus = 3.29 × 10 −4 .Then, we can use the diaspora size and the estimated pull rate to express the arrival rate for country i as Equation 6 gives the arrival rate from county i considering the diaspora size of that country and a pull rate ρ that applies to all countries equally.This method overestimates arrivals from some countries (for example, Serbia and Turkey) and underestimates others (for example, Romania and Germany) and, in general, results in higher error than Equation 3(SM C).However, it gives an alternative expression to estimate rates that do not depend on the data for the daily arrivals to that country.The obtained value of ρ Aus = 3.29 × 10 −4 reflects that one person is expected to arrive daily for every 3,031 individuals from any diaspora.For example, in Austria, there are around 42,580 people from Poland, so 14 people are expected each day, or 2,800 people during the 200 days of observation.The observed number of arrivals from Poland during that period was 2,787 migrants (SM A).

Modelling the migration assortativity
Once the number of arrivals is known, we model their conditional destination in that country [30,63,64].We assume that once M i (t) = m persons arrive, they decide to reside in a particular location j -for example, a city in the destination country -depending on the size of the diaspora in the location j.Thus, we assume that the probability of a person from country i moving to location j follows: where R ij is the diaspora from country i in destination j.The diaspora is such that R i = j R ij is the overall size of the diaspora from country i.For example, if location j has 10% of the diaspora from i, we assume that the probability that a migrant moves to j among the ν destinations is also 10%.Destinations with bigger diaspora attract more migrants.The process is modelled as a Multinomial distribution: , where M ij (t) are the arrivals from i in location j and πi = (π i1 , π i2 , . . ., π iν ) is the vector with entries π ij corresponding to the probability of choosing j as their destination, where k π ik = 1.A Multinomial distribution, conditional on a Poisson distribution, also follows a Poisson distribution with combined rates of arrivals and success [58].Therefore, arrivals from country i to location j follow M ij (t) ∼ Pois(π ij λ i t).
If the daily rate of arrivals is known, then Equation 3gives an estimate of the arrivals at a granular level.If the diaspora size from other countries is known, then Equation 6gives an estimate that depends only on the diaspora size from country i in destination j.Combining both, arrivals from country i to location j follow Other migration models focus only on the assortativity of individuals, meaning that they only differentiate the likelihood of moving between destinations but ignore the number of people moving.However, migration intensity is more relevant than assortativity.Modelling how many people will arrive in a country should be the first explicit component of migration models.Arrivals can be modelled with a constant rate so that they can be predicted within a reasonable period.Further, the arrival rate is strongly linked to their diaspora size.We estimate how many people will move to a country if we observe how many people already live there.The second element, related to assortativity, has been captured by size, job offerings and others.Yet, diaspora size explains assortativity more accurately and with smaller geographical units than metropolitan areas.Because large cities tend to have a high diaspora share, the gravity model works relatively well to capture a general trend.However, the gravity model fails at smaller geographical units.According to the gravity model, two neighbourhoods of similar size are equally attractive to migrants, but this is never the case.Ethnic neighbourhoods, such as "Chinatowns", are part of the cultural landscape in most cities.

The diaspora model for human migration
Inferring diaspora size and subsequent migrations In the case of international migration to metropolitan areas in the US, census data estimates the yearly arrivals between 2009 and 2019 [62].It is disaggregated for eight regions of origin (Africa, Asia, the Caribbean, Central America, Europe, Northern America, Oceania, and South America).It gives the number of arrivals to 391 metropolitan areas and the countryside.We use the number of arrivals in one year to estimate arrivals in the subsequent one.Let A ij (t) be the number of arrivals from region i to metropolitan area j.We assume that the arrivals in that year result from some fixed pull rate λ and an unknown diaspora size D ij .Then, the expected number of arrivals for the next year follows a Poisson distribution with rate λD ij = A ij for some value of λ > 0. Thus, the expected number of arrivals in one year is the observed number during the previous year.
We compare the estimated number of arrivals E ij to the observed number A ij in 2019.At the regional level, for example, the model gives that 131,478 arrivals are expected from Africa, and between 130,767 and 132,188 arrivals are expected.The observed number of migrants from Africa in 2019 was 131,943.For other regions, the estimated inflow is within a 3.5% difference from the observed number of arrivals, except for the case of South America.The yearly number of migrants from South America to the US has nearly doubled in seven years.This increasing intensity in the annual number of migrants would be better captured considering the observed trend (more details in SM D).

A -Data description and observation
The data corresponding to arrivals to Austria is provided in the Complex Effects of Migration Patterns on Supply Capacities project.Access to the data is restricted for security and privacy reasons, and only authorised researchers can view the data.For each person, the data includes their country of origin and the neighbourhood in which they have a registered residence.The data gives the location of the diaspora for each country at the postal code level.Data before November 26, 2022 (referred to as "December" in the manuscript) does not identify each arrival date.It gives the age, gender, residential status, and country of origin of R = 1, 466, 113 migrants in Austria.For 200 days, the majority of arrivals to the country were captured at the moment when a person registered their residence through a registration form known as the "Meldezettel" with the Bundesministerium für Inneres (Federal Ministry of the Interior) when they apply for some form of residence permit in the country.Visits planned for shorter periods (tourism) do not require registering and are not counted.Legally, migrants in Austria are classified according to their residence status.For example, migrants who plan to stay in the country for less than six months are classified as foreigners, but if they stay longer, they are classified as settled migrants with a residence permit.On the other hand, refugees fall under a different classification depending on their stage, for example, seeking asylum, approved or rejected.As of 14 June 2023, refugees cover only 13.72% in our data (Table 1).We apply the same analysis to all classifications.The data contains information describing the nationality and residence of 1.46 million people from 192 countries.As of November 26, 2022, Austria has 1,542,349 registered migrants from 192 countries.Around 95% disclosed the main addresses and are considered here.We analyse the arrivals for 263 days divided into two parts: 200 days to train and 63 days to test.Within the period of analysis that considers 200 days, there were A = 111, 244 arrivals to the country, mainly from Ukraine, Romania, Germany and Syria.As of 14 June 2023 (after 200 days), around 75% of arrivals are from 15 countries (Table 2).

Migration Status Percentage
We test whether a uniform daily arrival explains the observed number of migrants from the top countries of origin.A uniform daily arrival is not rejected for the top 12 countries.The observed arrivals fall within the modelled intervals (Figure 7).We also test whether postal codes with a larger diaspora attracted more migrants.2: The top countries of origin in Austria, in descending order of arrivals within the observation period of 200 days.Only migrants with registered main addresses.We list countries with arrivals percentages above 2%.The diaspora is the country's pre-existing population size -before November 26, 2022.The diaspora model for human migration

B -Modelling intensity and assortativity
Let M i (t) be the number of arrivals from country i since time t = 0. We assume that M i (t) ∼ Pois(λ i t), so the expected number of arrivals during t days is λ i t.The Poisson distribution is frequently used to model discrete events (such as the number of arrivals) since it allows overlooking small perturbations or fluctuations and focuses on the more general picture, the daily arrivals.It depends on a single parameter, λ i , known as the (daily) rate, which is the expected number of arrivals per day.
Once a person decides to move to some country, they decide on a specific location, which can be as general as states or provinces or as particular as neighbourhoods.The person chooses location j with probability π ij .The destination, conditional on observing m arrivals, can be considered a Multinomial distribution with ν options.The vector πi = (π i1 , π i2 , . . ., π in ) captures the destination preferences for people from origin i.In particular, the decision of moving to destination k, with k ∈ 1, 2, . . ., n, is a Binomial distribution (with a probability of success π ij and with a probability of failure 1 − π ij ).Thus, the number of arrivals to destination k, conditional on observing m arrivals, It is possible to show that a Binomial distribution, conditional on a Poisson distribution, is also a Poisson distribution [58].It has a rate λ i π ik , which is the same rate but discounted by the probability π ik .Thus, arrivals to destination k are M ik (t) ∼ Pois(λ i π ik t).Modelling intensity and assortativity separately enables the disentangling of the process with a minimal set of parameters.
The assortativity of our model of migration estimates that people from i move to location j with probability π ij = R ij /R i , where R ij is the diaspora size of country i in location j, and R i = j R ij is the total diaspora.After some period, t, the new diaspora will have size , where b is the birth rate, d is the death rate, i is the inflow due to internal movements, o is the outflow due to internal movements, and λ corresponds to the new arrivals.Assuming that the impact of internal migration of the diaspora is negligible (meaning that i ≈ o), we get that Further, assuming that the birth and death rates are the same for all the diasporas, we get that , so the total diaspora also changes size due to the arrival of people.Then, the assortativity impact, after some period is π so it remains unchanged.Thus, the model conserves the distribution of the diaspora across destinations after the arrival of people is considered.

C -Model comparison
In this section, we compare our model with the gravity model.The gravity model is one of the most prominent ways in which social mobility is analysed.The gravity model captures the impact of size at the origin and destination countries and their distance [26,27,20,65,66].Gravity has been used, for example, to model trade between countries and cultural distances or frictions between distinct locations [26,27,23].The gravity model, however, does not quantify the intensity of migration but gives only a description of the assortativity.One of the most significant drawbacks of the gravity model is that it does not consider any temporal dimension, so it only ranks destinations depending on their size.Unfortunately, the gravity model does not provide the expected arrivals of migrants or an analogy to our diaspora pull rate; thus, we do not include it in the intensity error calculations.

Intensity
To assess the error in the expected arrivals of migrants, we use the 200 days of observations to construct a daily pull rate for every country λ i (Equation 3) and predict the arrivals in the next nine weeks (63 days since 14 June 2023), we choose to have a time window in weeks instead of months because migration patterns and data registration goes through a weekly cycle.We compare our estimate with the actual observed arrivals and find that our model can predict the observed arrivals with a margin of ±0.17 arrival per country per day (Table 3).Additionally, we compute the daily pull rate of Austria ρ Aus (Equation 5) and estimate the arrivals for all countries and find that using this method, our model can predict the observed arrivals with a margin of ±0.32 arrival per country per day.Thus, using ρ Aus , we get almost twice the error.However, we can rely on fewer data points.

Assortativity
For a fixed period, a country of origin i and destination j, we have modelled the flow D ij and compared it to the observed flow M ij .We compute the mean square error as:  3: Top 10 errors comparisons of the observed and modelled arrivals where Arrivals are the observed arrivals, λ i is the daily pull rate of every country, ρ Aus is the daily pull rate of Austria, Arr(λ i ) are the modelled arrivals using λ i , Arr(ρ Aus ) are the modelled arrivals using ρ Aus , Err(λ i ) and Err(ρ Aus ) are the squared error of Arr(λ i ) and Arr(ρ Aus ) respectively.The errors are ranked according to the squared error of λ i (Err(λ i ))in descending order for nine weeks (63 days).
(E r ) 2 is the square root of the sum of the squared error averaged over 63 days of observation and 192 countries.
where µ and ν are all the possible origins and destinations.The mean squared error can be used to compare distinct models, where a smaller error means better performance.
The gravity model assumes that destination j with population P j attracts population depending on its size, so we consider its assortativity as π g ij = f (P j , D ij ) for some function f that takes the size of the destination and the distance between origin and destination.We construct a gravity model G such that once a person has decided to move to a country, they choose their destination depending on its size.Thus, we also consider that the destination is picked as a Multinomial distribution depending on its size.Formally, we assume that once m people have moved from i, they will move to j depending on its size, so π G ij = P α j / j P α j , for some parameter α ≥ 0. We compare the diaspora and gravity models by comparing the mean square error (Figure 8).We compare the diaspora and the gravity model by analysing only their assortativity.That is, we assume m arrivals to some destination and distribute them depending on the assortativity of the model considered.We consider the sum of the squared error terms of each model (where a smaller error means that the model describes more accurately the assortativity of the arrivals).The difference between the error terms of the gravity and the diaspora models is enormous, particularly for the countries with the highest number of arrivals.For example, for arrivals from Germany, the gravity model has a squared error of 280.97, but the diaspora model has a squared error of 55.60.
We use the error equation outlined in Equation 9 to calculate the error of each origin country for all destinations.We average the error over 100 simulations for both models and get the average per country (Table 4).In total, there are 192 countries and 2,221 possible destinations.4: Error Comparison between the diaspora and the gravity model.We list the top 10 countries of origin for all postal codes, sorted in descending order according to the gravity model such that the Syria diaspora has the biggest gravity model error and the Poland diaspora has the lowest.E r is the mean squared error.

Country
A crucial aspect of migration models is considering different geographic levels.For example, detecting the number of arrivals at the province level is critical since some provisions are frequently managed at that level (such as health or education).However, in smaller units such as cities and neighbourhoods, forecasting the number of migrants plays a critical role.One of the most significant weaknesses of the gravity model is that it cannot predict migration at the neighbourhood level.The gravity model has a squared error of 4,925.15when we look at the arrivals to the 10th district of Vienna (Favoriten), but the diaspora model has a squared error of 500.76.Results show that the mean square error is 3.42 for the diaspora model but 9.77 for the gravity model.Thus, the average error of each destination for all countries is nearly three times bigger for the gravity model compared to the diaspora model (Table 5).5: Top 10 error comparison between the diaspora and the gravity model per postal code for all countries of origin, sorted in descending order according to the gravity model.The postal code 1100 (Favoriten, 9th district in Vienna) has the biggest gravity model error, and the postal code 1200 (Brigittenau, 20th district in Vienna) has the lowest.E r is the mean squared error.

D -International migration to the USA
We conduct our analysis of 387 USA Metropolitan areas -Mets.We exclude movements to the countryside and five areas Metropolitan areas added in the 2019 census.The census data also limits us to only eight diasporas where the migrants' countries of origin are classified: Asia, Europe, Central America, South America, Africa, the Caribbean Islands, North America and Oceania.We use the census data from 2013 to 2018 to estimate the arrivals of our selected Mets in 2019.
The gravity model proves insufficient to predict the migration flows with underestimation in big metropolitan areas and overestimation in small metropolitan areas (Figure 9).We use the observed total arrivals in both the diaspora and the gravity model, and we model the assortativity according to Met size in the case of the gravity model and according to average diaspora assortativity in the case of the diaspora model (Table 6).

2 1Figure 1 :
Figure 1: Illustration of the diaspora migration model.We divide migration into two separate components: intensity (related to the arrival of migrants) and assortativity (related to where migrants decide to go).The diaspora model uses the size of the pre-existing population (depicted as the people on the map with different colours to represent people with different backgrounds) to estimate a steady inflow of migrants (represented by the arrow thickness) and their distribution across two regions (marked as 1 and 2 in the map).

Figure 2 :
Figure 2: Observed intensity and assortativity of migrants.A -(Left) the intensity of arrivals concerning the pre-existing diaspora size.Each point represents a country of origin, the diaspora size is the number of existing migrants from that country of origin in Austria before December 2022, and the arrivals is the number of new migrants observed after 200 days.The black dotted line is the daily pull rate of ρ Aus .We highlight arrivals from Ukraine, Germany, Syria and Serbia because they belong to countries of origin with the biggest diaspora size in Austria and have different economic and political backgrounds and migration histories (SM A). (Right) observed cumulative arrival of the top diasporas within our observation period, with different arrival rates λ i .B -Assortativity of migrants from the top diasporas.Each point represents a postal code in Austria (other diasporas in SM A).

Figure 3 :
Figure 3: Model results and error estimation.A -Results of the diaspora model for migration (pink) compared to the gravity model (grey) and the empirical observations (black crosses) over 100 runs.The horizontal axis is the observed number of migrations in each postal code, while the vertical axis is the number of observed migrants.We only show postal codes with arrivals above five during the observation period.B -The mean square error of our diaspora model (pink) vs. gravity (grey) over 100 runs.C -Modelled vs. observed arrivals to Austria.The disc size is proportional to the pre-existing population at the postal code

Figure 4 :
Figure 4: Vienna model results.a -Heat map of Vienna of the observed arrivals in Austria for the four top diasporas in Austria.b -Heat map of the diaspora model estimates in Vienna.c -Heat map of the gravity model estimates in Vienna.d -Spider plots of the top four diasporas in Austria, where each section is one of Vienna's 23 districts.The ratio between the modelled and the observed arrivals -estimate ratio -are displayed for each district for both the gravity model in grey (G ER ) and the diaspora model in pink (D ER ).The inner circle (red) is when the observed and the modelled arrivals are equal.When the polygons are smaller than the circle, the model underestimates the number of migrants but overestimates that number when it is bigger.

Figure 5 :
Figure 5: Top USA metropolitan areas.Results of the arrival flows of the top four metropolitan areas in the US: New York, Los Angeles, Chicago and Dallas.We plot the diaspora model estimates (red), the observed flows (blue), and the gravity model estimates (grey) for eight estimated diasporas.The diasporas are ranked according to their total arrival flow in the US in 2019.The smallest diaspora is from Oceania, with around 110,000 individuals, while the largest is from Asia, with more than 25 million migrants.

Figure 6 :
Figure6: Estimating the pull rate (λ i ).The intensity of migration for country i gives the daily arrival rate for that country λ i .For t days, the expected number of arrivals is λ i t, plotted as a dashed line for each country.For a sufficiently large rate and number of days, a Normal approximation gives a 99% confidence interval, plotted for each country as the shaded triangle (with θ = 4).The observed number of arrivals falls within the shaded triangle, so we do not reject a constant arrival rate for those countries of origin.

Figure 7 :
Figure 7: Intensity and assortativity of the top Diasporas.The intensity of in the 200 days of observation of the top eight diasporas in Austria (left).The assortativity of the arrivals of the top eight diasporas (right).

Figure 8 :
Figure 8: Gravity and Diaspora model descriptions..We divide migration into two components: intensity (related to the arrival of individuals) and assortativity (related to where migrants decide to go).The diaspora model of migration uses the size of the pre-existing population of a certain diaspora.The gravity model uses the total pre-existing population without accounting for diasporas and individual differences.

Figure 9 :
Figure 9: USA Metropolis.Results of the arrival flows of all the metropolitan areas 387 in the US.We plot the diaspora model estimates (pink), the gravity model estimates (grey) and the observed flows (blue).The sizes of the observations vary depending on the size of Met.

Table 1 :
Migrants residential status as of 14 June 2023