Big Data to reduce segregation of Syrian refugees in Turkey

24 November, 2020
Image of a refuggee camp

By Daniel Rhoads, doctoral candidate of the CoSIN3 (Complex Systems @ IN3) group.


What does it mean for two social groups to be integrated? Is it enough for  them to live in the same places, or speak the same languages? Most of us would  say there is more to it than that. Integration and segregation are highly complex  issues, and are challenging to measure. And yet, it is critical to have a solid  understanding of them in order to face many contemporary problems. A recent  collaborative work by the IN3’s CoSIN3 group (Complex Systems @ IN3) and the TURBA  Lab group (Urban Transformation and Global Change Laboratory) has contributed to addressing this important topic, harnessing Big Data and computational analysis to try and generate actionable insights aimed at improving  the integration of Syrian refugees in Istanbul, Turkey, through the lens of  behavioral segregation.  

Worldwide, there are approximately 4.5 million refugees from the ongoing  Syrian Civil War. 3 million of those live in Turkey. With the future of their home country uncertain, the question of helping to smoothly integrate these people into  Turkish society in the long term has become more and more crucial. With this  situation in mind, Turkey’s largest mobile phone operator, Türk Telekom (TT), partnered with Istanbul’s Boğaziçi University to host the 2018 Data 4 Refugees (D4R) Challenge. The work by CoSIN3 and the TURBA Lab, “Measuring and  Mitigating Behavioural Segregation Using Call Data Records,” was a fruit of this  exciting call, and was selected as a challenge finalist.

The importance of behavioral segregation

Segregation has many dimensions. In social science, it has often been  studied considering physical separation: Do people from distinct groups live in  the same neighborhoods, or not? In what proportion? Of course, researchers have always known that segregation cannot be measured just by looking at where people live. If two groups are evenly mixed across a city or a territory, but people from the two groups never communicate with each other, we would still say that they are segregated.

Still, such measures of behavioral segregation have been less addressed in the academic literature, not because of a lack of theory, but a lack of data. While government censuses regularly provide statistics on the residential  patterns of various groups, data relating to social interactions has, up until  recently, been scarce. The fascinating dataset released by Türk Telekom for the D4R challenge is one example of how this situation is fast changing in the era of  Big Data.  

While media attention tends to center on refugee camps, 90% of Syrian  refugees in Turkey live in urban areas, together with the local population. Many live in Istanbul, Turkey’s largest city. For this reason, while the dataset was made up of calls and SMS from throughout 2017 across the entire country, the researchers chose to focus on Istanbul as a case study to develop their measure of behavioral segregation. 

Using call data records to measure Syrian refugee segregation

Whenever we make a call or send an SMS, beyond the conversation we are  having, we generate data, some of which are stored by mobile phone companies. These Call Data Records (CDRs) normally indicate the general location where the call was made, as well as the time, and identifier information for the caller and the receiver, along with various other data points. In just the past few years, the use of CDR data has led to important insights concerning human social networks, mobility patterns, and epidemic spreading. 

To understand the initial conditions in the study area, we compared the  calling patterns of local and refugee residents in each of Istanbul’s 39 districts. We assume that the place where a person lives has an impact on how they behave, building upon a literature on the social effects of spatial context. If refugees and locals living in the same district behave differently on average, we can infer that they are not socially integrated. The pattern of total calls made by each group (refugee and local), from one district to each of the other districts, was taken as  that group’s communication ‘fingerprint’ in that district. Comparing the local and refugee communication patterns district-by-district through statistical analysis, we found a high level of behavioural segregation throughout the city.

New possibilities of Big Data

These initial results provided us with a view of the current situation. In  order to explore how it might be improved, we proposed a simulated  reorganization of the distribution of the residential patterns of the refugee population in order to minimise our measure of behavioural segregation. Simply put, we relocated some selected refugees from one district to another, redistributing their patterns of communication in the process. We based our approach on the sociological principle of homophily — that similar people are  more likely to interact — and on the assumption that interaction improves integration.

The experiment was framed as a constrained optimisation problem, where an algorithm attempts to find the best solution (the lowest level of segregation) under certain conditions that allow us to maintain realism (e.g., don’t  have refugee population rise above 10%, the initial district-level maximum, in any district).  

The results of our experiments showed that behavioural segregation by our measure was significantly reduced as a result of the change in refugee  residential patterns. We compared the outcome of the experiment with available housing price data, and found that refugee housing costs would increase by an average of about 50 euros per household per month. These manageable figures open up the possibility of a rent subsidisation program to encourage refugees to move to more expensive districts where they might be more fully integrated,  according to this analysis.  

Of course, these sensitive social issues are influenced by many complex  factors. The intention of the work is not to propose a specific policy, but more to develop new tools for the toolbox of policy-makers and stakeholders, taking full advantage of the new and exciting possibilities of Big Data, which are only  beginning to be understood. We foresee many such studies in the future,  particularly ones that fuse good sociological theory with up-and-coming computational tools and data sources.


Article by Daniel Rhoads, doctoral candidate of the CoSIN3 (Complex Systems @ IN3) group at the UOC’s Internet Interdisciplinary Institute (IN3).


Rhoads, D., Serrano, I., Borge-Holthoefer, J. et al. Measuring and mitigating behavioural segregation using Call Detail RecordsEPJ Data Sci. 95 (2020).

(Visited 15 times, 1 visits today)
About the author