Exploring the feasibility of ex-post harmonisation of religiosity items from the European Social Survey and the European Values Study
Measurement Instruments for the Social Sciences volume 4, Article number: 12 (2022)
This paper examines the feasibility of ex-post harmonisation strategies using European Values Study (EVS) Wave 5 (2017–2020) and European Social Survey (ESS) Round 9 (2018–2019) data across 17 countries. The study shows an empirical assessment of the comparability of four items measuring religious behaviours (belonging to a religious denomination at present/in the past, religious services attendance, and praying), captured in both surveys. The novelty of this paper lies in the analytical comparison of religiosity indicators that are rarely assessed from a comparative perspective.
The harmonisation strategy was based upon several analytical techniques that seek to determine similarities and differences between the selected items in terms of (a) their validity, by examining their correlations with a set of sociodemographic and substantive correlates, (b) their distributions, supplemented by visual comparisons and relevant statistical tests, and (c) item non-substantive shares. The findings pointed to the most consistency among the partial correlations, where individual religiosity produced the most differences between the surveys. Distributions produced the most discrepancies that also corresponded to less similarity across variable categories as gauged by Duncan’s index. This paper is descriptive and exploratory in its aim. It can be taken as a jumping-off point for future research where the time series of these two surveys, and potentially others, can be examined across aggregate levels (e.g. birth cohorts, countries).
The ex-post harmonisation of survey data from different sources is a blossoming field of research and consists of using existing data to build an integrated dataset in which the reliability and validity of the outcome measurements is preserved (Wolf et al., 2016). The methodological harmonisation procedures aim at obtaining compatible measurements of surveys over time, across countries, projects, or a combination of those (Dubrow & Tomescu-Dubrow, 2016; Wolf et al., 2016). In the survey research literature, harmonisation can occur at two different stages: (1) during the design of the survey (ex-ante harmonisation); (2) after data collection (ex-post harmonisation) (Wolf et al., 2016).
Pooling data from different data sources enhances possibilities of empirical investigations (Wolf et al., 2016). Recent examples of large-scale survey data ex-post harmonisation projects include the Survey Data Recycling project-SDR (Tomescu-Dubrow & Slomczynski, 2016) and the Church Attendance and Religious Change Pooled European (CARPE) dataset (Biolcati et al., 2020). The main methodological challenges reported in the literature point to variables that have different measurements or scales (especially scales in attitudinal questions), differences in classification schemes and/or numeric codes assigned to each category, or different question wordings, even when the underlying concept is the same (Tomescu-Dubrow & Slomczynski, 2014, 2016). However, this conceptual overlap across surveys is often assumed yet rarely tested empirically.
The European Social Survey (ESS) and the European Values Study (EVS)
The European Social Survey (ESS) and the European Values Study (EVS) are large, cross-national social survey projects that collect data in most European countries. Despite many differences between the two, there are also important similarities. Combined, the two data sources offer a complementary and substantial perspective to analyse value change in Europe. The EVS and the ESS questionnaires cover a variety of domains relevant for social sciences with high methodological rigour. Although conceptual coverage is similar, the use of particular indicators secures the specificity and the identity of each survey.Footnote 1 The set of questions in both questionnaires allows monitoring change and continuity of values and attitudes in the domains of family, gender roles, work, religion, immigration, politics, and society. Both surveys were traditionally carried out as face-to-face interviews to individuals selected with probability-based sampling designs. Both surveys target all residents in private households in a country, regardless of nationality and language. Challenges including increased costs and deteriorating response rates led the EVS to experimentally implement mixed-mode innovations in the 5th wave in six countries (Luijkx et al., 2021). Since 1981, the EVS has been conducted every 9 years (1990, 1999, 2008, 2017) whereas the ESS is a biannual survey which, since 2002, has fielded 10 rounds of data collection using a core module that is repeated over time, new and repeated rotating modules, as well as a comprehensive set of sociodemographic variables.Footnote 2 Some differences regarding fieldwork duration and organisation as well as slightly distinct criteria to define the target population exist between the two surveys (ESS, 2018; EVS, 2020b). All in all, both data sources are widely used in the scientific community to inform policies, and as an important source of methodological innovation in comparative survey research.
The current study
In the framework of the ESS-SUSTAIN-2Footnote 3 project, the EVS and the ESS investigated different scenarios for a potential collaboration in data collection, one of the potential scenarios being a joint data collection and, consequently, the need to create an integrated dataset including data from both surveys. Since the original questionnaires were not designed to be comparable, this paper will attempt to assess the comparability of selected items and determine whether ex-post harmonisation is feasible. In this context, devising harmonisation strategies to pool the two data sources without compromising the long-standing time series is a key element.
This study presents the first stages of evaluation of such a harmonisation process, in which data collected by the EVS (Wave 5) and the ESS (Round 9)Footnote 4 across 17 countries were used to examine the comparability and conceptual overlap of selected items. This paper presents the results of the methodological procedure implemented to (a) empirically investigate the conceptual overlap between matched items of ESS and EVS questionnaires in terms of validity of measurement, and (b) compare the distributions to understand differences in the actual measurement.
To illustrate the proposed methodological operationalisation, we focus on four item pairs tapping on religiosity, including (1) belonging to a religious denomination, (2) ever belonged to a religious denomination, (3) attendance at religious services, and (4) praying frequency. The criteria used to select items to be compared were based on similarity in question wording and attributes, interviewer role, answer categories, and showcards layout.Footnote 5 Moreover, the conceptual construct represented by these items covering distinct dimensions of religiosity informs the ongoing debate on the topic (Biolcati et al., 2020; Voas, 2009).
Previous experiences such as the SDR and the CARPE projects have shown that, although strategies to accomplish ex-post harmonisation differ widely, a common preoccupation lies in handling the methodological variation to guarantee high quality target variables (Biolcati et al., 2020; Tomescu-Dubrow & Slomczynski, 2016). The SDR project brings together information on equivalent measurement of political behaviour, social attitudes, and demographics from 22 international surveys worldwide, including 89 waves over 50 years (Tomescu-Dubrow & Slomczynski, 2014, 2016). The CARPE aggregated dataset includes a harmonised, country-level aggregate measure of church attendance across all available rounds of 5 data sources (ESS, EVS, Eurobarometer, International Social Survey Programme-ISSP, and World Values Survey-WVS) spanning over 45 European countries (Biolcati et al., 2020). Compared to these projects, our contribution consists of a shorter time span and a smaller number of surveys/waves including the latest available round of ESS and EVS data. However, compared to CARPE, we expand the comparison to multiple dimensions of religiosity, and focus on pooled data at the individual level. Our study also complements SDR by assessing the conceptual overlap of items prior to carrying out the ex-post harmonisation procedures. The relevance of the proposed empirical comparison is not only to shed light for future data harmonisation and cooperation between the two surveys but also (a) to provide information for researchers to understand whether survey data from different sources measure the same concepts and can confidently be pooled for analytical purposes and (b) for survey methodologists, to get insights into how differences in wording and methodology affect measurement and comparability.
In the following sections, we describe the criteria used to select the items and countries to be compared and the analytical plan. This effort purports to assess the feasibility of ex-post harmonisation and identify potential issues while exploring solutions during the process to ultimately offer insights for researchers interested in data harmonisation and its applicability.
Data and measures
In an initial phase of the project, members of both teams compared the items in English of each questionnaire and assessed their similarities based on 17 attributes clustered in four domains as shown in Table 2 in Appendix 1. These criteria were adapted from the Survey Quality Predictor (SQP), an online tool designed for predicting survey quality by using the information on the questions’ features (Saris & Gallhofer, 2014). The more shared attributes, the more compatible the pairs of items were considered. As a result, current and past belonging items achieved a consistency of 100% (indicating that all 17 attributes coincided in the EVS and ESS), followed by attendance at religious services (94%) and praying frequency (88%). Despite very similar question wording and response scales, religious services attendance differed on the label qualifiers (Table 4 in Appendix 1). The question wording for frequency of praying was very similar in both questionnaires and both response scales coincided, with the exception of one category (#5 which in ESS read as “Only on special holy days”, whereas in EVS “Several times a year” (see Table 5 in Appendix 1)). An overview of the source variables wording and measurement scales as well as the target items is provided in Tables 3, 4 and 5 in Appendix 1.
Regarding the correlates used to the compare the validity of the measurements, their selection was based on theoretical grounds suggesting that aspects of religiosity are different depending on sociodemographic characteristics such as age, sex, educational level, and income (Lemos et al., 2019; Schwadel, 2011, 2015). Notwithstanding variation in religiosity patterns across Europe, extant research has often pointed out some common trends. For instance, older people are more likely to attend church and to be more religious than younger individuals (Halman & Draulans, 2006; Vezzoni & Biolcati-Rinaldi, 2015; Voas, 2009). Substantial sex differences in religious practices and beliefs have also been documented in previous studies. Women tend to be more religious and to participate more in religious activities when compared to men due to cultural and socialisation mechanisms (Halman & Draulans, 2006). Higher education is likely to be negatively associated with church attendance and religious affiliation (Ruiter & van Tubergen, 2009). In addition, people who are not in paid employment are expected to be more religious than employed individuals, since those who are not working have more time available and fewer sources of identity and inclusion (Halman & Draulans, 2006). In turn, religious identity, the indicators of religious practice and religious belonging all correspond to empirically observable behaviours underlying the multidimensional concept of religiosity and therefore are frequently found to be strongly interconnected (Molteni & Biolcati, 2018).
The distributions of each correlate, disaggregated by survey and country, are displayed in Table 6 in Appendix 1. In order to achieve meaningful comparisons between surveys, the correlates were harmonised to maximise their similarities in terms of measurement and conceptual substantivity as shown in Table 7 in Appendix 1. As for the employment status variable, a target variable was generated which discerns between paid work (full time, part-time employees, and self-employed) and not in paid work (all the other categories). With regard to education, the source variables were recoded into a new indicator clustering the respondents based on their achieved level of education. The larger difference pertains to individual religiosity where the source variable for the EVS (3-point, nominal scale), with three categories, was reduced to two categories distinguishing those who identify themselves as religious from those who do not (not religious and atheist), to better mirror the ESS question tapping on the degree of religiosity (see Table 7 in Appendix 1).
Prior to the analysis, a joint dataset was created by combining a selection of items from the EVS 2017 Integrated dataset (2020b) and the ESS Integrated file Round 9 dataset (2018). A few steps were undertaken, as shown in Table 8 in Appendix 1, to increase the comparability of the data.
To examine the comparability of the items between the surveys, several analytical strategies were carried out. First, we examined the validity of the items by means of Spearman’s rank-order correlations, accompanied by 95% confidence intervals (CIs) obtained by bootstrapping (1000 repetitions), with a set of sociodemographic characteristics: age, sex, education, paid work, and individual religiosity. Spearman’s correlations were chosen given the ordinal nature of most of the variables. Partial correlations were used instead of regular bivariate associations to control for the impact of the other correlates. Consistency in direction and magnitude of the coefficients across the two surveys along with overlapping CIs will be considered indication of similar validity between the two surveys and therefore comparable in terms of conceptual overlap.
Second, to detect substantive differences in the measurements, the distributions of the items were compared visually (histograms) and using relevant statistical tests depending on the nature—ordinal or binary—of the variable (Mann-Whitney U test and chi-square tests). The Duncan Dissimilarity Index was used to quantify the size of the differences in the measurement, as this measure can be interpreted as the proportion of observations that would need to change categories of a certain item in one survey to have the same distribution as in the other survey. In line with previous research (Biemer et al., 2018), the threshold in this paper to consider measurements similar is at least .10.
In addition to the comparison of the distributions of substantive values non-substantive responses were compared for each pair of items, assessing the extent to which the proportion of non-substantive values is comparable between the two surveys for each of the countries (for this, z-tests were used). The amount of non-substantive answers (e.g. refusals, do not know responses—see Table 8 in Appendix 1 for the full list) has been previously used as an indicator of data quality (Aizpurua et al., 2018; Cernat & Revilla, 2020). In the other univariate and bivariate analyses, these non-substantive values were handled by using a listwise deletion approach which led to drop between 2.5% and 4% of the observations from the analyses, depending on the target variable and country.
All the analyses were conducted using weighted data.Footnote 6 Because of the differences in available weights for the two surveys, weighted analysis will be based on the best available weightsFootnote 7 for within-country analysis in each case.
Along with the variability stemming from the EVS and the ESS methodology, the country-specific operations introduce additional variation around the sampling frames, sample sizes, and response rates. Therefore, analyses were performed separately for each country. To aid the interpretation, countries were ranked based on their similarity, looking at the following cumulative set of criteria: (1) fieldwork conducted less than 1 year apart, (2) analogous sampling strategy,Footnote 8 (3) similar sample sizes, and (4) similar response rates. Only three countries satisfied all four criteria (Germany, Norway, and Slovenia). Once we loosened the resemblance of the response rates, three additional countries were added (Estonia, Poland, and Sweden). Similarly, when the sample size affinity criterion was set aside, Finland, the Netherlands, and Switzerland were included. Finally, once the requirement of similar sampling design was removed, eight more countries were incorporated (Austria, Denmark, France, Great Britain, Hungary, Italy, Montenegro, and Serbia). Consistency in the outputs (Spearman’s correlations, distributions, Duncan’s test, and non-substantive responses) is expected to be stronger in countries with more similar methodologies. In countries with large differences in the ways data are collected (e.g. different sampling strategy), differences in methods may interact with differences in measurement, magnifying—or perhaps concealing—discrepancies.
Currently belonging to a religious denomination
Spearman’s correlations (values and 95% CIs) for currently belonging to a religious denomination (binary variable) with other correlates are displayed in Fig. 1. Overall, the coefficients are consistent in size and direction across the surveys. Age, education, sex, and employment status resulted in negligible and weak associations with belonging to a religious denomination. When looking at the CIs, the size and direction of the correlation with age in Germany and Austria is significantly different between the two surveys, with an indication of lower validity in the EVS due to a negative correlation which goes against the expectation (Halman & Draulans, 2006). Differences also occur for education in Finland and employment status in Austria, but in this case only the EVS displayed correlations in the expected direction (Ruiter & van Tubergen, 2009). As for individual religiosity, the correlations are for the most part positive as expected, and sizeable, however with varying strengths across countries and surveys. When looking at the CIs, most differences across the surveys were yielded for eight countries (Norway, Germany, Sweden, Poland, Switzerland, Italy, Great Britain, and Denmark), including countries with similar methodologies such as similar sample frame, sample sizes, and response rates.
The distributions of the EVS and ESS items measuring current belonging to a religious denomination are displayed in Figure 5 in Appendix 2. Overall, the EVS presented higher shares of affiliated people compared to the ESS. In both surveys, Poland registered the greatest percentage of adherents, whereas the country with the lowest percentage was Estonia. To test whether differences across the groups were significant, chi-square tests were computed. The differences in percentages of people adhering to a denomination were statistically significant in 15 of the 17 countries. Looking at the Duncan’s test, indicating the proportion of cases that would need to change categories to have the same distributions, we found that in most countries (9) said proportion was lower than .10, if not .05, indicating a small difference across surveys. Larger proportions, however, were found in Finland (D = .26), Sweden (D = .22), Denmark (D = .22), and Serbia (D = .21). As for the proportion of non-substantive responses, which can be found in Table 9 in Appendix 2, it was generally low in both surveys (< 3%). EVS—Italy exhibited the highest share (2.6%) against Poland in the ESS (2.0%). These differences were significant at the .05 level only for Estonia, Italy, Norway, and Poland.
Ever belonged to a religious denomination
Turning to past religious belonging (binary variable) in Fig. 2, correlations were estimated only for the sociodemographic characteristics, and this question was only administered to those who indicated not belonging to a religious denomination at present, making the correlation with current individual religiosity meaningless. Starting with age, the differences between the Spearman’s correlations across the surveys are minimal, exception made for three countries: Germany yielded non overlapping CIs and with the EVS relationship being stronger compared to the ESS, whereas Denmark and Montenegro showed non-overlapping CIs and reversed coefficients directionality, with positive associations in the EVS. Moving to education, sex, and employment status, the coefficients yielded overall negligible and small associations. The coefficients for sex in Poland and employment status in Sweden differ greatly as indicated by the non-overlapping CIs and opposite directions, with the EVS associations being positive. When compared to current belonging to a denomination, it is worth noting that the CIs of the correlations for past religious belonging are wider due to smaller sample sizes.
As for the distributions (Figure 6 in Appendix 2), countries in the EVS, by and large, displayed greater percentages of respondents responding positively when compared to the ESS, which is consistent with current religious denomination. Within the EVS, Finland presented the highest share of people that claimed to have belonged to a denomination, whilst within the ESS Austria was the country with the greatest percentage. Conversely, Serbia in the EVS and Hungary in the ESS had the lowest percentage of past affiliations that were around 5% for both countries. The chi-square tests showed that the surveys were significantly different, at .05 level, across all the countries, except for Estonia. When looking at the similarity between the surveys with the Duncan’s index, less than 10% of the respondents in 5 countries would have to change response categories to achieve equal distributions in the other survey. The rest of the countries yielded Duncan’s indices greater than .10, with Finland’s being the highest (D = .62) and Montenegro (D = .12) the lowest, indicating substantial differences across surveys. Regarding the non-substantive responses for having ever belonged to a religious denomination, shares were very low and differences mostly not significant (see Table 9 in Appendix 2), exceptions made for Poland, the Netherlands, Hungary, and Montenegro.
Attendance at religious services
When looking at the correlation and associated 95% CIs for services attendance (ordinal variable, six categories) in Fig. 3, we see that the surveys did not deviate from each other when looking at age, education, and sex, showing negligible and small associations, which lends support to low yet comparable validity across the surveys. Moving to employment status, the coefficients for Hungary displayed opposite direction with non-overlapping CIs, with the ESS association crossing 0 and going against expectations (Halman & Draulans, 2006). When looking at individual religiosity, positive medium to strong correlations were registered across the groups indicating high validity, yet with differences across surveys with the ESS displaying significantly larger coefficients than the EVS in seven countries (Norway, Sweden, Poland, Finland, Italy, Great Britain, and Denmark).
The histogram in Figure 7 in Appendix 2 displays how, in both surveys, Poland produced the highest scores, while the lowest corresponded to Estonia in the EVS and the Netherlands in the ESS. In most of the countries the scores tended to pile up on the left side meaning that people tend to attend religious services from mostly never to only on holy days. The only exception to this trend is Poland that had skewed scores on the right-side indicating respondents that attend church once a week on average. The distributions were compared by means of a Mann-Whitney U test. When looking at Figure 7 in Appendix 2, distributions appear to be statistically different (z score significant at the .05 level) in 9 countries out of 17. In the countries with more comparable methodologies, distributions appear similar. The outcomes from the Duncan’s dissimilarity tests provided similarity across surveys in most of the countries (12), indicating that no more than 10% of the respondents would have to change response categories in the other survey. However, slightly larger coefficients were yielded for the other countries with Estonia (D = .15) being the highest and France (D = .11) the lowest. Regarding non-substantive answers, both the EVS and the ESS exhibited low percentage shares overall, with the exception of EVS-Serbia and ESS-Poland that had shares above 2%. The differences in non-substantive response across surveys were corroborated by statistical evidence only for Poland, Serbia, and Hungary as shown in Table 9 in Appendix 2.
The results of the partial correlations for frequency of praying (ordinal variable, seven categories), disaggregated by survey and country, are displayed in Fig. 4 where all the correlations—apart from individual religiosity—showed negligible and weak strength indicating low validity overall. The findings showed most consistency for correlations with education, sex, and employment status as indicated by coefficients’ similar magnitude and direction along with overlapping CIs. The correlation values and 95% CIs with age yielded differences in Poland, with the EVS displaying stronger associations than the ESS, whereas in Sweden we see opposite direction and non-overlapping CIs, with the EVS showing positive associations. The associations with individual religiosity were once again stronger, indicating higher validity than with the sociodemographic variables, yet displayed differences in size between the surveys in Slovenia, Poland, Switzerland, Italy, and Montenegro, with the ESS displaying larger coefficients than the EVS.
The distributions by survey and country in Figure 8 in Appendix 2 showed a tendency to select extreme answer categories on both sides, meaning that most respondents reported praying either never or every day.
Along with the visual inspection, a Mann-Whitney U test was carried out to determine whether there were differences in the distributions of praying frequency. The results showed that the surveys were significantly different (z score significant at .05 level) in 11 countries. In terms of Duncan’s Dissimilarity Index, 9 countries indicated high similarity between surveys with proportions lower than .10. The other countries yielded slightly larger indices as was the case for Norway (D = .11) and moderately larger such as Montenegro (D = .27). Overall, the non-substantive answers proportions ranged from very low (< 1%) to 3% for EVS in Montenegro and slightly above 8% for Poland in the ESS (Table 9 in Appendix 2). However, the differences between surveys were supported by statistical evidence only for Austria, Estonia, Hungary, Poland, and Montenegro.
Summary of results
Our findings point to several conclusions (see Table 1). Spearman’s partial correlations with sociodemographic characteristics showed low validity but also, mostly, consistency between the surveys, especially for past religious belonging, followed by praying frequency, religious services attendance, and lastly, current religious belonging. Validity was higher when looking at the correlations with individual religiosity (substantive item), however that is also where most inconsistencies between surveys emerged. This might be due to discrepancies in the wording and scales of the source variables, with the EVS being a 3-category item dichotomised for the purposes of the analyses, whereas the ESS variable was measured on an 11-point scale. The differences yielded by the correlations with religiosity might, therefore, have been affected by the large differences in the scales of the source variables.
The distributions of the items appeared different between surveys, although when looking at the proportion of ‘misplaced’ cases, these were not overwhelmingly high. Indeed, high similarity across answer categories, as indicated by Duncan’s index, were obtained mainly for, in order, religious services attendance, current belonging, and praying. Non-substantive answers were overall low (as expected in interviewer-administered surveys), and differences were statistically significant only in a few countries. The outcomes for each analytical step for every variable were mixed, as was the case for past religious belonging that presented most inconsistencies for the distributions of the percentages of people having ever belonged to a religious denomination and less similarity between categories in terms of Duncan’s coefficients, whereas the partial correlations with the sociodemographic variables and the amount of non-substantive responses yielded the least inconsistencies.
Country patterns are mixed too. Countries like Austria, Denmark, France, Great Britain, Hungary, Italy, Serbia, and Montenegro, with more differences in the methodology displayed differences across surveys in at least one of the consistency checks. Countries with minimal methodological differences such as Slovenia, Germany, and Norway also presented a few inconsistencies (see Table 1) but to a lesser extent. Specifically, Germany, Slovenia, and Norway were the most similar in sampling frames, sample sizes, and response rates. They yielded least inconsistencies across the four selected items in the proportion of non-substantive responses, followed by the partial correlations, Duncan’s dissimilarity test, and finally differences in distributions (assessed by chi-square and Mann-Whitney tests at .05 level). The countries similar only in sampling frames and sample sizes—Estonia, Poland, and Sweden -showed least differences in the partial correlations, then equally inconsistent findings across the Duncan’s dissimilarity test and the proportion of non-substantive responses, with the worst performance in the distribution tests. Finland, Netherlands, and Switzerland were only similar in their sampling frames; they did not display any differences in the proportion of non-substantive response, whereas, starting from the least differences, they presented inconsistencies in the partial correlations, the Duncan’s dissimilarity test, and then the distribution tests. Finally, the countries that had the most variation across all the methodological benchmarks (Austria, Denmark, France, Great Britain, Hungary, Italy, Montenegro, Serbia), showed least inconsistencies in the partial correlations, followed by the differences in the non-substantive responses, the Duncan’s Dissimilarity test, and the distribution tests. In particular, distributions were more comparable across surveys in those countries than in countries with larger methodological discrepancies.
This paper examines similarities and differences between EVS and ESS items measuring the same concepts around religiosity. For this purpose, we selected four pairs of items that showed similarities in their attributes and potentially high conceptual overlap. Data from 17 European countries were included based upon several criteria: (1) data collection within 1 year, (2) similar sampling frames, (3) similar sample sizes, and (4) similar response rates. The countries were ranked starting from those that showed minimal variation to those that have cumulatively more differences around these benchmarks. Discrepancies in the measurement of religiosity items between the surveys were assessed on the basis of validity, frequency distributions and item non-response.
This study points to three main findings. First, the inconsistencies around the frequency distributions posit challenges for researchers interested in pooling data to provide estimates at the population level. This is in contrast with the results of Biolcati et al. (2020), who found evidence of consistency across survey programmes in building a harmonised measurement of church attendance. It is possible, however, that the harmonisation procedure implemented in the CARPE project, in which the ordinal variables were transformed into numeric probabilities of attending religious services at least weekly, successfully minimised measurement differences that we detected by looking at the full distributions.
Second, this paper showed overall comparable validity of the measurements across surveys, indicating a high degree of conceptual equivalence of the selected items. This finding suggests that researchers interested in correlational studies involving indicators of religiosity might confidently use a combination of the two data sources.
Third, discrepancies between surveys were larger in countries with larger differences in methodologies, as expected. While our study fails to isolate the contribution of each methodological discrepancy to the observed outcomes, it provides an indication that measurements are similar across surveys, net of methodological differences in data collection. To further explore these aspects and identify the sources of variation, large-scale experimental designs might prove fruitful.
Despite its contributions to the literature, this research presents a number of shortcomings. First, the analyses concerned a selection of variables with similar properties, limiting the possibilities to generalise to less homogenous variables and constructs. Second, the validity assessment was carried out with a limited number of correlates, and the relationships between our variables of interest and the correlates were generally low. For future research avenues, it would be helpful to expand the scope to additional variables, and to model the relationships in ways that better fit the data (e.g. by using interactions). Last, the analyses were carried out only on a single wave/round, not making full use of the longitudinal nature of the surveys. In addition to this, multiple comparisons were conducted, increasing the probability of identifying false positives (differences between the two surveys that are due to chance). We adopted this conservative approach, while also relying on effect sizes and ranges in our interpretation but recognise that differences might have been amplified by not correcting for multiple testing.
Our outcomes lead to two main future implications for researchers interested in pooling the EVS and the ESS data. On the one hand, the inconsistencies around the frequency distributions posit challenges for researchers interested in estimates at the population level. On the other hand, this paper showed a high degree of conceptual equivalence of the selected items that provides ground for comparable results in terms of association among variables.
In general, the results presented in this paper are promising and show the potential to merge EVS and ESS data and their time series. The differences between different countries within the ESS and the EVS surveys are often bigger than differences within the same countries between the ESS and the EVS surveys. This paper is a contribution to the assessment of differences between countries in longitudinal surveys like the ESS and the EVS and yield the conclusion that we are on the right path. Further standardising and harmonising questions between international comparative surveys will increase future data quality.
Availability of data and materials
The datasets analysed during the current paper are available in (1) the European Social Survey repository available at https://doi.org/10.21338/NSD-ESS9-2018 and (2) the European Values Study repository available at https://doi.org/10.4232/1.13560. https://osf.io/r3swn/?view_only=cfd7a6496aae4925a26567fbcf8c3783.
With the difference that ESS targets the resident population from 15 years old, EVS from 18 years old.
Computer-assisted personal interviewing (CAPI) is used for ESS Round 9 and self-administered mode was allowed during the fieldwork of the latest ESS Round 10 due to challenges presented by the COVID-19 pandemic, with some countries collecting data using face-to-face interviews, and others self-administered modes (web and mail).
The ESS-SUSTAIN-2–European Social Survey Sustainability–is a €4.9m joint project (grant agreement number 676166) coordinated by the European Social Survey together with other 16 partner countries/research institutes across Europe. It aims at improving research methodology and analytical power of collected data, in a more cost-effective manner, and to achieve long-term sustainability. One focus point of SUSTAIN-2 is the exploration of bridging the survey infrastructures of the EVS and the ESS. Questionnaire and item comparison undertaken and reported in this paper will possibly allow EVS items to be collected using the ESS infrastructure. For more information see https://www.europeansocialsurvey.org/about/funding.html.
ESS Round 9 was preferred over ESS Round 10 due to the period of data collection being closer to that of EVS Wave 5.
This preliminary analytical step yielded a list of 24 substantive items with a potential conceptual overlap across the two surveys.
Except for the Mann-Whitney U test, which did not support the use of weighted data.
By sampling strategy we refer to using a similar sample frame so that the same register of individuals across the countries could be used for comparison. This way differences attributed to coverage errors (Ortmanns, 2020) and design effects (design weights are only available for five EVS countries) are minimised.
Aizpurua, E., Heiden, E. O., Park, K. H., Wittrock, J., & Losch, M. E. (2018). Investigating respondent multitasking and distraction using self-reports and interviewers’ observations in a dual-frame telephone survey. Survey Methods: Insights from the Field. https://doi.org/10.13094/SMIF-2018-00006.
Biemer, P. P., Murphy, J., Zimmer, S., Berry, C., Deng, G., & Lewis, K. (2018). Using bonus monetary incentives to encourage web response in mixed-mode household surveys. Journal of Survey Statistics and Methodology, 6(2), 240–261. https://doi.org/10.1093/jssam/smx015.
Biolcati, F., Molteni, F., Quandt, M., & Vezzoni, C. (2020). Church attendance and religious change pooled European dataset (CARPE): A survey harmonization project for the comparative analysis of long-term trends in individual religiosity. Quality & Quantity, 54, 1–25. https://doi.org/10.1007/s11135-020-01048-9.
Cernat, A., & Revilla, M. (2020). Moving from face-to-face to a web panel: Impacts on measurement quality. Journal of Survey Statistics and Methodology, 9(4), 745–763. https://doi.org/10.1093/jssam/smaa007.
Dubrow, J. K., & Tomescu-Dubrow, I. (2016). The rise of cross-national survey data harmonization in the social sciences: Emergence of an interdisciplinary methodological field. Quality & Quantity, 50(4), 1449–1467. https://doi.org/10.1007/s11135-015-0215-z.
ESS Round 9: European Social Survey Round 9 Data (2018). Data file edition 3.1. NSD - Norwegian Centre for Research Data, Norway – Data Archive and distributor of ESS data for ESS ERIC. https://doi.org/10.21338/NSD-ESS9-2018.
European Values Study (EVS) (2020a). European Values Study (EVS) 2017: Weighting data (2020/15; GESIS papers). https://doi.org/10.21241/ssoar.70113.
European Values Study (EVS) (2020b). European values study 2017: Integrated dataset (EVS 2017). GESIS Data Archive, Cologne. ZA7500 data file version 4.0.0. https://doi.org/10.4232/1.13560.
Halman, L., & Draulans, V. (2006). How secular is Europe? The British Journal of Sociology, 57(2), 263–288. https://doi.org/10.1111/j.1468-4446.2006.00109.x.
Kaminska, O. (2020). Guide to using weights and sample design indicators with ESS data contents. https://www.europeansocialsurvey.org/docs/methodology/ESS_weighting_data_1_1.pdf
Lemos, C. M., Gore, R. J., Puga-Gonzalez, I., & Shults, F. L. (2019). Dimensionality and factorial invariance of religiosity among Christians and the religiously unaffiliated: A cross-cultural analysis based on the International Social Survey Programme. PLoS One, 14(5), e0216352. https://doi.org/10.1371/journal.pone.0216352.
Luijkx, R., Jónsdóttir, G. A., Gummer, T., Ernst Stähli, M., Frederiksen, M., Ketola, K., … Wolf, C. (2021). The European values study 2017: On the way to the future using mixed-modes. European Sociological Review, 37(2), 330–346. https://doi.org/10.1093/esr/jcaa049.
Molteni, F., & Biolcati, F. (2018). Shifts in religiosity across cohorts in Europe: A multilevel and multidimensional analysis based on the European Values Study. Social Compass, 65(3), 413–432. https://doi.org/10.1177/0037768618772969.
Ortmanns, V. (2020). Explaining inconsistencies in the education distributions of ten cross-national surveys-the role of methodological survey characteristics. Journal of Official Statistics, 36(2), 379–409. https://doi.org/10.2478/jos-2020-0020.
Ruiter, S., & van Tubergen, F. (2009). Religious attendance in cross-national perspective: A multilevel analysis of 60 countries. American Journal of Sociology, 115(3), 863–895. https://doi.org/10.1086/603536.
Saris, W. E., & Gallhofer, I. N. (2014). Design, evaluation, and analysis of questionnaires for survey research, (2nd ed., ). Wiley.
Schwadel, P. (2011). Age, period, and cohort effects on religious activities and beliefs. Social Science Research, 40(1), 181–192. https://doi.org/10.1016/j.ssresearch.2010.09.006.
Schwadel, P. (2015). Explaining cross-national variation in the effect of higher education on religiosity. Journal for the Scientific Study of Religion, 54(2), 402–418. https://doi.org/10.1111/jssr.12187.
Tomescu-Dubrow, I., & Slomczynski, K.M. (2014). Democratic values and protest behavior: Data harmonization, measurement comparability, and multi-level modeling in cross-national perspective. Ask: Research and Methods, 23(1), 103–114. https://kb.osu.edu/bitstream/handle/1811/69606/1/ASK_2014_103_114.pdf
Tomescu-Dubrow, I., & Slomczynski, K. M. (2016). Harmonization of cross-national survey projects on political behavior: Developing the analytic framework of survey data recycling. International Journal of Sociology, 46(1), 58–72. https://doi.org/10.1080/00207659.2016.1130424.
Vezzoni, C., & Biolcati-Rinaldi, F. (2015). Church attendance and religious change in Italy, 1968–2010: A multilevel analysis of pooled datasets. Journal for the Scientific Study of Religion, 54(1), 100–118. https://doi.org/10.1111/jssr.12173.
Voas, D. (2009). The rise and fall of fuzzy fidelity in Europe. European Sociological Review, 25(2), 155–168. https://doi.org/10.1093/esr/jcn044.
Wolf, C., Schneider, S. L., Behr, D., & Joye, D. (2016). Harmonizing survey questions between cultures and over time. In C. Wolf, D. Joye, T. Smith, & Y. Fu (Eds.), The SAGE handbook of survey methodology, (pp. 502–524). SAGE Publications Ltd. https://dx.doi.org/10.4135/9781473957893.n33.
We would like to thank for their valuable feedback the members of the EVS Standing Group, the ESS Methods and Scientific Advisory Board, Ranjit K. Singh (GESIS), as well as the anonymous reviewers.
ESS-SUSTAIN-II project (grant number 871063).
Ethics approval and consent to participate
Data management and General Data Protection Regulation approved (TSB_RP182).
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Comparison of EVS Wave 5 and ESS Round 10 items for belonging to a religious denomination.
Comparison of EVS Wave 5 and ESS Round 10 items for having belonged to a religious denomination.
Comparison of EVS Wave 5 and ESS Round 10 items for attendance at religious services.
Comparison of EVS Wave 5 and ESS Round 10 items for praying frequency.
List of countries
Germany (DE), Slovenia (SI), Norway (NO), Estonia (EE), Sweden (SE), Poland (PL), Netherlands (NL), Switzerland (CH), Finland (FI), Great Britain (GB), Serbia (RS), Italy (IT), Hungary (HU), France (FR), Austria (AT), Denmark (DK), Montenegro (ME).
About this article
Cite this article
Aizpurua, E., Fitzgerald, R., de Barros, J.F. et al. Exploring the feasibility of ex-post harmonisation of religiosity items from the European Social Survey and the European Values Study. Meas Instrum Soc Sci 4, 12 (2022). https://doi.org/10.1186/s42409-022-00038-x