ecological fallacy, in epidemiology, failure in reasoning that arises when an inference is made about an individual based on aggregate data for a group. In ecological studies (observational studies of relationships between risk-modifying factors and health or other outcomes in populations), the aggregation of data results in the loss or concealment of certain details of information. Statistically, a correlation tends to be larger when an association is assessed at the group level than when it is assessed at the individual level. Nonetheless, details about individuals may be missed in aggregate data sets. There are a variety of examples of ecological fallacy; three are described in this article.
In the first example, researchers want to study relationships between nativity (represented by the percentage of the population who are foreign-born) and literacy (represented by the percentage of the population who are literate), with calculations based on populations in various U.S. states. In such an investigation, correlations could be rendered meaningless if foreign-born individuals tend to live in states where the native-born are more literate.
In another example, in a study designed to examine relationships between diet, lifestyle, heart disease, and stroke, researchers found that the mean entry-level blood pressures and stroke mortality rates were inversely correlated for certain cohorts (study groups) of men aged 45 to 59 with 25-year follow-up. The finding was contrary to expectations. Subsequent analyses carried out at the individual level showed that the association between blood pressure and stroke mortality was strongly positive in most of the study groups. The explanation of this paradox is that within each cohort, individuals who had experienced a stroke and who had died from a stroke tended to have high blood pressure. However, when the individual values in each cohort were averaged and used to calculate the correlation, cohorts with higher average blood pressures may have turned out to have smaller mortality rates simply because of the heterogeneity of correlations among the cohorts.
In a third example, researchers found that death rates from breast cancer were significantly increased in countries where fat consumption was high when compared with countries where fat consumption was low. This is an association for aggregate data in which the unit of observation is country. Thus, in countries with more fat in the diet and higher rates of breast cancer, women who eat fatty foods are not necessarily more likely to get breast cancer. One cannot be certain that the breast cancer cases had high fat intakes.
In order to determine whether ecological hypotheses generated by group-level analyses are true for individuals, individual-level data must be collected. For causal inference, individual data are required to account for population heterogeneity and confounding bias.