sampling error, in statistics, the difference between a true population parameter and an estimate of the parameter generated from a sample. Sampling error happens because samples contain only a fraction of values in a population and are thus not perfectly representative of the entire set. The magnitude of sampling error can be affected by numerous factors, including sample size and the population’s variability, and quantified by other measures, such as the margin of error. However, samplingerror does not encompass all error in an estimate and is distinct from non-sampling error, caused by methodological issues or other biases.
Although the true parameters—such as the mean or other descriptive measures—of a population are often unknown, these values can be inferred by using estimates derived from a sample of the population’s values. While steps can be taken to prevent unrepresentative samples, these estimates will ultimately be different from the parameter’s true value, as a sample will never contain the exact same values of the population. The difference between the true population parameter and the estimate generated from a sample is known as the sampling error. The fundamental difference between compositions of a sample and the overall population leads to the existence of sampling error in any value estimated using a sample. However, sampling error does not occur in cases where an entire population is included. For example, there is no sampling error in a census of the entire population, as any parameter calculated is the true population parameter.
The extent of sampling error in a given study is subject to a number of factors, including sample size. The size of a sample has an inverse relationship with sampling error. A larger sample results in a smaller sampling error because the sample contains a larger proportion of the population. As the sample size approaches the size of a population, the standard error approaches zero. In addition, the variability of the values in the population affects the size of the sampling error. The higher the variation in a population, the more likely it is for a sample to also have large variability and be unrepresentative of the overall population.
Furthermore, different sampling methods can have varied effects on the size of sampling error. Stratified sampling can reduce the impact of highly variable populations by dividing the set into homogenous classes and randomly sampling from these subgroups, leading to a reduced sampling error. By contrast, cluster sampling, in which the population is divided into conveniently occurring clusters that are used to draw the sample from, can lead to samples that do not evenly cover the population, resulting in a larger sampling error.
The magnitude of sampling error is commonly conveyed by standard error, a measure analogous to the standard deviation of all possible samples. Standard error provides a quantitative measure of how far an estimate is expected to differ from the true population parameter. To calculate an estimate of the standard error, a sample’s standard deviation is divided by the square root of the sample size. Like sampling error, standard error is affected by the size of the sample drawn and the variability of its values. Standard error is used to calculate the margin of error, another measure used to quantify sampling error. Margins of error provide a range of values denoting an interval that the true parameter could fall within. A margin of error does not provide a fixed value of the sampling error; instead, its magnitude depends on a predetermined confidence level indicating the likelihood that the given range captures the true population parameter.
While sampling error exists in any case where a sample is used to estimate a parameter, it does not encompass all error that can be present in the process. Non-sampling error refers to any amount of existing error unrelated to sampling. These sources of inaccuracy fall into one of two categories: systemic errors, which are effects that bias results in one direction, or variable errors, which are those that randomly distort results but typically balance out in the long run. Similar to its sampling counterpart, non-sampling error reduces the accuracy of an estimated parameter; however, these sources of error are not attributable to differences across samples and cannot be reduced by increasing the sample size.
Non-sampling error can result from a variety of causes. Coverage error—which includes omissions, duplications, or misclassifications of sample units—can lead to distorted results. Additionally, selection bias can add to non-sampling error by creating a nonrandom sample that is consequently unrepresentative of the population. Other sources of non-sampling error include methodological flaws, data-processing mistakes, and misinterpretations of results.