2 November 2023 12:00 am Views - 222
It is generally not feasible to speak to every adult citizen to learn about important characteristics of the population, such as the unemployment rate or sentiments on the government
Both the margin of error and the confidence level are ways of specifying how close the survey results are likely to be to the actual results of the entire population
In Sri Lanka, it is common for the validity of sample surveys to be questioned, on the basis that the samples are “small” and cannot reasonably reflect the characteristics of the entire national population.
Examples of skeptical perceptions abound. A prominent Sinhala media personality on a popular TV channel questioned the validity of the findings of the Centre for Policy Alternatives’ (CPA) Economic Reform Index, stating “The sample size for this survey is 1,000…… it is wrong if the mindset of 22 million people is gauged by using 1,000 people”.
Similarly, when Verité Research shares the results of its quarterly Mood of the Nation poll, some question how a sample size of approximately 1,000 people can truly reflect the ‘mood of the nation’.
Sri Lanka has an adult population of approximately 14 million. It is generally not feasible to speak to every adult citizen to learn about important characteristics of the population, such as the unemployment rate or sentiments on the government. A census of the population is carried out not more than once in 10 years due to the vast amount of time and human resources required.
This makes sample surveys a useful tool. The statistical science of a sample survey is that it is possible to have a reasonable estimate of the characteristics of an entire population, by looking at the characteristics of a much smaller, randomly selected, sample of the population.
But how large should a randomised sample survey be, to have a reasonable estimate of the characteristics of the entire population? That is a question best answered by statistical mathematics, not subjective perception. This FactCheck.lk Explainer sets out the objectively defined statistical approach to provide a scientific answer to the question.
How do statisticians define a reasonable estimation from a sample survey?
Statisticians use two criteria to mathematically determine the level of reasonableness of an estimation derived from a sample survey: (1) margin of error and (2) confidence level.
These two criteria depend on the sample size and how the sample is chosen (sample selection). Assessing whether these criteria are met allows for an objective evaluation of the survey’s reliability.
(1) The margin of error
This is a measurement of how much the average answer to a particular question in the survey sample may differ from the average answer of the entire population. The smaller the margin of error, the closer the sample result is to that of the entire population.
For example, if a survey shows 50% of the sample supports a certain political candidate, with a margin of error of (+/-) 5 percentage points. This means the actual percentage of people in the population who support that candidate is expected to be anywhere between 45% and 55%. If the margin of error was only (+/-) 3 percentage points, then support for the candidate in the population is expected to be narrower, between 47% and 53%.
(2) The confidence level
This is a measurement of the confidence we can have that the results of asking the entire population would be within the “results range” we get from the population that is sampled.
The “results range” is the range from the sample average minus the margin of error, to the average plus the margin of error. It is the probability that the selected sample will give a result within the margin of error, that is the same result as one would get from the entire population.
For example, in the above survey with a margin of error of (+/-) 3 percentage points, the confidence level is 95%. That means if the survey is conducted 100 times using the same sample size and selection method, 95 out of the 100 surveys would produce results that are the same as they would be for the entire population, within a margin of error (+/-) 3 percentage points. This means that there is a 95% probability that the actual support for the candidate in the above example is between 47% and 53%.
What is the statistical standard for margin of error and confidence level?
Both the margin of error and the confidence level are ways of specifying how close the survey results are likely to be to the actual results of the entire population. Low margins of error and high confidence levels make survey results more likely to be close to those of the entire population. If Sri Lanka’s entire population was surveyed, the margin of error would be zero and the confidence level would be 100. Therefore, the larger the randomly selected size of a sample survey, the lower the margin of error and/or higher the confidence level.
In this context, what is an acceptable survey sample in terms of these two metrics? Sample surveys around the world, especially in relation to opinion polling and understanding population characteristics, have tended to carry a margin of error of (+/-) 3 percentage points and a confidence level of 95%.
Achieving a particular statistical standard in terms of the margin of error and the confidence level is determined both by the size of the sample, and the method by which the sample is chosen.
Method of Random Selection
The margin of error and confidence level calculations are based on the sample being randomly selected from the population. Random selection techniques provide everyone in the population an approximately equal chance of being selected, and form samples that are likely to be a representative microcosm of the entire population.
What happens if this expectation of random selection is violated and there is a bias towards some group in the population in terms of sample selection? The results of the sample survey will also be biased towards the results of that group, which could be different from those of the entire population. For instance, if the survey was only conducted online, the sample would be biased towards those who have the capacity (devices and data) to access the internet. If only those able to speak English were selected, the sample results may be less representative of Sri Lanka’s entire population, which mostly speaks Sinhala or Tamil.
There are many methods to implement random selection. One method is a simple random sample: to randomly pick people from the entire population, like drawing a lottery ticket.
Another is to have a stratified random sample: to begin by stratifying the population (dividing the entire population into diverse non-overlapping groups), and then picking a simple random sample from each group. Often, in national sample surveys, a multi-stage stratified random sample can help ensure a more representative random selection.
For example, in Sri Lanka, the first stage of a multi-stage stratified random sample could consider the population as being divided into 24 districts. The second stage could be to select from within each district a certain number of Grama Niladari (GN) divisions proportionate to the district populations (more GN divisions from districts with larger populations). The final stage could be to select a certain number of people from each GN division, proportionate to the GN division populations (more people from GN divisions with larger populations).
Minimum acceptable size of a randomly selected sample
The above is a basic explanation of sample surveys, to help readers objectively calculate the sample size that would be adequate for statistically valid results. It is possible to calculate the required sample size using the relevant mathematical equations; free online calculators can generate the result as well.
Sri Lanka has an adult population of c.14 million persons and c.5.7 million households. Both the equation and calculators, when applied, provide the following result for Sri Lanka: on a nationwide survey, a 95% confidence level and an error margin of 3 percentage points are achieved with a randomly selected sample of c. 1,000.
Thus, the claim that very large sample sizes are necessary to achieve reasonably accurate survey results is a popular misconception. Statistical science and maths lead us to an objective conclusion, that c. 1,000 is enough. Larger samples can be used, but the cost-to-benefit ratio can be small. For instance, more than doubling the sample to c. 2,400 will only reduce the maximum error margin from 3 to 2 percentage points. You can use this online calculator to test it for yourself: https://www.surveymonkey.com/mp/sample-size- calculator/.
FactCheck.lk is a platform run by Verité Research.