How many times have you read about the results of studies that were claimed to be or not be statistically significant? How many times have you read that the frequency of some occurrence was found or not found to be statistically significantly above chance? It’s like if the phrase “statistically significant” is considered some sort of seal of legitimacy. If it is statistically significant, it must be true, no? And if it is not statistically significant, it must be false, right?
The truth, of course, is more complex than that. Let’s address one issue that invalidates many studies regardless of whether their results are statistically significant or not: sample size.
Studies conducted with a small sample size, if not carefully designed, are very prone to variability. A very big red flag is when the results of a small study are unexpected. Studies with a small sample size can produce false results either in favor or against the premise that they are addressing, although most of the time they fail to produce statistically significant results for an effect, even if the effect is real. This sample size issue is often not reported when the study is presented in the news or, if reported, it normally goes something like this:
The authors caution that this was a study with a small sample size but (there is always a “but”) if this result can be repeated in a larger study then…
On the one hand, the authors acknowledge that the results of the study may be bogus due to the small sample size, but on the other hand the authors nevertheless find the results worthy enough to publicize or to acquiesce to the desire of journalists to publicize them!
So, are all studies with a small sample size bad, and should be avoided?
Not necessarily. Sometimes scientists perform small studies intended to give the issue being tested a “look-see”. Sometimes studies with small sample sizes are all that is possible to do due to budgetary constraints or other constraints such a working with a rare disease. Scientists also perform small studies to gain experience in order to later design larger studies. But there is another reason a study with a small sample size may be justified. Consider the following statistical joke:
A scientist asks a group of 3 students to concentrate in teleporting through a solid wall. Of these 3 students one manages to accomplish this feat. Excitedly the scientist informs a colleague of the results of the experiment, but his colleague dismisses it outright arguing that with a sample size of 3 you cannot possibly obtain any statistically significant results.
The premise of this joke is that one person achieving such a feat would be enough. No further sample (or even statistics for that matter) is required! This is intended to illustrate that the nature of the effect being evaluated is important. If scientists are only interested in detecting a very large effect, then a study with a small sample size may be totally justified.
But you may argue that a study with a large sample size is always more desirable, no? After all, you can’t go wrong if you perform a study with a large sample size, right?
Not only are studies with large sample sizes more expensive, time-consuming, and difficult to implement, but there is a seldom discussed and surprising downside to large samples sizes illustrated in the following example:
A researcher spends several years performing trials to evaluate whether human subjects can mentally affect the outcome of the toss of a coin as simulated by a computer. After accumulating and analyzing hundreds of thousands of trials, the researcher finds that human subjects can influence the computer mentally to produce heads as oppose to tails with a frequency of 50.0001%, and that this effect is statistically significantly above mere chance.
The problem outlined in the example above, is that using a very large sample size you can detect anything, including random background fluctuations or equipment calibration imperfections, as statistically significant. Detecting such a small effect, regardless of whether it is statistically significant or not, may not only be meaningless, but is also devoid of any practical significance.
The dirty little secret of statistical analysis is that statistical significance cannot replace good judgement. Before a study, you have to ask what is the size of the effect that you want to detect and whether it would be of importance to detect an effect of that magnitude, if indeed it is present. These questions will lead to determining the correct sample size for the study, and in fact whether the study should be performed at all!
To recap and answer the question posited in the title of this post: in statistics whether sample size matters or not depends on the magnitude and importance of the effect being evaluated.
1) The detection of an important small effect may require a study with a large sample size.
2) The detection of an important large effect may be achieved with a carefully designed study employing a small sample size.
3) The detection of a small effect with a large sample size may be irrelevant if the effect is not important, regardless of statistical significance.
4) The detection of an effect with a small sample size in a study not carefully designed is likely to be a happenstance occurrence, regardless of statistical significance.
So next time you hear about whether something was statistically significant, inquire about sample size.
Image by Nick Youngson used here under an Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license.