Statistical power is the probability Probability Probability is a mathematical tool used to study randomness and provide predictions about the likelihood of something happening. There are several basic rules of probability that can be used to help determine the probability of multiple events happening together, separately, or sequentially. Basics of Probability of detecting an effect when that effect genuinely exists in the population. Other things being equal, a test based on a large sample has more statistical power than a test involving a small sample. There are also ways to increase power without increasing the sample size. Most published studies have low statistical power, which can lead to serious misinterpretation of results.
Last updated: Aug 9, 2022
In order to comprehend the concept of statistical power, some previous knowledge about descriptive and inferential statistics is recommended.
Statistical power (SP) is expressed in 3 different ways:
Fewer than 13% of 31,873 clinical trials published between 1974 and 2017 had adequate SP. A study with low SP means that the test results are questionable and poses potentially serious problems, including:
Studies having too much SP, also called “overpowered studies,” can also be problematic because of the following reasons:
Statistical power has relevance only when the null hypothesis Null hypothesis The null hypothesis (H0) states that there is no difference between the populations being studied (or put another way, there is no relationship between the variables being tested). Statistical Tests and Data Representation can be rejected, and is determined by the following variables:
Alpha is the chance of testing positive by a diagnostic test among those without the condition, causing a type I error Error Refers to any act of commission (doing something wrong) or omission (failing to do something right) that exposes patients to potentially hazardous situations. Disclosure of Information or a “ false positive False positive An FP test result indicates that a person has the disease when they do not. Epidemiological Values of Diagnostic Tests.”
Beta is the chance of testing negative by a diagnostic test among those with the condition, causing a type II error Error Refers to any act of commission (doing something wrong) or omission (failing to do something right) that exposes patients to potentially hazardous situations. Disclosure of Information or a “ false negative False negative An FN test result indicates a person does not have the disease when, in fact, they do. Epidemiological Values of Diagnostic Tests.”
The relationship Relationship A connection, association, or involvement between 2 or more parties. Clinician–Patient Relationship between alpha and beta is often depicted in graphs that show:
There is an inverse relationship Relationship A connection, association, or involvement between 2 or more parties. Clinician–Patient Relationship between alpha and beta. If beta is decreased:
The inverse relationship Relationship A connection, association, or involvement between 2 or more parties. Clinician–Patient Relationship of alpha and beta can also be appreciated in a 2 x 2 contingency table Contingency table A contingency table lists the frequency distributions of variables from a study and is a convenient way to look at any relationships between variables. Measures of Risk that compares the positive and negative findings of reality versus a study.
Real positive findings | Real negative findings | |
---|---|---|
Positive study findings | True positives (power, 1 – β) | False positives (type I error Error Refers to any act of commission (doing something wrong) or omission (failing to do something right) that exposes patients to potentially hazardous situations. Disclosure of Information, α) |
Negative study findings | False negatives (type II error Error Refers to any act of commission (doing something wrong) or omission (failing to do something right) that exposes patients to potentially hazardous situations. Disclosure of Information, β) | True negatives |
Standard deviation Standard deviation The standard deviation (SD) is a measure of how far each observed value is from the mean in a data set. Measures of Central Tendency and Dispersion is a measure of the amount of variation or dispersion Dispersion Central tendency is a measure of values in a sample that identifies the different central points in the data, often referred to colloquially as “averages.” The most common measurements of central tendency are the mean, median, and mode. Identifying the central value allows other values to be compared to it, showing the spread or cluster of the sample, which is known as the dispersion or distribution. Measures of Central Tendency and Dispersion of a set of values relative to the mean Mean Mean is the sum of all measurements in a data set divided by the number of measurements in that data set. Measures of Central Tendency and Dispersion.
The sample size is the number of observations in a sample.
For a 2 sample, 2-tailed t-test with an alpha level of 0.05, the simple formula below will give an approximate sample size needed to have a statistical power of 80% (beta = 0.2):
$$ n = \frac{16s^{2}}{d^{2}} $$where n = size of each sample, s = standard deviation Standard deviation The standard deviation (SD) is a measure of how far each observed value is from the mean in a data set. Measures of Central Tendency and Dispersion (assumed to be the same in each group), and d = difference to be detected. The mnemonic, as suggested by the originator of the formula, Robert Lehr, is “16 s-squared over d-squared.” (Note: “s-squared” is also known as the variance).
Examples:
Effect size is the standardized mean Mean Mean is the sum of all measurements in a data set divided by the number of measurements in that data set. Measures of Central Tendency and Dispersion difference between 2 groups, which is exactly equivalent to the “ Z-score Z-score Standard deviation difference between patient’s bone mass density and that of age-matched population. Osteoporosis” of a standard normal distribution.
Calculating ES with Cohen’s d:
Cohen’s d is the most common (but imperfect) method to calculate ES. Cohen’s d = the estimated difference in the means/(pooled estimated standard deviations), where:
$$ {SD = \sqrt{\frac{(SD1^{2} + SD2^{2})}{2}}} $$If the SDs are equal in each group, then d = mean Mean Mean is the sum of all measurements in a data set divided by the number of measurements in that data set. Measures of Central Tendency and Dispersion differences/ SD SD The standard deviation (SD) is a measure of how far each observed value is from the mean in a data set. Measures of Central Tendency and Dispersion. For example, if the difference is 150 and the SD SD The standard deviation (SD) is a measure of how far each observed value is from the mean in a data set. Measures of Central Tendency and Dispersion is 50, then d = 150/50 = 3, which is a large ES.
Interpretation of Cohen’s d:
In summary, the SP will tend to be greater when:
A power analysis answers 2 big questions:
The traditional minimum level of power is 80% (or 0.80), just as the arbitrary 5% (or 0.05) value is the traditional minimum alpha cut-off to set the p-value P-value The p-value is the probability of obtaining a given result, assuming the null hypothesis is true. Statistical Tests and Data Representation at 0.05.
It would be much better to have a 90% power level. Although it takes more resources, keep in mind that it would take even more to rerun the study later.
A trial of a new fertilizer called “Grow-A-Lot” was given to a tomato farmer to determine if more tomatoes are produced per plant with the new fertilizer compared to plants Plants Cell Types: Eukaryotic versus Prokaryotic not fertilized. The farmer picked 200 tomato seeds from a bucket of his usual planting seeds and divided them into 2 groups:
The null hypothesis Null hypothesis The null hypothesis (H0) states that there is no difference between the populations being studied (or put another way, there is no relationship between the variables being tested). Statistical Tests and Data Representation is that both groups of plants Plants Cell Types: Eukaryotic versus Prokaryotic would produce the same number of tomatoes per plant, whereas the alternative hypothesis Alternative hypothesis The alternative hypothesis (H1) states that there is a difference between the populations being studied. Statistical Tests and Data Representation would be that the plants Plants Cell Types: Eukaryotic versus Prokaryotic receiving the fertilizer would produce a different number of tomatoes.
Trial 1 with large sample sizes:
The fertilized group produced an average of twice the number of tomatoes (300) as the control group (150). There is also a small amount of overlap, since some plants Plants Cell Types: Eukaryotic versus Prokaryotic in the control group overperformed the others in their group, whereas some plants Plants Cell Types: Eukaryotic versus Prokaryotic in the experimental group underperformed. Just a glance at the graph is convincing enough to note that there is an obvious difference, but a t-test was performed to confirm that the difference was statistically significant, with a very small p-value P-value The p-value is the probability of obtaining a given result, assuming the null hypothesis is true. Statistical Tests and Data Representation.
Even if the experiment is repeated 1000 times, it would be extremely unlikely that the farmer would happen to pick a different set of seeds at random from the overlap region in order to yield a different result. The large size effect alone gives this trial a large amount of SP because it would be extremely unlikely that repeat sampling would produce a different result.
Trial 2 with small sample sizes:
The experiment would retain large SP with many fewer subject seeds as well, and almost all the t-tests would correctly give a significant (small) p-value P-value The p-value is the probability of obtaining a given result, assuming the null hypothesis is true. Statistical Tests and Data Representation.
A different fertilizer is used (fertilizer “Grow-A-Little”) that has a much lesser effect, producing an average of only 10 extra tomatoes per plant. There will be a larger overlap of tomato production per plant between the experimental and the control groups, which can be detected only by using larger sample sizes.
Trial 3 with large sample sizes and large SDs:
The sample sizes are sufficiently large to counteract the small ES, making the difference statistically significant at a p-value P-value The p-value is the probability of obtaining a given result, assuming the null hypothesis is true. Statistical Tests and Data Representation < 0.05. Note that, even though the difference is statistically significant, this small difference may not be of practical or relevant significance to the farmer.
Trial 4 with small sample sizes and large SDs:
Due to small sample sizes, no statistically significant difference is found at a p-value P-value The p-value is the probability of obtaining a given result, assuming the null hypothesis is true. Statistical Tests and Data Representation < 0.05. So, the null hypothesis Null hypothesis The null hypothesis (H0) states that there is no difference between the populations being studied (or put another way, there is no relationship between the variables being tested). Statistical Tests and Data Representation cannot be rejected because the trial did not have a large enough effect or sample size.
Trial 5 with small sample sizes and small SDs:
Due to a small SD SD The standard deviation (SD) is a measure of how far each observed value is from the mean in a data set. Measures of Central Tendency and Dispersion, the difference is statistically significant at a p-value P-value The p-value is the probability of obtaining a given result, assuming the null hypothesis is true. Statistical Tests and Data Representation of 0.05. The SD SD The standard deviation (SD) is a measure of how far each observed value is from the mean in a data set. Measures of Central Tendency and Dispersion is usually a fixed parameter in a population and cannot be changed, but the same result can be effectively obtained by increasing the sample size. The increase diminishes the impact of a large but fixed value of SD SD The standard deviation (SD) is a measure of how far each observed value is from the mean in a data set. Measures of Central Tendency and Dispersion, allowing the detection of smaller differences between the groups tested.
Investigators involved in designing a randomized clinical trial chose a sample size that would have 90% power of detecting a 20% difference between the control and experimental groups, with a significance level (2-sided) of 5%.
If in actuality, there is no difference in the means, what is the chance that the study will find a statistically significant difference? What is this error Error Refers to any act of commission (doing something wrong) or omission (failing to do something right) that exposes patients to potentially hazardous situations. Disclosure of Information called?
Answer: This is just a terminology question and is typical of the type of question present on the board exams, with the power inserted as a distractor. Refer to the first multicolored graph above: if there is no difference between the 2 groups, then there would just be 1 bell curve, with the alpha cut-off describing the false positives; thus, the chance of finding a statistically significant difference is 5%, creating a type I ( false positive False positive An FP test result indicates that a person has the disease when they do not. Epidemiological Values of Diagnostic Tests) error Error Refers to any act of commission (doing something wrong) or omission (failing to do something right) that exposes patients to potentially hazardous situations. Disclosure of Information, because any subject having a value in the alpha area belongs to the same population.
Does power increase/decrease/not change if beta is decreased?
Answer: Power increases if beta is decreased, as the power = 1 – beta. Refer to the first multicolored graph.
Does power increase/decrease/not change if alpha is increased?
Answer: Power increases if alpha is increased, which increases the probability Probability Probability is a mathematical tool used to study randomness and provide predictions about the likelihood of something happening. There are several basic rules of probability that can be used to help determine the probability of multiple events happening together, separately, or sequentially. Basics of Probability of false positives; thus, increasing alpha is not a favored way of increasing power. Refer to the first multicolored graph to see the relationship Relationship A connection, association, or involvement between 2 or more parties. Clinician–Patient Relationship between alpha and power. In a board exam, a 2 x 2 contingency table Contingency table A contingency table lists the frequency distributions of variables from a study and is a convenient way to look at any relationships between variables. Measures of Risk of reality/truth versus study/test results is often used to frame this question. It is important to understand how to calculate type I and type II errors.
Does power increase/decrease/not change if the difference between the mean Mean Mean is the sum of all measurements in a data set divided by the number of measurements in that data set. Measures of Central Tendency and Dispersion of the experimental group and that of the control group increases?
Answer: Power increases by increasing the mean Mean Mean is the sum of all measurements in a data set divided by the number of measurements in that data set. Measures of Central Tendency and Dispersion difference, which is another way of increasing the ES as there is less overlap between the 2 distributions. See the first multicolored graph.
Does beta increase/decrease/not change if the difference between the mean Mean Mean is the sum of all measurements in a data set divided by the number of measurements in that data set. Measures of Central Tendency and Dispersion of the experimental group and that of the control group increases?
Answer: Beta decreases if the mean Mean Mean is the sum of all measurements in a data set divided by the number of measurements in that data set. Measures of Central Tendency and Dispersion difference increases, as there is less overlap between the 2 populations. See the first multicolored graph.