Central tendency is a measure of values in a sample that identifies the different central points in the data, often referred to colloquially as “averages.” The most common measurements of central tendency are the mean, median, and mode. Identifying the central value allows other values to be compared to it, showing the spread or cluster of the sample, which is known as the dispersion or distribution. These measurements of dispersion are categorized in 2 groups: measures of dispersion based on percentiles and measures of dispersion based on the mean (which is commonly known as standard deviations). Analysis of data distribution determines whether the data have a strong or a weak central tendency based on their dispersion. When the data distribution is symmetrical Symmetrical Dermatologic Examination and the mean = median = mode, the data are said to have a normal distribution. Other types of distributions are possible as well, and these are known as nonnormal distributions.
Last updated: Sep 1, 2022
Measures of central tendency are single values that attempt to describe a data set by identifying the central or “typical” value for that data set.
Definition:
Mean is the sum of all measurements in a data set divided by the number of measurements in that data set.
Equation:
$$ Mean = \frac{Sum\ of\ all\ values\ in\ the\ data\ set}{Total\ number\ of\ values\ in\ data\ set} $$ $$ Mean = \frac{x_{1}+x_{2}+x_{3}+…+x_{n}}{n} $$Example:
Find the mean of the following data set: 1, 1, 1, 3, 5, 5, 7, 19.
Answer: There are 8 numbers in this data set. To calculate the mean, add up all the numbers and divide by 8:
$$ Mean = \frac{1+1+1+3+5+5+7+19}{8}=\frac{42}{8}=5.25 $$Definition:
After arranging the data from lowest to highest, the median is the middle value, separating the lower half from the upper half of the data set.
Equation:
To find the median, arrange the values from lowest to highest, then use the following equation to determine which “position” in order represents the median:
$$ Median = \left \{ \frac{(n+1)}{2} \right \} $$where n = the number of values in the data set.
Example:
Find the median of the following data set: 1, 5, 1, 19, 3, 1, 7, 5.
Answer: There are 8 numbers in this data set. To find the median, first arrange the numbers in order: 1, 1, 1, 3, 5, 5, 7, 19. Next, determine which “position” represents the median. To do this, use the formula (n + 1) / 2. There are 8 numbers in this data set, so n = 8. Therefore, the median will be: (8 + 1) / 2 = 4.5. The median is between the 4th and 5th numbers, which are 3 and 5 (visually: 1, 1, 1, 3, 5, 5, 7, 19). So the median in this data set is 4.
Definition:
The mode is the value that occurs most frequently in the data set.
Example:
Find the mode of the following data set: 1, 5, 1, 19, 3, 1, 7, 5.
Answer: Identify the number that appears most often. This can be done by setting up a frequency table:
Data point | Frequency (how often the data point occurs in the sample) |
---|---|
1 | 3 |
3 | 1 |
5 | 2 |
7 | 1 |
19 | 1 |
Mnemonic:
MOde is the value that is in the set MOst often.
Type | Description | Example | Result |
---|---|---|---|
Mean | Total sum of numbers divided by number of values | (8 + 4 + 10 + 4 + 4 + 5 + 4 + 5 + 6) / 9 | 5.5 |
Median | Middle value that separates higher half from lower half | 4, 4, 4, 4, 5, 5, 6, 8, 10 | 5 |
Mode | Most frequent number | 4, 4, 4, 4, 5, 5, 6, 8, 10 | 4 |
Dispersion is the size of distribution of values in a data set. Several measures of dispersion include a range, quantiles (e.g., quartiles or percentiles), and standard deviations.
Definition: The standard deviation (SD) is a measure of how far each observed value is from the mean in a data set.
Equation:
Mathematically, the SD can be calculated using the following equation:
$$ \sigma = \sqrt{\frac{\sum (\chi _{i}-\mu )^{2}}{N}} $$σ = population standard deviation
Ν = the size of the population
χᵢ = each value from the population
μ = the population mean
Calculations (using the equation):
Data distribution describes how your data cluster (or don’t cluster). Data tend to cluster in certain patterns, known as distribution patterns. There is a “normal” distribution pattern, and there are multiple nonnormal patterns. Different statistical tests are used for different distribution patterns.
Normal distributions differ according to their mean and variance, but share the following characteristics:
Many processes follow a nonnormal distribution, which can be due to the natural variations or errors in the data.
Common distributions:
Reasons why data may have a nonnormal distribution: