00:00
Next on our to-do list are the measures of
variability.
00:04
There are many ways to quantify variability.
00:06
However, we will focus on the most common
ones variance, standard deviation
and coefficient of variation.
00:14
In the field of statistics, we will
typically use different formulas when working
with population data and sample data.
00:20
Let's think about this for a bit.
00:23
When you have the whole population, each
data point is known.
00:26
So you are 100% sure of the measures you
were calculating.
00:30
When you take a sample of this population,
and you compute a sample statistic, it is
interpreted as an approximation of the
population parameter.
00:38
Moreover, if you extract ten different
samples from the same population, you will
get ten different measures.
00:44
Statisticians have solved the problem by
adjusting the algebraic formulas for many
statistics to reflect this issue.
00:51
Therefore, we will explore both population
and sample formulas, as they are
both used. You must be asking yourself why
there are unique formulas
for the mean median and mode.
01:03
Well, actually the sample mean is the
average of the sample data points,
while the population mean is the average of
the population data points.
01:12
So technically there are two different
formulas, but they are computed in the same
way. Okay.
01:19
Now, after this short clarification, it's
time to get on to variants.
01:24
Variance measures, the dispersion of a set
of data points around their mean value.
01:30
Population variance denoted by sigma squared
is equal to the sum of square differences
between the observed values and the
population mean divided by the total number
of observations.
01:42
Sample variance, on the other hand, is
denoted by S squared and is equal
to the sum of squared.
01:48
Differences between observed sample values
and the sample mean divided by
the number of sample observations minus one.
01:57
All right. When you were getting acquainted
with statistics, it is hard to grasp
everything right away.
02:03
Therefore, let's stop for a second to
examine the formula for the population and
try to clarify its meaning.
02:11
The main part of the formula is its
numerator.
02:13
So that's what we want to comprehend, the
sum of differences between the
observations and the mean squared.
02:20
Hmm. So the closer a number to the mean, the
lower the result we will
obtain. Right?
And the further away from the mean it lies,
the larger this difference.
02:32
Easy. But why do we elevate to the second
degree?
Squaring the differences has two main
purposes.
02:41
First, by squaring the numbers, we always
get non-negative computations
without going too deep into the mathematics
of it.
02:48
It is intuitive that dispersion cannot be
negative.
02:51
Dispersion is about distance, and distance
cannot be negative.
02:56
If, on the other hand, we calculate the
difference and do not elevate to the second
degree, we would obtain both positive and
negative values that, when summed, would
cancel out, leaving us with no information
about the dispersion.
03:10
Second squaring amplifies the effect of
large differences.
03:14
For example, if the mean is zero, and you
have an observation of 100, the squared
spread is 10,000.
03:22
All right, enough dry theory.
03:24
It is time for a practical example.
03:27
We have a population of five observations.
03:30
One, two, three, four and five.
03:33
Let's find its variants.
03:36
We start by calculating the mean one plus
two plus three plus
four plus five divided by five equals three.
03:45
Then we apply the formula.
03:47
We just saw one minus three squared plus two
minus
three squared plus three minus three squared
plus
four. Minus three squared plus five.
04:02
Minus three squared.
04:04
All of these components have to be divided
by five.
04:07
When we do the math, we get two.
04:10
So the population variance of the data set
is two.
04:14
But what about the sample variants?
This would only be suitable if we were told
that these five observations were a sample
drawn from a population.
04:23
So let's imagine that's the case.
04:26
The sample mean is once again three.
04:29
The numerator is the same, but the
denominator is going to be four instead of
five, giving us a sample variance of 2.5.
04:38
To conclude the variance topic, we should
interpret the results.
04:42
Why is the sample variance bigger than the
population variance?
In the first case, we knew the population.
04:48
That is, we had all the data, and we
calculated the variance.
04:52
In the second case, we were told that one,
two, three, four and five was a
sample drawn from a bigger population.
05:00
Imagine the population of this sample were
these nine numbers
111, two, three, four, five,
five, five and five.
05:11
Clearly the numbers are the same, but there
is a concentration around the two extremes
of the data set.
05:17
One and five.
05:19
The variance of this population is 2.96.
05:24
So our sample variance has rightfully
corrected upwards in order to reflect the
higher potential variability.
05:32
This is the reason why there are different
formulas for sample and population data.
05:38
This was a very important lesson, so please
make sure that you have understood it well.
05:43
You can reinforce what you learned here by
doing the exercise available in the course
resources section.
05:49
Remember, the subject of statistics is only
understood when practiced.
05:54
Thanks for watching.