00:00
All right. So we've learned that confidence
intervals based on small samples from
normally distributed populations are
calculated with the T statistic.
00:10
Let's check a similar example to the one we
saw earlier.
00:14
You are an aspiring data scientist and are
wondering how much the mean data
scientist salary is.
00:22
This time, though, you do not have the
population variance.
00:26
In fact, you have a sample of only nine
compensations you found on Glassdoor and
have summarized the information in the
following table.
00:35
Okay. We've already calculated the sample
mean and standard error, which are
92,533
$4,644 respectively.
00:48
Good, but we don't have one key piece of
information, the
population variance.
00:55
No problem.
00:56
As the good statisticians that we are, we
will use the student's T
distribution. Here's the formula that allows
us to find a confidence
interval for the mean of a population with
an unknown variance.
01:10
Let's compare it with the formula we use
when the variance is known.
01:15
There are two key differences.
01:17
First, instead of a Z statistic, we have a T
statistic.
01:22
And second, instead of population standard
deviation, we have sample
standard deviation.
01:29
Otherwise, everything is the same.
01:33
So it shouldn't be that difficult to
remember.
01:37
The logic behind constructing confidence
intervals in both cases is the same.
01:42
The only two inputs that change are the
statistic at hand and the standard
deviation. When population variance is
known, population
standard deviation goes with the Z
statistic.
01:55
When population variance is unknown, sample
standard deviation goes with the t
statistic. All right.
02:03
So we have the sample mean standard
deviation and sample size.
02:09
All we have to do is find the T statistic.
02:13
We will be able to obtain the PT statistic
from the PT table.
02:17
First, we need to specify the degrees of
freedom.
02:21
For the students T distribution, there are n
minus 1 degrees of freedom.
02:27
Our sample consists of nine observations.
02:30
So we have eight degrees of freedom.
02:33
Second, we have to find Alpha divided by
two.
02:38
Once again, this depends on the confidence
level that we want to obtain.
02:42
In this example, we are going to use a
confidence level of 95%.
02:47
This means that alpha is equal to 5%.
02:51
Therefore, half of alpha would be 2.5%.
02:57
You can now see that the associated
statistic is 2.31.
03:05
Note that some T tables you will find in
books or online like this one have a
CY row. The abbreviation stands for
Confidence Interval.
03:15
Instead of finding Alpha, we can just check
the 95% confidence interval and get
the same result.
03:21
Easy. We have all the
information needed, so we just plug in the
numbers.
03:31
What we get is a confidence interval from
81806
to $103261.
03:41
Let's compare this result to the result for
the confidence interval with known
population. We got a 95% confidence
interval that was between $94,833 and
105,568.
03:58
You can clearly note that when we know the
population variance, we get a narrower
confidence interval.
04:04
When we do not know the population variance,
there is a higher uncertainty that is
reflected by wider boundaries for our
interval.
04:11
Makes sense, doesn't it?
So what we learned today is that even when
we do not know the population
variance, we can still make predictions.
04:21
But they will be less accurate.
04:24
Furthermore, the proper statistic for
estimating the confidence interval when the
population variance is unknown is the T
statistic and not the Z
statistic. All right, great.
04:35
Thanks for watching.