00:00
A confidence interval is the range within
which you expect the population parameter to
be. And its estimation is based on the data
we have in our sample.
00:11
There can be two main situations when we
calculate the confidence intervals for a
population, when the population variance is
known and when it is unknown.
00:20
Depending on which situation we are in, we
would use a different calculation method.
00:25
Now, the whole field of statistics exist
because we almost never have population
data. Even if we do have population, we may
not be able to
analyze it. It may be so much that it
doesn't make sense to be used all at once.
00:40
Think about people using the Internet, the
data.
00:44
Google has approximates population data, but
even their data is not.
00:49
There are people who are not a part of the
Google ecosystem in any way.
00:54
That can be done by using other browsers
like Opera and Safari or other
search engines like Bing DuckDuckGo or video
providers different from
YouTube. Furthermore, they can browse in
incognito.
01:07
These people are a part of the population of
people using the Internet, but Google doesn't
have much data on them.
01:14
As you can see, even the company that has
the most data doesn't
necessarily have population data.
01:21
So if Google wants to use statistical
methods to target them with Google Ads, they
will basically be using sample data to for
with a population variance unknown to guess
their preferences.
01:34
Ok. In this lesson, we will explore the
confidence intervals for a
population mean with a known variance.
01:42
An important assumption in this calculation
is that the population is normally
distributed. Even if it is not, you should
use a large sample and let the
central limit theorem do the normalization
magic for you.
01:54
Remember, if you work with a sample which is
large enough, you can assume
normality of sample means.
02:01
All right. Let's say you want to become a
data scientist, and you're
interested in the salary you are going to
get.
02:09
Imagine you have certain information that
the population standard deviation of data
science salaries is equal to 15,000.
02:17
Furthermore, you know, the salaries are
normally distributed, and your sample
consists of 30 salaries.
02:24
The formula for the confidence interval with
a known variance is given below.
02:29
The population mean will fall between the
sample mean minus
z of alpha divided by two times the standard
error.
02:40
And the sample mean plus z of alpha divided
by
two times the standard error.
02:49
The sample mean is the point estimate.
02:51
You know all about the standard error
already.
02:54
So let's compute it using the formula.
02:58
What we have left is the so-called
reliability factor Z of
Alpha divided by two.
03:06
Z is the statistic that we've described
earlier, the standardized variable that has
a standard normal distribution.
03:13
Right. And what about Alpha?
This is the same alpha we had when we
defined our confidence level.
03:21
So for a confidence level of 95%, alpha will
be equal to
5%. Similarly, for a confidence level of
99%,
alpha would be equal to 1%.
03:33
It all fits into place now, doesn't it?
Let's go back to our example.
03:39
The sample mean is 100,200 and the standard
deviation is known to
be 15,000.
03:45
Thus the standard error is 2739.
03:52
Having calculated these values, we can take
the next step and choose our confidence
level. Common confidence levels are 90%, 95%
and
99%, with respective alphas of 10%, 5% and
1%.
04:08
Another way to put the value of alpha is
0.10.05
and 0.01 respectively.
04:18
Keep in mind that a 95% confidence interval
means you are sure that a
95% of the cases, the true population
parameter would fall into the
specified interval.
04:29
Ok the Z of Alpha comes from the so-called
standard normal distribution
table. It is best to first see it and then
comment on it.
04:40
Let's say that we want to find the values
for the 95% confidence interval.
04:45
Alpha is 0.05.
04:47
Therefore, we are looking for Z of alpha
divided by two or
0.0 to 5.
04:56
In the table. This will match the value of
one
-0.025 or
0.9775. The corresponding
Z comes from the sum of the row and column
table headers associated with this
cell. In our case, the value is 1.9
plus 0.06 or 1.96.
05:22
A commonly used term for the Z is critical
value.
05:24
So we have found the critical value for this
confidence interval.
05:29
Now we can easily substitute in the formula.
05:33
The final confidence interval becomes 94833
to
105568. The interpretation is the following.
05:45
We are 95% confident that the average data
scientist salary will be in the
interval 94,833 and
$105,568.
06:00
Let's repeat the exercise using a higher
confidence level.
06:04
Say we want to be 99% certain of the outcome.
06:07
Alpha is 0.01.
06:11
We look at the table for the value of one
-0.005,
which is equal to 0.995.
06:19
Bummer. There is no such value when this
happens.
06:23
We just have to round to the nearest value
available.
06:27
The corresponding critical value is 2.5 plus
0.08.
06:32
Thus, 2.58.
06:35
We plug it into our formula once more and
the new confidence interval is equal to
93,135 and
107,206. This means that we are
99% confident that the average data
scientist salary is going to lie in the
interval between 93,135 and
$107,206.
07:00
Please note that in this case, there is a
trade-off between the level of confidence we
chose and the estimation precision.
07:06
The interval we obtained is broader.
07:09
The opposite is also true.
07:11
A narrow confidence interval translates to
higher uncertainty.
07:15
Makes sense, right?
If we are trying to estimate the population
mean, and we are picking a larger interval,
we're increasing our chances of having an
interval that actually includes the mean
and vice versa.
07:29
If we want to be more specific about the
population mean range, this will take away
from our confidence about this statement.
07:37
Okay. This lecture was a bit longer, but
very insightful.
07:42
Don't skip the exercises provided.
07:44
They will help you reinforce the knowledge
about this concept, which is fundamental for
everybody who wants to work with numbers in
their job.
07:53
In the next few lessons, we will study some
particular cases and teach you how to find
confidence intervals for them.
08:00
Thanks for watching.