00:01
Hi again. So you know what a hypothesis is
and you have an idea of how to
form the null and alternative hypothesis.
00:09
By the end of this lesson, we will
understand the reason why hypothesis testing
works. First, we must define the term
significance level.
00:19
Normally we aim to reject the null if it is
false, right.
00:23
However, as with any test, there is a small
chance that we could get it wrong and reject
the null hypothesis.
00:29
That is true. The significance level is
denoted by Alpha and is the
probability of rejecting the null hypothesis
if it is true.
00:37
So the probability of making this error.
00:41
Typical values for alpha are 0.010.05 and
0.1.
00:49
It is a value that you select based on the
certainty you need.
00:53
In most cases, the choice of alpha is
determined by the context you are operating
in. But 0.05 is the most commonly used
value.
01:02
Let's explore an example.
01:04
Say you need to test if a machine is working
properly.
01:08
You would expect the test to make little or
no mistakes.
01:11
As you want to be very precise, you should
pick a low significance level, such as
0.01. The famous Coca Cola glass bottle is
12
ounces. If the machine pours 12.1 ounces,
some of the liquid will
be spilled and the label would be damaged as
well.
01:29
So in certain situations, we need to be as
accurate as possible.
01:34
However, if we are analyzing humans or
companies, we would expect more random or at
least uncertain behavior and hence a higher
degree of error.
01:43
For instance, if we want to predict how much
Coca Cola its consumers drink, on average,
the difference between 12 ounces and 12.1
ounces will not be that crucial.
01:52
So we can choose a higher significance level
like 0.05 or
0.1. Now that we have an
idea about the significance level, let's get
to the mechanics of hypothesis
testing. Imagine you are consulting a
university and want to
carry out an analysis on how students are
performing on average.
02:15
The university. Dean believes that on
average, students have a GPA of 70%.
02:20
Being the data driven researcher that you
are, you can't simply agree with his opinion.
02:25
So you start testing.
02:27
The null hypothesis is the population mean
grade is 70%.
02:33
This is a hypothesized value, and we denote
it with MU zero.
02:38
The alternative hypothesis is the population
mean grade is not 70%.
02:43
So you zero defers from 70%.
02:48
All right. Assuming that the population of
grades is normally distributed, all grades
received by students should look this way.
02:56
That is the true population mean.
02:58
Now a test we would normally perform is the
Z test.
03:02
The formula is Z equals the sample mean
minus the hypothesized
mean divided by the standard error.
03:11
The idea is the following.
03:14
We are standardizing or scaling the sample
mean we got.
03:17
If the sample mean is close enough to the
hypothesized mean, then Z will be close to
zero. Otherwise, it will be far away from
it.
03:26
Naturally, if the sample mean is exactly
equal to the hypothesized, mean Z will be
zero. In all these cases, we would accept
the null hypothesis.
03:37
Okay. The question here is the following.
03:40
How big should be for us to reject the null
hypothesis.
03:45
Well, there is a cutoff line.
03:47
Since we are conducting a two sided or a two
tailed test, there are two cutoff lines, one
on each side. When we calculate Z, we will
get a value.
03:57
If this value falls into the middle part,
then we cannot reject the null.
04:01
If it falls outside in the shaded region,
then we reject the null hypothesis.
04:06
That is why the shaded part is called
rejection region.
04:12
The area that is cut off actually depends on
the significance level.
04:17
The level of significance alpha is 0.05.
04:20
Then we have alpha divided by two or 0.0 to
5 on the left
side and 0.0 to 5 on the right side.
04:30
Now these are values we can check from the Z
table.
04:33
When Alpha is 0.0 to 5, Z is 1.96.
04:38
So 1.96 on the right side and -1.96 on the
left side.
04:44
Therefore, if the value we get for Z from
the test is lower than
-1.96 or higher than 1.96, we will reject
the
null hypothesis, otherwise we will accept
it.
04:57
That's more or less how hypothesis testing
works.
05:01
We scale the sample mean with respect to the
hypothesized value.
05:06
If Z is close to zero, then we cannot reject
the null.
05:09
If it is far away from zero, then we reject
the null hypothesis.
05:15
Ok. What about one sided tests?
We have those too.
05:20
Let's take the example from last lecture.
05:23
Paul says data scientists earn more than
125,000.
05:28
So h zero is MU zero is bigger than
125,000. The alternative is that zero
is lower or equal to 125,000.
05:42
Using the same level of significance this
time.
05:44
The whole rejection region is on the left.
05:48
So the rejection region has an area of
alpha.
05:52
Looking at the Z table that corresponds to a
Z score of
1.645, and since it is on the left, it is
with a minus sign.
06:01
Now when calculating our test statistic Z,
if we get a value lower than
-1.645, we would reject the null hypothesis
as we have
statistical evidence that the data scientist
salary is less than 125,000.
06:16
Otherwise, we would accept it.
06:20
All right. To exhaust all possibilities,
let's explore another one
tale test. Say the university dean told you
that the average
GPA students get is lower than 70%.
06:33
In that case, the null hypothesis is MU zero
is lower than
70%, while the alternative means zero is
bigger or equal to
70%. In this situation, the rejection region
is on the
right side. So if the test statistic is
bigger than the cutoff Z score,
we would reject the null.
06:55
Otherwise, we wouldn't.
06:58
Cool. That's all for now.
07:00
In a lesson or two, we'll start testing.
07:03
Just hold on a bit, and thanks for watching.