00:01
All right. Now that we've covered the
necessary theory, it is time for some
testing. We're going to explore two types of
tests drawn from a
single population and drawn from multiple
populations.
00:14
This is very similar to confidence intervals
for a single population and confidence
intervals for two populations that we
covered previously.
00:23
In the next few videos, we will run tests
for a single mean with both known
variants and unknown variants.
00:31
Let's start with a test in which the
variance is known, shall we?
For this test, we will use our good old data
scientist salary example.
00:41
Here's the data set one more time.
00:44
By now. I hope you are able to calculate the
sample mean.
00:48
It is $100,200.
00:52
The population variance is known and its
standard deviation is equal to
15,000. Moreover, the sample size is
30. However, you saw that according to
Glassdoor,
the popular salary information website, the
mean data scientist salary is
113,000. The sample that is available on
Glassdoor is
based on self-reported numbers, and you
would like to see if its value is correct.
01:21
We needed a two sided test as we are
interested in knowing both of the salary is
significantly less than that or
significantly more than that.
01:31
The null hypothesis is the population means
salary is
113,000. We denoted as
MU zero equals 113,000.
01:45
The alternative hypothesis is that the
population means salary is different than
113,000. All right.
01:54
Formula time, almost.
01:57
Testing is done by standardizing the
variable at hand and comparing it to the
lowercase c which follows a standard normal
distribution.
02:06
Remember standardization.
02:08
We learned about it in the previous section.
02:10
Back then, I told you it was very important.
02:13
And you will now see why.
02:16
For those that don't remember, I suggest
watching the video on standardization once
again. For the others, I will quickly go
through it.
02:24
We standardize a variable by subtracting the
mean and dividing by the standard
deviation. Since it is a sample, we use the
standard error.
02:34
Thus the formula for standardization
becomes.
02:39
Capital Z is equal to the sample mean minus
the value of
interest from the null hypothesis divided by
the standard error.
02:50
In this way, we obtain a distribution with a
mean of zero and a standard
deviation of one.
02:57
This uppercase C should not be mistaken with
lowercase c.
03:02
The Upper Casey is the standardized variable
associated with the test and will be
called the Z score from now on.
03:11
The lowercase c is the one from the table
that we've talked about before, and
henceforth will be referred to as the
critical value.
03:20
All right. How does testing work?
Think about this.
03:25
The lowercase c is normally distributed with
a mean and standard deviation of one.
03:29
The uppercase C is normally distributed with
a mean of x bar minus MU zero
and a standard deviation of one.
03:38
Standardization lets us compare the means.
03:41
The closer the difference of X bar and MU 0
to 0, the closer the z
score itself to zero.
03:49
This implies a higher chance to accept the
null hypothesis.
03:54
Let's go back to the example.
03:56
So what is the value of our standardized
variable?
We plug in the numbers that we have from the
beginning of the lesson.
04:04
What we get is a z-score of -4.67.
04:09
Now we will compare the absolute value of
-4.67 with a lowercase
z of alpha divided by two, where alpha is a
significance level.
04:19
Note that we use the absolute value, as it
is much easier to always compare positive
capital Z's with positive lowercase c's.
04:27
Moreover, some Z tables don't include
negative values.
04:31
You should be aware that the two statements
-4.67 is lower than the
negative. Critical value is the same as 4.6.
04:40
Seven is higher than the positive critical
value.
04:44
Thus, our decision rule becomes absolute
value of the z score should be higher than
the absolute value of the critical value.
04:52
Using 5% significance.
04:54
Our alpha is 0.05.
04:57
Since it is a two sided test, we check the
table for Z of 0.0 to
5. The corresponding value is 1.96.
05:07
The last thing we need to do is compare our
standardized variable to the critical value.
05:12
If the Z score is higher than 1.96, we would
reject the null hypothesis.
05:18
If it is lower, we will accept it.
05:22
4.67 is higher than 1.96.
05:25
Therefore, we reject the null hypothesis.
05:28
The answer is that at the 5% significance
level, we have rejected the null
hypothesis or at 5% significance.
05:36
There is no statistical evidence that the
mean salary is $113,000.
05:43
There are many other ways to express this,
and you will probably hear more about this
later on in the course.
05:50
What if we had a different significance
level?
Using 1% significance.
05:54
We have an alpha of 0.01, so z of alpha
divided
by two is 2.58.
06:02
Once again, our z-score of 4.67 is higher
than
2.58. So we would reject the null hypothesis
even at the
1% significance.
06:14
But how much further can we go before we
could not reject the null hypothesis anymore?
0.5%. 0.1%.
06:22
There is a special technique that allows us
to see what the significance level is, after
which we will be unable to reject the null
hypothesis.
06:30
We will see it in our next video.
06:33
Stay tuned.