00:01
Hi. And welcome back.
00:03
This section is based on the knowledge that
you acquired previously.
00:07
So if you haven't been through it, you may
have a hard time keeping up.
00:11
Make sure you have seen all the videos about
confidence intervals, distributions, Z
tables and T tables and have done all the
exercises.
00:20
If you've completed them already, you are
good to go.
00:24
Confidence intervals provide us with an
estimation of where the parameters are
located. However, when you are making a
decision, you need a yes or no
answer. The correct approach in this case is
to use a test.
00:38
In this section, we will learn how to
perform one of the fundamental tasks and
statistics. Hypothesis testing.
00:46
Okay. There are four steps in data driven
decision-making.
00:51
First, you must formulate a hypothesis.
00:56
Second, once you have formulated a
hypothesis, you will have to find the right
test for your hypothesis.
01:03
Third, you execute the test.
01:06
And fourth, you make a decision based on the
results.
01:11
Let's start from the beginning.
01:13
What is a hypothesis?
Though there are many ways to define it.
01:18
The most intuitive I've seen is a hypothesis
is an idea
that can be tested.
01:26
This is not the formal definition, but it
explains the point very well.
01:31
So if I tell you that apples in New York are
expensive.
01:34
This is an idea or a statement, but is not
testable until I have something to
compare it with.
01:42
For instance, if I define expensive as any
price higher than
$1.75 per pound, then it immediately becomes
a
hypothesis.
01:54
What's something that cannot be a hypothesis.
01:57
An example may be would the USA do better or
worse under a Clinton
administration compared to a Trump
administration?
Statistically speaking, this is an idea, but
there is no data to test it.
02:11
Therefore it cannot be a hypothesis of a
statistical test.
02:16
Actually, it is more likely to be a topic of
another discipline.
02:21
Conversely, in statistics we may compare
different US presidencies that have already
been completed, such as the Obama
administration and the Bush administration,
as we have data on both.
02:33
All right. Let's get out of politics and get
into hypotheses.
02:38
Here's a simple topic that can be tested.
02:41
According to Glassdoor, the popular salary
information website, the mean data
scientist salary in the US is 113,000.
02:51
So we want to test if their estimate is
correct.
02:56
There are two hypotheses that are made.
02:58
The null hypothesis denoted h zero and the
alternative
hypothesis denoted h one or h a.
03:08
The null hypothesis is the one to be tested,
and the alternative is everything
else. In our example, the null hypothesis
would
be. The mean data scientist salary is
113,000.
03:24
While the alternative, the mean data
scientist salary is not
113,000. Now you would want to
check if 113,000 is close enough to the true
mean predicted by our
sample. In case it is, you would accept the
null hypothesis.
03:43
Otherwise, you would reject the null
hypothesis.
03:47
The concept of the null hypothesis is
similar to innocent until proven
guilty. We assume that the mean salary is
$113,000
, and we try to prove otherwise.
04:01
Okay. This was an example of a two-sided or a
two-tailed test.
04:06
You can also form one-sided or one tale
tests.
04:10
Say Your friend Paul told you that he thinks
data scientists earn more than
$125,000 per year.
04:18
You doubt him?
So you design a test to see who's right.
04:23
The null hypothesis of this test would be
the mean data scientist salary is
more than 125,000.
04:32
The alternative will cover everything else.
04:34
Thus, the mean data scientist salary is less
than or equal to
125,000. It is important to
note that outcomes of tests refer to the
population parameter rather than the sample
statistic. So the result that we get is for
the population.
04:53
Another crucial consideration is that
generally the researcher is trying to reject
the null hypothesis.
05:00
Think about, the null hypothesis has the
status quo and the alternative as the change
or innovation that challenges that status
quo.
05:09
In our example, Paul was representing the
status quo, which we were challenging.
05:15
Let me emphasize this once again, in
statistics.
05:18
The null hypothesis is the statement we are
trying to reject.
05:21
Therefore, the null hypothesis is the
present state of affairs, while the
alternative is our personal opinion.
05:29
It truly is counterintuitive in the
beginning, but later on when you start doing
the exercises, you will understand the
mechanics.
05:38
Okay. After this lecture, there will be a
detailed comment on these two examples.
05:43
In addition, make sure you complete the quiz
questions, so you become confident with
forming hypotheses.
05:51
Thanks for watching.