00:00
So far, we have talked about confidence
intervals with population variances that are
either known or unknown.
00:07
However, we were considering only one
population.
00:11
In the next couple of lessons, we will
explore confidence intervals looking into two
populations. These cases are more important
as they have a wide range of
real world applications.
00:23
A few important distinctions need to be made
before we dive into this topic.
00:28
In some cases, the samples that we have
taken from the two populations will be
dependent on each other, and in others they
will be independent.
00:36
Dependent samples are easier.
00:38
You will experience this firsthand.
00:40
Dependent samples can occur in several
situations.
00:45
First, when we are researching the same
subject over time.
00:49
Examples are weight loss and blood samples.
00:52
Essentially, we are looking at the same
person before and after.
00:57
These two examples will be explored and
detailed in these lessons.
01:02
Another case in which we have dependent
samples is when investigating couples or
families, for instance, habits of husbands
and wives.
01:11
They are obviously dependent on each other,
as the time these people spend together at
home often coincides.
01:17
Watching TV, eating dinner, often sharing
the same household income.
01:24
Finally, we can have the same people, but in
samples relating to different things.
01:28
So instead of a before and after situation,
we are looking at cause and effect.
01:34
For example, when applying to university in
the US you sit the SAT and
based on it, you either get admitted or you
don't.
01:42
The applicant is the same person.
01:44
However, the samples are different.
01:46
One relates to the SAT.
01:48
The other to the admittance outcome.
01:52
In terms of testing, we have one formula for
confidence intervals, for dependent samples
and other statistical methods like
regressions, which we will study later on.
02:02
For now, let's stick with confidence
intervals.
02:05
Ok. When we have independent samples, we can
further
distinguish three cases when the population
variance is known, when the
population variance is unknown but assumed
to be equal, and when it is
unknown and assumed to be different.
02:22
Sounds a bit overwhelming, but don't worry.
02:25
In statistics, many concepts are similar to
each other, and you will quickly see that you
have already acquired the intuition that
allows you to understand these concepts
pretty fast.
02:37
All right. Let's get on to the topic of this
lesson.
02:41
The dependent samples.
02:44
This statistical test is often used when
developing medicine.
02:48
Let's say you have developed a pill that
increases the concentration of magnesium in
the blood. It is very promising, but there
is no data to support your claim.
02:58
After testing the drug in a laboratory, it
is time to see its actual effect on
people. What you would typically do is take
a sample of ten
people and test their magnesium levels
before and after taking the pill.
03:11
The two dependent samples are the magnesium
levels before and the magnesium levels after.
03:17
It is clear that it is the same people we
are testing.
03:20
Thus, the samples are dependent.
03:23
An important note is that the populations
are normally distributed.
03:27
Actually, when dealing with biology,
normality is so often observed that we
immediately assume that such variables are
normally distributed.
03:36
Okay. Back to the example.
03:39
Whenever you take a blood test, the
magnesium levels are stated in milligrams per
deciliter, and a healthy person would
usually have somewhere between 1.7 and
2.2 milligrams of magnesium per deciliter.
03:52
Here is a table that contains a sample of
ten people and their levels of magnesium
before and after taking the pill for some
time.
04:00
We've also added a cell that calculates the
difference in levels before and after taking
the pill. Instead of dealing with two
variables, we now have only
one. In this way, the data looks as a single
population, doesn't it?
Let's calculate the mean and the standard
deviation of the differences.
04:19
The mean is 0.33 and the standard deviation
is 0.45.
04:24
Moreover, we know that the sample size is
ten.
04:28
The formula that would allow us to calculate
a confidence interval is the following.
04:34
The population is normally distributed, but
the sample we have contains only ten
observations. Therefore, the distribution
will have to work with is students
T and the appropriate statistic is T.
04:47
You can clearly see that it is the same as
the one for a single population with an
unknown variance.
04:53
Let's choose the level of confidence and
plug in the numbers.
04:57
As we have said many times.
04:58
95% confidence is one of the most common
levels.
05:02
And so we will use it here as well.
05:05
The T statistic with nine degrees of freedom
for a 95% confidence interval is
2.26. Now we have everything we need, and we
can
calculate the confidence interval.
05:18
It lies in the range between 0.01 and 0.65.
05:23
How do we interpret this result?
Well, in 95% of the cases, the true mean
will fall in this
interval. Moreover, the whole interval is
positive.
05:34
This shows that the true mean of the
difference is definitely positive.
05:38
Therefore, with 95% certainty, we can say
that the levels of magnesium in
the test subject's blood is higher.
05:46
The purpose of this test was to determine
whether the drug is effective.
05:50
Based on our small sample, it most likely
is.
05:54
All right. This shows you some of the
practical applications of inference.
05:59
Stay tuned for our next lesson, in which we
will explore confidence intervals on
independent samples.