00:02
Welcome back for lecture 8 in which
we'll discuss inference for paired data.
00:06
So let's start with an example to
motivate what we're gonna do here.
00:10
The question is: Do flexible work schedules
reduce the demand for resources?
The Late County Illinois health department
experimented with the flexible four-day work week.
00:19
So for a year, the department recorded the mileage driven
by 11 field workers on an ordinary five-day work week.
00:26
Then it switched to a flexible four-day work
week and recorded the mileage for another year.
00:31
So here are the data for the
11 people that they looked at.
00:34
So we have the five-day mileage and
the four-day mileage for each person.
00:38
Now we wanna perform inference on the
differences in the mean mileage.
00:42
So the question is: Can we
use a two sample t-test?
No, we can't. Why is that?
Because each observation or each set of
measurements was taken on the same person,
so each person has two
observations taken on them.
00:56
This means that the two
groups are independent.
00:58
So we violate assumption 1 for
the two sample procedures.
01:02
So what do we do?
We call these types of data,
paired data or matched pairs.
01:08
And one thing that we might think about
doing, is we look at the differences
in the four-day and the five-day
mileage for each individual
and then perform inference
on the differences.
01:18
Then what we have is we essentially have
one observation for each individual,
so we have one sample of
independent observations.
01:27
So what we do then, is once we have the differences,
we analyze the differences in the same way
that we would apply the one-sample
t-procedures that we discussed before.
01:37
As long as all the conditions
for that are satisfied.
01:40
So let's take the five-day
minus the four-day mileage
and re-frame our data in such a way
that it just shows the differences.
01:48
So what we get is the data you see right here.
01:51
Each individual with a difference in
the five-day and the four-day mileage.
01:56
In order to carry out the paired t-procedures,
we have to have some conditions satisfied.
02:01
First of all, we need the
paired data condition.
02:04
Which simply says that our
data come in matched pairs.
02:07
Second, we need the
independence assumption.
02:10
So the differences have to be independent,
so they have to come from a random sample.
02:15
Third, the randomization condition.
02:17
The data must come from a random
sample or random assignment or groups.
02:21
So three and two often
take care of each other.
02:25
Four, the 10% condition.
02:27
The sample size has to be less
than 10% of the population size.
02:31
And five, the nearly
normal condition.
02:34
The differences have to show near normality in
order to use the t-procedures for the paired data.
02:41
So how do we carry out a paired t-test?
Well, looks a lot like the one-sample t-test.
02:47
Let's let mu D be the
population mean difference.
02:51
Then we hypothesize that mu D is
equal to some hypothesized value
versus one of the three
standard alternatives.
02:58
That mu D is less than mu 0, mu D greater
than mu 0 or mu D is not equal to mu 0.
03:05
Let's talk about the mechanics.
03:07
We'll let S D be the sample standard
deviation for the differences.
03:11
Then the test statistic is given by d bar minus mu 0
divided by the standard error of the difference of
d bar where d bar is the sample mean of the differences
and the standard errror of d bar is given by
SD over the square root of the sample size.
03:28
Under the null hypothesis, the test
statistic follow the t-distribution
with n minus 1 degrees of freedom just like
you did on the one-sample t-procedures.
03:38
So let's do the example on the
mileage data that we just looked at.
03:42
Do we have paired data?
Yes, we do.
03:45
We have two observations on each individual
so we can look at the differences.
03:50
Do we have independence?
The individuals are
likely to be independent.
03:55
The randomization condition is not stated explicitely
on the problem but we're going to assume this.
04:02
The 10% condition, the Lake County Health
Department has more than 110 field workers,
so we're good on the 10% condition.
04:10
Now to the right you see a
histogram of the differences.
04:14
And so for the nearly normal condition, we
have some problems with the normal assumption.
04:19
We have two peaks at the right skew.
04:21
So we have some problems with
the nearly normal condition.
04:24
but in order to get a feel for the mechanics
of the task, we're going to do it anyway
So let's look at the mechanics.
04:30
Well first we need our summary
statistics for the differences.
04:32
We have the sample mean
difference is 982 miles.
04:36
The sample standard deviation is 1139.568
miles and our sample size is 11.
04:44
So in this test what we're assuming
initially is that there's no difference
between the mileage for the four
day and the five day work-week.
04:50
so we're gonna assume that mu D is
zero, that's our null hypothesis.
04:55
Our test statistic then is d bar over
SD divided by the square root of n
or 982 divided by 1139.568 over the
square root of 11 which gives us 2.858
The significance level is 5%
and what we're looking to do
is to see if there's a five-day mileage on
average is greater than the four-day mileage.
05:20
So we reject the null hypothesis if our test statistic
takes the value of greater than or
equal to t 10.05 which is 1.812.
05:30
Our test statistic took a value of 2.858.
05:34
So we reject the null hypothesis and
conclude that there is evidence to suggest
that average mileage decreases during the
four-day work week versus the five-day work week.
05:43
What if we want a confidence
interval for the mean difference?
Well if the conditions for
the paired t-test are met,
then we can form a 100 times 1 minus alpha percent confidence
interval for the mean difference in the following way.
05:55
We take d bar plus or minus t* with n minus 1 degrees of
freedom times the standard error of the mean difference.
06:03
This is the same form as we had
in the one-sample t-interval.
06:08
For the mileage example, we wanna construct the
95% confidence interval for the mean difference.
06:13
So using the table, we
find that t*10 is 2.228.
06:19
We found during the hypothesis test that d bar was
982 and the standard error of d bar was 343.593.
06:29
So when we form our confidence interval, we
take 982 plus or minus 2.228 times 343.593.
06:38
And what that gives us is an interval
of 216.47 up to 1747.525 miles.
06:47
So what that tells us is that we are 95% confident
that the average mileage for the five-day work week
is between 216.4748 and 1,747.525 miles
higher than that for the four-day work week.
07:04
With the paired t-test, there are a
bunch of things that can go wrong.
07:07
So here are some things that we want to avoid.
07:09
We don't want to use a two-sample
t-test when we have paired data
because we know that our groups are not
independent if we have paired data.
07:17
We don't want to use a paired t-procedure
when the data are not paired.
07:21
So those first two things kinda go together.
07:25
Don't forget to look out for outliers.
07:27
This can indicate problems with
the nearly normal assumption.
07:31
And do not use side by side boxplots or
histograms to look for the difference
between the means of the paired groups because
they're not from two different groups.
07:39
So we're not doing this as we
would for a two-sample t-test.
07:44
So what have we done in this lecture?
Well, we examine the difference between
paired data and the type of data
that enables us to use a two-sample t-test.
07:53
We described how to carry out the paired t-test as well
as how to construct a paired t confidence interval
for paired data and for the average
difference for paired data.
08:04
We finished up by looking at
some things that can go wrong
and things that we wanna avoid when
we use the paired t-procedures.
08:11
This is the end of lecture 8 and I look
forward seeing you back for lecture 9.