00:00
Before we continue, let's introduce a
concept, a sampling distribution.
00:06
Say you have the population of used cars in
a car shop.
00:09
We want to analyze the car prices and be
able to make some predictions on them.
00:14
Population parameters which may be of
interest are mean car price, standard
deviation of prices, covariance and so on.
00:23
Normally in statistics we would not have
data on the whole population, but rather just
a sample. Let's draw a sample out of that
data.
00:32
The mean is $2,617.23.
00:38
Now a problem arises from the fact that if I
take another sample, I may get a completely
different mean $3,201.34.
00:49
Then a third with a mean of
$2,844.33.
00:56
As you can see, the sample mean depends on
the incumbents of the sample itself.
01:00
So taking a single value, as we did in
descriptive statistics, is definitely
suboptimal. What we can do is draw many,
many samples and
create a new data set comprised of sample
means.
01:14
These values are distributed in some way.
01:16
So we have a distribution.
01:19
When we are referring to a distribution
forme by samples, we use the term a sampling
distribution. For our case, we can be even
more precise.
01:28
We are dealing with a sampling distribution
of the mean.
01:33
So far, so good.
01:35
Now, if we inspect these values closely, we
will realize that they are
different but are concentrated around a
certain value.
01:43
Right? For our case, somewhere around
$2,800.
01:48
Since each of these sample means are nothing
but approximations of the population mean,
the value they revolve around is actually
the population mean itself.
01:58
Most probably none of them is the population
mean, but taken together, they give a
really good idea.
02:05
In fact, if we take the average of those
sample means, we expect to get a very precise
approximation of the population mean.
02:12
Great. Let me give you some more information.
02:17
Here's a plot of the distribution of the car
prices.
02:20
We haven't seen many distributions, but we
know that this is not a normal distribution.
02:26
It has a right skew, and that's about all we
can see.
02:30
Here's the big revelation.
02:33
It turns out that if we visualize the
distribution of the sampling means we get
something else. Something familiar,
something useful.
02:42
A normal distribution.
02:45
And that's what the central limit theorem
states.
02:49
No matter the distribution of the population
binomial uniform exponential
or another one, the sampling distribution of
the mean will approximate a normal
distribution. Not only that, but it's mean
as the same as the
population mean.
03:05
That's something we already noticed.
03:07
What about the variance?
Well, it depends on the size of the samples
we draw, but it is quite elegant.
03:14
It is the population variance divided by the
sample size.
03:19
Since the sample size is in the denominator,
the bigger the sample size, the lower the
variance. Or, in other words, the closer the
approximation we get.
03:29
So if you are able to draw bigger samples,
your statistical results will be more
accurate. Usually for CLT to apply we need a
sample
size of at least 30 observations.
03:42
Great. Finally, let's finish off with why
the central limit theorem
is so important.
03:49
As we already know, the normal distribution
has elegant statistics and an unmatched
applicability in calculating confidence
intervals and performing tests.
03:58
The Central Limit theorem allows us to
perform tests, solve problems and make
inferences using the normal distribution,
even when the population is not normally
distributed. The discovery and proof of the
theorem revolutionized statistics
as a field, and we will be relying on it a
lot in the subsequent lectures.
04:17
That's all for now.
04:19
Thanks for watching.