00:02
Before crunching any numbers and making
decisions, we should introduce some key
definitions. The first step of every
statistical analysis you perform is to
determine whether the data you are dealing
with is a population or a sample.
00:16
A population is the collection of all items
of interest to our study and is usually
denoted with an uppercase n.
00:23
The numbers we've obtained when using a
population are called parameters.
00:28
A sample is a subset of the population and
is denoted with a lowercase n and the
numbers we've obtained when working with a
sample are called statistics.
00:37
Now you know why the field we are studying
is called statistics.
00:41
Let's say we want to perform a survey of the
job prospects of the students studying in the
New York University.
00:47
What is the population?
You can simply walk into New York University
and find every student, right?
Well, surely that would not be the
population of NYU students.
00:58
The population of interest includes not only
the students on campus, but also the ones at
home on exchange abroad, distant education
students, part-time
students, even the ones who enrolled but are
still at high school.
01:11
Though exhaustive, even this list misses
someone.
01:15
Point taken. Populations are hard to define
and hard to observe in real
life. A sample, however, is much easier to
gather.
01:24
It is less time-consuming and less costly.
01:27
Time and resources are the main reasons we
prefer drawing samples compared to
analyzing an entire population.
01:35
So let's draw a sample then, as we first
wanted to do, we can
just go to the NYU campus next.
01:42
Let's enter the canteen because we know it
will be full of people.
01:46
We can then interview 50 of them.
01:49
Cool. This is a sample drawn from the
population of NYU
students. Good job.
01:57
Populations are hard to observe and contact.
01:59
That's why statistical tests are designed to
work with incomplete data.
02:03
You will almost always be working with
sample data and make data driven decisions
and inferences based on it.
02:10
All right. Since statistical tests are
usually based on sample data, samples are
key to accurate statistical insights.
02:17
They have two defining characteristics,
randomness and representativeness.
02:22
A sample must be both random and
representative for an insight, to be precise.
02:28
A random sample is collected when each
member of the sample is chosen from the
population strictly by chance.
02:35
A representative sample is a subset of the
population that accurately reflects the
members of the entire population.
02:43
Let's go back to the sample we just
discussed, the 50 students from the NYU
canteen. We walked into the university
canteen and violated both
conditions. People were not chosen by
chance.
02:55
They were a group of NYU students who were
there for lunch.
02:59
Most members did not even get the chance to
be chosen as they were not in the canteen.
03:04
Thus, we conclude the sample was not random.
03:09
But was it representative?
Well, it represented a group of people, but
definitely not all students in the
university, to be exact.
03:17
It represented the people who have lunch at
the university canteen.
03:22
Had our survey been about job prospects of
NYU students who eat in the university
canteen, we would have done well.
03:30
Okay. You must be wondering how to draw a
sample that is both random and
representative. Well, the safest way would
be to get access to the student
database and contact individuals in a random
manner.
03:43
However, such surveys are almost impossible
to conduct without assistance from the
university. All right.
03:50
Throughout the course, we will explore both
sample and population statistics.
03:55
After completing this course, samples and
populations will be a piece of cake for
you. Thanks for watching.