00:01
This lesson will introduce you to the three
measures of central tendency.
00:05
Don't be scared by the terminology we are
talking about mean, median and mode.
00:10
Even if you are familiar with these terms,
please stick around as we will explore their
upsides and shortfalls.
00:16
Ready? Let's go.
00:19
The first measure we will study is the mean
also known as the simple average.
00:24
It is denoted by the Greek letter MU for a
population and x bar for a sample.
00:29
These notions will come in handy in the next
section.
00:33
We can find the mean of a data set by adding
up all of its components and then dividing
them by their number. The mean is the most
common measure of central tendency,
but it has a huge downside.
00:44
It is easily affected by outliers.
00:47
Let's aid ourselves with an example.
00:51
These are the prices of pizza at 11
different locations in New York City and ten
different locations in LA.
00:58
Let's calculate the means of the two data
sets using the formula.
01:02
For the mean. In NYC, we get $11, whereas
for LA just
5.5. On average, pizza in New York can't be
twice
as expensive as in LA, right?
Correct. The problem is that in our sample
we have included one posh place in New
York where they charge $66 for pizza and
this doubled the mean.
01:25
What we should take away from this example is
that the mean is not enough to make definite
conclusions. So how can we protect ourselves
from this
issue? You guessed it.
01:36
We can calculate the second measure, the
median.
01:41
The median is basically the middle number in
an ordered data set.
01:45
Let's see how it works for our example.
01:47
In order to calculate the median, we have to
order our data.
01:50
In ascending order.
01:52
The median of the data set is the number at
position n plus one divided by
two in the ordered list, where N is the
number of observations.
02:02
Therefore, the median for NYC is at the
sixth position or $6
much closer to the observed prices than the
mean of $11.
02:11
Right. What about LA?
We have just ten observations in LA.
02:16
According to our formula, the median is at
position 5.5.
02:21
In cases like this, the median is the simple
average of the numbers at positions five and
six. Therefore, the median of LA prices is
$5.5.
02:32
Okay. We have seen that the median is not
affected by extreme prices, which is good
when we have posh New York restaurants in a
street pizza sample, but we still don't get
the full picture.
02:43
We must introduce another measure, the mode.
02:47
The mode is the value that occurs most
often.
02:50
It can be used for both numerical and
categorical data, but we will stick to our
numerical example.
02:57
After counting the frequencies of each value,
we find that the mode of New York pizza
prices is $3.
03:03
Now, that's interesting.
03:05
The most common price of pizza in NYC is
just $3, but the mean and
median led us to believe it was much more
expensive.
03:14
Okay. Let's do the same and find the mode of
LA pizza prices.
03:20
Hmm. Each price appears only once.
03:23
How do we find the mode, then?
Well, we say that there is no mode.
03:29
But can't I say that there are ten modes,
you may ask?
Sure you can, but it will be meaningless
with ten observations, and an experienced
statistician would never do that.
03:40
In general, you often have multiple modes.
03:42
Usually two or three modes are tolerable,
but more than that would defeat the purpose
of finding a mode.
03:49
There is one last question that we haven't
answered.
03:52
Which measure is best?
The NYC and LA example shows this, that the
measures of central tendency should be used
together rather than independently.
04:02
Therefore, there is no best, but using only
one is
definitely the worst.
04:09
All right. Now, you know about the mean
median and mode.
04:13
In our next video, we will use that
knowledge to talk about skewness.
04:18
Stay tuned, and thanks for watching.