Playlist

Univariate Measures: Central Tendency

by 365 Careers

My Notes
  • Required.
Save Cancel
    Learning Material 4
    • XLS
      2.7. Mean, median and mode lesson.xls
    • XLS
      2.7.Mean-median-and-mode-exercise.xls
    • XLS
      2.7.Mean-median-and-mode-exercise-solution.xls
    • PDF
      Download Lecture Overview
    Report mistake
    Transcript

    00:01 This lesson will introduce you to the three measures of central tendency.

    00:05 Don't be scared by the terminology we are talking about mean, median and mode.

    00:10 Even if you are familiar with these terms, please stick around as we will explore their upsides and shortfalls.

    00:16 Ready? Let's go.

    00:19 The first measure we will study is the mean also known as the simple average.

    00:24 It is denoted by the Greek letter MU for a population and x bar for a sample.

    00:29 These notions will come in handy in the next section.

    00:33 We can find the mean of a data set by adding up all of its components and then dividing them by their number. The mean is the most common measure of central tendency, but it has a huge downside.

    00:44 It is easily affected by outliers.

    00:47 Let's aid ourselves with an example.

    00:51 These are the prices of pizza at 11 different locations in New York City and ten different locations in LA.

    00:58 Let's calculate the means of the two data sets using the formula.

    01:02 For the mean. In NYC, we get $11, whereas for LA just 5.5. On average, pizza in New York can't be twice as expensive as in LA, right? Correct. The problem is that in our sample we have included one posh place in New York where they charge $66 for pizza and this doubled the mean.

    01:25 What we should take away from this example is that the mean is not enough to make definite conclusions. So how can we protect ourselves from this issue? You guessed it.

    01:36 We can calculate the second measure, the median.

    01:41 The median is basically the middle number in an ordered data set.

    01:45 Let's see how it works for our example.

    01:47 In order to calculate the median, we have to order our data.

    01:50 In ascending order.

    01:52 The median of the data set is the number at position n plus one divided by two in the ordered list, where N is the number of observations.

    02:02 Therefore, the median for NYC is at the sixth position or $6 much closer to the observed prices than the mean of $11.

    02:11 Right. What about LA? We have just ten observations in LA.

    02:16 According to our formula, the median is at position 5.5.

    02:21 In cases like this, the median is the simple average of the numbers at positions five and six. Therefore, the median of LA prices is $5.5.

    02:32 Okay. We have seen that the median is not affected by extreme prices, which is good when we have posh New York restaurants in a street pizza sample, but we still don't get the full picture.

    02:43 We must introduce another measure, the mode.

    02:47 The mode is the value that occurs most often.

    02:50 It can be used for both numerical and categorical data, but we will stick to our numerical example.

    02:57 After counting the frequencies of each value, we find that the mode of New York pizza prices is $3.

    03:03 Now, that's interesting.

    03:05 The most common price of pizza in NYC is just $3, but the mean and median led us to believe it was much more expensive.

    03:14 Okay. Let's do the same and find the mode of LA pizza prices.

    03:20 Hmm. Each price appears only once.

    03:23 How do we find the mode, then? Well, we say that there is no mode.

    03:29 But can't I say that there are ten modes, you may ask? Sure you can, but it will be meaningless with ten observations, and an experienced statistician would never do that.

    03:40 In general, you often have multiple modes.

    03:42 Usually two or three modes are tolerable, but more than that would defeat the purpose of finding a mode.

    03:49 There is one last question that we haven't answered.

    03:52 Which measure is best? The NYC and LA example shows this, that the measures of central tendency should be used together rather than independently.

    04:02 Therefore, there is no best, but using only one is definitely the worst.

    04:09 All right. Now, you know about the mean median and mode.

    04:13 In our next video, we will use that knowledge to talk about skewness.

    04:18 Stay tuned, and thanks for watching.


    About the Lecture

    The lecture Univariate Measures: Central Tendency by 365 Careers is from the course Statistics for Data Science and Business Analysis (EN).


    Author of lecture Univariate Measures: Central Tendency

     365 Careers

    365 Careers


    Customer reviews

    (1)
    5,0 of 5 stars
    5 Stars
    5
    4 Stars
    0
    3 Stars
    0
    2 Stars
    0
    1  Star
    0