Playlist

Univariate Measures: Data Dispersion (Variance)

by 365 Careers

My Notes
  • Required.
Save Cancel
    Learning Material 4
    • XLS
      2.9. Variance lesson.xls
    • XLS
      2.9.Variance-exercise.xls
    • XLS
      2.9.Variance-exercise-solution.xls
    • PDF
      Download Lecture Overview
    Report mistake
    Transcript

    00:00 Next on our to-do list are the measures of variability.

    00:04 There are many ways to quantify variability.

    00:06 However, we will focus on the most common ones variance, standard deviation and coefficient of variation.

    00:14 In the field of statistics, we will typically use different formulas when working with population data and sample data.

    00:20 Let's think about this for a bit.

    00:23 When you have the whole population, each data point is known.

    00:26 So you are 100% sure of the measures you were calculating.

    00:30 When you take a sample of this population, and you compute a sample statistic, it is interpreted as an approximation of the population parameter.

    00:38 Moreover, if you extract ten different samples from the same population, you will get ten different measures.

    00:44 Statisticians have solved the problem by adjusting the algebraic formulas for many statistics to reflect this issue.

    00:51 Therefore, we will explore both population and sample formulas, as they are both used. You must be asking yourself why there are unique formulas for the mean median and mode.

    01:03 Well, actually the sample mean is the average of the sample data points, while the population mean is the average of the population data points.

    01:12 So technically there are two different formulas, but they are computed in the same way. Okay.

    01:19 Now, after this short clarification, it's time to get on to variants.

    01:24 Variance measures, the dispersion of a set of data points around their mean value.

    01:30 Population variance denoted by sigma squared is equal to the sum of square differences between the observed values and the population mean divided by the total number of observations.

    01:42 Sample variance, on the other hand, is denoted by S squared and is equal to the sum of squared.

    01:48 Differences between observed sample values and the sample mean divided by the number of sample observations minus one.

    01:57 All right. When you were getting acquainted with statistics, it is hard to grasp everything right away.

    02:03 Therefore, let's stop for a second to examine the formula for the population and try to clarify its meaning.

    02:11 The main part of the formula is its numerator.

    02:13 So that's what we want to comprehend, the sum of differences between the observations and the mean squared.

    02:20 Hmm. So the closer a number to the mean, the lower the result we will obtain. Right? And the further away from the mean it lies, the larger this difference.

    02:32 Easy. But why do we elevate to the second degree? Squaring the differences has two main purposes.

    02:41 First, by squaring the numbers, we always get non-negative computations without going too deep into the mathematics of it.

    02:48 It is intuitive that dispersion cannot be negative.

    02:51 Dispersion is about distance, and distance cannot be negative.

    02:56 If, on the other hand, we calculate the difference and do not elevate to the second degree, we would obtain both positive and negative values that, when summed, would cancel out, leaving us with no information about the dispersion.

    03:10 Second squaring amplifies the effect of large differences.

    03:14 For example, if the mean is zero, and you have an observation of 100, the squared spread is 10,000.

    03:22 All right, enough dry theory.

    03:24 It is time for a practical example.

    03:27 We have a population of five observations.

    03:30 One, two, three, four and five.

    03:33 Let's find its variants.

    03:36 We start by calculating the mean one plus two plus three plus four plus five divided by five equals three.

    03:45 Then we apply the formula.

    03:47 We just saw one minus three squared plus two minus three squared plus three minus three squared plus four. Minus three squared plus five.

    04:02 Minus three squared.

    04:04 All of these components have to be divided by five.

    04:07 When we do the math, we get two.

    04:10 So the population variance of the data set is two.

    04:14 But what about the sample variants? This would only be suitable if we were told that these five observations were a sample drawn from a population.

    04:23 So let's imagine that's the case.

    04:26 The sample mean is once again three.

    04:29 The numerator is the same, but the denominator is going to be four instead of five, giving us a sample variance of 2.5.

    04:38 To conclude the variance topic, we should interpret the results.

    04:42 Why is the sample variance bigger than the population variance? In the first case, we knew the population.

    04:48 That is, we had all the data, and we calculated the variance.

    04:52 In the second case, we were told that one, two, three, four and five was a sample drawn from a bigger population.

    05:00 Imagine the population of this sample were these nine numbers 111, two, three, four, five, five, five and five.

    05:11 Clearly the numbers are the same, but there is a concentration around the two extremes of the data set.

    05:17 One and five.

    05:19 The variance of this population is 2.96.

    05:24 So our sample variance has rightfully corrected upwards in order to reflect the higher potential variability.

    05:32 This is the reason why there are different formulas for sample and population data.

    05:38 This was a very important lesson, so please make sure that you have understood it well.

    05:43 You can reinforce what you learned here by doing the exercise available in the course resources section.

    05:49 Remember, the subject of statistics is only understood when practiced.

    05:54 Thanks for watching.


    About the Lecture

    The lecture Univariate Measures: Data Dispersion (Variance) by 365 Careers is from the course Statistics for Data Science and Business Analysis (EN).


    Author of lecture Univariate Measures: Data Dispersion (Variance)

     365 Careers

    365 Careers


    Customer reviews

    (1)
    5,0 of 5 stars
    5 Stars
    5
    4 Stars
    0
    3 Stars
    0
    2 Stars
    0
    1  Star
    0