Playlist

Data Visualization: Numeric Variables

by 365 Careers

My Notes
  • Required.
Save Cancel
    Learning Material 4
    • XLS
      2.4. Numerical variables. Frequency distribution table lesson.xls
    • XLS
      2.4.Numerical-variables.Frequency-distribution-table-exercise.xls
    • XLS
      2.4.Numerical-variables.Frequency-distribution-table-exercise-solution.xls
    • PDF
      Download Lecture Overview
    Report mistake
    Transcript

    00:01 Ok, excellent.

    00:03 We already know how to create graphs and tables for categorical variables.

    00:07 In this lesson, we are going to do the same for numerical variables.

    00:11 And given that numerical data is the main focus of this course, we will spend a couple of lessons on this topic.

    00:17 Whenever we want to plot data, it is best to first order it in a table.

    00:21 So as we did with categorical variables, let's start by creating a frequency distribution table.

    00:27 Here's a list of 20 different numbers.

    00:30 If we arrange them in a frequency table like the one we use for categorical variables, we would obtain a table with 20 rows, each of them representing one number with a corresponding frequency of one, as each number occurs exactly one time.

    00:43 This table would be impractical for any analysis, right? Well, when we deal with numerical variables, it makes much more sense to group the data into intervals and then find the corresponding frequencies.

    00:55 In this way, we make a summary of the data that allows for a meaningful visual representation. How do we choose these intervals? Generally, statisticians prefer working with groups of data that contain 5 to 20 intervals.

    01:10 This way the summary can be useful.

    01:12 However, this varies from case to case, and the correct choice of intervals largely depends on the amount of data we are working with.

    01:20 In our example, we will divide the data into five intervals of equal length.

    01:25 The simple formula that we use is as follows, The interval width is equal to the largest number, minus the smallest number divided by the number of desired intervals. In our case, the length of the interval should be 100 minus one divided by five.

    01:41 The result is 19.8.

    01:45 Now we want to round this number up in order to reach a neater representation.

    01:50 Therefore, our intervals will be as follows.

    01:53 1 to 21, 21 to 41, 41 to 61, 61 to 81 and 81 to 101.

    02:03 Each interval has a width of 20.

    02:07 Okay. Let's try to construct the frequency distribution table.

    02:12 A number is included in a particular interval.

    02:15 If that number is greater than the lowest bound and equal to or less than the largest bound. As we can see from the table, there are two numbers in the first interval, four in the second three and the third six in the fourth and five in the fifth interval. For many analyses, it is useful to calculate the relative frequency of the data points in each interval.

    02:36 As we said in a previous video, the relative frequency is the frequency of a given interval as part of the total.

    02:43 Let's add another column to our table and name it relative frequency.

    02:48 So the interval from 1 to 21 has an absolute frequency of two, but a relative frequency of two divided by the number of 20 numbers, which gives us 10% and so on until we fill the table.

    03:02 All right. This is how we calculate relative frequencies.

    03:06 Now that we have summarized the raw data, we can start plotting it.


    About the Lecture

    The lecture Data Visualization: Numeric Variables by 365 Careers is from the course Statistics for Data Science and Business Analysis (EN).


    Author of lecture Data Visualization: Numeric Variables

     365 Careers

    365 Careers


    Customer reviews

    (1)
    5,0 of 5 stars
    5 Stars
    5
    4 Stars
    0
    3 Stars
    0
    2 Stars
    0
    1  Star
    0