00:01
Ok, excellent.
00:03
We already know how to create graphs and
tables for categorical variables.
00:07
In this lesson, we are going to do the same
for numerical variables.
00:11
And given that numerical data is the main
focus of this course, we will spend a couple
of lessons on this topic.
00:17
Whenever we want to plot data, it is best to
first order it in a table.
00:21
So as we did with categorical variables,
let's start by creating a frequency
distribution table.
00:27
Here's a list of 20 different numbers.
00:30
If we arrange them in a frequency table like
the one we use for categorical variables, we
would obtain a table with 20 rows, each of
them representing one number with a
corresponding frequency of one, as each
number occurs exactly one time.
00:43
This table would be impractical for any
analysis, right?
Well, when we deal with numerical variables,
it makes much more sense to group the data
into intervals and then find the
corresponding frequencies.
00:55
In this way, we make a summary of the data
that allows for a meaningful visual
representation. How do we choose these
intervals?
Generally, statisticians prefer working with
groups of data that contain 5 to
20 intervals.
01:10
This way the summary can be useful.
01:12
However, this varies from case to case, and
the correct choice of intervals largely
depends on the amount of data we are working
with.
01:20
In our example, we will divide the data into
five intervals of equal length.
01:25
The simple formula that we use is as
follows, The interval width is equal to the
largest number, minus the smallest number
divided by the number of desired
intervals. In our case, the length of the
interval should be 100 minus
one divided by five.
01:41
The result is 19.8.
01:45
Now we want to round this number up in order
to reach a neater representation.
01:50
Therefore, our intervals will be as follows.
01:53
1 to 21, 21 to 41, 41 to
61, 61 to 81 and 81 to 101.
02:03
Each interval has a width of 20.
02:07
Okay. Let's try to construct the frequency
distribution table.
02:12
A number is included in a particular
interval.
02:15
If that number is greater than the lowest
bound and equal to or less than the largest
bound. As we can see from the table, there
are two numbers in the first interval,
four in the second three and the third six
in the fourth and five in the fifth
interval. For many analyses, it is useful to
calculate the relative
frequency of the data points in each
interval.
02:36
As we said in a previous video, the relative
frequency is the frequency of a given
interval as part of the total.
02:43
Let's add another column to our table and
name it relative frequency.
02:48
So the interval from 1 to 21 has an absolute
frequency of two, but
a relative frequency of two divided by the
number of 20 numbers, which gives us
10% and so on until we fill the table.
03:02
All right. This is how we calculate relative
frequencies.
03:06
Now that we have summarized the raw data, we
can start plotting it.