00:00
Hello and welcome to our third practical
example.
00:04
This time, we will explore the topic of
gender pay gap.
00:09
We will test whether a particular company is
discriminating against some of its employees
on a gender basis.
00:16
Our fictitious company is called Spark
Fortress, Inc.
00:20
It is a big company with more than 5000
employees.
00:24
And here we will work with a sample of 174
of them.
00:29
We have data showing us their name, age,
gender, nationality,
ethnicity, tenure, department position and
annual
salary. I believe there is no need for
further
explanation of the data set.
00:45
We are going to test if there is significant
difference in the salaries employees are paid
based on their gender.
00:52
It would be easier if we look at the problem
at hand and the following way our
174 employee sample could be divided into
two sub samples,
one that is exclusively male and one female.
01:06
So we have two samples drawn from the same
population that are
independent, although so far we've worked
with different populations only if
the values in one sample reveal no
information about the other sample, then they
are considered independent.
01:23
There are different methodologies to conduct
such a study, and while regression analysis
is my preferred one here, we will use a
hypothesis test for the mean salary to
determine if there is reasonable evidence
for gender discrimination.
01:37
Let's state the two hypotheses.
01:41
Eight zero. The average male salary is equal
to the average female
salary, or mu m minus mu f equals
zero. H1.
01:53
The average male salary differs from the
average female salary.
01:59
Ok. The test we should use is the T test for
independent samples.
02:05
What about the salary population variance?
It is truly unknown, and we can assume it is
equal.
02:13
Let's construct a frequency distribution
table.
02:18
We have 98 females and 76 males.
02:23
These are our sample sizes.
02:26
Next, we should calculate the means and the
sample variances of the two samples that we
got. As we assume that the population
variances are equal, we should also
compute the pooled variance.
02:39
Here's the good old formula.
02:44
And here is the ginormous result.
02:49
Finally, the t score for this test is
computed following the familiar expression.
02:56
We get a t score of 1.3 for.
03:00
The degrees of freedom are 172.
03:05
As we said earlier, once we have surpassed
50 degrees of freedom, the students T
distribution almost completely overlaps with
the normal distribution.
03:14
Thus, the P values for a T score of 1.34 and
a z score of
1.34 will be virtually the same.
03:23
You already know how to use a p value
calculator, so I'll just give you the p
value. It's 0.182.
03:34
The P value is much greater than all common
levels of significance.
03:39
We conclude that we cannot reject the null
hypothesis.
03:43
There isn't enough statistical evidence that
there is a wage gap in this firm.
03:48
Now. That's cool.
03:49
Spark Fortress seems like a nice place to
work at, but let's dig just a bit
deeper into this result.
03:58
Personally, I'm interested to know if there
is no wage gap at all, or maybe there is
one hidden beneath the aggregate values we
just investigated.
04:07
Sometimes it is a good idea to examine the
data set manually, and that's something we
didn't do in the beginning, but we should
have done.
04:16
Let's order the salaries from largest to the
smallest.
04:22
We can see that the highest paid employee is
actually the president and CEO of the
company, Caroline Bold, who is female.
04:31
This may explain the egalitarian culture of
the company, but it may also mean that
our high salary biased our data.
04:40
What if, on average, it seems that women are
rewarded the same as men, but in
fact, very few of them are.
04:48
In such cases, I would normally further
segment the data.
04:54
Let's divide the employees into two more
groups, below 35 and
above 35.
05:00
This will give us valuable information about
the wage equality of younger and older
staff. I've created two more data sets that
are based
on the original one.
05:11
Let's run the same tests as before, but this
time we will do it in our segmented
data. The hypotheses are the same.
05:20
What we get for these two tests is a T score
of 0.43 for employees
below 35 and 2.00 for employees over 35.
05:31
The corresponding P values are 0.668 and
0.048.
05:39
What these numbers mean is that in the group
below 35, there is virtually no wage
gap on a gender basis.
05:45
The p value is so big that we may be 100%
sure there is no
discrimination going on.
05:53
In the older group, however, the p value is
0.048.
05:59
This is very close to 0.05, but still below
it.
06:04
This implies that at 95% significance, we
reject the null hypothesis.
06:08
Therefore, a wage gap does exist for older
employees.
06:15
All right. This was a two sided test, so we
are not sure who gets more
money. Right? Well, do you remember the
nifty trick?
The T score of two is positive.
06:27
Therefore, the difference in pay is positive
in favor of males.
06:34
A limitation of this analysis is that we
omitted other factors such as position and
ethnicity. So we are not completely sure
what's going on in the firm, but we can
say that overall there is no wage gap in
Spark Fortress and this is driven by
wage equality among young employees.
06:51
This is a good indicator, as it means that
the company is firmly moving towards
complete equality.
06:59
All right. Your homework is to conduct a
similar test that aims to capture if there is
racial discrimination in the firm.
07:06
You can find it in the data, in the
resources for this lesson.
07:10
Good luck, and thanks for watching.