00:01
Hi and welcome to our second practical
example.
00:05
Let me introduce you to the topic.
00:07
You are a data analyst for Al Bundy.
00:10
Shop Al Bundy is a US based company that has
been established 30
years ago. Currently, it also.
00:17
Operates in Canada, the UK and Germany.
00:21
The firm sells mid to high end shoes
ranging.
00:23
From 120 to 200.
00:27
While the shoes are of high quality, you
have lots and lots of.
00:30
Inventory that is never sold.
00:33
In other words, the shoes.
00:34
Collect dust on store shelves.
00:37
Inventory management is a very common
problem.
00:40
Many, if not most shops.
00:42
Cannot determine the right number of items
they.
00:44
Need to keep in stock.
00:46
The opposite problem arises too.
00:49
Shops don't supply an adequate amount of
goods and fail to meet the.
00:52
Demand in their market.
00:54
For instance.
00:55
You have surely.
00:56
Entered a shoe shop.
00:57
But were unable to buy a specific pair of
shoes because they did not have them in
stock. In this example, we will examine the
opposite problem.
01:06
Having too.
01:06
Much inventory, this is a more significant
problem for the company.
01:11
As it means the company has.
01:12
Invested in producing or purchasing the
product.
01:14
But was not able to sell it.
01:17
All right. One way to solve this.
01:19
Problem is by using confidence intervals.
01:22
We have seen.
01:23
Many examples, but here's a real life one.
01:27
We can see a database.
01:28
With the sales information about Al Bundy
shop for the years from.
01:31
2014 to.
01:33
2016. There's invoice number, date, country.
01:37
Product ID.
01:38
And.
01:39
Shop, which depends.
01:40
On the country. The sell gender indicates if
the product is designed for men or
women, as shoes differ greatly depending on
gender.
01:48
Next, we have shoe size.
01:50
Apart from the US side system.
01:52
I have also included the European and the UK
ones.
01:56
Just so it is.
01:56
Easier for you to understand the data if you
are used to other systems.
02:01
This file will be provided for you, and you
can.
02:03
Check the shoe size conversion table if you
would like to do that.
02:06
Finally, there's the
unit price for that.
02:09
Sale and the discount that.
02:10
Applies.
02:12
Let's begin our analysis.
02:14
First, we should determine.
02:16
If that's sample or population data.
02:19
It is obviously a sample and not the
population of sales.
02:22
Given that, we have just.
02:23
Three years of data.
02:26
Second, we want to get to know the data set
better.
02:29
There are two.
02:30
Big.
02:30
Subgroups in our data.
02:32
Men choose and women choose.
02:35
They are completely different and bundling
them together.
02:38
When making predictions is going to yield
deceiving results.
02:42
Not only.
02:43
Fee differ by gender.
02:44
But also there are different shoe
types and models.
02:48
Our problem is related to inventory
management.
02:51
Therefore, we should divide our inventory in
some way and then count the frequencies.
02:56
The frequencies will give us a better idea.
02:59
Of the data. Ok A good way to do that is to
divide the data by
shoe size.
03:05
I would also like to see it country wise.
03:07
We already noted that division by gender is
also needed.
03:11
So we have three dimensions shoe size.
03:14
Country.
03:15
And gender.
03:17
A possible.
03:17
Solution is to create.
03:18
Two tables, one for men's shoes and one for
women's.
03:22
Shoes. And then proceed normally.
03:24
That's what I'm going to do.
03:27
Here are the two tables in the file that is.
03:30
Provided in the resources section.
03:32
You can see the formula I use.
03:34
To calculate the frequencies.
03:36
While Excel may be a bit sloppy, it is still
very powerful.
03:41
In order to use confidence intervals at all.
03:43
We must have normally distributed data.
03:46
While this sounds restrictive for all
practical applications, it isn't.
03:50
We can simply.
03:51
Apply our good old friend.
03:53
The Central.
03:53
Limit Theorem. Whenever we are in the
presence of a sum or average of a large
number of observations.
03:59
We can assume.
04:00
Normality. In our case, we.
04:03
Are calculating.
04:04
Average sales for a period.
04:06
Given that, Al Bundy.
04:07
Shop has been operating for more than 30
years, CLT applies.
04:11
And we can safely continue.
04:13
With our inference.
04:15
Okay. We want to.
04:17
Estimate the number of shoes that are likely
to be sold.
04:20
A 95% confidence interval will give us such
information.
04:24
We will take the last 12 months of sales and
make a prediction.
04:29
Let's do this. Only four men choose, as the
problem is identical.
04:32
For both genders.
04:34
Please note that since people.
04:35
Have different shoe sizes.
04:36
We will actually have to calculate 17.
04:38
Confidence intervals, one for each size.
04:42
Let's get on to it.
04:44
First, we need to.
04:45
Calculate the
means. Next.
04:50
We do not know the
population variance.
04:52
And our sample consists of just 12.
04:54
Observations.
04:56
We have to use the T statistic.
04:58
This problem refers to the lesson on one
population with population variance unknown.
05:04
We have a sample of 12 observations.
05:06
Therefore, we are looking for.
05:07
The T statistic for a 95% confidence
interval.
05:10
With 11 degrees.
05:12
Of freedom. It is.
05:14
2.20. Next, we will.
05:17
Compute the standard errors.
05:20
We are going to use the good old formula
while not.
05:24
Necessary to go through this step as we have
everything we need.
05:27
I will still show you the
margins of error, as it may be interesting
for some of you to examine.
05:34
Finally, the confidence intervals.
05:36
Are given by the following formula.
05:39
And this is how.
05:40
They look after the calculations.
05:42
Have been carried out.
05:44
The result we obtained could be interpreted
as follows.
05:48
In 95% of the cases.
05:50
The true population mean of.
05:51
Sales for each shoe size will fall into the
respective interval.
05:56
As we don't want to be low on stock.
05:57
A possible solution to the problem.
05:59
Is
get as many pairs of shoes as the closest.
06:02
Number to the upper limit of the confidence
interval.
06:05
In this way, you will be almost certain you
won't run out of stock and shoes won't be
waiting.
06:10
Forever in your storage unit.
06:13
Therefore, we should get four pairs of men
choose size six, three pairs.
06:18
Of men's shoes, size 6.5 and.
06:20
So on. Mostly we should.
06:22
Prepare ourselves with size 9.5 and size.
06:25
16 won't.
06:26
Yield in any sales.
06:27
Sorry for those
of you who have size 16 feet, I know it is
hard enough to find shoes already,
but.
06:33
Well, this.
06:34
Company won't be selling any.
06:37
All right. We are almost done here.
06:40
Before you go, I'd like to show you.
06:42
Another application of confidence intervals.
06:44
Let's say we want to use a confidence
interval to see if two shops are selling.
06:48
The same number of.
06:49
Shoes.
06:51
Moreover, we want to know, with.
06:52
95% confidence by how much one shop
outperforms the other in terms of
sales.
07:00
You can see two.
07:01
Tables representing the sales of women's
shoes and two German shops.
07:05
There are codes r, g, e, r, one and g e are
two.
07:09
Once again, we have.
07:10
Data for 2016.
07:12
Now, an assumption that we have to.
07:13
Make is that it's.
07:14
The same.
07:14
People don't buy pairs of shoes from
different shops.
07:17
Logically, it makes sense that in the same
year the same people don't go around
different shops of the same brand to buy
shoes.
07:25
Even if this happens, it is an exception and
not the norm.
07:29
Therefore, we can say that the two samples
are independent.
07:34
Once again, we don't know the
population variance.
07:37
But given that this is the same market in the
same country, we can assume it is equal.
07:43
This implies we are in the case of
independent samples with population variants
unknown but assumed to be equal.
07:50
Like in the previous.
07:51
Case.
07:52
We need to calculate the means.
07:53
And sample variances.
07:56
Here we must calculate.
07:57
A pooled variance.
07:58
Which is an unbiased.
08:00
Estimate of the population variance.
08:03
We are in the case where we have to use the
PT statistic with 12 plus 12 minus two
degrees of freedom for a 95% confidence.
08:13
What we get from the table is 2.7.
08:17
The respective margins of error are given by
the
well known formula.
08:22
Finally, the 95% confidence intervals are.
08:25
Determined by the means and the margins of
error.
08:29
What we get are these 15 intervals.
08:32
Zero is a part of all of them.
08:35
With the exception of the
unsellable size for all confidence intervals.
08:39
Start in the negatives and finish in the
positives.
08:42
This implies that we cannot conclude that
one shop sells significantly more shoes than
the other for any size.
08:51
While it seems like we have no great.
08:53
Insight.
08:53
That's not entirely true.
08:56
The confidence intervals that we got are not
consistently in favor of one shop having
higher sales or the other.
09:03
This is.
09:03
Evident from the fact that.
09:05
Some of them are mostly negative, while
others are mostly positive.
09:09
That is the show that for some sizes GR one.
09:12
Is likely to sell more, while for others vice
versa.
09:17
However, our decision was that using this
methodology and level of confidence, we
cannot really.
09:22
Identify which shop is selling more.
09:24
They are identical.
09:26
The insight that we get is that these two
shops are so balanced in terms of sales that
they may share the same warehouse or
exchange pairs of shoes to achieve better
results.
09:36
Furthermore, they can be bundled.
09:38
Together for any analysis, action or decision
needed.
09:42
On average, we expect them to move together.
09:45
Moreover, if one noticeably outperforms the
other in the future.
09:49
We may be sure that something that wasn't.
09:50
Observed before is going on as they are
predicted to remain identical.
09:57
All right. Time to wrap up this lesson.
10:00
In the next section, we will examine
hypothesis testing.
10:05
Thanks for watching.