Playlist

Interference Statistics: Practical Example

by 365 Careers

My Notes
  • Required.
Save Cancel
    Learning Material 4
    • XLS
      3.17. Practical example. Confidence intervals lesson.xls
    • XLS
      3.17.Practical-example.Confidence-intervals-exercise.xls
    • XLS
      3.17.Practical-example.Confidence-intervals-exercise-solution.xls
    • PDF
      Download Lecture Overview
    Report mistake
    Transcript

    00:01 Hi and welcome to our second practical example.

    00:05 Let me introduce you to the topic.

    00:07 You are a data analyst for Al Bundy.

    00:10 Shop Al Bundy is a US based company that has been established 30 years ago. Currently, it also.

    00:17 Operates in Canada, the UK and Germany.

    00:21 The firm sells mid to high end shoes ranging.

    00:23 From 120 to 200.

    00:27 While the shoes are of high quality, you have lots and lots of.

    00:30 Inventory that is never sold.

    00:33 In other words, the shoes.

    00:34 Collect dust on store shelves.

    00:37 Inventory management is a very common problem.

    00:40 Many, if not most shops.

    00:42 Cannot determine the right number of items they.

    00:44 Need to keep in stock.

    00:46 The opposite problem arises too.

    00:49 Shops don't supply an adequate amount of goods and fail to meet the.

    00:52 Demand in their market.

    00:54 For instance.

    00:55 You have surely.

    00:56 Entered a shoe shop.

    00:57 But were unable to buy a specific pair of shoes because they did not have them in stock. In this example, we will examine the opposite problem.

    01:06 Having too.

    01:06 Much inventory, this is a more significant problem for the company.

    01:11 As it means the company has.

    01:12 Invested in producing or purchasing the product.

    01:14 But was not able to sell it.

    01:17 All right. One way to solve this.

    01:19 Problem is by using confidence intervals.

    01:22 We have seen.

    01:23 Many examples, but here's a real life one.

    01:27 We can see a database.

    01:28 With the sales information about Al Bundy shop for the years from.

    01:31 2014 to.

    01:33 2016. There's invoice number, date, country.

    01:37 Product ID.

    01:38 And.

    01:39 Shop, which depends.

    01:40 On the country. The sell gender indicates if the product is designed for men or women, as shoes differ greatly depending on gender.

    01:48 Next, we have shoe size.

    01:50 Apart from the US side system.

    01:52 I have also included the European and the UK ones.

    01:56 Just so it is.

    01:56 Easier for you to understand the data if you are used to other systems.

    02:01 This file will be provided for you, and you can.

    02:03 Check the shoe size conversion table if you would like to do that.

    02:06 Finally, there's the unit price for that.

    02:09 Sale and the discount that.

    02:10 Applies.

    02:12 Let's begin our analysis.

    02:14 First, we should determine.

    02:16 If that's sample or population data.

    02:19 It is obviously a sample and not the population of sales.

    02:22 Given that, we have just.

    02:23 Three years of data.

    02:26 Second, we want to get to know the data set better.

    02:29 There are two.

    02:30 Big.

    02:30 Subgroups in our data.

    02:32 Men choose and women choose.

    02:35 They are completely different and bundling them together.

    02:38 When making predictions is going to yield deceiving results.

    02:42 Not only.

    02:43 Fee differ by gender.

    02:44 But also there are different shoe types and models.

    02:48 Our problem is related to inventory management.

    02:51 Therefore, we should divide our inventory in some way and then count the frequencies.

    02:56 The frequencies will give us a better idea.

    02:59 Of the data. Ok A good way to do that is to divide the data by shoe size.

    03:05 I would also like to see it country wise.

    03:07 We already noted that division by gender is also needed.

    03:11 So we have three dimensions shoe size.

    03:14 Country.

    03:15 And gender.

    03:17 A possible.

    03:17 Solution is to create.

    03:18 Two tables, one for men's shoes and one for women's.

    03:22 Shoes. And then proceed normally.

    03:24 That's what I'm going to do.

    03:27 Here are the two tables in the file that is.

    03:30 Provided in the resources section.

    03:32 You can see the formula I use.

    03:34 To calculate the frequencies.

    03:36 While Excel may be a bit sloppy, it is still very powerful.

    03:41 In order to use confidence intervals at all.

    03:43 We must have normally distributed data.

    03:46 While this sounds restrictive for all practical applications, it isn't.

    03:50 We can simply.

    03:51 Apply our good old friend.

    03:53 The Central.

    03:53 Limit Theorem. Whenever we are in the presence of a sum or average of a large number of observations.

    03:59 We can assume.

    04:00 Normality. In our case, we.

    04:03 Are calculating.

    04:04 Average sales for a period.

    04:06 Given that, Al Bundy.

    04:07 Shop has been operating for more than 30 years, CLT applies.

    04:11 And we can safely continue.

    04:13 With our inference.

    04:15 Okay. We want to.

    04:17 Estimate the number of shoes that are likely to be sold.

    04:20 A 95% confidence interval will give us such information.

    04:24 We will take the last 12 months of sales and make a prediction.

    04:29 Let's do this. Only four men choose, as the problem is identical.

    04:32 For both genders.

    04:34 Please note that since people.

    04:35 Have different shoe sizes.

    04:36 We will actually have to calculate 17.

    04:38 Confidence intervals, one for each size.

    04:42 Let's get on to it.

    04:44 First, we need to.

    04:45 Calculate the means. Next.

    04:50 We do not know the population variance.

    04:52 And our sample consists of just 12.

    04:54 Observations.

    04:56 We have to use the T statistic.

    04:58 This problem refers to the lesson on one population with population variance unknown.

    05:04 We have a sample of 12 observations.

    05:06 Therefore, we are looking for.

    05:07 The T statistic for a 95% confidence interval.

    05:10 With 11 degrees.

    05:12 Of freedom. It is.

    05:14 2.20. Next, we will.

    05:17 Compute the standard errors.

    05:20 We are going to use the good old formula while not.

    05:24 Necessary to go through this step as we have everything we need.

    05:27 I will still show you the margins of error, as it may be interesting for some of you to examine.

    05:34 Finally, the confidence intervals.

    05:36 Are given by the following formula.

    05:39 And this is how.

    05:40 They look after the calculations.

    05:42 Have been carried out.

    05:44 The result we obtained could be interpreted as follows.

    05:48 In 95% of the cases.

    05:50 The true population mean of.

    05:51 Sales for each shoe size will fall into the respective interval.

    05:56 As we don't want to be low on stock.

    05:57 A possible solution to the problem.

    05:59 Is get as many pairs of shoes as the closest.

    06:02 Number to the upper limit of the confidence interval.

    06:05 In this way, you will be almost certain you won't run out of stock and shoes won't be waiting.

    06:10 Forever in your storage unit.

    06:13 Therefore, we should get four pairs of men choose size six, three pairs.

    06:18 Of men's shoes, size 6.5 and.

    06:20 So on. Mostly we should.

    06:22 Prepare ourselves with size 9.5 and size.

    06:25 16 won't.

    06:26 Yield in any sales.

    06:27 Sorry for those of you who have size 16 feet, I know it is hard enough to find shoes already, but.

    06:33 Well, this.

    06:34 Company won't be selling any.

    06:37 All right. We are almost done here.

    06:40 Before you go, I'd like to show you.

    06:42 Another application of confidence intervals.

    06:44 Let's say we want to use a confidence interval to see if two shops are selling.

    06:48 The same number of.

    06:49 Shoes.

    06:51 Moreover, we want to know, with.

    06:52 95% confidence by how much one shop outperforms the other in terms of sales.

    07:00 You can see two.

    07:01 Tables representing the sales of women's shoes and two German shops.

    07:05 There are codes r, g, e, r, one and g e are two.

    07:09 Once again, we have.

    07:10 Data for 2016.

    07:12 Now, an assumption that we have to.

    07:13 Make is that it's.

    07:14 The same.

    07:14 People don't buy pairs of shoes from different shops.

    07:17 Logically, it makes sense that in the same year the same people don't go around different shops of the same brand to buy shoes.

    07:25 Even if this happens, it is an exception and not the norm.

    07:29 Therefore, we can say that the two samples are independent.

    07:34 Once again, we don't know the population variance.

    07:37 But given that this is the same market in the same country, we can assume it is equal.

    07:43 This implies we are in the case of independent samples with population variants unknown but assumed to be equal.

    07:50 Like in the previous.

    07:51 Case.

    07:52 We need to calculate the means.

    07:53 And sample variances.

    07:56 Here we must calculate.

    07:57 A pooled variance.

    07:58 Which is an unbiased.

    08:00 Estimate of the population variance.

    08:03 We are in the case where we have to use the PT statistic with 12 plus 12 minus two degrees of freedom for a 95% confidence.

    08:13 What we get from the table is 2.7.

    08:17 The respective margins of error are given by the well known formula.

    08:22 Finally, the 95% confidence intervals are.

    08:25 Determined by the means and the margins of error.

    08:29 What we get are these 15 intervals.

    08:32 Zero is a part of all of them.

    08:35 With the exception of the unsellable size for all confidence intervals.

    08:39 Start in the negatives and finish in the positives.

    08:42 This implies that we cannot conclude that one shop sells significantly more shoes than the other for any size.

    08:51 While it seems like we have no great.

    08:53 Insight.

    08:53 That's not entirely true.

    08:56 The confidence intervals that we got are not consistently in favor of one shop having higher sales or the other.

    09:03 This is.

    09:03 Evident from the fact that.

    09:05 Some of them are mostly negative, while others are mostly positive.

    09:09 That is the show that for some sizes GR one.

    09:12 Is likely to sell more, while for others vice versa.

    09:17 However, our decision was that using this methodology and level of confidence, we cannot really.

    09:22 Identify which shop is selling more.

    09:24 They are identical.

    09:26 The insight that we get is that these two shops are so balanced in terms of sales that they may share the same warehouse or exchange pairs of shoes to achieve better results.

    09:36 Furthermore, they can be bundled.

    09:38 Together for any analysis, action or decision needed.

    09:42 On average, we expect them to move together.

    09:45 Moreover, if one noticeably outperforms the other in the future.

    09:49 We may be sure that something that wasn't.

    09:50 Observed before is going on as they are predicted to remain identical.

    09:57 All right. Time to wrap up this lesson.

    10:00 In the next section, we will examine hypothesis testing.

    10:05 Thanks for watching.


    About the Lecture

    The lecture Interference Statistics: Practical Example by 365 Careers is from the course Statistics for Data Science and Business Analysis (EN).


    Author of lecture Interference Statistics: Practical Example

     365 Careers

    365 Careers


    Customer reviews

    (1)
    5,0 of 5 stars
    5 Stars
    5
    4 Stars
    0
    3 Stars
    0
    2 Stars
    0
    1  Star
    0