00:00 All right. So we've learned that confidence intervals based on small samples from normally distributed populations are calculated with the T statistic. 00:10 Let's check a similar example to the one we saw earlier. 00:14 You are an aspiring data scientist and are wondering how much the mean data scientist salary is. 00:22 This time, though, you do not have the population variance. 00:26 In fact, you have a sample of only nine compensations you found on Glassdoor and have summarized the information in the following table. 00:35 Okay. We've already calculated the sample mean and standard error, which are 92,533 $4,644 respectively. 00:48 Good, but we don't have one key piece of information, the population variance. 00:55 No problem. 00:56 As the good statisticians that we are, we will use the student's T distribution. Here's the formula that allows us to find a confidence interval for the mean of a population with an unknown variance. 01:10 Let's compare it with the formula we use when the variance is known. 01:15 There are two key differences. 01:17 First, instead of a Z statistic, we have a T statistic. 01:22 And second, instead of population standard deviation, we have sample standard deviation. 01:29 Otherwise, everything is the same. 01:33 So it shouldn't be that difficult to remember. 01:37 The logic behind constructing confidence intervals in both cases is the same. 01:42 The only two inputs that change are the statistic at hand and the standard deviation. When population variance is known, population standard deviation goes with the Z statistic. 01:55 When population variance is unknown, sample standard deviation goes with the t statistic. All right. 02:03 So we have the sample mean standard deviation and sample size. 02:09 All we have to do is find the T statistic. 02:13 We will be able to obtain the PT statistic from the PT table. 02:17 First, we need to specify the degrees of freedom. 02:21 For the students T distribution, there are n minus 1 degrees of freedom. 02:27 Our sample consists of nine observations. 02:30 So we have eight degrees of freedom. 02:33 Second, we have to find Alpha divided by two. 02:38 Once again, this depends on the confidence level that we want to obtain. 02:42 In this example, we are going to use a confidence level of 95%. 02:47 This means that alpha is equal to 5%. 02:51 Therefore, half of alpha would be 2.5%. 02:57 You can now see that the associated statistic is 2.31. 03:05 Note that some T tables you will find in books or online like this one have a CY row. The abbreviation stands for Confidence Interval. 03:15 Instead of finding Alpha, we can just check the 95% confidence interval and get the same result. 03:21 Easy. We have all the information needed, so we just plug in the numbers. 03:31 What we get is a confidence interval from 81806 to $103261. 03:41 Let's compare this result to the result for the confidence interval with known population. We got a 95% confidence interval that was between $94,833 and 105,568. 03:58 You can clearly note that when we know the population variance, we get a narrower confidence interval. 04:04 When we do not know the population variance, there is a higher uncertainty that is reflected by wider boundaries for our interval. 04:11 Makes sense, doesn't it? So what we learned today is that even when we do not know the population variance, we can still make predictions. 04:21 But they will be less accurate. 04:24 Furthermore, the proper statistic for estimating the confidence interval when the population variance is unknown is the T statistic and not the Z statistic. All right, great. 04:35 Thanks for watching.
The lecture Confidence Intervals: Calculations in a Population With Unknown Variance by 365 Careers is from the course Statistics for Data Science and Business Analysis (EN).
5 Stars |
|
5 |
4 Stars |
|
0 |
3 Stars |
|
0 |
2 Stars |
|
0 |
1 Star |
|
0 |