Playlist

Confidence Intervals: Calculations in a Population With Known Variance

by 365 Careers

My Notes
  • Required.
Save Cancel
    Learning Material 6
    • XLS
      3.9. Population variance known, z-score lesson.xls
    • XLS
      3.9. The z-table.xls
    • XLS
      3.9.Population-variance-known-z-score-exercise.xls
    • XLS
      3.9.Population-variance-known-z-score-exercise-solution.xls
    • XLS
      3.9.The-z-table.xls
    • PDF
      Download Lecture Overview
    Report mistake
    Transcript

    00:00 A confidence interval is the range within which you expect the population parameter to be. And its estimation is based on the data we have in our sample.

    00:11 There can be two main situations when we calculate the confidence intervals for a population, when the population variance is known and when it is unknown.

    00:20 Depending on which situation we are in, we would use a different calculation method.

    00:25 Now, the whole field of statistics exist because we almost never have population data. Even if we do have population, we may not be able to analyze it. It may be so much that it doesn't make sense to be used all at once.

    00:40 Think about people using the Internet, the data.

    00:44 Google has approximates population data, but even their data is not.

    00:49 There are people who are not a part of the Google ecosystem in any way.

    00:54 That can be done by using other browsers like Opera and Safari or other search engines like Bing DuckDuckGo or video providers different from YouTube. Furthermore, they can browse in incognito.

    01:07 These people are a part of the population of people using the Internet, but Google doesn't have much data on them.

    01:14 As you can see, even the company that has the most data doesn't necessarily have population data.

    01:21 So if Google wants to use statistical methods to target them with Google Ads, they will basically be using sample data to for with a population variance unknown to guess their preferences.

    01:34 Ok. In this lesson, we will explore the confidence intervals for a population mean with a known variance.

    01:42 An important assumption in this calculation is that the population is normally distributed. Even if it is not, you should use a large sample and let the central limit theorem do the normalization magic for you.

    01:54 Remember, if you work with a sample which is large enough, you can assume normality of sample means.

    02:01 All right. Let's say you want to become a data scientist, and you're interested in the salary you are going to get.

    02:09 Imagine you have certain information that the population standard deviation of data science salaries is equal to 15,000.

    02:17 Furthermore, you know, the salaries are normally distributed, and your sample consists of 30 salaries.

    02:24 The formula for the confidence interval with a known variance is given below.

    02:29 The population mean will fall between the sample mean minus z of alpha divided by two times the standard error.

    02:40 And the sample mean plus z of alpha divided by two times the standard error.

    02:49 The sample mean is the point estimate.

    02:51 You know all about the standard error already.

    02:54 So let's compute it using the formula.

    02:58 What we have left is the so-called reliability factor Z of Alpha divided by two.

    03:06 Z is the statistic that we've described earlier, the standardized variable that has a standard normal distribution.

    03:13 Right. And what about Alpha? This is the same alpha we had when we defined our confidence level.

    03:21 So for a confidence level of 95%, alpha will be equal to 5%. Similarly, for a confidence level of 99%, alpha would be equal to 1%.

    03:33 It all fits into place now, doesn't it? Let's go back to our example.

    03:39 The sample mean is 100,200 and the standard deviation is known to be 15,000.

    03:45 Thus the standard error is 2739.

    03:52 Having calculated these values, we can take the next step and choose our confidence level. Common confidence levels are 90%, 95% and 99%, with respective alphas of 10%, 5% and 1%.

    04:08 Another way to put the value of alpha is 0.10.05 and 0.01 respectively.

    04:18 Keep in mind that a 95% confidence interval means you are sure that a 95% of the cases, the true population parameter would fall into the specified interval.

    04:29 Ok the Z of Alpha comes from the so-called standard normal distribution table. It is best to first see it and then comment on it.

    04:40 Let's say that we want to find the values for the 95% confidence interval.

    04:45 Alpha is 0.05.

    04:47 Therefore, we are looking for Z of alpha divided by two or 0.0 to 5.

    04:56 In the table. This will match the value of one -0.025 or 0.9775. The corresponding Z comes from the sum of the row and column table headers associated with this cell. In our case, the value is 1.9 plus 0.06 or 1.96.

    05:22 A commonly used term for the Z is critical value.

    05:24 So we have found the critical value for this confidence interval.

    05:29 Now we can easily substitute in the formula.

    05:33 The final confidence interval becomes 94833 to 105568. The interpretation is the following.

    05:45 We are 95% confident that the average data scientist salary will be in the interval 94,833 and $105,568.

    06:00 Let's repeat the exercise using a higher confidence level.

    06:04 Say we want to be 99% certain of the outcome.

    06:07 Alpha is 0.01.

    06:11 We look at the table for the value of one -0.005, which is equal to 0.995.

    06:19 Bummer. There is no such value when this happens.

    06:23 We just have to round to the nearest value available.

    06:27 The corresponding critical value is 2.5 plus 0.08.

    06:32 Thus, 2.58.

    06:35 We plug it into our formula once more and the new confidence interval is equal to 93,135 and 107,206. This means that we are 99% confident that the average data scientist salary is going to lie in the interval between 93,135 and $107,206.

    07:00 Please note that in this case, there is a trade-off between the level of confidence we chose and the estimation precision.

    07:06 The interval we obtained is broader.

    07:09 The opposite is also true.

    07:11 A narrow confidence interval translates to higher uncertainty.

    07:15 Makes sense, right? If we are trying to estimate the population mean, and we are picking a larger interval, we're increasing our chances of having an interval that actually includes the mean and vice versa.

    07:29 If we want to be more specific about the population mean range, this will take away from our confidence about this statement.

    07:37 Okay. This lecture was a bit longer, but very insightful.

    07:42 Don't skip the exercises provided.

    07:44 They will help you reinforce the knowledge about this concept, which is fundamental for everybody who wants to work with numbers in their job.

    07:53 In the next few lessons, we will study some particular cases and teach you how to find confidence intervals for them.

    08:00 Thanks for watching.


    About the Lecture

    The lecture Confidence Intervals: Calculations in a Population With Known Variance by 365 Careers is from the course Statistics for Data Science and Business Analysis (EN).


    Author of lecture Confidence Intervals: Calculations in a Population With Known Variance

     365 Careers

    365 Careers


    Customer reviews

    (1)
    5,0 of 5 stars
    5 Stars
    5
    4 Stars
    0
    3 Stars
    0
    2 Stars
    0
    1  Star
    0