Playlist

Data Visualization: Cross Tables and Scatterplots

by 365 Careers

My Notes
  • Required.
Save Cancel
    Learning Material 4
    • XLS
      2.6. Cross table and scatter plot lesson.xls
    • XLS
      2.6.Cross-table-and-scatter-plot-exercise.xls
    • XLS
      2.6.Cross-table-and-scatter-plot-exercise-solution.xls
    • PDF
      Download Lecture Overview
    Report mistake
    Transcript

    00:00 So far we have covered graphs that represent only one variable, but how do we represent relationships between two variables? In this video, we'll explore cross tables and scatter plots.

    00:12 Once again, we have a division between categorical and numerical variables.

    00:18 Let's start with categorical variables.

    00:21 The most common way to represent them is using cross tables or, as some statisticians call them, contingency tables.

    00:28 Imagine you were an investment manager, and you manage stocks, bonds and real estate investments for three different investors.

    00:35 Each of them has a different idea of risk, and hence their money is allocated in a different way. Among the three asset classes across table representing all the data looks in the following way.

    00:47 You can clearly see the row showing the type of investment that's been made and the columns with each investor's allocation.

    00:54 It is a good practice to calculate the totals of each row and column, as it is often useful in further analysis.

    01:01 Notice that the subtotals of the rows give us total investment in stocks, bonds and real estate. On the other hand, the subtitles of the columns give us the holdings of each investor. Once we have created a cross table, we can proceed by visualizing the data onto a plane.

    01:19 A very useful chart in such cases is a variation of the bar chart called the side by sidebar chart.

    01:26 It represents the holdings of each investor in the different types of assets.

    01:30 Stocks are in green, bonds are in red, and real estate is in blue.

    01:35 The name of this type of chart comes from the fact that for each investor, the categories of assets are represented side by side.

    01:42 In this way, we can easily compare asset holdings for a specific investor or among investors. Easy, right? All graphs are very easy to create and read.

    01:52 Once you have identified the type of data you were dealing with and decided on the best way to visualize it.

    01:58 Finally, we would like to conclude with a very important graph.

    02:02 The scatter plot.

    02:04 It is used when representing two numerical variables.

    02:08 For this example, we have gathered the reading and writing SAT scores of 100 individuals. Let me first show you the graph before analyzing it.

    02:18 All right. First SAT scores by component range from 200 to 800 points. And that is why our data is bounded within the range of 200 to 800.

    02:28 Second, our vertical axis shows the writing scores, while the horizontal axis contains reading scores.

    02:36 Third, there are 100 students and the results correspond to a specific point on the graph. Each point gives us information about a particular student's performance. For example, this is Jane.

    02:48 She scored 300 on writing, but 550 on the reading part.

    02:54 Scatter plots usually represent lots and lots of observations.

    02:58 When interpreting a scatter plot.

    03:00 A statistician is not expected to look into single data points.

    03:03 He would be much more interested in getting the main idea of how the data is distributed.

    03:09 Ok The first thing we see is that there is an obvious uptrend.

    03:13 This is because lower writing scores are usually obtained by students with lower reading scores, and higher writing scores have been achieved by students with higher reading scores. This is logical, right? Students are more likely to do well on both because the two tasks are closely related.

    03:29 Second, we notice a concentration of students in the middle of the graph with scores in the region of 450 to 550 on both reading and writing.

    03:38 Remember we said that scores can be anywhere between 208 hundred? Well, 500 is the average score one can get.

    03:45 So it makes sense that a lot of people fall into that area.

    03:50 Third, there is this group of people with both very high writing and reading scores. The exceptional students tend to be excellent at both components.

    04:00 This is less true for bad students as their performance tends to deviate when performing different tasks. Finally, we have Jane from a minute ago.

    04:08 She is far away from every other observation as she scored above average on reading but poorly on writing.

    04:15 This observation is called an outlier as it goes against the logic of the whole data set. We will learn more about outliers and how to treat them in our analysis later on in this course. So we have gone through the basics.

    04:28 We have covered populations, samples, types of variables, graphs and tables.

    04:35 And it is time for us to dive into the heart of descriptive statistics, measurements of central tendency and variability.

    04:43 Thanks for watching.


    About the Lecture

    The lecture Data Visualization: Cross Tables and Scatterplots by 365 Careers is from the course Statistics for Data Science and Business Analysis (EN).


    Author of lecture Data Visualization: Cross Tables and Scatterplots

     365 Careers

    365 Careers


    Customer reviews

    (1)
    5,0 of 5 stars
    5 Stars
    5
    4 Stars
    0
    3 Stars
    0
    2 Stars
    0
    1  Star
    0