Mathematics

Scatter Graphs

Scatter graphs are used to display the relationship between two sets of data. Each data point is represented by a dot on the graph, with the horizontal axis typically showing one set of values and the vertical axis showing the other. This type of graph is useful for identifying patterns or trends in the data.

Written by Perlego with AI-assistance

11 Key excerpts on "Scatter Graphs"

  • Compassionate Statistics
    eBook - ePub

    Compassionate Statistics

    Applied Quantitative Analysis for Social Services (With exercises and instructions in SPSS)

    Chapter 11 , will introduce correlations used for inferential purposes.

    Scattergrams/Scatterplots

    It would be perfectly legitimate to portray the existence of a numerical correlation between two variables in a standard table, as in Table 10.1 .
    Unfortunately, the true nature of this correlation is not easily apparent in a table format. Only partially revealed in Table 10.1 is the fact that the data indicate (surprisingly) that the longer clients stayed in treatment at this agency, the worse became their attitude about that agency’s effectiveness. It is for this reason that most researchers create scattergrams, rather than tables, to present correlations used solely for descriptive purposes.
    Table 10.1   Table of the Length of Client Contact (in Weeks) and Client Attitude Toward Agency Effectiveness (10 = Very High , 1 = Very Low )
    Client # Contact in Weeks Attitude Toward Agency
    01 2 10
    02 4 8
    03 7 2
    04 6 3
    05 4 6
    06 2 10
    07 7 2
    08 10 1
    09 5 5
    10 7 3
    11 3 8
    12 2 8
    13 8 3
    14 9 1
    15 4 10
    16 8 2
    17 4 10
    A scattergram , also called a scatterplot, offers a clear visual image of the intersection of the values contained in the two variables. Scattergrams are conceptually based on the image of the X axis and Y axis. You are undoubtedly familiar with this image since it is commonly used as a template in economics and business administration classes to display economic trends and forecasts.
    A scattergram literally pinpoints where individual cases (usually people) are placed on a grid bounded by the possible values of the two variables being analyzed, then scatters those points, thereby forming some variation of a pattern. These possible values start at an absolute zero point, where the X axis crosses the Y axis, and then continue to increase vertically and horizontally out from that zero point, as illustrated in Figure 10.4
  • Advanced Statistics for Physical and Occupational Therapy
    • Thomas Gus Almonroeder(Author)
    • 2022(Publication Date)
    • Routledge
      (Publisher)
    As you progress through this chapter, you’ll learn more about how to both qualitatively and quantitatively examine the relationship between variables, as well as other key information that can be gleaned from a scatter plot. However, as a primer, let’s start by simply describing some general things that can be observed from the scatter plot in Figure 8.1. First, you’ll notice that there appears to be a general pattern to the data points, as they tend to start in the bottom-left corner and move toward the top-right corner of the scatter plot. This indicates that athletes with stronger lower bodies (i.e. greater one repetition maximum weights) tended to be able to jump higher. You’ll also notice that most data points appear to be located fairly closely to the line of best fit but that very few data points lie directly on the line. Finally, you’ll see that some of the data points are located farther from the line of best fit than others. For example, the athlete whose data point is circled didn’t appear to fit the general pattern observed for the sample, as his lower body strength was slightly below average, but he was able to jump higher than almost all of the other athletes in the study.
    As you can already see, scatter plots provide a tremendous amount of information. Examining a scatter plot is often done as a preliminary step to qualitatively visualize the nature of the relationship (or lack thereof) between variables before performing a more quantitative analysis. Scatter plots can also help us determine whether our data meets some of the key assumptions associated with bivariate correlation (discussed at the end of this chapter). Fortunately, scatter plots are fairly easy to generate and are often standard output when conducting a bivariate correlation analysis using most statistical analysis software packages. Scatter plots are also commonly included as figures in research articles that report the results of a bivariate correlation analysis.

    An introduction to correlation coefficients

    While examining a scatter plot allows us to qualitatively assess the relationship between two variables, correlation coefficients help to quantitatively characterize the nature of the relationship. The Pearson product-moment correlation coefficient, which is typically represented by a lowercase, italicized r and often referred to as the ‘r value’, is commonly used to quantitatively describe the relationship between two continuous variables. Throughout this chapter, any reference to a correlation coefficient, or r
  • Understanding Statistics
    • Bruce J. Chalmer(Author)
    • 2020(Publication Date)
    • CRC Press
      (Publisher)
    11

    Describing Relationships Between Two Variables

    11.1  A scatterplot shows the shape of a relationship between two variables.

    Relationships between variables

    Before we discuss ways of describing relationships between variables, we need to consider why we should bother. The answer is the same as it was when we considered group differences on a single variable: Many scientific hypotheses can be stated in terms of the relationship between two variables. In fact, the issue of group differences can itself be considered in terms of a relationship between two variables and some of the same techniques apply, as we will see.
    What do we mean by a “relationship” between variables? To say that two variables are related means that knowledge of an individual's score on one variable changes our best guess about the individual's score on the other variable.
    For example, yield of com per hectare and amount of rainfall during the growing season are presumably related to each other. If we knew nothing else about a given farm (besides the fact that com was planted there), our best guess about the yield we might expect from that farm would simply be the average com yield. But if we were given information about the rainfall at the farm, our guess would probably be affected. If we were told that there is almost no rain at the farm, then, in the absence of irrigation, we would expect a very low yield. If we were told that there is too much rain, we would similarly expect a low yield. If we were told that the amount of rain is just right for com, we would expect a high yield.
    Figure 11.1 shows a graphical representation of this type of relationship. By drawing such a picture, we can characterize the “shape” of a relationship. In this case it is curvilinear: High com yields are associated with moderate rainfall, with lower yields for very high or very low rainfall.

    Scatterplots

    Of course, Figure 11.1 is unrealistic. Even though rainfall undoubtedly does affect com yield, it is not the only factor. Many other things matter also. Knowing the amount of rainfall would change our best guess about com yield, but we still would not be able to predict com yield perfectly
  • Introduction to Bayesian Statistics
    • William M. Bolstad, James M. Curran(Authors)
    • 2016(Publication Date)
    • Wiley
      (Publisher)
    RMS, root mean square. It is not as affected by outliers as the variance is, but it is still quite affected. It inherits good mathematical properties and good combining properties from the variance. The standard deviation is the most widely used measure of spread. It is in the same units as mean, so its size is directly comparable to the mean.

    3.5 Displaying Relationships Between Two or More Variables

    Sometimes our data are measurements for two variables for each experimental unit. This is called bivariate data. We want to investigate the relationship between the two variables.

    Scatterplot

    The scatterplot is just a two-dimensional dotplot. Mark off the horizontal axis for the first variable, the vertical axis for the second. Each point is plotted on the graph. The shape of the “point cloud” gives us an idea as to whether the two variables are related, and if so, what is the type of relationship.
    When we have two samples of bivariate data and want to see if the relationship between the variables is similar in the two samples, we can plot the points for both samples on the same scatterplot using different symbols so we can tell them apart.
    EXAMPLE 3.3
    The Bears.mtw file stored in Minitab contains 143 measurements on wild bears that were anesthetized, measured, tagged, and released. Figure 3.11 shows a scatterplot of head length versus head width for these bears. From this we can observe that head length and head width are related. Bears with large width heads tend to have heads that are long. We can also see that male bears tend to have larger heads than female bears.
    Figure 3.11
    Head length versus head width in black bears.

    Scatterplot Matrix

    Sometimes our data consists of measurements of several variables on each experimental unit. This is called multivariate data. To investigate the relationships between the variables, form a scatterplot matrix
  • Multivariate Analysis for the Behavioral Sciences, Second Edition
    • Kimmo Vehkalahti, Brian S. Everitt(Authors)
    • 2018(Publication Date)
    • CRC Press
      (Publisher)
    When there are many variables measured on all the individuals in a study, an initial examination of all the separate pairwise scatterplots becomes difficult. For example, if 10 variables are available, there are 45 possible scatterplots. But all these scatterplots can be conveniently arranged into a scatterplot matrix that then aids in the overall comprehension and understanding of the data.
    A scatterplot matrix is defined as a square, symmetric grid of bivariate scatterplots. The grid has q rows and columns, each one corresponding to a different variable. Each of the grid’s cells shows a scatterplot of two variables. Variable j is plotted against variable i in the ij th cell, and the same variables appear in cell ji with the x - and y -axes of the scatterplots interchanged. The reason for including both the upper and lower triangles of the grid, despite the seeming redundancy, is that it enables a row and a column to be visually scanned to see one variable against all others, with the scales for the one variable lined up along the horizontal or the vertical.
    To illustrate the use of a scatterplot matrix, we shall use the data shown in Table 2.6 . These data arise from an experiment in which five different types of electrode were applied to the arms of 16 subjects and the resistance measured (in kilohms). The experiment was designed to see whether all electrode types performed similarly. The scatterplot matrix for the data is shown in Figure 2.26 ; each of the scatterplots in the diagram has been enhanced by the addition of the linear fit of the y variable on the x variable. The diagram suggests the presence of several outliers, the most extreme of which is subject 15; the reason for the two extreme readings on this subject was that he had very hairy arms. Figure 2.26
  • R in Action, Third Edition
    eBook - ePub

    R in Action, Third Edition

    Data analysis and graphics with R and Tidyverse

    • Robert I. Kabacoff(Author)
    • 2022(Publication Date)
    • Manning
      (Publisher)
    Scatter plots and scatter plot matrices allow you to visualize relationships between quantitative variables two at a time. The plots can be enhanced with linear and loess fit lines showing trends.
  • When you’re creating a scatter plot based on a large volume of data, methods that plot densities rather than points are particularly useful.
  • The relationships among three quantitative variables can be explored using 3D scatter plots or 2D bubble charts.
  • Change over time can be described effectively with line charts.
  • Large correlation matrices are difficult to understand in table form, but easily explored via corrgrams—visual plots of correlation matrices.
  • The relationships between two or more categorical variables can be visualized with mosaic charts.
  • Statistics for Psychologists
    eBook - ePub

    Statistics for Psychologists

    An Intermediate Course

    does emerge, and a dependence of failure on temperature is revealed.
    To end the chapter on a less sombre note, and to show that misperception and miscommunication are certainly not confined to statistical graphics, see Figure 2.35 .
    Fig. 2.35.  Misperception and miscommunication are sometimes a way of life. (© The New Yorker collection 1961 Charles E. Martin from cartoonbank.com. All Rights Reserved.)
    2.10.  Summary
    1. Graphical displays are an essential feature in the analysis of empirical data.
    2. In some case a graphical “analysis” may be all that is required (or merited).
    3. Stem-and-leaf plots are usually more informative than histograms for displaying frequency distributions.
    4. Box plots display much more information about data sets and are very useful for comparing groups. In addition, they are useful for identifying possible outliers.
    5. Scatterplots are the fundamental tool for examining relationships between variables. They can be enhanced in a variety of ways to provide extra information.
    6. Scatterplot matrices are a useful first step in examining data with more than two variables.
    7. Beware graphical deception!
    Software Hints SPSS
    Pie charts, bar charts, and the like are easily constructed from the Graph menu. You enter the data you want to use in the chart, select the type of chart you want from the Graph menu, define how the chart should appear, and then click OK. For example, the first steps in producing a simple bar chart would be as follows.
    1. Enter the data you want to use to create the chart.
    2. Click Graph, then click Bar. When you do this you will see the Bar Charts
  • Statistical Data Analysis Explained
    eBook - ePub

    Statistical Data Analysis Explained

    Applied Environmental Statistics with R

    • Clemens Reimann, Peter Filzmoser, Robert Garrett, Rudolf Dutter(Authors)
    • 2011(Publication Date)
    • Wiley
      (Publisher)
    This plot is typically displayed as an elongated rectangle. Each value is plotted at its correct position along the x -axis and at a position selected by chance (according to a random uniform distribution) along the y -axis (Figure 3.2, lower diagram). This simple graphic can provide important insight into structure in the data. Figure 3.2 Evolution of the one-dimensional scatterplot demonstrated using Sc as measured by instrumental neutron activation analysis (INAA) in the samples of the Kola C-horizon In Figure 3.2 (stacked and one-dimensional scatterplot) a significant feature is apparent that would be important to consider if this variable were to be used in a more formal statistical analysis. The data were reported in 0.1 mg/kg steps up to a value of 10 mg/kg and then rounded to full 1 mg/kg steps – this causes an artifical “discretisation” of all data above 10 mg/kg. 3.2 The histogram One of the most frequently used diagrams to depict a data distribution is the histogram. It is constructed in the form of side-by-side bars. Within a bar each data value is represented by an equal amount of area. The histogram permits the detection at one glance as to whether a distribution is symmetric (i.e. the same shape on either side of a line drawn through the centre of the histogram) or whether it is skewed (stretched out on one side – right or left skewed). It is also readily apparent whether the data show just one maximum (unimodal) or several humps (multimodal distribution). The parts far away from the main body of data on either side of the histograms are usually called the tails. The length of the tails can be judged. The existence or non-existence of straggling data (points that appear detached from the main body of data) at one or both extremes of the distribution is also visible at one glance
  • Mathematics for Scientific and Technical Students
    • H. Davies, H.G. Davies, G.A. Hicks(Authors)
    • 2014(Publication Date)
    • Routledge
      (Publisher)
    Chapter 9

    Determinants and matrices

       

    9.1 Representation of data

    The relationship between two quantities in engineering or science can be expressed graphically, and usually by a formula or equation.

    (a) Representation of data with a formula or equation

    A formula or equation is a convenient way of expressing the relationship between two quantities. In equations such as y = 3x 2 + 2x – 1, x is called the independent variable. y is the dependent variable, so called because its value depends upon the x value. Such equations produce pairs of (x , y ) values which can be used as Cartesian coordinates.
    In equations such as r = 2 cos θ , θ is the independent variable and r the dependent variable, the pairs of (r , θ ) values produced are called polar coordinates. Polar graphs are examined in Section 9.14 .

    (b) Graphical representation of two quantities related by an equation

    For each value of x (or θ ) a corresponding value of y (or r ) is obtained from the equation. Each pair of values is used as the co-ordinates of a point on a plane. These points trace out a curve or straight line which is the graph representing the equation. The shape of the graph is a good indication of how one quantity depends on the other.

    9.2 Cartesian and polar coordinates

    To produce graphs and engineering drawings either by hand or by computer it is necessary to have a method of locating the positions of points on paper or on screen. At least two numbers, called coordinates, are required to locate a point in a plane. Two systems are used:

    (a) Cartesian coordinates (x , y )

    This is the most commonly used system. Two perpendicular datum lines are used, the horizontal line is called the x -axis, the vertical line is called the y -axis, as shown in Fig. 9.1 . The point of intersection of the two axes is called the origin O. Any point P is located by its perpendicular distance from the two axes.
    Fig. 9.1
  • R Visualizations
    eBook - ePub

    R Visualizations

    Derive Meaning from Data

    Chapter 5 Visualize the Relation of Two Continuous Variables
    How do two or more variables relate to each other? Chapter 6 presents analyses of the relationship between two categorical variables. Here focus on the relationship between continuous variables: As the values of one variable increase, the values of the other variables tend to either systematically increase (+ relationship) or systematically decrease (- relationship). Examples follow.
    Relationship, continuous variables: As the values of one variable increase, the values of the other variables tend to either increase, or decrease.
    Positive Relationship:
    • Food quality increases, customer satisfaction increases
    • Hotel occupancy rate increases, needed staff increases
    Negative (inverse) relationship:
    • Price decreases, sales volume increases
    • Time partying increases, grades decrease
    Positive and negative relationships can have the same magnitude, such as for linear relationships, assessed by the correlation coefficient. The sign of + or − indicates the direction of the relationship. The size of the coefficient indicates magnitude.
    Correlation coefficient: Indicates extent of a linear relationship of two variables, bounded by -1 and 1.
    The essential visualization for the relationship between two continuous variables is the scatterplot. The paired data values for each observation plot as a single point. Define the two coordinates of each point as the values of both variables for the corresponding observation.
    Scatterplot: One plotted point for each pair of values for two variables.
    Quick Start: Scatterplot, Chapter 2.4 , p. 38
    5.1  Enhance the Scatterplot
    5.1.1  The Ellipse
    A distribution of two variables, x and y, necessarily involves a third variable. For categorical variables, the third variable is their joint frequency at specific values of x and y. For continuous variables, the third variable is their joint density, z, which plots as a smooth curve. The plot extends into three dimensions with z represented as height. Consider the plot of a bivariate normal distribution, such as the two examples in Figure 5.1
  • Visualize This
    No longer available |Learn more

    Visualize This

    The FlowingData Guide to Design, Visualization, and Statistics

    • Nathan Yau(Author)
    • 2011(Publication Date)
    • Wiley
      (Publisher)
    For the sake of simplicity, Washington, DC was removed from the dataset to see the rest of the data better. It is important, however, to consider the importance of the outliers in your data. This is discussed more in Chapter 7, “Spotting Differences.”
    It’s not bad for a base chart, and if the graphic were just for analysis, you could stop here. However, if you have more than an audience of one, you can improve the readability a lot with a few small changes, as shown in Figure 6-7 .
    I put less emphasis on the surrounding box by getting rid of the thick border, and direct your attention to the curve by making it thicker and darker than the dots.
    Figure 6-7: Revised scatterplot on murder versus burglary
    Exploring More Variables
    Now that you’ve plotted two variables against each other, the obvious next step is to compare other variables. You could pick and choose the variables you want to compare and make a scatterplot for each pair, but that could easily lead to missed opportunities and ignoring interesting spots in the data. You don’t want that. So instead, you can plot every possible pair with a scatterplot matrix, as shown in Figure 6-8 .
    Figure 6-8: Scatterplot matrix framework
    This method is especially useful during your data exploration phases. You might have a dataset in front of you but have no clue where to start or what it’s about. If you don’t know what the data is about, your readers won’t either.
    Tip To tell a complete story, you have to understand your data. The more you know about your data, the better the story you can tell.
    The scatterplot matrix reads how you expect. It’s usually a square grid with all variables on both the vertical and horizontal. Each column represents a variable on the horizontal axis, and each row represents a variable on the vertical axis. This provides all possible pairs, and the diagonal is left for labels because there’s no sense in comparing a variable to itself.
    Create a Scatterplot Matrix
    Now come back to your crime data. You have seven variables, or rates for crime types, but in the previous example, you compared only two: murder and burglary. With a scatterplot matrix, you can compare all the crime types. Figure 6-9
  • Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.