Geography

Statistically Significant Data

Statistically significant data in geography refers to findings that are unlikely to have occurred by chance. It indicates that the observed patterns or relationships in the data are likely to be real and not due to random variation. This helps geographers make more confident conclusions about the relationships between geographic phenomena.

Written by Perlego with AI-assistance

6 Key excerpts on "Statistically Significant Data"

  • Statistics for Geography and Environmental Science
    • Richard Harris, Claire Jarvis(Authors)
    • 2014(Publication Date)
    • Routledge
      (Publisher)
    At their simplest, statistics are used to form basic numeric summaries of the processes, events or activities that the data represent. Yet, this is only the starting point for analysis. Statistics can also go beyond a sample of data, to help decide whether information gleaned from the sample is true more generally. Statistics ask if one ‘thing’ is related to another, or if the one causes the other.
    We do not claim that statistics are the only way of doing research. But, they are important tools for validating theory, making predictions, helping to make sense of the world, engaging in policy research, offering informed commentary about social and environmental issues, and to help make the case for change.
    Learning objectives By the end of this chapter you will be able to:
    • Appreciate the importance of studying statistics as a student of geography, environmental or geographical information science.
    • Define what is meant by geographical data and analysis.
    • Give a working definition of the difference between geographical and non geographical forms of analysis.
    • Summarise some debates surrounding the use of numbers and statistics in human and social geography.
    • Have an appreciation of the impact of spatial association when using statistics for geographical enquiry.

    1.1 Statistics: a brief introduction

    Students tend to be uneasy about statistics. This is hardly surprising. Statistics involve the language of mathematics, of formulae and notation – a language that is mystifying to new learners and worryingly intolerant of mistakes. Concepts such as ‘degrees of freedom’ are less intuitive than they ought to be and frequent use of the Greek alphabet is like shorthand for ‘keep away!’ Yes, learning statistics is a challenge. But the problem is deeper than that: statistics seem innately off-putting:
    [E]very year it seems that the majority of students, who apparently grasp many nonstatistical concepts with commensurate ease, struggle to understand statistics. (Dancey and Reidy 2004, p.1)
    Of course, there are exceptions. It is a generalisation to say, ‘Students tend to be uneasy about statistics.’ We have not met all students to ask them. Even if we had, each individual’s response to the question ‘do you like statistics?’ would depend on a range of factors including time of day, their understanding of the last stats class, peer pressure, other work commitments, whether they thought a ‘yes’ would get us to go away, and so on.
  • The Routledge Handbook of Planning Research Methods
    • Elisabete A. Silva, Patsy Healey, Neil Harris, Pieter Van den Broeck, Elisabete A. Silva, Patsy Healey, Neil Harris, Pieter Van den Broeck(Authors)
    • 2014(Publication Date)
    • Routledge
      (Publisher)
    Bundling data into areal aggregates has the effect of suppressing some of the underlying variation. Large areas (in the sense of areas with large populations) may contain intra-area heterogeneity which is lost to the database. For example, the mean value for household income in a large area may conceal considerable intra-area variation about the mean. Smaller areas generally suffer less from this problem. However, there are consequences of working with areas with small populations. Data errors or small random fluctuations in numbers (e.g., in the number of cases of a disease or number of burglaries) will likely have a big effect on calculated rates and ratios. The result is that sampling errors are larger (because they are a function of population size). The effect of this is that extreme rates or ratios are often associated with areas with small populations but differences between areas may not be statistically significant – that is, differences are due to sampling error. Statistical significance, on the other hand, tends to be found when comparing rates or ratios based on large populations. These underlying properties have implications when seeking out statistically significant crime or disease hotspots. The situation is further complicated if a region is partitioned into areas that vary in population size. Given the foregoing remarks, this implies each observation is drawn from a different distribution, each with its own sampling variance, and for this reason data values may not be directly comparable. This is the problem of inter-area heteroscedasticity (or non-constant variance). Two problems flow from this: first, maps of the attribute may contain misleading artefacts, particularly if there is a geography to the distribution of the areas with large and small populations; second, with reference again to the theory that underpins “classical” statistics, not only are data values not independent (as we have remarked) but also they are not drawn from the same probability distribution. Areal data are in general neither independent nor identically distributed.
    This section has provided a brief overview of some of the important properties of spatial data, particularly aggregated data, which have an important influence on how we undertake statistical analysis of spatial data. For more details see Haining (2009) . Other properties will surface in the context of specific problems in the next two sections.

    2.2 Exploratory spatial data analysis and spatial data mining

    Advances in computer technology have made it possible to explore and interrogate spatial data in new and innovative ways. Visualization of spatial data linking graphs (e.g., bivariate scatter-plots, Moran scatterplots, added variable plots, boxplots) to maps has helped the research scientist to detect patterns in geographical data. Questions such as “show me all the areas on the map which have attribute values above the upper quartile” or “show me all the areas on the map with positive residuals from my regression analysis” have become straightforward to implement. Brushing is the term used when the analyst highlights a subset of cases in one graph and sees them highlighted in another graph or on a map; dynamic brushing is brushing using a moving window that updates responses as the user moves over the graph or map (Monmonier 1989) . As we have noted some caution needs to be exercised because area values may not be directly comparable even after controlling for population size differences, for example. Also physical size is not necessarily correlated with population size, so that on a map of an English county which consists of both urban and rural areas some of the census areas we may be most interested in and with the largest populations may be difficult to see because of their small physical sizes, and the overall visual impression may be skewed towards the rural areas. Geographers make use of cartograms to try to overcome this problem. For numerous examples of cartograms see, for example, www.worldmapper.org and www.sasi.group.shef.ac.uk/maps
  • Practical Statistics for Geographers and Earth Scientists
    • Nigel Walford(Author)
    • 2011(Publication Date)
    • Wiley
      (Publisher)
    2 Geographical data: quantity and content
    Chapter 2 starts to focus on geographical data and looks at the issue of how much to collect by examining the relationships between populations and samples and introduces the implications of sampling for statistical analysis. It also examines different types of geographical phenomena that might be of interest and how to specify attributes and variables. Examples of sampling strategies that may be successfully employed by students undertaking independent research investigations are explained in different contexts.
    Learning outcomes This chapter will enable readers to:
    • outline the relationship between populations and samples, and discuss their implications for statistical analysis;
    • decide how much data to collect and how to specify the attributes and variables that are of interest in different types of geographical study;
    • initiate data-collection plans for an independent research investigation in Geography and related disciplines using an appropriate sampling strategy.
    2.1 Geographical data
    What are geographical data? Answering this question is not as simple as it might at first appear. The geographical part implies that we are interested in things – entities, observations, phenomena, etc. – immediately above, below and on, or indeed comprising, the Earth’s surface. It also suggests that we are interested in where these things are, either as a collection of similar entities in their own right or as one type of entity having a spatial relationship with another type or types. For example, we may wish to investigate people in relation to settlements; rivers in respect of land elevation; farms in association with food-industry corporations; or volcanoes in connection with the boundaries of tectonic plates. The data part of the term ‘geographical data’ also raises some definitional issues. At one level the data can be thought of as a set of ‘facts and figures’ about ‘geographical things’ . However, some would argue that the reduction of geographical entities to a list of ‘facts and figures’ in this fashion fails to capture the essential complexity of the phenomena. Thus, while it is possible to draw up a set of data items
  • Key Methods in Geography
    • Nicholas Clifford, Meghan Cope, Thomas Gillespie, Nicholas Clifford, Meghan Cope, Thomas Gillespie(Authors)
    • 2023(Publication Date)
    The development of methods for space and time analysis is a process that is continuously evolving. The increasing availability of spatio-temporal data has led to a pressing need to develop and integrate methods that can be used to reveal useful patterns, and this has been facilitated by the democratization of GIS. GIS can easily integrate geospatial data of different spatial scales and temporal granularity for a wide range of applications. This chapter has reviewed the importance of exploratory spatial data analysis to deepen our ability to detect spatial and temporal trends inherent to the data. Finally, the chapter has discussed the importance of confirmatory statistical approaches to model the role of explanatory variables.
    Summary
    The past two decades have witnessed two critical changes in the area of spatial data: 1) an increasing availability of spatial and temporal-explicit data and 2) the democratization of Geographic Information Systems (GIS). Georeferenced (or spatial) data are unique and characterized by a set of latitude and longitude coordinates, and come in different formats (point/lines/area or raster). These datasets can be massive, and there is an increasing need for robust statistical and visualization methods that can integrate different dimensions, such as space and time. Traditional statistical methods help explore trends. Spatial point pattern techniques (K-function, Kernel Density Estimation) are well suited to identify the locations of clusters, which are particularly informative in an epidemiological context. Spatial statistical approaches have the potential to identify locally varying patterns when data are aggregated at the areal level, as is the case for several census units. It is particularly important to test for spatial autocorrelation before implementing a regression technique; indeed traditional non-spatial regression approaches do not capture the spatial variation of the phenomenon under study. Caution is recommended when the data exhibit temporal trends (e.g. epidemiological datasets). Suggestions are presented to visualize this information and incorporate it into regression techniques.

    Note

    1 John Edwards had officially suspended his campaign on 30 January 2008, but his name was still on the ballot.
    Further Reading
    • Anselin (2011) ‘From SpaceStat to CyberGIS: Twenty years of spatial data analysis software’.
    • This provides a historical and critical review of the development of spatial data analysis software. It considers such things as the role of advances in methods, developments in cyberinfrastructure and the growth of open source.
  • Methods in Human Geography
    eBook - ePub

    Methods in Human Geography

    A guide for students doing a research project

    • Robin Flowerdew, David M. Martin, Robin Flowerdew, David M. Martin(Authors)
    • 2013(Publication Date)
    • Routledge
      (Publisher)
    everything else being equal, are much more reliable. These can be obtained quite easily from popular statistical packages such as SPSS for Windows.

    Summary

    The analysis of numerical data can be an extremely valuable constituent of geographic research. It drives data collection and data presentation and it provides a powerful framework within which our understanding of spatial processes can be tested and developed. This chapter has described how the analysis of numerical data can be useful in summarising large amounts of data through descriptive statistics; how relatively new, and primarily visual, exploratory techniques can be useful in both formulating hypotheses and examining results; how we can infer aspects of a population from a sample or a process from a set of observations; and how model building and calibration can be used as a test of how well we understand the real world.
    One of the more confusing aspects of data analysis is deciding on what technique to use in a given situation. Whilst experience is useful here, and discussing your problem with a user-friendly statistician is highly recommended, this chapter also discusses how to narrow down the choice of technique fairly easily. It is noted that spatial data have certain properties that lend themselves to different types of analysis and many of these are highly visual which often makes their presentation in research projects more appealing.
    There are pitfalls, however, and care has to be taken to eliminate problems that could invalidate the conclusions reached from the analysis. A number of potential problems that might be encountered or might need some thought are outlined. These include the modifiable areal unit problem, spatial non-stationarity, spatial dependence, non-standard distributions and the identification of spurious relationships. However, these problems are also challenges and should not deter students from incorporating data analysis as a central theme of their projects. With due thought, the analysis of numerical data can provide a pivotal contribution to research.
  • Modelling Interactions Between Vector-Borne Diseases and Environment Using GIS
    • Hassan M. Khormi, Lalit Kumar(Authors)
    • 2015(Publication Date)
    • CRC Press
      (Publisher)
    Chapter 4

    Spatial Data

    4.1 Introduction

    Spatial data is information that links the feature to a geographic location; that is, it links the feature to a position on Earth. Therefore, spatial data contains geographic information that enables specifying exactly where on Earth that feature is located. Because of this, spatial data is often also termed geospatial or georeferenced data. This ability to link the feature to an exact location on Earth is extremely important as it enables building relationships between that feature and other features around it. For example, a person suffering from dengue fever visits a doctor and is asked for the address where they live. This address now enables the doctor and anyone else using this data to see exactly where on Earth this person lives and to investigate the surrounding environmental conditions. The address is the spatial data and is denoted by coordinates on a map (Figure 4.1 ).
    Figure 4.1
    Representation of spatial data on a map.
    Spatial data is stored as coordinates and topology and is most often accessed, manipulated, or analysed through geographic information systems (GIS).
    The traditional method for storing, analysing, and presenting spatial data is the map. For example a two-dimensional (2D) road map contains points, lines, and polygons that represent different objects, such as cities, roads, and district boundaries. The features in the map have specific locations and areas and visualise geographic information; therefore, they are considered spatial data. So, understanding maps and the way they are produced is essential for exploring the characteristics of spatial data (Heywood et al., 2002).
    Geographic data have two components: spatial and nonspatial. The spatial component contains the locational information; the nonspatial component contains attribute information, commonly called descriptive information. As an example, the dengue patient’s name, date of birth, age, marital status, and so on all become part of the nonspatial or attribute information. Figure 4.2
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.