Data Visualization
eBook - ePub

Data Visualization

Charts, Maps, and Interactive Graphics

  1. 222 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Data Visualization

Charts, Maps, and Interactive Graphics

Book details
Book preview
Table of contents
Citations

About This Book

This is the age of data. There are more innovations and more opportunities for interesting work with data than ever before, but there is also an overwhelming amount of quantitative information being published every day. Data visualisation has become big business, because communication is the difference between success and failure, no matter how clever the analysis may have been. The ability to visualize data is now a skill in demand across business, government, NGOs and academia.

Data Visualization: Charts, Maps, and Interactive Graphics gives an overview of a wide range of techniques and challenges, while staying accessible to anyone interested in working with and understanding data.

Features:



  • Focusses on concepts and ways of thinking about data rather than algebra or computer code.
  • Features 17 short chapters that can be read in one sitting.
  • Includes chapters on big data, statistical and machine learning models, visual perception, high-dimensional data, and maps and geographic data.
  • Contains more than 125 visualizations, most created by the author.
  • Supported by a website with all code for creating the visualizations, further reading, datasets and practical advice on crafting the images.

Whether you are a student considering a career in data science, an analyst who wants to learn more about visualization, or the manager of a team working with data, this book will introduce you to a broad range of data visualization methods.

Cover image: Landscape of Change uses data about sea level rise, glacier volume decline, increasing global temperatures, and the increasing use of fossil fuels. These data lines compose a landscape shaped by the changing climate, a world in which we are now living. Copyright Ā© Jill Pelto ( jillpelto.com ).

Frequently asked questions

Simply head over to the account section in settings and click on ā€œCancel Subscriptionā€ - itā€™s as simple as that. After you cancel, your membership will stay active for the remainder of the time youā€™ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlegoā€™s features. The only differences are the price and subscription period: With the annual plan youā€™ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weā€™ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Data Visualization by Robert Grant in PDF and/or ePUB format, as well as other popular books in Mathematics & Probability & Statistics. We have over one million books available in our catalogue for you to explore.

Information

Year
2018
ISBN
9781351781749
Edition
1
III
Specific tasks
CHAPTER 7
Visual perception and the brain
IT HELPS TO KNOW A LITTLE about how the human brain processes visual information. Itā€™s very popular to explain this in terms of evolution, even though it is largely speculative. Nevertheless, the great majority of our million years or so on earth involved finding things to eat and spotting predators before they spotted us. Sitting down and looking at data is a new preoccupation, but uses the same old hunter-gatherer apparatus (eyes and brain). We tend to notice only the very broadest outlines of our surroundings except for one or two things that stand out in some way and draw our attention.
As a first principle, any visualization should convey its information quickly and easily, and with minimal scope for misunderstanding. Unnecessary visual clutter makes more work for the readerā€™s brain to do, slows down the understanding (at which point they may give up) and may even allow some incorrect interpretations to creep in. You might hear this called chartjunk. The designer Edward Tufte encourages us to think about the data:ink ratio, which you should try to keep high at all times. Statistician William Cleveland was more specific: the plot region is the part of any visualization that has to be clutter-free. That is the space between axes where the data appear. Annotations, examples, and even just eye candy outside the plot region impacts less on understanding. Sometimes a key, showing what different colors stand for, can be placed in the plot region without intruding much on the readerā€™s attention.
Image
Figure 7.1 A version of Figure 4.6 with a better data-to-ink ratio.
Although simplicity helps to avoid distraction, and some designers claim that a good visualization will require no explanation, not even axis labels, a legend or key, most experienced data analysts recognize that some explanation and guidance is essential. Complicated visualizations can make use of a ā€œHow to read this chartā€ paragraph, and talking the reader through how to identify and interpret one of the aspects of the data can be helpful.
Getting the reader to understanding the visualization at the time is a different task than getting them to remember the image or its message. Some research has found that including relevant and witty chartjunk can actually help recall, but it has to be done carefully.
7.1 ATTENTION AND CLARITY
Often, a visualization tells a story or conveys one specific message out of a larger analysis. Scientific training discourages analysts from telling the reader what to think, but in dataviz it may be important. Such points of interest can be highlighted: a steady increase in a line chart or one out of a cloud of markers in a scatter plot, for example.
This can be done effectively without cluttering by using pre-attentive cues. These are features that our brains seem to be hardwired to detect. We can add unobtrusive features to our visualizations just to help tell the story like this. In Figure 7.2, one marker is highlighted by color, another by size, then part of a line chart by shading around it. Very little is needed to draw the eye. I challenge you not to look at those points!
Image
Figure 7.2 Examples of pre-attentive features.
Crucially, these highlights have to be used sparingly. If there are too many of them, the reader will feel overloaded with information and they will no longer work. If you are making visualizations, be careful not to fall into a trap where you are very familiar with the data, so everything you create makes perfect sense to you. Experimentation and user testing will help you out. In Chapter 17, I am going to revisit some of these highlights and link them to everything else that surrounds the visualization.
We can also influence how the reader sees objects as being connected in some way. Good data visualization builds on the long-established Gestalt principles. The most obvious is that objects (like markers or lines) that are close together in a cluster and distinct from others farther away will be seen as connected. If we encode some of our variables as location or length then this follows naturally. But there are others that are not used so often:
ā€¢ Draw subtle lines connecting the objects of interest together.
ā€¢ Identify a group by a very distinct color and shape (for markers) or pattern and thickness (for lines).
ā€¢ Enclose them in a shaded area, or surrounding oval or rectangle (more complex shapes will lose this effect).
Of course, itā€™s not always possible to connect objects in the visualization without clutter, but it is worth considering. As with the pre-attentive cues, donā€™t overload the reader. There should only be one group that gets connected per visualization for maximum impact, and going beyond this can backfire. If you have multiple messages, maybe you need multiple visualizations (or an interactive one).
Jittering takes objects that confusingly coincide on the visualization and moves them by small random amounts. Scatter plots with markers piled on top of one another now have a cloud of closely packed markers around a common point, and line charts with the same problem now have a bundle of lines moving closely together from one common end to another.
Image
Figure 7.3 Observed rates of bird seed consumption in my garden, and smoothed lines through the data using splines. robertgrantstats.co.uk/dataviz/birdfeeders
Smoothing is perhaps the opposite of jittering, in that a lot of information gets summarized into one simple impression. A curve wiggles through a scatter plot, tracking the markers, or through a chart with multiple lines, showing a summary of the patterns (Figure 7.3).
The important feature of smoothed curves is that the smoothness is not part of the data. In the bird feeder data of Figure 7.3, the consumption often changes, and the resulting lines are very rough series of steps up and down (the gray lines). Most readers would find it hard to see the overall pattern, but the smoothed lines make it easier: more seeds consumed in summer 2015 and spring 2016, less so in the winter and through the rest of 2016. We are compromising by bending the natural line of the data, with the intention of improving understanding.
In Chapter 10, weā€™ll explore different techniques for smoothing in the context of models that predict one variable based on others. If your aim is not as formal as all that, and you just want to give a simplified impression, you could try a trick suggested by John Tukey that didnā€™t catch on: instead of small markers like circles, draw a vertical line for each data point. The overall shape will be apparent to readers but the central locations on the line will not be obvious (Figure 7.4).
Sometimes, there is a good reason for breaking a sequence of data into more than one smoothed line. For example, if you have economic data before and after the credit crunch of 2008, then you know from the context, even before you draw the data, that it could be represented as one smooth curve before and another smooth curve after the crunch.
Another simplifying trick which we will encounter in Chapter 11 is edge bundling, where lines connecting points together are artificially pulled together to reduce the spaghetti effect.
Semi-transparency is a great all-round tool for busy visualizations, also known as opacity. This allows lines, markers and such that coincide to be seen. Those in the background show through slightly. When markers are piled on top of one another, they look extra dark compared to others on their own. Because semi-transparency is more like the real world, we get an impression of lines moving continuously over and under one another and are able to take in more information immediately. There are several images in this book with semi-transparency, such as Figure 8.2; even though there are many markers or lines, you can see where they pile up in greater numbers.
Image
Figure 7.4 Tukeyā€™s smoothing by drawing vertical lines instead of points, applied to the train delay data from Chapter 2.
Colors, lengths, and areas are some of the attributes to which we have been encoding data. These are stimuli that get perceived by the brain. Not all stimuli have the same effect; the psychologist Stanley Smith Stevens showed that, if you double a length, it will be perceived accurately as twice the size of the original, but doubling an area is underestimated as 1.6 times bigger, while doubling the redness of a color is overestimated as 3.2 times bigger. This is why serious data visualization experts donā€™t like encoding things to area or color unless they are just ordinal (or you are happy for them to be understood as such).
7.2 CULTURAL ASSUMPTIONS
In many of the visualizations weā€™ve seen so far, time has been encoded to the horizontal location, with old data on the left and new data on the right. Why? This is an artifact of reading from left to right, and is so universal in dataviz that it is preferred even by writers of right-to-left alphabets like Arabic. Colors, too, do not have a universal meaning. Red is dangerous in some places and auspicious in others. It is a good idea not to assume your reader understands this sort of culture-specific encoding.
Some visualization formats are themselves cues to interpret the data in a specific way. For example, connections between data points have been visualized in the style of a subway map, and lists of items in the style of a periodic table of chemical elements. The trouble here is that, unless these are aimed wholly at city dwellers or chemists, not everyone will know what you are implying by the format. Although they are creative and fun, creators of these sorts of visualizations have sometimes been mocked for not having understood the thing they imitated. Do the distances between the subway stops represent anything? Is there actually periodicity in what looks like a periodic table, or is it just a glorified list?
7.3 LEARNING FROM OPTICAL ILLUSIONS
In data visualization, optical illusions are not just fun but actually give us some clues as to ways that people might misinterpret our work.
The cafƩ wall illusion (Figure 7.5, left) is one that may well affect data visualizations with blocks of color, causing lines to appear sloped when they are actually not. Lines entering and leaving shaded regions can also appear to bend (Figure 7.7), and wavy lines appear flatter or taller than they really are (Figure 7.5, right). Any visualization with high-contrast blocks of background color might be at risk from these effects.
Image
Figure 7.5 The cafĆ© wall illusion (left), where all lines are actually straight and either vertical or horizontal. A related illusion by Akiyoshi Kitaoka (right), where the two gray waves are identical in height. Left image by Wikipedia user ā€œFibonacciā€ - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=1788689. Right image by Akiyoshi Kitaoka, used with permission.
Image
Figure 7.6 The Ebbingh...

Table of contents

  1. Cover
  2. Half Title
  3. Title Page
  4. Copyright Page
  5. Dedication
  6. Table of Contents
  7. List of Figures
  8. Preface
  9. SECTION I The basics
  10. SECTION II Statistical building blocks
  11. SECTION III Specific tasks
  12. SECTION IV Closing remarks
  13. Further Reading
  14. Index