Data Simplification
eBook - ePub

Data Simplification

Taming Information With Open Source Tools

  1. 398 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Data Simplification

Taming Information With Open Source Tools

Book details
Book preview
Table of contents
Citations

About This Book

Data Simplification: Taming Information With Open Source Tools addresses the simple fact that modern data is too big and complex to analyze in its native form. Data simplification is the process whereby large and complex data is rendered usable. Complex data must be simplified before it can be analyzed, but the process of data simplification is anything but simple, requiring a specialized set of skills and tools.

This book provides data scientists from every scientific discipline with the methods and tools to simplify their data for immediate analysis or long-term storage in a form that can be readily repurposed or integrated with other data.

Drawing upon years of practical experience, and using numerous examples and use cases, Jules Berman discusses the principles, methods, and tools that must be studied and mastered to achieve data simplification, open source tools, free utilities and snippets of code that can be reused and repurposed to simplify data, natural language processing and machine translation as a tool to simplify data, and data summarization and visualization and the role they play in making data useful for the end user.

  • Discusses data simplification principles, methods, and tools that must be studied and mastered
  • Provides open source tools, free utilities, and snippets of code that can be reused and repurposed to simplify data
  • Explains how to best utilize indexes to search, retrieve, and analyze textual data
  • Shows the data scientist how to apply ontologies, classifications, classes, properties, and instances to data using tried and true methods

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Data Simplification by Jules J. Berman in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.

Information

Year
2016
ISBN
9780128038543
Chapter 1

The Simple Life

Abstract

The introduction discusses the insurmountable analytic obstacles created by collections of complex and inscrutable data. As it happens, this problem is not new. The natural history of civilization always seems to lead to a point where science and society become too bloated and burdensome to sustain further progress. A crucial point is reached when civilizations opt for simplification or nullification. This chapter reviews some of the most important simplification concepts, developed over the history of mankind, that have permitted civilization to attain its current state of activity. The chapter comes with a warning: simplify or stagnate.

Keywords

Simplification tools; Historical advances; Simplicity in civilizations; Complexity; Classifications; Midi; Povray; Html; Xml; Neuroscience

1.1 Simplification Drives Scientific Progress

Make everything as simple as possible, but not simpler.
Albert Einstein (see Glossary item, Occam's razor)
Advances in civilization have been marked by increasing complexity. To a great extent, modern complexity followed from the invention of books, which allowed us to build upon knowledge deposited by long-deceased individuals.
Because it is easy to admire complexity, it can be difficult to appreciate its opposite: simplification. Few of us want to revert to a simple, prehistoric lifestyle, devoid of the benefits of engines, electricity, automobiles, airplanes, mass production of food and manufactured items, and medical technology. Nonetheless, a thoughtful review of human history indicates that some of our greatest scientific advances involved simplifying complex activities (see Glossary item, Science). Here are just a few examples:
1. Nouns and names. By assigning specific names to individuals (eg, Turok Son of Stone, Hagar the Horrible), ancient humans created a type of shorthand for complex objects, thus releasing themselves from the task of providing repeated, detailed descriptions of the persons to whom we refer.
2. Classifications. Terms that apply to classes of things simplified our ability to communicate abstract concepts. The earliest classes may have been the names of species (eg, antelope) or families (eg, birds). In either case, class abstractions alleviated the need for naming every bird in a flock (see Glossary items, Abstraction, Species, Systematics, Taxonomy, and Classification).
3. Numerals. Early humans must have known that counting on fingers and toes can be confusing. Numbers simplified counting, and greatly extended the maximum conceivable value of a tally. Without an expandable set of integers, communicating "how much" and "how many" must have been an exasperating experience.
4. Glyphs, runes, stone tablets, and papyrus. Written language, and the media for preserving thoughts, relieved humans from committing everything to memory. The practice of writing things down simplified the task of recordkeeping and allowed ancient humans to create records that outlived the record-keepers (see Glossary item, Persistence).
5. Libraries. Organized texts (ie, books) and organized collections of texts (ie, libraries) simplified the accrual of knowledge across generations. Before there were books and libraries, early religions relied on the oral transmission of traditions and laws, an unreliable practice that invited impish tampering. The popularization of books marked the demise of oral traditions and the birth of written laws that could be copied, examined, discussed, and sometimes discarded.
6. Mathematics. Symbolic logic permitted ancient man to understand the real world through abstractions. For example, the number 2, a mathematical abstraction with no physical meaning, can apply to any type of object (eg, 2 chickens, 2 rocks, or 2 universes). Mathematics freed us from the tedious complexities of the physical realm, and introduced humans to a new world, ruled by a few simple axioms.
The list of ancient simplifications can go on and on. In modern times, simplifications have sparked new scientific paradigms and rejuvenated moribund disciplines. In the information sciences, HTML, a new and very simple method for formatting text and linking web documents and other data objects across the Internet, has revolutionized communications and data sharing. Likewise, XML has revolutionized our ability to annotate, understand, and merge data objects. The rudiments of HTML and XML can be taught in a few minutes (see Glossary items, HTML, XML, Data object).
In the computer sciences, language compilers have greatly reduced the complexity of programming. Object-oriented programming languages have simplified programming even further. Modern programmers can be much more productive than their counterparts who worked just a few decades ago. Likewise, Monte Carlo and resampling methods have greatly simplified statistics, enabling general scientists to model complex systems with ease (see Sections 8.2 and 8.3 of Chapter 8). More recently, MapReduce has simplified calculations by dividing large and complex problems into simple problems, for distribution to multiple computers (see Glossary item, MapReduce).
The methods for sequencing DNA are much simpler today than they were a few decades ago, and projects that required the combined efforts of multiple laboratories over several years, can now be accomplished in a matter of days or hours, within a single laboratory.
Physical laws and formulas simplify the way we understand the relationships among objects (eg, matter, energy, electricity, magnetism, and particles). Without access to simple laws and formulas, we could not have created complex products of technology (ie, computers, smartphones, and jet planes).

1.2 The Human Mind is a Simplifying Machine

Science is in reality a classification and analysis of the contents of the mind.
Karl Pearson in The Grammar of Science, 19001
The unrestricted experience of reality is complex and chaotic. If we were to simply record all the things and events that we see when we take a walk on a city street or a country road, we would be overwhelmed by the magnitude and complexity of the collected data: images of trees, leaves, bark, clouds, buildings, bricks, stones, dirt, faces, insects, heat, cold, wind, barometric pressure, color, shades, sounds, loudness, harmonics, sizes and positions of things, relationships in space between the positions of different objects, movements, interactions, changes in shape, emotional responses, to name just a few.2
We fool ourselves into thinking that we can gaze at the world and see what is to be seen. In fact, what really happens is that light received by retinal receptors is processed by many neurons, in many pathways, and our brain creates a representation of the data that we like to call consciousness. The ease with which we can be fooled by optical illusions serves to show that we only "see" what our brains tell us to see; not what is really there. Vision is somewhat like sitting in a darkened theater and watching a Hollywood extravaganza, complete with special effects and stage props. Dreams are an example of pseudo-visual productions directed by our subconscious brains, possibly as an antidote to nocturnal boredom.
Life, as we experience it, is just too weird to go unchecked. We maintain our sanity by classifying complex data into simple categories of things that have defined properties and roles. In this manner, we can ignore the details and concentrate on patterns. When we walk down the street, we see buildings. We know that the buildings are composed of individual bricks, panes of glass, and girders of steel; but we do not take the time to inventory all the pieces of the puzzle. We humans classify data instinctively, and much of our perception of the world derives from the classes of objects that we have invented for ourselves.
What we perceive is dependent upon the choices we make, as we classify our world. If we classify animals as living creatures, just like ourselves, with many of the same emotional and cognitive features as we have, then we might be more likely to treat animals much the same way as we treat our fellow humans. If we classify animals as a type of food, then our relationships with animals might be...

Table of contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Dedication
  6. Foreword
  7. Preface
  8. Author Biography
  9. Chapter 1: The Simple Life
  10. Chapter 2: Structuring Text
  11. Chapter 3: Indexing Text
  12. Chapter 4: Understanding Your Data
  13. Chapter 5: Identifying and Deidentifying Data
  14. Chapter 6: Giving Meaning to Data
  15. Chapter 7: Object-Oriented Data
  16. Chapter 8: Problem Simplification
  17. Index