It's All Analytics - Part II
eBook - ePub

It's All Analytics - Part II

Designing an Integrated AI, Analytics, and Data Science Architecture for Your Organization

Scott Burk, David Sweenor, Gary Miner

  1. 240 Seiten
  2. English
  3. ePUB (handyfreundlich)
  4. Über iOS und Android verfügbar
eBook - ePub

It's All Analytics - Part II

Designing an Integrated AI, Analytics, and Data Science Architecture for Your Organization

Scott Burk, David Sweenor, Gary Miner

Angaben zum Buch
Buchvorschau
Inhaltsverzeichnis
Quellenangaben

Über dieses Buch

Up to 70% and even more of corporate Analytics Efforts fail!!! Even after these corporations have made very large investments, in time, talent, and money, in developing what they thought were good data and analytics programs. Why? Because the executives and decision makers and the entire analytics team have not considered the most important aspect of making these analytics efforts successful. In this Book II of "It's All Analytics!" series, we describe two primary things: 1) What this "most important aspect" consists of, and 2) How to get this "most important aspect" at the center of the analytics effort and thus make your analytics program successful.

This Book II in the series is divided into three main parts:

Part I, Organizational Design for Success, discusses ……. The need for a complete company / organizational Alignment of the entire company and its analytics team for making its analytics successful. This means attention to the culture – the company culture culture!!! To be successful, the CEO's and Decision Makers of a company / organization must be fully cognizant of the cultural focus on 'establishing a center of excellence in analytics'. Simply, "culture – company culture" is the most important aspect of a successful analytics program. The focus must be on innovation, as this is needed by the analytics team to develop successful algorithms that will lead to greater company efficiency and increased profits.

Part II, Data Design for Success, discusses ….. Data is the cornerstone of success with analytics. You can have the best analytics algorithms and models available, but if you do not have good data, efforts will at best be mediocre if not a complete failure. This Part II also goes further into data with descriptions of things like Volatile Data Memory Storage and Non-Volatile Data Memory Storage, in addition to things like data structures and data formats, plus considering things like Cluster Computing, Data Swamps, Muddy Data, Data Marts, Enterprise Data Warehouse, Data Reservoirs, and Analytic Sandboxes, and additionally Data Virtualization, Curated Data, Purchased Data, Nascent & Future Data, Supplemental Data, Meaningful Data, GIS (Geographic Information Systems) & Geo Analytics Data, Graph Databases, and Time Series Databases. Part II also considers Data Governance including Data Integrity, Data Security, Data Consistency, Data Confidence, Data Leakage, Data Distribution, and Data Literacy.

Part III, Analytics Technology Design for Success, discusses …. Analytics Maturity and aspects of this maturity, like Exploratory Data Analysis, Data Preparation, Feature Engineering, Building Models, Model Evaluation, Model Selection, and Model Deployment. Part III also goes into the nuts and bolts of modern predictive analytics, discussing such terms as AI = Artificial Intelligence, Machine Learning, Deep Learning, and the more traditional aspects of analytics that feed into modern analytics like Statistics, Forecasting, Optimization, and Simulation. Part III also goes into how to Communicate and Act upon Analytics, which includes building a successful Analytics Culture within your company / organization.

All-in-all, if your company or organization needs to be successful using analytics, this book will give you the basics of what you need to know to make it happen.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?
Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.
(Wie) Kann ich Bücher herunterladen?
Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.
Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?
Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.
Was ist Perlego?
Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.
Unterstützt Perlego Text-zu-Sprache?
Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.
Ist It's All Analytics - Part II als Online-PDF/ePub verfügbar?
Ja, du hast Zugang zu It's All Analytics - Part II von Scott Burk, David Sweenor, Gary Miner im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Commerce & Management. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Jahr
2021
ISBN
9781000433999
Auflage
1

II DESIGNING FOR DATA SUCCESS

Chapter 4 Data Design for Success

DOI: 10.4324/9780429343957-6
The world is one big data problem.
Andrew McAfee, principal research scientist, MIT
Currently, seven out of the ten highest valued global brands are data companies. Data as the new oil? Clearly. When you invest in data, its storage, its management and its analysis, you’re investing in innovation.
Thomas Harrer, CTO, IBM Systems

Introduction

We are now moving into the technology sections of the book. Designing for People and Processes is where most analytics programs fail; it is crucial to get this right. Nevertheless, getting this right assumes you have a firm technology foundation. You want to design your organization (people, process, culture, and knowledge) first and then align the technical architecture within the organization – we say for short – design and align.
It’s about the data…. Exactly! The world isn’t run by weapons anymore, or energy, or money, it’s run by little ones and zeroes, little bits of data. It’s all just electrons.
Sneakers
That line is one of Scott’s (author) favorite quotes and it is from the movie Sneakers (see Lasker, et al., 1992) and it is Cosmo (Al Pacino) and Martin (Robert Redford) on a building roof top in a daring standoff when Cosmo says “It’s about the data…. Exactly! The world isn’t run by weapons anymore, or energy, or money, it’s run by little ones and zeroes, little bits of data. It’s all just electrons”. But no, it is really more than just the data! Data alone does nothing! Data by itself is just a cost!
Data costs include the cost to acquire it, the cost to store it, the legal liability of keeping it, and the potential risk of a data breach. We normally think that data is cheap, but when you consider the total cost of ownership, it is really quite expensive. And maybe only 15%–30% of data collected is ever used (see Priceonomics, August, 2019; Barrett, 2018); it sits there and does nothing (it is called “Dark Data”). So, it is only when your action is based on data that you gain value from it; therefore – It’s about the analytics and It’s All Analytics!
You can only offset the costs of all that data if you use it to the benefit of your organization.
As we are writing this, the country is in a crisis, a pandemic. For months, after the COVID pandemic started in early 2020, models were not the problem. Lack of visibility was definitely not an issue. Lack of attention from the public was not the issue. The media continually reporting on it was not the issue. The Issue was Meaningful Data – The Issue was a Lack of Complete Data. Some countries were not reporting the numbers they had available. Some countries lacked the numbers due to economic conditions and poor infrastructure. The biggest issues were the lack of meaningful data around testing – we didn’t know the number of people being tested; all we knew were positive numbers of COVID. The key measures reported were total confirmed cases, total deaths, and total recovered. We did not have the number of people that were tested. We did not know the number of people that already had the virus and thus most likely had immunity. We had no controls – we needed widespread testing of the general population to know the true incidence rate. The testing being done was heavily biased by the sick, not the population as a whole.
In early 2021 economies are still being shut down. We have no idea when we will be completely open again, if ever. Will there be additional waves, will super COVID be an issue? We cannot make wise policy decisions without meaningful, quality data. We were crippled and still reeling:
We were crippled for lack of data, nothing else.
Until we have the right data, we cannot make the right decisions. This is fundamental. In a world that is swimming in data, we must have the right data to make effective decisions. The media, politicians, social media warriors, and others including respected epidemiologists, were all saying the models are wrong! Yes, but not because of the lack of science and modeling expertise – it is not “modeling” that is wrong – it is Lack of the Right Data!
Science, business, and virtually every intelligent human endeavor today requires good data. You cannot make good decisions without the right data.
COVID and the Need for the Right Data
COVID has proven that regardless of how many smart people are involved and how hard they work, you cannot overcome bad data!
Given the time and energy, I (Scott) would have loved to have kept a journal, write, and blog about the gaps I saw with reporting and “experts and news” commentary. But, I had other commitments and did not write about it with one exception, I did write a LinkedIn article very early in the process (https://www.linkedin.com/pulse/covid-19-issue-modeling-scott-burk/, 4/10/20). It was early in the pandemic. It was the frustration with the reporting and commentary about all the models being wrong. It was insinuating that the scientists involved in presenting to the public didn’t know what they were doing; that “the models are wrong”, that “the analysis is wrong”, and that “the epidemiologists are wrong”. However, it was not the math or the underlying theory that was the problem. It was not the statistical models that were as much to blame as bias and insufficiency of data. No matter how great a scientist, epidemiologist, or statistician, you cannot provide good analytics or make good decisions with bad data.
The next six chapters, Part 2 (this Chapter, plus Chapters 59), are about data strategy and data management. There is no “one size fits all” solution and therefore we do not make specific recommendations. What we do is lay out fundamental pieces, Lego blocks if you will, to choose and build upon. We do want to impart the importance of data and getting that data strategy right and then governing and protecting all those potentially valuable assets.

Why Is Data So Important?

A process is a series of actions or steps taken to achieve a particular end goal. Life is all about processes. We have biological and chemical processes that govern our health and well-being. We have business processes that impact the success or failure of our business. We have government agencies that have processes to provide oversight and administrate. Anything that you desire to understand or improve involves a process. Data is just the artifacts of a process. Therefore, data are key to any consciously based improvement.
Billions and Data
Billions is an American television drama series that tells the story of hedge fund manager Bobby Axelrod (Damian Lewis) as he accumulates wealth and power in the world of high finance. Axelrod is very data driven (and for dramatic effect unscrupulous, no correlation implied). There is an interesting scene with Axelrod at a horse track where he says
So I started watching that (pointing to the leader/information board) instead of watching that (pointing to the track).The numbers told the story, they always do.
Yes, Axel, numbers are data, and data tells the story.

Data Is the Cornerstone of Improvement

You cannot improve what you do not measure.
You cannot even determine how you are doing until you compare at least two data points. “Everything is relative” is attributed to Albert Einstein. This is true in physics and most everything. All quantitative and qualitative assessment is by comparison, relating two or more things. A comparison of two groups or two periods of time results in one of three outcomes – things are the same, they are improving, or they are getting worse.
Even if we do not tangibly collect the data and even if we do not record the data anywhere but instead only mentally note it, we still use the same process of comparison in trying to determine cause and effect. The data exists if nowhere but our minds. We are making thousands of comparisons each day to determine our relative progress toward our stated or unstated goals and intentions. That is human evolution. That is why data are so important.
Data is at the heart of
  1. Maintaining status quo
  2. Improving on the status quo
  3. Determining how I am doing
In fact, the saying goes, data is the new oil! As we will see, you can even buy data (or at least rent it). Most data are NOT captured anywhere. Data are artifacts of process and observation. They may be recorded or unrecorded. Humans observe. Machines observe. Activity and process drive data, observation drives data.
In analytics we often1 go beyond these subjective determinations and require stored data to perform the analysis. We focus on capturing and storing data in this chapter.

Processes Are Everywhere

We often do not think in these terms, but you participate in and observe thousands of processes and systems in a day. Virtually everything we do is a process. From our personal routines like getting ready for work, planning a vacation to a workout routine. Our work day is full of hundreds of processes, regardless of your work function. A process is just a series of steps to accomplish an objective.
A system is a set of interrelated processes. We often think in terms of computer or mechanical systems. These systems are composed of thousands of processes. Each and every one of these processes generates an observable result. Each observable result can be recorded. Recorded results are data.
1Exceptions exist where the data are generated synthetically by an algorithm such as simulation, Bayesian Statistics, see Burk and Miner (2020).

The Problem – Issues with Data Continue to Persist

Sourcing, merging, cleansing, and making data ready for analytics development and analytics production are the real bottlenecks these days; but it has been that way for more than the last 30 years. Today, even with all the progress in technology, getting and cleaning data still consumes up to 80%–90% of an analyst or a data scientist’s time, thus limiting productivity.
We always seem to be on the horizon of overcoming these challenges, but they don’t materialize. One of the current promises is using AI methods themselves to help in feature extraction and data preparation for AI work. A prefilter, so to speak, to remediate data problems.
By 2025, it’s estimated that the amount of data will double every 12 hours. Thanks for sensors and wireless connectivity connected devices are generating oceans of data each day.
There are improvements in systems and processes that treat data. However, data is getting more complex, the growth of data is increasing, and interesting problems to solve with analytics are exploding. Our data manage...

Inhaltsverzeichnis

  1. Cover
  2. Title Page
  3. Copyright Page
  4. Dedication Page
  5. Table of Contents
  6. Foreword and Tribute to the Authors
  7. Preface
  8. Authors
  9. SECTION I Designing for Organizational Success
  10. SECTION II Designing for Data Success
  11. SECTION III Designing for Analytics Success
  12. Index
Zitierstile für It's All Analytics - Part II

APA 6 Citation

Burk, S., Sweenor, D., & Miner, G. (2021). It’s All Analytics - Part II (1st ed.). Taylor and Francis. Retrieved from https://www.perlego.com/book/2806917/its-all-analytics-part-ii-designing-an-integrated-ai-analytics-and-data-science-architecture-for-your-organization-pdf (Original work published 2021)

Chicago Citation

Burk, Scott, David Sweenor, and Gary Miner. (2021) 2021. It’s All Analytics - Part II. 1st ed. Taylor and Francis. https://www.perlego.com/book/2806917/its-all-analytics-part-ii-designing-an-integrated-ai-analytics-and-data-science-architecture-for-your-organization-pdf.

Harvard Citation

Burk, S., Sweenor, D. and Miner, G. (2021) It’s All Analytics - Part II. 1st edn. Taylor and Francis. Available at: https://www.perlego.com/book/2806917/its-all-analytics-part-ii-designing-an-integrated-ai-analytics-and-data-science-architecture-for-your-organization-pdf (Accessed: 15 October 2022).

MLA 7 Citation

Burk, Scott, David Sweenor, and Gary Miner. It’s All Analytics - Part II. 1st ed. Taylor and Francis, 2021. Web. 15 Oct. 2022.