Big Data Analytics
eBook - ePub

Big Data Analytics

Harnessing Data for New Business Models

Soraya Sedkaoui, Mounia Khelfaoui, Nadjat Kadi, Soraya Sedkaoui, Mounia Khelfaoui, Nadjat Kadi

Share book
  1. 306 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Big Data Analytics

Harnessing Data for New Business Models

Soraya Sedkaoui, Mounia Khelfaoui, Nadjat Kadi, Soraya Sedkaoui, Mounia Khelfaoui, Nadjat Kadi

Book details
Book preview
Table of contents
Citations

About This Book

This volume explores the diverse applications of advanced tools and technologies of the emerging field of big data and their evidential value in business. It examines the role of analytics tools and methods of using big data in strengthening businesses to meet today's information challenges and shows how businesses can adapt big data for effective businesses practices.

This volume shows how big data and the use of data analytics is being effectively adopted more frequently, especially in companies that are looking for new methods to develop smarter capabilities and tackle challenges in dynamic processes. Many illustrative case studies are presented that highlight how companies in every sector are now focusing on harnessing data to create a new way of doing business.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Big Data Analytics an online PDF/ePUB?
Yes, you can access Big Data Analytics by Soraya Sedkaoui, Mounia Khelfaoui, Nadjat Kadi, Soraya Sedkaoui, Mounia Khelfaoui, Nadjat Kadi in PDF and/or ePUB format, as well as other popular books in Betriebswirtschaft & Entscheidungsfindung. We have over one million books available in our catalogue for you to explore.

Information

Year
2021
ISBN
9781000290530

PART I Big Data: Opportunities and Challenges

CHAPTER 1 Big Data: An Overview

MALIKA BAKDI1 and WASSILA CHADLI2
1Senior Researcher, National High School of Statistics and Applied Economics (ENSSEA), KolĂ©a, Algeria, E-mail: [email protected]
2National High School of Statistics and Applied Economics (ENSSEA), KolĂ©a, Algeria, E-mail: [email protected]

ABSTRACT

This chapter focuses on a new trend to process and analyze large data, i.e., big data. It has become an imperative approach, particularly with the massive outbreak of data on the Internet (videos, photos, messages, social networks, e-commerce transactions, etc.) and the large diffusion use of connected objects (smartphones and tablets). In this research, we attempt to represent the big data phenomenon’s designs, architectures, and applications.

1.1 INTRODUCTION

Data and algorithms shape a new world that consists of a form of culmination for computing and, more precisely, a new way of controlling information. With more than 95% of the world’s data set having been created in recent years, it is important to know that it is not the one who has the best algorithm wins, but the one who has more data; and it is not just any type of data, but only the reliable data that are counted. As a result, a large amount of data will be accumulated as we have algorithms that work very efficiently based on the data we process.
Thus, the major problem with this large amount of data is that it becomes very difficult to work with, especially with the traditional database processing tools [4]. Today, companies are facing an exponential increase in data volume. To give us a more precise idea, we can attain several petabytes (10)15, see even zettabytes (10)21.
As expected, the amount of data created and managed has grown exponentially over the past few years. Hence, we can imagine how huge the amount of data that will be created in the future years, as data can be acquired from logs, social media, e-commerce transactions (the data are of a diverse nature), etc. Undoubtedly, many companies want to take advantage of this data – whether data collected by themselves or public data such as the web or open data. As a result, traditional technologies are not designed to process with a massive data explosion, and therefore thanks to big data, where the exponential growth of data can be processed.
In this work, we present theoretical research about big data. It should be mentioned that 2012 was the year of the big data buzz when the notion was popularized; this means that companies are dealing with an amount volume of data to be processed, which presents a technical and economic challenge.
The objective of the present work is to answer the following questions: what is big data? Why are we interested in big data? In addition, what is the revolutionary technology adopted by big data?

1.2 BIG DATA: CONCEPT AND DEFINITION

Certainly, in the explanation of big data, a lot has been said about the volume, which is one of the very important aspects of the clarification of the big data concept. Thus, a classic definition has been proposed by Gartner, which implies three dimensions (as shown in Figure 1.1).
FIGURE 1.1 The three V’s of big data.
Source: Authors’ creation.
The first one is about volume: it is the massive explosion of data that requires their processing and analysis. The second dimension is variety, which corresponds to the difficulty of processing and analyzing data, but more precisely, crossing the new data sources in an effective way that is more diverse and from multiple nature. Thus, the variety distinguishes big data from traditional data analysis. Indeed, big data analyzes data sets from different sources [8]. The third dimension is the velocity, which corresponds to the speed with which they are generated, processed, and stored.
It is clear that individuals and companies are great data generators in a very short time, but there is a shifted time between their processing and their generation. The coming of big data technology makes the job easier, thus giving us the advantage of processing data while it is being generated.
Subsequently, the explanation of big data does not focus exclusively on these three dimensions, as IBM has added two other dimensions to properly target the explanation, which are veracity and value. Veracity is the ability to have reliable data; for example, the generation of data by spambot is an example worthy of confidence. Another example is that of Mexico, where the presidential elections were made by a fake Twitter account.
The fifth is the value, having an equivalent meaning that the big data approach only makes sense to achieve strategic objectives related to individuals and the company, for the purpose of creating an added value, regardless of the field of activity. Thus, the success of a big data project is largely correlated by the creation of added value and new knowledge. The explanation of big data extends to the other 5V to note: validity, vulnerability, volatility, visualization, and variability.

1.3 BIG DATA IN DIGITS

One of the fundamental reasons for the existence of the big data phenomenon is the current extent to which information can be generated and made available [5]. The speed growth of data, especially those approved by intelligent objects, will reach more than 50 billion in the world in 2020. According to predictions, 40,000 billion data will be generated [14].
It is estimated that 90% of the data collected since the beginning of humanity have been generated only over the last two years, in which 70% of the data are created by individuals, although it is the companies that store and manage 80% of it.
Following this exponential trend in data, the countries became aware of the importance of big data, and thus in 2012, the U.S. announced a donation of 200 million dollars for research related to the theme of big data. In parallel, the big data strategy generates profits of $8.9 billion, which is the revenue generated by the big data market in 2014. Certainly, Amazon would generate 30% of its revenues through cross-selling [12].

1.3.1 BIG DATA ORIGIN

According to Fermigier [6], big data comes in particular from:
  • The Web: Access logs, social networks, e-commerce, indexing, storage of documents, photos, videos, linked data, etc. (e.g., Google processed 24 petabytes of data per day with MapReduce in 2009).
  • The Internet and Connected Objects: RFID, sensor networks, telephone call logs.
  • Science: Genomics, astronomy, subatomic physics (e.g., the German Climate Research Centre manages a database of 60 petabytes).
  • Business: e.g., Transaction history in a chain of hypermarkets.
  • Personal Data: e.g., Medical records.
  • Public Data: Open data.

1.3.2 BIG DATA PIONEERS

The massive growth of new big data technologies has become essential for many companies wishing to better know their suppliers and customers. The booming big data market includes several actors offering specific services [7].
Major web stakeholders, including Yahoo and Google search engines, as well as social media such as Facebook, also offer big data solutions. From 2004, Google proposed MapReduce, an algorithm capable of processing and storing a large amount of data. In 2014, Google announced its replacement by Google Cloud Dataflow, a SaaS solution.
Yahoo, for its part, is one of the main contributors to the Hadoop project by hiring Doug Cutting, its creator. The search engine has also created Horton works, a company dedicated entirely to the development of Hadoop.
Amazon, the American online retail giant, is also one of the pioneers of big data. Since 2009, it has provided companies with tools such as Amazon Web Services (AWS) and Elastic MapReduce, better known as EMR. The latter is accessible to everyone since its use does not require any skill in installing and adjusting Hadoop clusters [8].
Everyday users and individuals produce a massive amount of data. This data presents many opportunities for companies. Big data is the largest volume of data that translates into the creation of new technology that facilitates the growth and development of big data, which can be broadly categorized into two main families.
On the one hand, storage technologies are driven particularly by the deployment of cloud computing. On the other hand, the arrival of adjusted processing technology, especially the development of new databases adapted to unstructured data (Hadoop) and the implementation of high-performance computing modes (MapReduce). Figure 1.2 summarizes the main technologies that support the deployment of big data.
FIGURE 1.2 Big data technology.

1...

Table of contents