eBook - ePub

Microsoft Big Data Solutions

Name: Microsoft Big Data Solutions
Author: Adam Jorgensen, James Rowland-Jones, John Welch, Dan Clark, Christopher Price, Brian Mitchell

Adam Jorgensen, James Rowland-Jones, John Welch, Dan Clark, Christopher Price, Brian Mitchell

English
ePUB (apto para móviles)
Disponible en iOS y Android

eBook - ePub

Microsoft Big Data Solutions

Adam Jorgensen, James Rowland-Jones, John Welch, Dan Clark, Christopher Price, Brian Mitchell

Detalles del libro

Vista previa del libro

Índice

Citas

Información del libro

Tap the power of Big Data with Microsoft technologies

Big Data is here, and Microsoft's new Big Data platform is a valuable tool to help your company get the very most out of it. This timely book shows you how to use HDInsight along with HortonWorks Data Platform for Windows to store, manage, analyze, and share Big Data throughout the enterprise. Focusing primarily on Microsoft and HortonWorks technologies but also covering open source tools, Microsoft Big Data Solutions explains best practices, covers on-premises and cloud-based solutions, and features valuable case studies.

Best of all, it helps you integrate these new solutions with technologies you already know, such as SQL Server and Hadoop.

Walks you through how to integrate Big Data solutions in your company using Microsoft's HDInsight Server, HortonWorks Data Platform for Windows, and open source tools
Explores both on-premises and cloud-based solutions
Shows how to store, manage, analyze, and share Big Data through the enterprise
Covers topics such as Microsoft's approach to Big Data, installing and configuring HortonWorks Data Platform for Windows, integrating Big Data with SQL Server, visualizing data with Microsoft and HortonWorks BI tools, and more
Helps you build and execute a Big Data plan
Includes contributions from the Microsoft and HortonWorks Big Data product teams

If you need a detailed roadmap for designing and implementing a fully deployed Big Data solution, you'll want Microsoft Big Data Solutions.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?

Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.

¿Cómo descargo los libros?

Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.

¿En qué se diferencian los planes de precios?

Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.

¿Qué es Perlego?

Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.

¿Perlego ofrece la función de texto a voz?

Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.

¿Es Microsoft Big Data Solutions un PDF/ePUB en línea?

Sí, puedes acceder a Microsoft Big Data Solutions de Adam Jorgensen, James Rowland-Jones, John Welch, Dan Clark, Christopher Price, Brian Mitchell en formato PDF o ePUB, así como a otros libros populares de Informatik y Data-Warehousing. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Editorial

Wiley

Año

2014

ISBN

9781118729557

Edición

Categoría

Informatik

Categoría

Data-Warehousing

Part I
What Is Big Data?

In This Part

Chapter 1: Industry Needs and Solutions
Chapter 2: Microsoft's Approach to Big Data

Chapter 1
Industry Needs and Solutions

What You Will Learn in This Chapter

Finding Out What Constitutes “Big Data”
Appreciating the History and Origins of Hadoop
Defining Hadoop
Understanding the Core Components of Hadoop
Looking to the Future with Hadoop 2.0

This first chapter introduces you to the open source world of Apache and to Hadoop, one of the most exciting and innovative platforms ever created for the data professional. In this chapter we're going to go on a bit of a journey. You're going to find out what inspired Hadoop, where it came from, and its future direction. You'll see how from humble beginnings two gentlemen have inspired a generation of data professionals to think completely differently about data processing and data architecture.

Before we look into the world of Hadoop, though, we must first ask ourselves an important question. Why does big data exist? Is this name just a fad, or is there substance to all the hype? Is big data here to stay? If you want to know the answers to these questions and a little more, read on. You have quite a journey in front of you…

What's So Big About Big Data?

The world has witnessed explosive, exponential growth in recent times. So, did we suddenly have a need for big data? Not exactly. Businesses have been tackling the capacity challenge for many years (much to the delight of storage hardware vendors). Therefore the big in big data isn't purely a statement on size.

Likewise, on the processing front, scale-out solutions such as high-performance computing and distributed database technology have been in place since the last millennium. There is nothing intrinsically new there either.

People also often talk about unstructured data, but, really, this just refers to the format of the data. Could this be a reason we “suddenly” need big data? We know that web data, especially web log data, is born in an unstructured format and can be generated in significant quantities and volume. However, is this really enough to be considered big data?

In my mind, the answer is no. No one property on its own is sufficient for a project or a solution to be considered a big data solution. It's only when you have a cunning blend of these ingredients that you get to bake a big data cake.

This is in line with the Gartner definition of big data, which they updated in Doug Laney's publication, The Importance of Big Data: A Definition (Gartner, 2012): “High volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.”

What we do know is that every CIO on the planet seems to want to start a big data project right now. In a world of shrinking budgets, there is this sudden desire to jump in with both feet into this world of big data and start prospecting for golden nuggets. It's the gold rush all over again, and clearly companies feel like they might miss out if they hesitate.

However, this is a picture that has been sharpening its focus for several years. In the buildup to this ubiquitous acceptance of big data, we've been blessed with plenty of industry terms and trends, web scale, new programming paradigms of “code first,” and of course, to the total disgust of data modelers everywhere, NoSQL. Technologies such as Cassandra and MongoDB are certainly part of the broader ecosystem, but none have resonated as strongly with the market as Hadoop and big data. Why? In essence, unless you were Facebook, Google, Yahoo!, or Bing, issues like web scale really didn't apply.

It seems as though everyone is now building analytics platforms, and that, to be the king of geek chic, requires a degree in advanced statistics. The reason? Big data projects aren't defined by having big data sets. They are shaped by big ideas, by big questions, and by big opportunities. Big data is not about one technology or even one platform. It's so much more than that: It's a mindset and a movement.

Big data, therefore, is a term that underpins a raft of technologies (including the various Hadoop projects, NoSQL offerings, and even MPP Database Systems, for example) that have been created in the drive to better analyze and derive meaning from data at a dramatically lower cost and while delivering new insights and products for organizations all over the world. In times of recession, businesses look to derive greater value from the assets they have rather than invest in new assets. Big data, and in particular Hadoop, is the perfect vehicle for doing exactly that.

A Brief History of Hadoop

Necessity is the mother of invention, and Hadoop is no exception. Hadoop was created to meet the need of web companies to index and process the data tsunami courtesy of the newfangled Internetz. Hadoop's origins owe everything to both Google and the Apache Nutch project. Without one influencing the other, Hadoop might have ended up a very different animal (joke intended). In this next section, we are going to see how their work contributed to making Hadoop what it is today.

Google

As with many pioneering efforts, Google provided significant inspiration for the development that became known as Hadoop. Google published two landmark papers. The first paper, published in October 2003, was titled “The Google File System,” and the second paper, “MapReduce: Simplified Data Processing on Large Clusters,” published just over a year later in December 2004, provided the inspiration to Doug Cutting and his team of part-time developers for their project, Nutch.

MapReduce was first designed to enable Google developers to focus on the large-scale computations that they were trying to perform while abstracting away all the scaffolding code required to make the computation possible. Given the size of the data set they were working on and the duration of tasks, the developers knew that they had to have a model that was highly parallelized, was fault tolerant, and was able to balance the workload across a distributed set of machines. Of course, the Google implementation of MapReduce worked over Google File System (GFS); Hadoop Distributed File System (HDFS) was still waiting to be invented.

Google has since continued to release thought-provoking, illuminating, and inspirational publications. One publication worthy of note is “BigTable: A Distributed Storage System for Structured Data.” Of course, they aren't the only ones. LinkedIn, Facebook, and of course Yahoo! have all contributed to the big data mind share.

There are similarities here to the SIGMOD papers published by various parties in the relational database world, but ultimately it isn't the same. Let's look at an example. Twitter has open-sourced Storm—their complex event processing engine—which has recently been accepted into the Apache incubator program. For relational database vendors, this level of open sharing is really quite unheard of. For more details about storm head over to Apache: http://incubator.apache.org/projects/storm.html.

Nutch

Nutch was an open source crawler-based search engine built by a handful of part-time developers, including Doug Cutting. As previously mentioned Cutting was inspired by the Google publications and changed Nutch to take advantage of the enhanced scalability of the architecture promoted by Google. However, it wasn't too long after this that Cutting joined Yahoo! and Hadoop was born.

Nutch joined the Apache foundation in January 2005, and its first release (0.7) was in August 2005. However, it was not until 0.8 was released in July 2006 that Nutch began the transition to Hadoop-based architecture.

Nutch is still very much alive and is an actively contributed-to project. However, Nutch has now been split into two codebases. Ver...

Índice

Cover
Table of Contents
Part I: What Is Big Data?
Part II: Setting Up for Big Data with Microsoft
Part III: Storing and Managing Big Data
Part IV: Working with Your Big Data
Part V: Big Data and SQL Server Together
Part VI: Moving Your Big Data Forward
Introduction
End User License Agreement

Estilos de citas para Microsoft Big Data Solutions

APA 6 Citation

Jorgensen, A., Rowland-Jones, J., Welch, J., Clark, D., Price, C., & Mitchell, B. (2014). Microsoft Big Data Solutions (1st ed.). Wiley. Retrieved from https://www.perlego.com/book/1003266/microsoft-big-data-solutions-pdf (Original work published 2014)

Chicago Citation

Jorgensen, Adam, James Rowland-Jones, John Welch, Dan Clark, Christopher Price, and Brian Mitchell. (2014) 2014. Microsoft Big Data Solutions. 1st ed. Wiley. https://www.perlego.com/book/1003266/microsoft-big-data-solutions-pdf.

Harvard Citation

Jorgensen, A. et al. (2014) Microsoft Big Data Solutions. 1st edn. Wiley. Available at: https://www.perlego.com/book/1003266/microsoft-big-data-solutions-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Jorgensen, Adam et al. Microsoft Big Data Solutions. 1st ed. Wiley, 2014. Web. 14 Oct. 2022.