Learning Apache Spark 2
eBook - ePub

Learning Apache Spark 2

  1. 356 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Learning Apache Spark 2

Book details
Book preview
Table of contents
Citations

About This Book

Learn about the fastest-growing open source project in the world, and find out how it revolutionizes big data analyticsAbout This Book• Exclusive guide that covers how to get up and running with fast data processing using Apache Spark• Explore and exploit various possibilities with Apache Spark using real-world use cases in this book• Want to perform efficient data processing at real time? This book will be your one-stop solution.Who This Book Is ForThis guide appeals to big data engineers, analysts, architects, software engineers, even technical managers who need to perform efficient data processing on Hadoop at real time. Basic familiarity with Java or Scala will be helpful.The assumption is that readers will be from a mixed background, but would be typically people with background in engineering/data science with no prior Spark experience and want to understand how Spark can help them on their analytics journey.What You Will Learn• Get an overview of big data analytics and its importance for organizations and data professionals• Delve into Spark to see how it is different from existing processing platforms• Understand the intricacies of various file formats, and how to process them with Apache Spark.• Realize how to deploy Spark with YARN, MESOS or a Stand-alone cluster manager.• Learn the concepts of Spark SQL, SchemaRDD, Caching and working with Hive and Parquet file formats• Understand the architecture of Spark MLLib while discussing some of the off-the-shelf algorithms that come with Spark.• Introduce yourself to the deployment and usage of SparkR.• Walk through the importance of Graph computation and the graph processing systems available in the market• Check the real world example of Spark by building a recommendation engine with Spark using ALS.• Use a Telco data set, to predict customer churn using Random Forests.In DetailSpark juggernaut keeps on rolling and getting more and more momentum each day. Spark provides key capabilities in the form of Spark SQL, Spark Streaming, Spark ML and Graph X all accessible via Java, Scala, Python and R. Deploying the key capabilities is crucial whether it is on a Standalone framework or as a part of existing Hadoop installation and configuring with Yarn and Mesos.The next part of the journey after installation is using key components, APIs, Clustering, machine learning APIs, data pipelines, parallel programming. It is important to understand why each framework component is key, how widely it is being used, its stability and pertinent use cases.Once we understand the individual components, we will take a couple of real life advanced analytics examples such as 'Building a Recommendation system', 'Predicting customer churn' and so on.The objective of these real life examples is to give the reader confidence of using Spark for real-world problems.Style and approachWith the help of practical examples and real-world use cases, this guide will take you from scratch to building efficient data applications using Apache Spark.You will learn all about this excellent data processing engine in a step-by-step manner, taking one aspect of it at a time.This highly practical guide will include how to work with data pipelines, dataframes, clustering, SparkSQL, parallel programming, and such insightful topics with the help of real-world use cases.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Learning Apache Spark 2 by Muhammad Asif Abbasi in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.

Information

Year
2017
ISBN
9781785889585
Edition
1

Learning Apache Spark 2


Learning Apache Spark 2

Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: March 2017
Production reference: 1240317
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78588-513-6
www.packtpub.com

Credits

Authors
Muhammad Asif Abbasi
Copy Editor
Safis Editing
Reviewers
Prashant Verma
Project Coordinator
Nidhi Joshi
Commissioning Editor
Veena Pagare
Proofreader
Safis Editing
Acquisition Editor
Tushar Gupta
Indexer
Tejal Daruwale Soni
Content Development Editor
Mayur Pawanikar
Graphics
Tania Dutta
Technical Editor
Karan Thakkar
Production Coordinator
Nilesh Mohite

About the Author

Muhammad Asif Abbasi has worked in the industry for over 15 years in a variety of roles from engineering solutions to selling solutions and everything in between. Asif is currently working with SAS a market leader in Analytic Solutions as a Principal Business Solutions Manager for the Global Technologies Practice. Based in London, Asif has vast experience in consulting for major organizations and industries across the globe, and running proof-of-concepts across various industries including but not limited to telecommunications, manufacturing, retail, finance, services, utilities and government. Asif is an Oracle Certified Java EE 5 Enterprise architect, Teradata Certified Master, PMP, Hortonworks Hadoop Certified developer, and administrator. Asif also holds a Master's degree in Computer Science and Business Administration.

About the Reviewers

Prashant Verma started his IT carrier in 2011 as a Java developer in Ericsson working in telecom domain. After couple of years of JAVA EE experience, he moved into Big Data domain, and has worked on almost all the popular big data technologies, such as Hadoop, Spark, Flume, Mongo, Cassandra,etc. He has also played with Scala. Currently, He works with QA Infotech as Lead Data Enginner, working on solving e-Learning problems using analytics and machine learning.
Prashant has also worked on Apache Spark for Java Developers, Packt as a Technical Reviewer.
I want to thank Packt Publishing for giving me the chance to review the book as well as my employer and my family for their patience while I was busy working on this book.

www.packtpub.com

For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
www.packtpub.com
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

  • Fully searchable across every book published by Packt
  • Copy and paste, print, and bookmark content
  • On demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review at the website where you acquired this product.
If you'd like to join our team of regular reviewers, you can email us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Preface

This book will cover the technical aspects of Apache Spark 2.0, one of the fastest growing open-source projects. In order to understand what Apache Spark is, we will quickly recap a the history of Big Data, and what has made Apache Spark popular. Irrespective of your expertise level, we suggest going through this introduction as it will help set the context of the book.

The Past

Before going into the present-day Spark, it might be worthwhile understanding what problems Spark intend to solve, and especially the data movement. Without knowing the background we will not be able to predict the future.
"You have to learn the past to predict the future."
Late 1990s: The world was a much simpler place to live, with proprietary databases being the sole choice of consumers. Data was growing at quite an amazing pace, and some of the biggest databases boasted of maintaining datasets in excess of a Terabyte.
Early 2000s: The dotcom bubble happened, meant companies...

Table of contents

  1. Learning Apache Spark 2