AWS Administration - The Definitive Guide - Second Edition
eBook - ePub

AWS Administration - The Definitive Guide - Second Edition

  1. 358 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

AWS Administration - The Definitive Guide - Second Edition

Book details
Book preview
Table of contents
Citations

About This Book

Leverage this step-by-step guide to build a highly secure, fault-tolerant, and scalable Cloud environmentAbout This Book• Learn how to leverage various Amazon Web Services (AWS) components and services to build a secure, reliable, and robust environment to host your applications on.• Delve into core AWS service offerings with hands-on tutorials, real-world use case scenarios, and best practices.• A self-paced, systematic, and step-by-step guide to learning and implementing AWS in your own environment.Who This Book Is ForThis book is for those who want to learn and leverage the rich plethora of services provided by AWS. Although no prior experience with AWS is required, it is recommended that you have some hands-on experience of Linux, Web Services, and basic networking.What You Will Learn• Take an in-depth look at what's new with AWS, along with how to effectively manage and automate your EC2 infrastructure with AWS Systems Manager• Deploy and scale your applications with ease using AWS Elastic Beanstalk and Amazon Elastic File System• Secure and govern your environments using AWS CloudTrail, AWS Config, and AWS Shield• Learn the DevOps way using a combination of AWS CodeCommit, AWS CodeDeploy, and AWS CodePipeline• Run big data analytics and workloads using Amazon EMR and Amazon Redshift• Learn to back up and safeguard your data using AWS Data Pipeline• Get started with the Internet of Things using AWS IoT and AWS GreengrassIn DetailMany businesses are moving from traditional data centers to AWS because of its reliability, vast service offerings, lower costs, and high rate of innovation. AWS can be used to accomplish a variety of both simple and tedious tasks. Whether you are a seasoned system admin or a rookie, this book will help you to learn all the skills you need to work with the AWS cloud.This book guides you through some of the most popular AWS services, such as EC2, Elastic Beanstalk, EFS, CloudTrail, Redshift, EMR, Data Pipeline, and IoT using a simple, real-world, application-hosting example. This book will also enhance your application delivery skills with the latest AWS services, such as CodeCommit, CodeDeploy, and CodePipeline, to provide continuous delivery and deployment, while also securing and monitoring your environment's workflow. Each chapter is designed to provide you with maximal information about each AWS service, coupled with easy to follow, hands-on steps, best practices, tips, and recommendations.By the end of the book, you will be able to create a highly secure, fault-tolerant, and scalable environment for your applications to run on.Style and approachThis in-depth and insightful guide is filled with easy-to-follow examples, real-world use cases, best practices, and recommendations that will help you design and leverage some of the most commonly used AWS services.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access AWS Administration - The Definitive Guide - Second Edition by Yohan Wadia in PDF and/or ePUB format, as well as other popular books in Computer Science & Cloud Computing. We have over one million books available in our catalogue for you to explore.

Information

Year
2018
ISBN
9781788477178
Edition
2

Powering Analytics Using Amazon EMR and Amazon Redshift

In the previous chapter, we learned about two really useful services that developers can leverage to build highly scalable and decoupled applications in the cloud: Amazon SNS and Amazon SQS.
In this chapter, we will be turning things up a notch and exploring two amazingly powerful AWS services that are ideal for processing and running large-scale analytics and data warehousing in the cloud: Amazon EMR and Amazon Redshift.
Keeping this in mind, let's have a quick look at the various topics that we will be covering in this chapter:
  • Understanding the AWS analytics suite of services with an in-depth look at Amazon EMR, along with its use cases and benefits
  • Introducing a few key EMR concepts and terminologies, along with a quick getting started tour
  • Running a sample workload on EMR, using steps
  • Introducing Amazon Redshift
  • Getting started with an Amazon Redshift cluster
  • Working with Redshift databases and tables
  • Loading data from Amazon EMR into Amazon Redshift
So without any further ado, let's get started right away!

Understanding the AWS analytics suite of services

With the growth of big data and its adoption across organizations on the rise, many cloud providers today provide a plethora of services that are specifically designed to run massive computations and analytics on large volumes of data. AWS is one such cloud provider that also has invested a lot into the big data and analytics paradigm with a host of services offering ready-to-use frameworks, business insights and data warehousing solutions, as well. Here is a brief explanation of the AWS analytics suite of services:
  • Amazon EMR: Amazon Elastic MapReduce or EMR is a quick and easy to use service that provides users with a scalable, managed Hadoop ecosystem and framework. You can leverage EMR to process vast amounts of data without having to worry about configuring the underlying Hadoop platform. We will be learning and exploring more on EMR in the subsequent sections of this chapter.
  • Amazon Athena: Amazon Athena takes big data processing up a notch by providing a standard SQL interface for querying data that is stored directly on Amazon S3. With Athena, you do not have any underlying hardware to manage or maintain; it is all managed by AWS itself. This serverless approach makes Athena ideal for processing data that does not require any complex ETL processing. All you need to do is create a schema, point Athena to your data on Amazon S3, and start querying it using simple SQL syntax.
  • Amazon Elasticsearch Service: Amazon Elasticsearch Service provides a managed deployment of the popular open source search and analytics engine: Elasticsearch. This service comes in really handy when you wish to process streams of data originating from various sources such as logs generated from instances, and so on.
  • Amazon Kinesis: Unlike the other services discussed so far, Amazon Kinesis is more of a streaming service provided by AWS. You can use Amazon Kinesis to push vast amounts of data originating from multiple sources, into one or more streams that can be consumed by other AWS services for performing analytics and other data processing processes.
  • Amazon QuickSight: Amazon QuickSight is an extremely cost-effective business insights solution that can be used to perform fast ad hoc analysis on data.
  • Amazon Redshift: Amazon Redshift is a petabyte-scale data warehousing solution provided by AWS that you can leverage for analyzing your data, using an existing set of tools. We will be learning more about Redshift a bit later during this chapter. The services are depicted here:
  • AWS Data Pipeline: Moving large amounts of data between AWS services can be difficult to perform, especially when the data sources vary. AWS Data Pipeline makes it easier to transfer data between different AWS storage and compute services, as well as helping in the initial transformation and processing of data. You can even use Data Pipeline to transfer data reliably from an on-premise location into AWS storage services, as well.
  • AWS Glue: AWS Glue is a managed ETL (Extract, Transform and Load) service recently launched by AWS. Using AWS Glue greatly simplifies the process of preparing, extracting, and loading data from large datasets into an AWS storage service.
With this brief overview of the AWS analytics suite of services, let's now move forward and get started with understanding a bit more about Amazon EMR!

Introducing Amazon EMR

As mentioned earlier, Amazon EMR is a managed service that provides big data analytics frameworks, such as Apache Hadoop and Apache Spark straight out of the box and ready for use. Using Amazon EMR, you can easily perform a variety of use cases such as batch processing, big data analytics, low-latency querying, data streaming, or even use EMR as a large datastore itself!
With Amazon EMR, there is very little underlying infrastructure to manage on your part. You simply have to decide the number of instances you initially want to run your EMR cluster on and start consuming the framework for analytics and processing. Amazon EMR provides you with features that enable you to scale your infrastructure based on your requirements, without affecting the existing setups. Here is a brief look at some of the benefits that you can obtain by leveraging Amazon EMR for your own workloads:
  • Pricing: Amazon EMR relies on EC2 instances to spin up your Apache Hadoop or Apache Spark clusters. Although you can vary costs by selecting the instance types for your cluster from large to extra large and so on, the best part of EMR is that you can also opt between using a combination of on-demand EC2 instances, reserved and spot instances based on your setup, thus providing you with flexibility at significantly lower costs.
  • Scalability: Amazon EMR provides you with a simple way of scaling running workloads, depending on their processing requirements. You can resize your cluster or its individual components as you see fit and additionally, configure one or more instance groups for a guaranteed instance availability and processing.
  • Reliability: Although you, as an end user, have to specify the initial instances and their sizes, AWS ultimately ensures the reliability of the cluster by swapping out instances that either have failed or are going to in the due course of time.
  • Integration: Amazon EMR integrates with the likes of other AWS services to provide your cluster with additional storage, network, and security requirements. You can use services such as Amazon S3 to store both the input as well as the output data, AWS CloudTrail for auditing the requests made to EMR, VPC to ensure the security of your launched EMR instances and much more!
With these details in mind, let's move an inch closer to launching our very own EMR cluster by first visiting some of its key concepts and terminologies.

Concepts and terminologies

Before we get started with Amazon EMR, it is important to understand some of its key concepts and terminologies, starting out with clusters and nodes:
  • Clusters: Clusters are the core functioning component in Amazon EMR. A cluster is a group of EC2 instances that together can be used to process your workloads. Each instance within a cluster is termed as a node and each node has a different role to perform within the cluster.
  • Nodes: Amazon EMR distinguishes between clusters instances by providing them with one of these three roles:
    • Master node: An instance that is responsible for the overall manageability, working and monitoring of your cluster. The master node takes care of all the data and task distributions that occur within the cl...

Table of contents

  1. Title Page
  2. Copyright and Credits
  3. Packt Upsell
  4. Contributors
  5. Preface
  6. What's New in AWS?
  7. Managing EC2 with Systems Manager
  8. Introducing Elastic Beanstalk and Elastic File System
  9. Securing Workloads Using AWS WAF
  10. Governing Your Environments Using AWS CloudTrail and AWS Config
  11. Access Control Using AWS IAM and AWS Organizations
  12. Transforming Application Development Using the AWS Code Suite
  13. Messaging in the Cloud Using Amazon SNS and Amazon SQS
  14. Powering Analytics Using Amazon EMR and Amazon Redshift
  15. Orchestrating Data using AWS Data Pipeline
  16. Connecting the World with AWS IoT and AWS Greengrass
  17. Other Books You May Enjoy