- 358 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
AWS Administration - The Definitive Guide - Second Edition
About This Book
Leverage this step-by-step guide to build a highly secure, fault-tolerant, and scalable Cloud environmentAbout This Book⢠Learn how to leverage various Amazon Web Services (AWS) components and services to build a secure, reliable, and robust environment to host your applications on.⢠Delve into core AWS service offerings with hands-on tutorials, real-world use case scenarios, and best practices.⢠A self-paced, systematic, and step-by-step guide to learning and implementing AWS in your own environment.Who This Book Is ForThis book is for those who want to learn and leverage the rich plethora of services provided by AWS. Although no prior experience with AWS is required, it is recommended that you have some hands-on experience of Linux, Web Services, and basic networking.What You Will Learn⢠Take an in-depth look at what's new with AWS, along with how to effectively manage and automate your EC2 infrastructure with AWS Systems Manager⢠Deploy and scale your applications with ease using AWS Elastic Beanstalk and Amazon Elastic File System⢠Secure and govern your environments using AWS CloudTrail, AWS Config, and AWS Shield⢠Learn the DevOps way using a combination of AWS CodeCommit, AWS CodeDeploy, and AWS CodePipeline⢠Run big data analytics and workloads using Amazon EMR and Amazon Redshift⢠Learn to back up and safeguard your data using AWS Data Pipeline⢠Get started with the Internet of Things using AWS IoT and AWS GreengrassIn DetailMany businesses are moving from traditional data centers to AWS because of its reliability, vast service offerings, lower costs, and high rate of innovation. AWS can be used to accomplish a variety of both simple and tedious tasks. Whether you are a seasoned system admin or a rookie, this book will help you to learn all the skills you need to work with the AWS cloud.This book guides you through some of the most popular AWS services, such as EC2, Elastic Beanstalk, EFS, CloudTrail, Redshift, EMR, Data Pipeline, and IoT using a simple, real-world, application-hosting example. This book will also enhance your application delivery skills with the latest AWS services, such as CodeCommit, CodeDeploy, and CodePipeline, to provide continuous delivery and deployment, while also securing and monitoring your environment's workflow. Each chapter is designed to provide you with maximal information about each AWS service, coupled with easy to follow, hands-on steps, best practices, tips, and recommendations.By the end of the book, you will be able to create a highly secure, fault-tolerant, and scalable environment for your applications to run on.Style and approachThis in-depth and insightful guide is filled with easy-to-follow examples, real-world use cases, best practices, and recommendations that will help you design and leverage some of the most commonly used AWS services.
Frequently asked questions
Information
Powering Analytics Using Amazon EMR and Amazon Redshift
- Understanding the AWS analytics suite of services with an in-depth look at Amazon EMR, along with its use cases and benefits
- Introducing a few key EMR concepts and terminologies, along with a quick getting started tour
- Running a sample workload on EMR, using steps
- Introducing Amazon Redshift
- Getting started with an Amazon Redshift cluster
- Working with Redshift databases and tables
- Loading data from Amazon EMR into Amazon Redshift
Understanding the AWS analytics suite of services
- Amazon EMR: Amazon Elastic MapReduce or EMR is a quick and easy to use service that provides users with a scalable, managed Hadoop ecosystem and framework. You can leverage EMR to process vast amounts of data without having to worry about configuring the underlying Hadoop platform. We will be learning and exploring more on EMR in the subsequent sections of this chapter.
- Amazon Athena: Amazon Athena takes big data processing up a notch by providing a standard SQL interface for querying data that is stored directly on Amazon S3. With Athena, you do not have any underlying hardware to manage or maintain; it is all managed by AWS itself. This serverless approach makes Athena ideal for processing data that does not require any complex ETL processing. All you need to do is create a schema, point Athena to your data on Amazon S3, and start querying it using simple SQL syntax.
- Amazon Elasticsearch Service: Amazon Elasticsearch Service provides a managed deployment of the popular open source search and analytics engine: Elasticsearch. This service comes in really handy when you wish to process streams of data originating from various sources such as logs generated from instances, and so on.
- Amazon Kinesis: Unlike the other services discussed so far, Amazon Kinesis is more of a streaming service provided by AWS. You can use Amazon Kinesis to push vast amounts of data originating from multiple sources, into one or more streams that can be consumed by other AWS services for performing analytics and other data processing processes.
- Amazon QuickSight: Amazon QuickSight is an extremely cost-effective business insights solution that can be used to perform fast ad hoc analysis on data.
- Amazon Redshift: Amazon Redshift is a petabyte-scale data warehousing solution provided by AWS that you can leverage for analyzing your data, using an existing set of tools. We will be learning more about Redshift a bit later during this chapter. The services are depicted here:
- AWS Data Pipeline: Moving large amounts of data between AWS services can be difficult to perform, especially when the data sources vary. AWS Data Pipeline makes it easier to transfer data between different AWS storage and compute services, as well as helping in the initial transformation and processing of data. You can even use Data Pipeline to transfer data reliably from an on-premise location into AWS storage services, as well.
- AWS Glue: AWS Glue is a managed ETL (Extract, Transform and Load) service recently launched by AWS. Using AWS Glue greatly simplifies the process of preparing, extracting, and loading data from large datasets into an AWS storage service.
Introducing Amazon EMR
- Pricing: Amazon EMR relies on EC2 instances to spin up your Apache Hadoop or Apache Spark clusters. Although you can vary costs by selecting the instance types for your cluster from large to extra large and so on, the best part of EMR is that you can also opt between using a combination of on-demand EC2 instances, reserved and spot instances based on your setup, thus providing you with flexibility at significantly lower costs.
- Scalability: Amazon EMR provides you with a simple way of scaling running workloads, depending on their processing requirements. You can resize your cluster or its individual components as you see fit and additionally, configure one or more instance groups for a guaranteed instance availability and processing.
- Reliability: Although you, as an end user, have to specify the initial instances and their sizes, AWS ultimately ensures the reliability of the cluster by swapping out instances that either have failed or are going to in the due course of time.
- Integration: Amazon EMR integrates with the likes of other AWS services to provide your cluster with additional storage, network, and security requirements. You can use services such as Amazon S3 to store both the input as well as the output data, AWS CloudTrail for auditing the requests made to EMR, VPC to ensure the security of your launched EMR instances and much more!
Concepts and terminologies
- Clusters: Clusters are the core functioning component in Amazon EMR. A cluster is a group of EC2 instances that together can be used to process your workloads. Each instance within a cluster is termed as a node and each node has a different role to perform within the cluster.
- Nodes: Amazon EMR distinguishes between clusters instances by providing them with one of these three roles:
- Master node: An instance that is responsible for the overall manageability, working and monitoring of your cluster. The master node takes care of all the data and task distributions that occur within the cl...
Table of contents
- Title Page
- Copyright and Credits
- Packt Upsell
- Contributors
- Preface
- What's New in AWS?
- Managing EC2 with Systems Manager
- Introducing Elastic Beanstalk and Elastic File System
- Securing Workloads Using AWS WAF
- Governing Your Environments Using AWS CloudTrail and AWS Config
- Access Control Using AWS IAM and AWS Organizations
- Transforming Application Development Using the AWS Code Suite
- Messaging in the Cloud Using Amazon SNS and Amazon SQS
- Powering Analytics Using Amazon EMR and Amazon Redshift
- Orchestrating Data using AWS Data Pipeline
- Connecting the World with AWS IoT and AWS Greengrass
- Other Books You May Enjoy