Effective Amazon Machine Learning
eBook - ePub

Effective Amazon Machine Learning

  1. 306 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Effective Amazon Machine Learning

Book details
Book preview
Table of contents
Citations

About This Book

Learn to leverage Amazon's powerful platform for your predictive analytics needsAbout This Bookā€¢ Create great machine learning models that combine the power of algorithms with interactive tools without worrying about the underlying complexityā€¢ Learn the What's next? of machine learningā€”machine learning on the cloudā€”with this unique guideā€¢ Create web services that allow you to perform affordable and fast machine learning on the cloudWho This Book Is ForThis book is intended for data scientists and managers of predictive analytics projects; it will teach beginner- to advanced-level machine learning practitioners how to leverage Amazon Machine Learning and complement their existing Data Science toolbox.No substantive prior knowledge of Machine Learning, Data Science, statistics, or coding is required.What You Will Learnā€¢ Learn how to use the Amazon Machine Learning service from scratch for predictive analyticsā€¢ Gain hands-on experience of key Data Science conceptsā€¢ Solve classic regression and classification problemsā€¢ Run projects programmatically via the command line and the Python SDKā€¢ Leverage the Amazon Web Service ecosystem to access extended data sourcesā€¢ Implement streaming and advanced projectsIn DetailPredictive analytics is a complex domain requiring coding skills, an understanding of the mathematical concepts underpinning machine learning algorithms, and the ability to create compelling data visualizations. Following AWS simplifying Machine learning, this book will help you bring predictive analytics projects to fruition in three easy steps: data preparation, model tuning, and model selection.This book will introduce you to the Amazon Machine Learning platform and will implement core data science concepts such as classification, regression, regularization, overfitting, model selection, and evaluation. Furthermore, you will learn to leverage the Amazon Web Service (AWS) ecosystem for extended access to data sources, implement realtime predictions, and run Amazon Machine Learning projects via the command line and the Python SDK.Towards the end of the book, you will also learn how to apply these services to other problems, such as text mining, and to more complex datasets.Style and approachThis book will include use cases you can relate to. In a very practical manner, you will explore the various capabilities of Amazon Machine Learning services, allowing you to implementing them in your environment with consummate ease.

Frequently asked questions

Simply head over to the account section in settings and click on ā€œCancel Subscriptionā€ - itā€™s as simple as that. After you cancel, your membership will stay active for the remainder of the time youā€™ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlegoā€™s features. The only differences are the price and subscription period: With the annual plan youā€™ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weā€™ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Effective Amazon Machine Learning by Alexis Perrier in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.

Information

Year
2017
ISBN
9781785881794
Edition
1

Command Line and SDK

Using the AWS web interface to manage and run your projects is time-consuming. In this chapter, we move away from the web interface and start running our projects via the command line with the AWS Command Line Interface (AWS CLI) and the Python SDK with the Boto3 library.
The first step will be to drive a whole project via the AWS CLI, uploading files to S3, creating datasources, models, evaluations, and predictions. As you will see, scripting will greatly facilitate using Amazon ML. We will use these new abilities to expand our Data Science powers by carrying out cross-validation and feature selection.
So far we have split our original dataset into three data chunks: training, validation, and testing. However, we have seen that the model selection can be strongly dependent on the data split. Shuffle the data ā€” a different model might come as being the best one. Cross-validation is a technique that reduces this dependency by averaging the model performance on several data splits. Cross-validation involves creating many datasources for training, validation, and testing, and would be time-consuming using the web interface. The AWS CLI will allow us to quickly spin new datasources and models and carry out cross-validation effectively.
Another important technique in data science is feature elimination. Having a large number of features in your dataset either as the results of intensive feature engineering or because they are present in the original dataset can impact the model's performance. It's possible to significantly improve the model prediction capabilities by selecting and retaining only the best and most meaningful features while rejecting less important ones. There are many feature selection methods. We will implement a simple and efficient one, called recursive feature selection. The AWS Python SDK accessible via the Boto3 library will allow us to build the code wrapping around Amazon ML required for recursive feature selection.
In this chapter, you will learn the following:
  • How to handle a whole project workflow through the AWS command line and the AWS Python SDK:
    • Managing data uploads to S3
    • Creating and evaluating models
    • Making and exporting the predictions
  • How to implement cross-validation with the AWS CLI
  • How to implement Recursive Feature Selection with AWS the Python SDK

Getting started and setting up

Creating a performing predictive model from raw data requires many trials and errors, much back and forth. Creating new features, cleaning up data, and trying out new parameters for the model are needed to ensure the robustness of the model. There is a constant back and forth between the data, the models, and the evaluations. Scripting this workflow either via the AWS CLI or with the Boto3 Python library, will give us the ability to speed up the create, test, select loop.

Using the CLI versus SDK

AWS offers several ways besides the UI to interact with its services, the CLI, APIs, and SDKs in several languages. Though the AWS CLI and SDKs do not include all AWS services. Athena SQL, for instance, being a new service, is not yet included in the AWS CLI module or in any of AWS SDK at the time of writing.
The AWS Command Line Interface or CLI is a command-line shell program that allows you to manage your AWS services from your shell terminal. Once installed and set up with proper permissions, you can write commands to manage your S3 files, AWS EC2 instances, Amazon ML models, and most AWS services.
Generally speaking, a software development kit, or SDK for short, is a set of tools that can be used to develop software applications targeting a specific platform. In short, the SDK is a wrapper around an API. Where an API holds the core interaction methods, the SDK includes debugging support, documentation, and higher-level functions and methods. The API can be seen as the lowest common denominator that AWS supports and the SDK as a higher-level implementation of the API.
AWS SDKs are available in 12 different languages including PHP, Java, Ruby, and .NET. In this chapter, we will use the Python SDK.
Using the AWS CLI or SDK requires setting up our credentials, which we'll do in the following section

Installing AWS CLI

In order to set up your CLI credentials, you need your access key ID and your secret access key. You have most probably downloaded and saved them in a previous chapter. If that's not the case, you should simply create new ones from the IAM console (https://console.aws.amazon.com/iam).
Navigate to Users, select your IAM user name and click on the Security credentials tab. Choose Create Access Key and download the CSV file. Store the keys in a secure location. We will need the key in a few minutes to set up AWS CLI. But first, we need to install AWS CLI.
Docker environment ā€“ This tutorial will help you use the AWS CLI within a docker container: https://blog.flowlog-stats.com/2016/05/03/aws-cli-in-a-docker-container/. A docker image for running the AWS CLI is available at https://hub.docker.com/r/fstab/aws-cli/.
There is no need to rewrite the AWS documentation on how to install the AWS CLI. It is complete and up to date, and available at http://docs.aws.amazon.com/cli/latest/userguide/installing.html. In a nutshell, installing the CLI requires you to have Python and pip already installed.
Then, run the following:
 $ pip install --upgrade --user awscli 
Add AWS to your $PATH:
 $ export PATH=~/.local/bin:$PATH 
Reload the bash configuration file (this is for OSX):
 $ source ~/.bash_profile 
Check that everything works with the following command:
 $ aws --version 
You should see something similar to the following output:
 $ aws-cli/1.11.47 Python/3.5.2 Darwin/15.6.0 botocore/1.5.10 
Once installed, we need to configure the AWS CLI type:
 $ aws configure 
Now input the access keys you just created:
 $ aws configure

AWS Access Key ID [None]: ABCDEF_THISISANEXAMPLE

AWS Secret Access Key [None]: abcdefghijk_THISISANEXAMPLE
Default region name [None]: us-west-2
Default output format [None]: json
Choose the region that is closest to you and the format you prefer (JSON, text, or table). JSON is the default format.
The AWS configure command creates two files: a config file and a credential file. On OSX, the files are ~/.aws/config and ~/.aws/credentials. You can directly edit these files to change your access or configuration. You will need to create different profiles if you need to access multiple AWS accounts. You can do so via the AWS configure command:
 $ aws configure --profile user2 
You can also do so directly in the config and credential files:
 ~/.aws/config

[default]
output = json
region = us-east-1

[profile user2]
output = text
region = us-west-2
You can edit Credential file as follows:
 ~/.aws/credentials

[default]
aws_secret_access_key = ABCDEF_THISISANEXAMPLE
aws_access_key_id = abcdefghijk_THISISANEXAMPLE

[user2]
aws_access_key_id = ABCDEF_ANOTHERKEY
aws_secret_access_key = abcdefghijk_ANOTHERKEY
Refer to the AWS CLI setup page for more in-depth information:
http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html

Picking up CLI syntax

The overall for...

Table of contents

  1. Title Page
  2. Credits
  3. About the Author
  4. About the Reviewer
  5. www.PacktPub.com
  6. Customer Feedback
  7. Dedication
  8. Preface
  9. Introduction to Machine Learning and Predictive Analytics
  10. Machine Learning Definitions and Concepts
  11. Overview of an Amazon Machine Learning Workflow
  12. Loading and Preparing the Dataset
  13. Model Creation
  14. Predictions and Performances
  15. Command Line and SDK
  16. Creating Datasources from Redshift
  17. Building a Streaming Data Analysis Pipeline