Machine Learning with R Quick Start Guide
A beginner's guide to implementing machine learning techniques from scratch using R 3.5
- 250 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
Machine Learning with R Quick Start Guide
A beginner's guide to implementing machine learning techniques from scratch using R 3.5
About This Book
Learn how to use R to apply powerful machine learning methods and gain insight into real-world applications using clustering, logistic regressions, random forests, support vector machine, and more.
Key Features
- Use R 3.5 to implement real-world examples in machine learning
- Implement key machine learning algorithms to understand the working mechanism of smart models
- Create end-to-end machine learning pipelines using modern libraries from the R ecosystem
Book Description
Machine Learning with R Quick Start Guide takes you on a data-driven journey that starts with the very basics of R and machine learning. It gradually builds upon core concepts so you can handle the varied complexities of data and understand each stage of the machine learning pipeline.
From data collection to implementing Natural Language Processing (NLP), this book covers it all. You will implement key machine learning algorithms to understand how they are used to build smart models. You will cover tasks such as clustering, logistic regressions, random forests, support vector machines, and more. Furthermore, you will also look at more advanced aspects such as training neural networks and topic modeling.
By the end of the book, you will be able to apply the concepts of machine learning, deal with data-related problems, and solve them using the powerful yet simple language that is R.
What you will learn
- Introduce yourself to the basics of machine learning with R 3.5
- Get to grips with R techniques for cleaning and preparing your data for analysis and visualize your results
- Learn to build predictive models with the help of various machine learning techniques
- Use R to visualize data spread across multiple dimensions and extract useful features
- Use interactive data analysis with R to get insights into data
- Implement supervised and unsupervised learning, and NLP using R libraries
Who this book is for
This book is for graduate students, aspiring data scientists, and data analysts who wish to enter the field of machine learning and are looking to implement machine learning techniques and methodologies from scratch using R 3.5. A working knowledge of the R programming language is expected.
Frequently asked questions
Information
Predicting Failures of Banks - Multivariate Analysis
- Logistic regression
- Regularized methods
- Testing a random forest model
- Gradient boosting
- Deep learning in neural networks
- Support vector machines
- Ensembles
- Automatic machine learning
Logistic regression
set.seed(1234)
LogisticRegression=glm(train$Default~.,data=train[,2:ncol(train)],family=binomial())
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(LogisticRegression)
##
## Call:
## glm(formula = train$Default ~ ., family = binomial(), data = train[,
## 2:ncol(train)])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.9330 -0.0210 -0.0066 -0.0013 4.8724
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -11.7599825009 6.9560247460 -1.691 0.0909 .
## UBPRE395 -0.0575725641 0.0561441397 -1.025 0.3052
## UBPRE543 0.0014008963 0.0294470630 0.048 0.9621
## .... ..... .... .... ....
## UBPRE021 -0.0114148389 0.0057016025 -2.002 0.0453 *
## UBPRE023 0.4950212919 0.2459506994 2.013 0.0441 *
## UBPRK447 -0.0210028916 0.0192296299 -1.092 0.2747
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2687.03 on 7090 degrees of freedom
## Residual deviance: 284.23 on 6982 degrees of freedom
## AIC: 502.23
##
## Number of Fisher Scoring iterations: 13
Regularized methods
- Lasso
- Ridge
- Elastic net
library(h2o)
h2o.init()
train$Default<-as.factor(train$Default)
test$Default<-as.factor(test$Default)
as.h2o(train[,2:ncol(train)],destination_frame="train")
as.h2o(test[,2:ncol(test)],destination_frame="test")
h2o.ls()
## key
## 1 test
## 2 train
- Gaussian regression
- Poisson regression
- Binomial regression (classification)
- Multinomial classification
- Gamma regression
- Ordinal regression
- model_id: Here, we can specify the name that can be used as a reference by the model.
- training_frame: The dataset that we wish to use to build and train the model can be mentioned here, as this will be our training dataset.
- validation_frame: Here, the dataset that will be used to check the accuracy of the model is mentioned.
- nfolds: For validation, we require a certain number of folds to be mentioned here. In our case, the nfolds value is 5.
- seed: This specifies the seed that will be used by the algorithm. We will use a Random Number Generator (RNG) for the components in the algorithm that require random numbers.
- response_column: This is the column to use as the dependent variable. In our case, the column is named Default.
- ignored_columns: In this section, it is possible to ignore variables in the training process. In our case, all of the variables are considered relevant.
- ignore_const_cols: This is a flag that indicates that the package should avoid constant variables.
- family: This specifies the model type. In our case, we want to train a regression model, so the family sho...
Table of contents
- Title Page
- Copyright and Credits
- About Packt
- Contributors
- Preface
- R Fundamentals for Machine Learning
- Predicting Failures of Banks - Data Collection
- Predicting Failures of Banks - Descriptive Analysis
- Predicting Failures of Banks - Univariate Analysis
- Predicting Failures of Banks - Multivariate Analysis
- Visualizing Economic Problems in the European Union
- Sovereign Crisis - NLP and Topic Modeling
- Other Books You May Enjoy