eBook - ePub

Machine Learning for Cybersecurity Cookbook

Name: Machine Learning for Cybersecurity Cookbook
Author: Emmanuel Tsukerman

Over 80 recipes on how to implement machine learning algorithms for building security systems using Python

Emmanuel Tsukerman

346 Seiten
English
ePUB (handyfreundlich)
Über iOS und Android verfügbar

eBook - ePub

Machine Learning for Cybersecurity Cookbook

Over 80 recipes on how to implement machine learning algorithms for building security systems using Python

Emmanuel Tsukerman

Angaben zum Buch

Buchvorschau

Inhaltsverzeichnis

Quellenangaben

Über dieses Buch

Learn how to apply modern AI to create powerful cybersecurity solutions for malware, pentesting, social engineering, data privacy, and intrusion detection

Key Features

Manage data of varying complexity to protect your system using the Python ecosystem
Apply ML to pentesting, malware, data privacy, intrusion detection system(IDS) and social engineering
Automate your daily workflow by addressing various security challenges using the recipes covered in the book

Book Description

Organizations today face a major threat in terms of cybersecurity, from malicious URLs to credential reuse, and having robust security systems can make all the difference. With this book, you'll learn how to use Python libraries such as TensorFlow and scikit-learn to implement the latest artificial intelligence (AI) techniques and handle challenges faced by cybersecurity researchers.

You'll begin by exploring various machine learning (ML) techniques and tips for setting up a secure lab environment. Next, you'll implement key ML algorithms such as clustering, gradient boosting, random forest, and XGBoost. The book will guide you through constructing classifiers and features for malware, which you'll train and test on real samples. As you progress, you'll build self-learning, reliant systems to handle cybersecurity tasks such as identifying malicious URLs, spam email detection, intrusion detection, network protection, and tracking user and process behavior. Later, you'll apply generative adversarial networks (GANs) and autoencoders to advanced security tasks. Finally, you'll delve into secure and private AI to protect the privacy rights of consumers using your ML models.

By the end of this book, you'll have the skills you need to tackle real-world problems faced in the cybersecurity domain using a recipe-based approach.

What you will learn

Learn how to build malware classifiers to detect suspicious activities
Apply ML to generate custom malware to pentest your security
Use ML algorithms with complex datasets to implement cybersecurity concepts
Create neural networks to identify fake videos and images
Secure your organization from one of the most popular threats – insider threats
Defend against zero-day threats by constructing an anomaly detection system
Detect web vulnerabilities effectively by combining Metasploit and ML
Understand how to train a model without exposing the training data

Who this book is for

This book is for cybersecurity professionals and security researchers who are looking to implement the latest machine learning techniques to boost computer security, and gain insights into securing an organization using red and blue team ML. This recipe-based book will also be useful for data scientists and machine learning developers who want to experiment with smart techniques in the cybersecurity domain. Working knowledge of Python programming and familiarity with cybersecurity fundamentals will help you get the most out of this book.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?

Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.

(Wie) Kann ich Bücher herunterladen?

Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.

Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?

Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.

Was ist Perlego?

Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.

Unterstützt Perlego Text-zu-Sprache?

Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.

Ist Machine Learning for Cybersecurity Cookbook als Online-PDF/ePub verfügbar?

Ja, du hast Zugang zu Machine Learning for Cybersecurity Cookbook von Emmanuel Tsukerman im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Computer Science & Cyber Security. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Verlag

Packt Publishing

Jahr

2019

ISBN

9781838556341

Auflage

Thema

Computer Science

Thema

Cyber Security

Automatic Intrusion Detection

An intrusion detection system monitors a network or a collection of systems for malicious activity or policy violations. Any malicious activity or violation caught is stopped or reported. In this chapter, we will design and implement several intrusion detection systems using machine learning. We will begin with the classical problem of detecting spam email. We will then move on to classifying malicious URLs. We will take a brief detour to explain how to capture network traffic, so that we may tackle more challenging network problems, such as botnet and DDoS detection. We will construct a classifier for insider threats. Finally, we will address the example-dependent, cost-sensitive, radically imbalanced, and challenging problem of credit card fraud.

This chapter contains the following recipes:

Spam filtering using machine learning
Phishing URL detection
Capturing network traffic
Network behavior anomaly detection
Botnet traffic detection
Feature engineering for insider threat detection
Employing anomaly detection for insider threats
Detecting DDoS
Credit card fraud detection
Counterfeit bank note detection
Ad blocking using machine learning
Wireless indoor localization

Technical requirements

The following are the technical prerequisites for this chapter:

Wireshark
PyShark
costcla
scikit-learn
pandas
NumPy

Code and datasets may be found at https://github.com/PacktPublishing/Machine-Learning-for-Cybersecurity-Cookbook/tree/master/Chapter06.

Spam filtering using machine learning

Spam mails (unwanted mails) constitute around 60% of global email traffic. Aside from the fact that spam detection software has progressed since the first spam message in 1978, anyone with an email account knows that spam continues to be a time-consuming and expensive problem. Here, we provide a recipe for spam-ham (non-spam) classification using machine learning.

Getting ready

Preparation for this recipe involves installing the scikit-learn package in pip. The command is as follows:

pip install sklearn

In addition, extract spamassassin-public-corpus.7z into a folder named spamassassin-public-corpus.

How to do it...

In the following steps, we build a classifier for wanted and unwanted email:

Unzip the spamassassin-public-corpus.7z dataset.

Specify the path of your spam and ham directories:

import os

spam_emails_path = os.path.join("spamassassin-public-corpus", "spam")
ham_emails_path = os.path.join("spamassassin-public-corpus", "ham")
labeled_file_directories = [(spam_emails_path, 0), (ham_emails_path, 1)]

Create labels for the two classes and read the emails into a corpus:

email_corpus = []
labels = []

for class_files, label in labeled_file_directories:
 files = os.listdir(class_files)
 for file in files:
 file_path = os.path.join(class_files, file)
 try:
 with open(file_path, "r") as currentFile:
 email_content = currentFile.read().replace("\n", "")
 email_content = str(email_content)
 email_corpus.append(email_content)
 labels.append(label)
 except:
 pass

Train-test split the dataset:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
 email_corpus, labels, test_size=0.2, random_state=11
)

Train an NLP pipeline on the training data:

from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import HashingVectorizer, TfidfTransformer
from sklearn import tree

nlp_followed_by_dt = Pipeline(
 [
 ("vect", HashingVectorizer(input="content", ngram_range=(1, 3))),
 ("tfidf", TfidfTransformer(use_idf=True,)),
 ("dt", tree.DecisionTreeClassifier(class_weight="balanced")),
 ]
)
nlp_followed_by_dt.fit(X_train, y_train)

Evaluate the classifier on the testing data:

from sklearn.metrics import accuracy_score, confusion_matrix

y_test_pred = nlp_followed_by_dt.predict(X_test)
print(accuracy_score(y_test, y_test_pred))
print(confusion_matrix(y_test, y_test_pred))

The following is the output:

0.9761620977353993
[[291 7]
[ 13 528]]

How it works…

We start by preparing a dataset consisting of raw emails (Step 1), which the reader can examine by looking at the dataset. In Step 2, we specify the paths of the spam and ham emails, as well as assign labels to their directories. We proceed to read all of the emails into an array, and create a labels array in Step 3. Next, we train-test split our dataset (Step 4), and then fit an NLP pipeline on it in Step 5. Finally, in Step 6, we test our pipeline. We see that accuracy is pretty high. Since the dataset is relatively balanced, there is no need to use special metrics to evaluate success.

Phishing URL detection

A phishing website is a website that tries to obtain your account password or other personal information by making you think that you are on a legitimate website. S...

Inhaltsverzeichnis

Title Page
Copyright and Credits
About Packt
Contributors
Preface
Machine Learning for Cybersecurity
Machine Learning-Based Malware Detection
Advanced Malware Detection
Machine Learning for Social Engineering
Penetration Testing Using Machine Learning
Automatic Intrusion Detection
Securing and Attacking Data with Machine Learning
Secure and Private AI
Appendix
Other Books You May Enjoy

Zitierstile für Machine Learning for Cybersecurity Cookbook

APA 6 Citation

Tsukerman, E. (2019). Machine Learning for Cybersecurity Cookbook (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/1284230/machine-learning-for-cybersecurity-cookbook-over-80-recipes-on-how-to-implement-machine-learning-algorithms-for-building-security-systems-using-python-pdf (Original work published 2019)

Chicago Citation

Tsukerman, Emmanuel. (2019) 2019. Machine Learning for Cybersecurity Cookbook. 1st ed. Packt Publishing. https://www.perlego.com/book/1284230/machine-learning-for-cybersecurity-cookbook-over-80-recipes-on-how-to-implement-machine-learning-algorithms-for-building-security-systems-using-python-pdf.

Harvard Citation

Tsukerman, E. (2019) Machine Learning for Cybersecurity Cookbook. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/1284230/machine-learning-for-cybersecurity-cookbook-over-80-recipes-on-how-to-implement-machine-learning-algorithms-for-building-security-systems-using-python-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Tsukerman, Emmanuel. Machine Learning for Cybersecurity Cookbook. 1st ed. Packt Publishing, 2019. Web. 14 Oct. 2022.