Machine Learning for Cybersecurity Cookbook
eBook - ePub

Machine Learning for Cybersecurity Cookbook

Over 80 recipes on how to implement machine learning algorithms for building security systems using Python

Emmanuel Tsukerman

  1. 346 Seiten
  2. English
  3. ePUB (handyfreundlich)
  4. Über iOS und Android verfügbar
eBook - ePub

Machine Learning for Cybersecurity Cookbook

Over 80 recipes on how to implement machine learning algorithms for building security systems using Python

Emmanuel Tsukerman

Angaben zum Buch
Buchvorschau
Inhaltsverzeichnis
Quellenangaben

Über dieses Buch

Learn how to apply modern AI to create powerful cybersecurity solutions for malware, pentesting, social engineering, data privacy, and intrusion detection

Key Features

  • Manage data of varying complexity to protect your system using the Python ecosystem
  • Apply ML to pentesting, malware, data privacy, intrusion detection system(IDS) and social engineering
  • Automate your daily workflow by addressing various security challenges using the recipes covered in the book

Book Description

Organizations today face a major threat in terms of cybersecurity, from malicious URLs to credential reuse, and having robust security systems can make all the difference. With this book, you'll learn how to use Python libraries such as TensorFlow and scikit-learn to implement the latest artificial intelligence (AI) techniques and handle challenges faced by cybersecurity researchers.

You'll begin by exploring various machine learning (ML) techniques and tips for setting up a secure lab environment. Next, you'll implement key ML algorithms such as clustering, gradient boosting, random forest, and XGBoost. The book will guide you through constructing classifiers and features for malware, which you'll train and test on real samples. As you progress, you'll build self-learning, reliant systems to handle cybersecurity tasks such as identifying malicious URLs, spam email detection, intrusion detection, network protection, and tracking user and process behavior. Later, you'll apply generative adversarial networks (GANs) and autoencoders to advanced security tasks. Finally, you'll delve into secure and private AI to protect the privacy rights of consumers using your ML models.

By the end of this book, you'll have the skills you need to tackle real-world problems faced in the cybersecurity domain using a recipe-based approach.

What you will learn

  • Learn how to build malware classifiers to detect suspicious activities
  • Apply ML to generate custom malware to pentest your security
  • Use ML algorithms with complex datasets to implement cybersecurity concepts
  • Create neural networks to identify fake videos and images
  • Secure your organization from one of the most popular threats – insider threats
  • Defend against zero-day threats by constructing an anomaly detection system
  • Detect web vulnerabilities effectively by combining Metasploit and ML
  • Understand how to train a model without exposing the training data

Who this book is for

This book is for cybersecurity professionals and security researchers who are looking to implement the latest machine learning techniques to boost computer security, and gain insights into securing an organization using red and blue team ML. This recipe-based book will also be useful for data scientists and machine learning developers who want to experiment with smart techniques in the cybersecurity domain. Working knowledge of Python programming and familiarity with cybersecurity fundamentals will help you get the most out of this book.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?
Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.
(Wie) Kann ich Bücher herunterladen?
Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.
Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?
Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.
Was ist Perlego?
Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.
Unterstützt Perlego Text-zu-Sprache?
Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.
Ist Machine Learning for Cybersecurity Cookbook als Online-PDF/ePub verfügbar?
Ja, du hast Zugang zu Machine Learning for Cybersecurity Cookbook von Emmanuel Tsukerman im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Computer Science & Cyber Security. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Jahr
2019
ISBN
9781838556341

Automatic Intrusion Detection

An intrusion detection system monitors a network or a collection of systems for malicious activity or policy violations. Any malicious activity or violation caught is stopped or reported. In this chapter, we will design and implement several intrusion detection systems using machine learning. We will begin with the classical problem of detecting spam email. We will then move on to classifying malicious URLs. We will take a brief detour to explain how to capture network traffic, so that we may tackle more challenging network problems, such as botnet and DDoS detection. We will construct a classifier for insider threats. Finally, we will address the example-dependent, cost-sensitive, radically imbalanced, and challenging problem of credit card fraud.
This chapter contains the following recipes:
  • Spam filtering using machine learning
  • Phishing URL detection
  • Capturing network traffic
  • Network behavior anomaly detection
  • Botnet traffic detection
  • Feature engineering for insider threat detection
  • Employing anomaly detection for insider threats
  • Detecting DDoS
  • Credit card fraud detection
  • Counterfeit bank note detection
  • Ad blocking using machine learning
  • Wireless indoor localization

Technical requirements

The following are the technical prerequisites for this chapter:
  • Wireshark
  • PyShark
  • costcla
  • scikit-learn
  • pandas
  • NumPy
Code and datasets may be found at https://github.com/PacktPublishing/Machine-Learning-for-Cybersecurity-Cookbook/tree/master/Chapter06.

Spam filtering using machine learning

Spam mails (unwanted mails) constitute around 60% of global email traffic. Aside from the fact that spam detection software has progressed since the first spam message in 1978, anyone with an email account knows that spam continues to be a time-consuming and expensive problem. Here, we provide a recipe for spam-ham (non-spam) classification using machine learning.

Getting ready

Preparation for this recipe involves installing the scikit-learn package in pip. The command is as follows:
pip install sklearn
In addition, extract spamassassin-public-corpus.7z into a folder named spamassassin-public-corpus.

How to do it...

In the following steps, we build a classifier for wanted and unwanted email:
  1. Unzip the spamassassin-public-corpus.7z dataset.
  1. Specify the path of your spam and ham directories:
import os

spam_emails_path = os.path.join("spamassassin-public-corpus", "spam")
ham_emails_path = os.path.join("spamassassin-public-corpus", "ham")
labeled_file_directories = [(spam_emails_path, 0), (ham_emails_path, 1)]
  1. Create labels for the two classes and read the emails into a corpus:
email_corpus = []
labels = []

for class_files, label in labeled_file_directories:
files = os.listdir(class_files)
for file in files:
file_path = os.path.join(class_files, file)
try:
with open(file_path, "r") as currentFile:
email_content = currentFile.read().replace("\n", "")
email_content = str(email_content)
email_corpus.append(email_content)
labels.append(label)
except:
pass
  1. Train-test split the dataset:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
email_corpus, labels, test_size=0.2, random_state=11
)
  1. Train an NLP pipeline on the training data:
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import HashingVectorizer, TfidfTransformer
from sklearn import tree

nlp_followed_by_dt = Pipeline(
[
("vect", HashingVectorizer(input="content", ngram_range=(1, 3))),
("tfidf", TfidfTransformer(use_idf=True,)),
("dt", tree.DecisionTreeClassifier(class_weight="balanced")),
]
)
nlp_followed_by_dt.fit(X_train, y_train)
  1. Evaluate the classifier on the testing data:
from sklearn.metrics import accuracy_score, confusion_matrix

y_test_pred = nlp_followed_by_dt.predict(X_test)
print(accuracy_score(y_test, y_test_pred))
print(confusion_matrix(y_test, y_test_pred))
The following is the output:
0.9761620977353993
[[291 7]
[ 13 528]]

How it works…

We start by preparing a dataset consisting of raw emails (Step 1), which the reader can examine by looking at the dataset. In Step 2, we specify the paths of the spam and ham emails, as well as assign labels to their directories. We proceed to read all of the emails into an array, and create a labels array in Step 3. Next, we train-test split our dataset (Step 4), and then fit an NLP pipeline on it in Step 5. Finally, in Step 6, we test our pipeline. We see that accuracy is pretty high. Since the dataset is relatively balanced, there is no need to use special metrics to evaluate success.

Phishing URL detection

A phishing website is a website that tries to obtain your account password or other personal information by making you think that you are on a legitimate website. S...

Inhaltsverzeichnis

  1. Title Page
  2. Copyright and Credits
  3. About Packt
  4. Contributors
  5. Preface
  6. Machine Learning for Cybersecurity
  7. Machine Learning-Based Malware Detection
  8. Advanced Malware Detection
  9. Machine Learning for Social Engineering
  10. Penetration Testing Using Machine Learning
  11. Automatic Intrusion Detection
  12. Securing and Attacking Data with Machine Learning
  13. Secure and Private AI
  14. Appendix
  15. Other Books You May Enjoy
Zitierstile für Machine Learning for Cybersecurity Cookbook

APA 6 Citation

Tsukerman, E. (2019). Machine Learning for Cybersecurity Cookbook (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/1284230/machine-learning-for-cybersecurity-cookbook-over-80-recipes-on-how-to-implement-machine-learning-algorithms-for-building-security-systems-using-python-pdf (Original work published 2019)

Chicago Citation

Tsukerman, Emmanuel. (2019) 2019. Machine Learning for Cybersecurity Cookbook. 1st ed. Packt Publishing. https://www.perlego.com/book/1284230/machine-learning-for-cybersecurity-cookbook-over-80-recipes-on-how-to-implement-machine-learning-algorithms-for-building-security-systems-using-python-pdf.

Harvard Citation

Tsukerman, E. (2019) Machine Learning for Cybersecurity Cookbook. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/1284230/machine-learning-for-cybersecurity-cookbook-over-80-recipes-on-how-to-implement-machine-learning-algorithms-for-building-security-systems-using-python-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Tsukerman, Emmanuel. Machine Learning for Cybersecurity Cookbook. 1st ed. Packt Publishing, 2019. Web. 14 Oct. 2022.