Machine Learning with the Elastic Stack
Gain valuable insights from your data with Elastic Stack's machine learning features, 2nd Edition
- 450 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
Machine Learning with the Elastic Stack
Gain valuable insights from your data with Elastic Stack's machine learning features, 2nd Edition
About This Book
Discover expert techniques for combining machine learning with the analytic capabilities of Elastic Stack and uncover actionable insights from your data
Key Features
- Integrate machine learning with distributed search and analytics
- Preprocess and analyze large volumes of search data effortlessly
- Operationalize machine learning in a scalable, production-worthy way
Book Description
Elastic Stack, previously known as the ELK stack, is a log analysis solution that helps users ingest, process, and analyze search data effectively. With the addition of machine learning, a key commercial feature, the Elastic Stack makes this process even more efficient. This updated second edition of Machine Learning with the Elastic Stack provides a comprehensive overview of Elastic Stack's machine learning features for both time series data analysis as well as for classification, regression, and outlier detection.
The book starts by explaining machine learning concepts in an intuitive way. You'll then perform time series analysis on different types of data, such as log files, network flows, application metrics, and financial data. As you progress through the chapters, you'll deploy machine learning within Elastic Stack for logging, security, and metrics. Finally, you'll discover how data frame analysis opens up a whole new set of use cases that machine learning can help you with.
By the end of this Elastic Stack book, you'll have hands-on machine learning and Elastic Stack experience, along with the knowledge you need to incorporate machine learning in your distributed search and data analysis platform.
What you will learn
- Find out how to enable the ML commercial feature in the Elastic Stack
- Understand how Elastic machine learning is used to detect different types of anomalies and make predictions
- Apply effective anomaly detection to IT operations, security analytics, and other use cases
- Utilize the results of Elastic ML in custom views, dashboards, and proactive alerting
- Train and deploy supervised machine learning models for real-time inference
- Discover various tips and tricks to get the most out of Elastic machine learning
Who this book is for
If you're a data professional looking to gain insights into Elasticsearch data without having to rely on a machine learning specialist or custom development, then this Elastic Stack machine learning book is for you. You'll also find this book useful if you want to integrate machine learning with your observability, security, and analytics applications. Working knowledge of the Elastic Stack is needed to get the most out of this book.
Frequently asked questions
Information
Section 1 â Getting Started with Machine Learning with Elastic Stack
- Chapter 1, Machine Learning for IT
- Chapter 2, Enabling and Operationalization
Chapter 1: Machine Learning for IT
- Overcoming the historical challenges in IT
- Dealing with the plethora of data
- The advent of automated anomaly detection
- Unsupervised versus supervised ML
- Using unsupervised ML for anomaly detection
- Applying supervised ML to data frame analytics
Overcoming the historical challenges in IT
Dealing with the plethora of data
- Filter/search: Some tools allow the user to define searches to help trim down the data into a more manageable set. While extremely useful, this capability is most often used in an ad hoc fashion once a problem is suspected. Even then, the success of using this approach usually hinges on the ability for the user to know what they are looking for and their level of experienceâboth with prior knowledge of living through similar past situations and expertise in the search technology itself.
- Visualizations: Dashboards, charts, and widgets are also extremely useful to help us understand what data has been doing and where it is trending. However, visualizations are passive and require being watched for meaningful deviations to be detected. Once the number of metrics being collected and plotted surpasses the number of eyeballs available to watch them (or even the screen real estate to display them), visual-only analysis becomes less and less useful.
- Thresholds/rules: To get around the requirement of having data be physically watched in order for it to be proactive, many tools allow the user to define rules or conditions that get triggered upon known conditions or known dependencies between items. However, it is unlikely that you can realistically define all appropriate operating ranges or model all of the actual dependencies in today's complex and distributed applications. Plus, the amount and velocity of changes in the application or environment could quickly render any static rule set useless. Analysts find themselves chasing down many false positive alerts, setting up a boy who cried wolf paradigm that leads to resentment of the tools generating the alerts and skepticism of the value that alerting could provide.
The advent of automated anomaly detection
- Timeliness: Notification of an outage, breach, or other significant anomalous situation should be known as quickly as possible to mitigate it. The cost of downtime or the risk of a continued security compromise is minimized if remedied or contained quickly. Algorithms that cannot keep up with the real-time nature of today's IT data have limited value.
- Scalability: As mentioned earlier, the volume, velocity, and variation of IT data continue to explode in modern IT environments. Algorithms that inspect this vast data must be able to scale linearly with the data to be usable in a practical sense.
- Efficiency: IT budgets are often highly scrutinized for wasteful spending, and many organizations are constantly being asked to do more with less. Tacking on an additional fleet of super-computers to run algorithms is not practical. Rather, modest commodity hardware with typical specifications must be able to be employed as part of the solution.
- Generalizability: While highly specialized data science is often the best way to solve a specific information problem, the diversity of data in IT environments drives a need for something that can be broadly applicable across most use cases. Reusability of the same techniques is much more cost-effective in the long run.
- Adaptability: Ever-changing IT environments will quickly render a brittle algorithm useless in no time. Training and retraining the ML model would only introduce yet another time-wasting venture that cannot be afforded.
- Accuracy: We already know that alert fatigue from legacy threshold and rule-based systems is a r...
Table of contents
- Machine Learning with the Elastic Stack
- Second Edition
- Preface
- Section 1 â Getting Started with Machine Learning with Elastic Stack
- Chapter 1: Machine Learning for IT
- Chapter 2: Enabling and Operationalization
- Section 2 â Time Series Analysis â Anomaly Detection and Forecasting
- Chapter 3: Anomaly Detection
- Chapter 4: Forecasting
- Chapter 5: Interpreting Results
- Chapter 6: Alerting on ML Analysis
- Chapter 7: AIOps and Root Cause Analysis
- Chapter 8: Anomaly Detection in Other Elastic Stack Apps
- Section 3 â Data Frame Analysis
- Chapter 9: Introducing Data Frame Analytics
- Chapter 10: Outlier Detection
- Chapter 11: Classification Analysis
- Chapter 12: Regression
- Chapter 13: Inference
- Appendix: Anomaly Detection Tips
- Other Books You May Enjoy