eBook - ePub

Building Data Streaming Applications with Apache Kafka

Name: Building Data Streaming Applications with Apache Kafka
Author: Manish Kumar, Chanchal Singh

Manish Kumar,

Chanchal Singh,

278 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Building Data Streaming Applications with Apache Kafka

Manish Kumar,

Chanchal Singh,

Book details

Book preview

Table of contents

Citations

About This Book

Design and administer fast, reliable enterprise messaging systems with Apache KafkaAbout This Book• Build efficient real-time streaming applications in Apache Kafka to process data streams of data• Master the core Kafka APIs to set up Apache Kafka clusters and start writing message producers and consumers• A comprehensive guide to help you get a solid grasp of the Apache Kafka concepts in Apache Kafka with pracitcalpractical examplesWho This Book Is ForIf you want to learn how to use Apache Kafka and the different tools in the Kafka ecosystem in the easiest possible manner, this book is for you. Some programming experience with Java is required to get the most out of this bookWhat You Will Learn• Learn the basics of Apache Kafka from scratch• Use the basic building blocks of a streaming application• Design effective streaming applications with Kafka using Spark, Storm &, and Heron• Understand the importance of a low -latency, high- throughput, and fault-tolerant messaging system• Make effective capacity planning while deploying your Kafka Application• Understand and implement the best security practicesIn DetailApache Kafka is a popular distributed streaming platform that acts as a messaging queue or an enterprise messaging system. It lets you publish and subscribe to a stream of records, and process them in a fault-tolerant way as they occur.This book is a comprehensive guide to designing and architecting enterprise-grade streaming applications using Apache Kafka and other big data tools. It includes best practices for building such applications, and tackles some common challenges such as how to use Kafka efficiently and handle high data volumes with ease. This book first takes you through understanding the type messaging system and then provides a thorough introduction to Apache Kafka and its internal details. The second part of the book takes you through designing streaming application using various frameworks and tools such as Apache Spark, Apache Storm, and more. Once you grasp the basics, we will take you through more advanced concepts in Apache Kafka such as capacity planning and security.By the end of this book, you will have all the information you need to be comfortable with using Apache Kafka, and to design efficient streaming data applications with it.Style and approachA step-by –step, comprehensive guide filled with practical and real- world examples

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access Building Data Streaming Applications with Apache Kafka by Manish Kumar, Chanchal Singh in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2017

ISBN

9781787287631

Edition

Topic

Computer Science

Subtopic

Data Processing

Index

Computer Science

Building Storm Applications with Kafka

In the previous chapter, we learned about Apache Spark, a near real-time processing engine which can process data in micro batches. But when it comes to very low latency applications, where seconds of delay may cause big trouble, Spark may not be a good fit for you. You would need a framework which can handle millions of records per second and you would want to process record by record, instead of processing in batches, for lower latency. In this chapter, we will learn about the real-time processing engine, Apache Storm. Storm was first designed and developed by Twitter, which later became an open source Apache project.

In this chapter, we will learn about:

Introduction to Apache Storm
Apache Storm architecture
Brief overview of Apache Heron
Integrating Apache Storm with Apache Kafka (Java/Scala example)
Use case (log processing)

Introduction to Apache Storm

Apache Storm is used to handle very sensitive applications where even a delay of 1 second can mean huge losses. There are many companies using Storm for fraud detection, building recommendation engines, triggering suspicious activity, and so on. Storm is stateless; it uses Zookeeper for coordinating purposes, where it maintains important metadata information.

Apache Storm is a distributed real-time processing framework which has the ability to process a single event at a time with millions of records being processed per second per node. The streaming data can be bounded or unbounded; in both situations Storm has the capability to reliably process it.

Storm cluster architecture

Storm also follows the master-slave architecture pattern, where Nimbus is the master and Supervisors are the slaves:

Nimbus: The master node of Storm cluster. All other nodes in the cluster are called worker nodes. Nimbus distributes data among the worker nodes and also assigns task to worker nodes. Nimbus also monitors for worker failure and if a worker fails, it reassigns a task to some other worker.
Supervisors: Supervisors are responsible for completing tasks assigned by Nimbus and sending available resource information. Each worker node has exactly one supervisor and each worker node has one or more worker process and each supervisor manages multiple worker processes.

Storm architecture

Remember we said that Storm is stateless; both Nimbus and Supervisor save its state on Zookeeper. Whenever Nimbus receives a Storm application execution request, it asks for available resources from Zookeeper and then schedules the task on available supervisors. It also saves progress metadata to Zookeeper, so in case of failure, if Nimbus restarts, it knows where to start again.

The concept of a Storm application

The Apache Storm application consists of two components:

Spout: Spout is used to read the stream of data from an external source system and pass it to topology for further processing. Spout can be either reliable or unreliable.
- Reliable spout: Reliable spout is capable of replaying the data in case it failed during the processing. In such a case, spout waits for acknowledgement for each event it has emitted for further processing. Remember this may cost more processing time but is extremely helpful for those applications for which we cannot manage to lose a single record for processing, such as...

Title Page
Copyright
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface
Introduction to Messaging Systems
Introducing Kafka the Distributed Messaging Platform
Deep Dive into Kafka Producers
Deep Dive into Kafka Consumers
Building Spark Streaming Applications with Kafka
Building Storm Applications with Kafka
Using Kafka with Confluent Platform
Building ETL Pipelines Using Kafka
Building Streaming Applications Using Kafka Streams
Kafka Cluster Deployment
Using Kafka in Big Data Applications
Securing Kafka
Streaming Application Design Considerations