- 249 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
Mastering Apache Storm
About This Book
Master the intricacies of Apache Storm and develop real-time stream processing applications with easeAbout This Bookā¢ Exploit the various real-time processing functionalities offered by Apache Storm such as parallelism, data partitioning, and moreā¢ Integrate Storm with other Big Data technologies like Hadoop, HBase, and Apache Kafkaā¢ An easy-to-understand guide to effortlessly create distributed applications with StormWho This Book Is ForIf you are a Java developer who wants to enter into the world of real-time stream processing applications using Apache Storm, then this book is for you. No previous experience in Storm is required as this book starts from the basics. After finishing this book, you will be able to develop not-so-complex Storm applications. What You Will Learnā¢ Understand the core concepts of Apache Storm and real-time processingā¢ Follow the steps to deploy multiple nodes of Storm Clusterā¢ Create Trident topologies to support various message-processing semanticsā¢ Make your cluster sharing effective using Storm schedulingā¢ Integrate Apache Storm with other Big Data technologies such as Hadoop, HBase, Kafka, and moreā¢ Monitor the health of your Storm clusterIn DetailApache Storm is a real-time Big Data processing framework that processes large amounts of data reliably, guaranteeing that every message will be processed. Storm allows you to scale your data as it grows, making it an excellent platform to solve your big data problems. This extensive guide will help you understand right from the basics to the advanced topics of Storm.The book begins with a detailed introduction to real-time processing and where Storm fits in to solve these problems. You'll get an understanding of deploying Storm on clusters by writing a basic Storm Hello World example. Next we'll introduce you to Trident and you'll get a clear understanding of how you can develop and deploy a trident topology. We cover topics such as monitoring, Storm Parallelism, scheduler and log processing, in a very easy to understand manner. You will also learn how to integrate Storm with other well-known Big Data technologies such as HBase, Redis, Kafka, and Hadoop to realize the full potential of Storm.With real-world examples and clear explanations, this book will ensure you will have a thorough mastery of Apache Storm. You will be able to use this knowledge to develop efficient, distributed real-time applications to cater to your business needs.Style and approachThis easy-to-follow guide is full of examples and real-world applications to help you get an in-depth understanding of Apache Storm. This book covers the basics thoroughly and also delves into the intermediate and slightly advanced concepts of application development with Apache Storm.
Frequently asked questions
Information
Storm Deployment, Topology Development, and Topology Options
- Deployment of the Storm cluster
- Program and deploy the word count example
- Different options of the Storm UI--kill, active, inactive, and rebalance
- Walkthrough of the Storm UI
- Dynamic log level settings
- Validating the Nimbus high availability
Storm prerequisites
Installing Java SDK 7
- Download the Java SDK 7 RPM from Oracle's site (http://www.oracle.com/technetwork/java/javase/downloads/index.html).
- Install the Java jdk-7u<version>-linux-x64.rpm file on your CentOS machine using the following command:
sudo rpm -ivh jdk-7u<version>-linux-x64.rpm
- Add the following environment variable in the ~/.bashrc file:
export JAVA_HOME=/usr/java/jdk<version>
- Add the path of the bin directory of the JDK to the PATH system environment variable to the ~/.bashrc file:
export PATH=$JAVA_HOME/bin:$PATH
- Run the following command to reload the bashrc file on the current login terminal:
source ~/.bashrc
- Check the Java installation as follows:
java -version
java version "1.7.0_71" Java(TM) SE Runtime Environment (build 1.7.0_71-b14) Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)
Deployment of the ZooKeeper cluster
- Download the latest stable ZooKeeper release from the ZooKeeper site (http://zookeeper.apache.org/releases.html). At the time of writing, the latest version is ZooKeeper 3.4.6.
- Once you have downloaded the latest version, unzip it. Now, we set up the ZK_HOME environment variable to make the setup easier.
- Point the ZK_HOME environment variable to the unzipped directory. Create the configuration file, zoo.cfg, at the $ZK_HOME/conf directory using the following commands:
cd $ZK_HOME/conf touch zoo.cfg
- Add the following properties to the zoo.cfg file:
tickTime=2000 dataDir=/var/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3.2888.3888
-
- tickTime: This is the basic unit of time in milliseconds used by ZooKeeper. It is used to send heartbeats, and the minimum session timeout will be twice the tickTime value.
- dataDir: This is the directory to store the in-memory database snapshots and transactional log.
- clientPort: This is the port used to listen to client connections.
- initLimit: This is the number of tickTime values needed to allow followers to connect and sync to a leader node.
- syncLimit: This is the number of tickTime values that a follower can take to sync with the leader node. If the sync does not happen within this time, the follower will be dropped from the ensemble.
Table of contents
- Title Page
- Copyright
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Customer Feedback
- Preface
- Real-Time Processing and Storm Introduction
- Storm Deployment, Topology Development, and Topology Options
- Storm Parallelism and Data Partitioning
- Trident Introduction
- Trident Topology and Uses
- Storm Scheduler
- Monitoring of Storm Cluster
- Integration of Storm and Kafka
- Storm and Hadoop Integration
- Storm Integration with Redis, Elasticsearch, and HBase
- Apache Log Processing with Storm
- Twitter Tweet Collection and Machine Learning