Cassandra High Availability
eBook - ePub

Cassandra High Availability

  1. 186 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Cassandra High Availability

Book details
Book preview
Table of contents
Citations

About This Book

Apache Cassandra is a massively scalable, peer-to-peer database designed for 100 percent uptime, with deployments in the tens of thousands of nodes supporting petabytes of data.

This book offers readers a practical insight into building highly available, real-world applications using Apache Cassandra. The book starts with the fundamentals, helping you to understand how the architecture of Apache Cassandra allows it to achieve 100 percent uptime when other systems struggle to do so. You'll have an excellent understanding of data distribution, replication, and Cassandra's highly tunable consistency model. This is followed by an in-depth look at Cassandra's robust support for multiple data centers, and how to scale out a cluster. Next, the book explores the domain of application design, with chapters discussing the native driver and data modeling. Lastly, you'll find out how to steer clear of common antipatterns and take advantage of Cassandra's ability to fail gracefully.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Cassandra High Availability by Robbie Strickland in PDF and/or ePUB format, as well as other popular books in Informatica & Data mining. We have over one million books available in our catalogue for you to explore.

Information

Year
2014
ISBN
9781783989126
Edition
1
Subtopic
Data mining

Cassandra High Availability


Table of Contents

Cassandra High Availability
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Errata
Piracy
Questions
1. Cassandra's Approach to High Availability
ACID
The monolithic architecture
The master-slave architecture
Sharding
Master failover
Cassandra's solution
Cassandra's architecture
Distributed hash table
Replication
Replication across data centers
Tunable consistency
The CAP theorem
Summary
2. Data Distribution
Hash table fundamentals
Distributing hash tables
Consistent hashing
The mechanics of consistent hashing
Token assignment
Manually assigned tokens
vnodes
How vnodes improve availability
Adding and removing nodes
Node rebuilding
Heterogeneous nodes
Partitioners
Hotspots
Effects of scaling out using ByteOrderedPartitioner
A time-series example
Summary
3. Replication
The replication factor
Replication strategies
SimpleStrategy
NetworkTopologyStrategy
Snitches
Maintaining the replication factor when a node fails
Consistency conflicts
Consistency levels
Repairing data
Balancing the replication factor with consistency
Summary
4. Data Centers
Use cases for multiple data centers
Live backup
Failover
Load balancing
Geographic distribution
Online analysis
Analysis using Hadoop
Analysis using Spark
Data center setup
RackInferringSnitch
PropertyFileSnitch
GossipingPropertyFileSnitch
Cloud snitches
Replication across data centers
Setting the replication factor
Consistency in a multiple data center environment
The anatomy of a replicated write
Achieving stronger consistency between data centers
Summary
5. Scaling Out
Choosing the right hardware configuration
Scaling out versus scaling up
Growing your cluster
Adding nodes without vnodes
Adding nodes with vnodes
How to scale out
Adding a data center
How to scale up
Upgrading in place
Scaling up using data center replication
Removing nodes
Removing nodes within a data center
Decommissioning a data center
Other data migration scenarios
Snitch changes
Summary
6. High Availability Features in the Native Java Client
Thrift versus the native protocol
Setting up the environment
Connecting to the cluster
Executing statements
Prepared statements
Batched statements
Caution with batches
Handling asynchronous requests
Running queries in parallel
Load balancing
Failing over to a remote data center
Downgrading the consistency level
Defining your own retry policy
Token awareness
Tying it all together
Falling back to QUORUM
Summary
7. Modeling for High Availability
How Cassandra stores data
Implications of a log-structured storage
Understanding compaction
Size-tiered compaction
Leveled compaction
Date-tiered compaction
CQL under the hood
Single primary key
Compound keys
Partition keys
Clustering columns
Composite partition keys
The importance of the storage model
Understanding queries
Query by key
Range queries
Denormalizing with collections
How collections are stored
Sets
Lists
Maps
Working with time-series data
Designing for immutability
Modeling sensor data
Queries
Time-based ordering
Using a sentinel value
Satisfying our queries
When time is all that matters
Working with geospatial data
Summary
8. Antipatterns
Multikey queries
Secondary indices
Secondary indices under the hood
Distributed joins
Deleting data
Garbage collection
Resurrecting the dead
Unexpected deletes
The problem with tombstones
Expiring columns
TTL antipatterns
When null does not mean empty
Cassandra is not a queue
Unbounded row growth
Summary
9. Failing Gracefully
Knowledge is power
Monitoring via Java Management Extensions
Using OpsCenter
Choosing a management toolset
Logging
Cassandra logs
Garbage collector logs
Monitoring node metrics
Thread pools
Column family statistics
Finding latency outliers
Communication metrics
When a node goes down
Marking a downed node
Handling a downed node
Handling slow nodes
Backing up data
Taking a snapshot
Incremental backups
Restoring from a snapshot
Summary
Index

Cassandra High Availability

Copyright © 2014 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: December 2014
Production reference: 1221214
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78398-912-6
www.packtpub.com

Credits

Author
Robbie Strickland
Reviewers
Richard Low
Jimmy MĂ„rdell
Rob Murphy
Russell Spitzer
Commissioning Editor
Kunal Parikh
Acquisition Editors
Richard Harvey
Owen Roberts
Content Development Editors
Samantha Gonsalves
Azharuddin Sheikh
Technical Editor
Ankita Thakur
Copy Editors
Pranjali Chury
Merilyn Pereira
Project Coordinator
Sanchita Mandal
Proofreaders
Simran Bhogal
Maria Gould
Ameesha Green
Paul Hindle
Indexer
Rekha Nair
Graphics
Sheetal Aute
Disha Haria
Abhinash Sahu
Production Coordinator
Alwin Roy
Cover Work
Alwin Roy

About the Author

Robbie Strickland got involved in the Apache Cassandra project in 2010, and he initially went into production with the 0.5 release. He has made numerous contributions over the years, including his work on drivers for C# and Scala, and multiple contributions to the core Cassandra codebase. In 2013, he became the very first certified Cassandra developer, and in 2014, DataStax selected him as an Apache Cassandra MVP.
While this is Robbie's first published technical book, he has been an active speaker and writer in the Cassandra community and is the founder of the Atlanta Cassandra Users Group. Other examples of his writing can be found on the DataStax blog, and he has conducted numerous webinars and spoken at many conferences over the years.

Table of contents

  1. Cassandra High Availability