eBook - ePub

Cassandra High Availability

Name: Cassandra High Availability
ISBN: 9781783989126

Robbie Strickland,

186 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Cassandra High Availability

Robbie Strickland,

About this book

Apache Cassandra is a massively scalable, peer-to-peer database designed for 100 percent uptime, with deployments in the tens of thousands of nodes supporting petabytes of data.

This book offers readers a practical insight into building highly available, real-world applications using Apache Cassandra. The book starts with the fundamentals, helping you to understand how the architecture of Apache Cassandra allows it to achieve 100 percent uptime when other systems struggle to do so. You'll have an excellent understanding of data distribution, replication, and Cassandra's highly tunable consistency model. This is followed by an in-depth look at Cassandra's robust support for multiple data centers, and how to scale out a cluster. Next, the book explores the domain of application design, with chapters discussing the native driver and data modeling. Lastly, you'll find out how to steer clear of common antipatterns and take advantage of Cassandra's ability to fail gracefully.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Packt Publishing

Year

2014

eBook ISBN

9781783989126

Edition

Topic

Ciencia de la computación

Subtopic

Minería de datos

Cassandra High Availability

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Errata

Piracy

Questions

1. Cassandra's Approach to High Availability

ACID

The monolithic architecture

The master-slave architecture

Sharding

Master failover

Cassandra's solution

Cassandra's architecture

Distributed hash table

Replication

Replication across data centers

Tunable consistency

The CAP theorem

Summary

2. Data Distribution

Hash table fundamentals

Distributing hash tables

Consistent hashing

The mechanics of consistent hashing

Token assignment

Manually assigned tokens

vnodes

How vnodes improve availability

Adding and removing nodes

Node rebuilding

Heterogeneous nodes

Partitioners

Hotspots

Effects of scaling out using ByteOrderedPartitioner

A time-series example

Summary

3. Replication

The replication factor

Replication strategies

SimpleStrategy

NetworkTopologyStrategy

Snitches

Maintaining the replication factor when a node fails

Consistency conflicts

Consistency levels

Repairing data

Balancing the replication factor with consistency

Summary

4. Data Centers

Use cases for multiple data centers

Live backup

Failover

Load balancing

Geographic distribution

Online analysis

Analysis using Hadoop

Analysis using Spark

Data center setup

RackInferringSnitch

PropertyFileSnitch

GossipingPropertyFileSnitch

Cloud snitches

Replication across data centers

Setting the replication factor

Consistency in a multiple data center environment

The anatomy of a replicated write

Achieving stronger consistency between data centers

Summary

5. Scaling Out

Choosing the right hardware configuration

Scaling out versus scaling up

Growing your cluster

Adding nodes without vnodes

Adding nodes with vnodes

How to scale out

Adding a data center

How to scale up

Upgrading in place

Scaling up using data center replication

Removing nodes

Removing nodes within a data center

Decommissioning a data center

Other data migration scenarios

Snitch changes

Summary

6. High Availability Features in the Native Java Client

Thrift versus the native protocol

Setting up the environment

Connecting to the cluster

Executing statements

Prepared statements

Batched statements

Caution with batches

Handling asynchronous requests

Running queries in parallel

Load balancing

Failing over to a remote data center

Downgrading the consistency level

Defining your own retry policy

Token awareness

Tying it all together

Falling back to QUORUM

Summary

7. Modeling for High Availability

How Cassandra stores data

Implications of a log-structured storage

Understanding compaction

Size-tiered compaction

Leveled compaction

Date-tiered compaction

CQL under the hood

Single primary key

Compound keys

Partition keys

Clustering columns

Composite partition keys

The importance of the storage model

Understanding queries

Query by key

Range queries

Denormalizing with collections

How collections are stored

Sets

Lists

Maps

Working with time-series data

Designing for immutability

Modeling sensor data

Queries

Time-based ordering

Using a sentinel value

Satisfying our queries

When time is all that matters

Working with geospatial data

Summary

8. Antipatterns

Multikey queries

Secondary indices

Secondary indices under the hood

Distributed joins

Deleting data

Garbage collection

Resurrecting the dead

Unexpected deletes

The problem with tombstones

Expiring columns

TTL antipatterns

When null does not mean empty

Cassandra is not a queue

Unbounded row growth

Summary

9. Failing Gracefully

Knowledge is power

Monitoring via Java Management Extensions

Using OpsCenter

Choosing a management toolset

Logging

Cassandra logs

Garbage collector logs

Monitoring node metrics

Thread pools

Column family statistics

Finding latency outliers

Communication metrics

When a node goes down

Marking a downed node

Handling a downed node

Handling slow nodes

Backing up data

Taking a snapshot

Incremental backups

Restoring from a snapshot

Summary

Index

Cassandra High Availability

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: December 2014

Production reference: 1221214

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78398-912-6

www.packtpub.com

Credits

Author

Robbie Strickland

Reviewers

Richard Low

Jimmy Mårdell

Rob Murphy

Russell Spitzer

Commissioning Editor

Kunal Parikh

Acquisition Editors

Richard Harvey

Owen Roberts

Content Development Editors

Samantha Gonsalves

Azharuddin Sheikh

Technical Editor

Ankita Thakur

Copy Editors

Pranjali Chury

Merilyn Pereira

Project Coordinator

Sanchita Mandal

Proofreaders

Simran Bhogal

Maria Gould

Ameesha Green

Paul Hindle

Indexer

Rekha Nair

Graphics

Sheetal Aute

Disha Haria

Abhinash Sahu

Production Coordinator

Alwin Roy

Cover Work

Alwin Roy

About the Author

Robbie Strickland got involved in the Apache Cassandra project in 2010, and he initially went into production with the 0.5 release. He has made numerous contributions over the years, including his work on drivers for C# and Scala, and multiple contributions to the core Cassandra codebase. In 2013, he became the very first certified Cassandra developer, and in 2014, DataStax selected him as an Apache Cassandra MVP.

While this is Robbie's first published technical book, he has been an active speaker and writer in the Cassandra community and is the founder of the Atlanta Cassandra Users Group. Other examples of his writing can be found on the DataStax blog, and he has conducted numerous webinars and spoken at many conferences over the years.

Cassandra High Availability

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Cassandra High Availability by Robbie Strickland in PDF and/or ePUB format, as well as other popular books in Ciencia de la computación & Minería de datos. We have over one million books available in our catalogue for you to explore.

Cassandra High Availability

Cassandra High Availability

About this book

Tools to learn more effectively

Information

Cassandra High Availability

Table of Contents

Cassandra High Availability

Credits

About the Author

Table of contents

Frequently asked questions