Table of Contents
Securing Hadoop
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Errata
Piracy
Questions
1. Hadoop Security Overview
Why do we need to secure Hadoop?
Challenges for securing the Hadoop ecosystem
Key security considerations
Reference architecture for Big Data security
Summary
2. Hadoop Security Design
What is Kerberos?
Key Kerberos terminologies
How Kerberos works?
Kerberos advantages
The Hadoop default security model without Kerberos
Hadoop Kerberos security implementation
User-level access controls
Service-level access controls
User and service authentication
Delegation Token
Job Token
Block Access Token
Summary
3. Setting Up a Secured Hadoop Cluster
Prerequisites
Setting up Kerberos
Installing the Key Distribution Center
Configuring the Key Distribution Center
Establishing the KDC database
Setting up the administrator principal for KDC
Starting the Kerberos daemons
Setting up the first Kerberos administrator
Adding the user or service principals
Configuring LDAP as the Kerberos database
Supporting AES-256 encryption for a Kerberos ticket
Configuring Hadoop with Kerberos authentication
Setting up the Kerberos client on all the Hadoop nodes
Setting up Hadoop service principals
Creating a keytab file for the Hadoop services
Distributing the keytab file for all the slaves
Setting up Hadoop configuration files
HDFS-related configurations
MRV1-related configurations
MRV2-related configurations
Setting up secured DataNode
Setting up the TaskController class
Configuring users for Hadoop
Automation of a secured Hadoop deployment
Summary
4. Securing the Hadoop Ecosystem
Configuring Kerberos for Hadoop ecosystem components
Securing Hive
Securing Hive using Sentry
Securing Oozie
Securing Flume
Securing Flume sources
Securing Hadoop sink
Securing a Flume channel
Securing HBase
Securing Sqoop
Securing Pig
Best practices for securing the Hadoop ecosystem components
Summary
5. Integrating Hadoop with Enterprise Security Systems
Integrating Enterprise Identity Management systems
Configuring EIM integration with Hadoop
Integrating Active-Directory-based EIM with the Hadoop ecosystem
Accessing a secured Hadoop cluster from an enterprise network
HttpFS
HUE
Knox Gateway Server
Summary
6. Securing Sensitive Data in Hadoop
Securing sensitive data in Hadoop
Approach for securing insights in Hadoop
Securing data in motion
Securing data at rest
Implementing data encryption in Hadoop
Summary
7. Security Event and Audit Logging in Hadoop
Security Incident and Event Monitoring in a Hadoop Cluster
The Security Incident and Event Monitoring (SIEM) system
Setting up audit logging in a secured Hadoop cluster
Configuring Hadoop audit logs
Summary
A. Solutions Available for Securing Hadoop
Hadoop distribution with enhanced security support
Automation of a secured Hadoop cluster deployment
Cloudera Manager
Zettaset
Different Hadoop data encryption options
Dataguise for Hadoop
Gazzang zNcrypt
eCryptfs for Hadoop
Securing the Hadoop ecosystem with Project Rhino
Mapping of security technologies with the reference architecture
Infrastructure security
OS and filesystem security
Application security
Network perimeter security
Data masking and encryption
Authentication and authorization
Audit logging, security policies, and procedures
Security Incident and Event Monitoring
Index
Copyright © 2013 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2013
Production Reference: 1181113
Published by Packt Publishing
Ltd.Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78328-525-9
www.packtpub.com
Author
Sudheesh Narayanan
Reviewers
Mark Kerzner
Nitin Pawar
Acquisition Editor
Antony Lowe
Commissioning Editor
Shaon Basu
Technical Editors
Amit Ramadas
Amit Shetty
Project Coordinator
Akash Poojary
Proofreader
Ameesha Green
Indexer
Rekha Nair
Graphics
Sheetal Aute
Ronak Dhruv
Valentina D'silva
Disha Haria
Abhinash Sahu
Production Coordinator
Nilesh R. Mohite
Cover Work
Nilesh R. Mohite
Sudheesh Narayanan is a Technology Strategist and Big Data Practitioner with expertise in technology consulting and implementing Big Data solutions. With over 15 years of IT experience in Information Management, Business Intelligence, Big Data & Analytics, and Cloud & J2EE application development, he provided his expertise in architecting, designing, and developing Big Data products, Cloud management platforms, and highly scalable platform services. His expertise in Big Data includes Hadoop and its ecosystem components, NoSQL databases (MongoDB, ...