Apache Mahout Clustering Designs
eBook - ePub

Apache Mahout Clustering Designs

  1. 130 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Apache Mahout Clustering Designs

Book details
Book preview
Table of contents
Citations

About This Book

Explore clustering algorithms used with Apache MahoutAbout This Book• Use Mahout for clustering datasets and gain useful insights• Explore the different clustering algorithms used in day-to-day work• A practical guide to create and evaluate your own clustering models using real world data setsWho This Book Is ForThis book is for developers who want to try out clustering on large datasets using Mahout. It will also be useful for those users who don't have background in Mahout, but have knowledge of basic programming and are familiar with basics of machine learning and clustering. It will be helpful if you know about clustering techniques with some other tool.What You Will Learn• Explore clustering algorithms and cluster evaluation techniques• Learn different types of clustering and distance measuring techniques• Perform clustering on your data using K-Means clustering• Discover how canopy clustering is used as pre-process step for K-Means• Use the Fuzzy K-Means algorithm in Apache Mahout• Implement Streaming K-Means clustering in Mahout• Learn Spectral K-Means clustering implementation of MahoutIn DetailAs more and more organizations are discovering the use of big data analytics, interest in platforms that provide storage, computation, and analytic capabilities has increased. Apache Mahout caters to this need and paves the way for the implementation of complex algorithms in the field of machine learning to better analyse your data and get useful insights into it.Starting with the introduction of clustering algorithms, this book provides an insight into Apache Mahout and different algorithms it uses for clustering data. It provides a general introduction of the algorithms, such as K-Means, Fuzzy K-Means, StreamingKMeans, and how to use Mahout to cluster your data using a particular algorithm. You will study the different types of clustering and learn how to use Apache Mahout with real world data sets to implement and evaluate your clusters.This book will discuss about cluster improvement and visualization using Mahout APIs and also explore model-based clustering and topic modelling using Dirichlet process. Finally, you will learn how to build and deploy a model for production use.Style and approachThis book is a hand's-on guide with examples using real-world datasets. Each chapter begins by explaining the algorithm in detail and follows up with showing how to use mahout for that algorithm using example data-sets.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Apache Mahout Clustering Designs by Ashish Gupta in PDF and/or ePUB format, as well as other popular books in Computer Science & Programming Algorithms. We have over one million books available in our catalogue for you to explore.

Information

Year
2015
ISBN
9781783284443
Edition
1

Apache Mahout Clustering Designs


Table of Contents

Apache Mahout Clustering Designs
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Understanding Clustering
The clustering concept
Application of clustering
Understanding distance measures
Understanding different clustering techniques
Hierarchical methods
The partitioning method
The density-based method
Probabilistic clustering
Algorithm support in Mahout
Clustering algorithms in Mahout
Installing Mahout
Building Mahout code using Maven
Setting up the development environment using Eclipse
Setting up Mahout for Windows users
Preparing data for use with clustering techniques
Summary
2. Understanding K-means Clustering
Learning K-means
Running K-means on Mahout
Dataset selection
Executing K-means
The clusterdump result
Visualizing clusters
Summary
3. Understanding Canopy Clustering
Running Canopy clustering on Mahout
The Canopy generation phase
The Canopy clustering phase
Running Canopy clustering
Using the Canopy output for K-means
Visualizing clusters
Working with CSV files
Summary
4. Understanding the Fuzzy K-means Algorithm Using Mahout
Learning Fuzzy K-means clustering
Running Fuzzy K-means on Mahout
Dataset
Creating a vector for the dataset
Vector reader
Visualizing clusters
Summary
5. Understanding Model-based Clustering
Learning model-based clustering
Understanding Dirichlet clustering
Topic modeling
Running LDA using Mahout
Dataset selection
Steps to execute CVB (LDA)
Summary
6. Understanding Streaming K-means
Learning Streaming K-means
The Streaming step
The BallKMeans step
Using Mahout for streaming K-means
Dataset selection
Converting CSV to a vector file
Running Streaming K-means
Summary
7. Spectral Clustering
Understanding spectral clustering
Affinity (similarity) graph
Getting graph Laplacian from the affinity matrix
Eigenvectors and eigenvalues
The spectral clustering algorithm
Normalized spectral clustering
Mahout implementation of spectral clustering
Summary
8. Improving Cluster Quality
Evaluating clusters
Extrinsic methods
Intrinsic methods
Using DistanceMeasure interface
Summary
9. Creating a Cluster Model for Production
Preparing the dataset
Launching the Mahout job on the cluster
Performance tuning for the job
Summary
Index

Apache Mahout Clustering Designs

Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: September 2015
Production reference: 1240915
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78328-443-6
www.packtpub.com

Credits

Author
Ashish Gupta
Reviewers
Siva Prakash
Tharindu Rusira
Commissioning Editor
Akram Hussain
Acquisition Editors
Vivek Anantharaman
Divya Poojari
Content Development Editor
Susmita Sabat
Technical Editor
Namrata Patil
Copy Editor
Merilyn Pereira
Project Coordinator
Judie Jose
Proofreader
Safis Editng
Indexer
Rekha Nair
Graphics
Abhinash Sahu
Production Coordinator
Manu Joseph
Cover Work
Manu Joseph

About the Author

Ashish Gupta has been working in the field of software development for the last 10 years. He has worked in companies such as SAP Labs and Caterpillar as a software developer. While working for a start-up predicting potential customers for new fashion apparels using social media, he developed an interest in the field of machine learning. Since then, he has worked on big data technologies and machine learning for different industries, including retail, finance, insurance, and so on. He is passionate about learning new technologies and sharing that knowledge with others. He is the author of the book, Learning Apache Mahout Classification, Packt Publishing. He has organized many boot camps for Apache Mahout and the Hadoop ecosystem.
2323__perlego__chapter_divider__232...

Table of contents

  1. Apache Mahout Clustering Designs