Python Natural Language Processing Cookbook
eBook - ePub

Python Natural Language Processing Cookbook

Over 50 recipes to understand, analyze, and generate text for implementing language processing tasks

Zhenya Antić

  1. 284 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

Python Natural Language Processing Cookbook

Over 50 recipes to understand, analyze, and generate text for implementing language processing tasks

Zhenya Antić

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

Get to grips with solving real-world NLP problems, such as dependency parsing, information extraction, topic modeling, and text data visualization

Key Features

  • Analyze varying complexities of text using popular Python packages such as NLTK, spaCy, sklearn, and gensim
  • Implement common and not-so-common linguistic processing tasks using Python libraries
  • Overcome the common challenges faced while implementing NLP pipelines

Book Description

Python is the most widely used language for natural language processing (NLP) thanks to its extensive tools and libraries for analyzing text and extracting computer-usable data. This book will take you through a range of techniques for text processing, from basics such as parsing the parts of speech to complex topics such as topic modeling, text classification, and visualization.

Starting with an overview of NLP, the book presents recipes for dividing text into sentences, stemming and lemmatization, removing stopwords, and parts of speech tagging to help you to prepare your data. You'll then learn ways of extracting and representing grammatical information, such as dependency parsing and anaphora resolution, discover different ways of representing the semantics using bag-of-words, TF-IDF, word embeddings, and BERT, and develop skills for text classification using keywords, SVMs, LSTMs, and other techniques. As you advance, you'll also see how to extract information from text, implement unsupervised and supervised techniques for topic modeling, and perform topic modeling of short texts, such as tweets. Additionally, the book shows you how to develop chatbots using NLTK and Rasa and visualize text data.

By the end of this NLP book, you'll have developed the skills to use a powerful set of tools for text processing.

What you will learn

  • Become well-versed with basic and advanced NLP techniques in Python
  • Represent grammatical information in text using spaCy, and semantic information using bag-of-words, TF-IDF, and word embeddings
  • Perform text classification using different methods, including SVMs and LSTMs
  • Explore different techniques for topic modeling such as K-means, LDA, NMF, and BERT
  • Work with visualization techniques such as NER and word clouds for different NLP tools
  • Build a basic chatbot using NLTK and Rasa
  • Extract information from text using regular expression techniques and statistical and deep learning tools

Who this book is for

This book is for data scientists and professionals who want to learn how to work with text. Intermediate knowledge of Python will help you to make the most out of this book. If you are an NLP practitioner, this book will serve as a code reference when working on your projects.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Python Natural Language Processing Cookbook un PDF/ePUB en línea?
Sí, puedes acceder a Python Natural Language Processing Cookbook de Zhenya Antić en formato PDF o ePUB, así como a otros libros populares de Computer Science y Data Processing. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Año
2021
ISBN
9781838987787
Edición
1
Categoría
Data Processing

Chapter 1: Learning NLP Basics

While working on this book, I focused on including recipes that would be useful in a wide variety of NLP (Natural Language Processing) projects. They range from the simple to the advanced, and deal with everything from grammar to visualizations; and in many of them, options for languages other than English are included. I hope you find the book useful.
Before we can get on with the real work of NLP, we need to prepare our text for processing. This chapter will show you how to do that. By the end of the chapter, you will be able to have a list of words in a piece of text arranged with their parts of speech and lemmas or stems, and with very frequent words removed.
NLTK and spaCy are two important packages that we will be working with in this chapter and throughout the book.
The recipes included in this chapter are as follows:
  • Dividing text into sentences
  • Dividing sentences into words: tokenization
  • Parts of speech tagging
  • Stemming
  • Combining similar words: lemmatization
  • Removing stopwords

Technical requirements

Throughout this book, I will be showing examples that were run using an Anaconda installation of Python 3.6.10. To install Anaconda, follow the instructions here: https://docs.anaconda.com/anaconda/install/.
After you have installed Anaconda, use it to create a virtual environment:
conda create -n nlp_book python=3.6.10 anaconda
activate nlp_book
Then, install spaCy 2.3.0 and NLTK 3.4.5:
pip install nltk
pip install spacy
After you have installed spaCy and NLTK, install the models needed to use them. For spaCy, use this:
python -m spacy download en_core_web_sm
Use Python commands to download the necessary model for NLTK:
python
>>> import nltk
>>> nltk.download('punkt')
All the code that is in this book can be found in the book's GitHub repository: https://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook.
Important note
The files in the book's GitHub repository should be run using the -m option from the main directory that contains the code subfolders for each chapter. For example, you would use it as follows:
python -m Chapter01.dividing_into_sentences

Dividing text into sentences

When we work with text, we can work with text units on different scales: we can work at the level of the document itself, such as a newspaper article; the paragraph, the sentence, or the word. Sentences are the main unit of processing in many NLP tasks. In this section, I will show you how to divide text into sentences.

Getting ready

For this part, we will be using the text of the book The Adventures of Sherlock Holmes. You can find the whole text in the book's GitHub (see the sherlock_holmes.txt file). For this recipe, we will need just the beginning of the book, which can be found in the sherlock_holmes_1.txt file.
In order to do this task, you will need the nltk package and its sentence tokenizers, described in the Technical requirements section.

How to do it…

We will now divide the text of The Adventures of Sherlock Holmes, outputting a list of sentences:
  1. Import the nltk package:
    import nltk
  2. Read in the book text:
    filename = "sherlock_holmes_1.txt"
    file = open(filename, "r", encoding="utf-8")
    text = file.read()
  3. Replace newlines with spaces:
    text = text.replace("\n", " ")
  4. Initialize an NLTK tokenizer. This uses the punkt model we downloaded previously:
    tokenizer = nltk.data.load("tokenizers/punkt/english.pickle")
  5. Divide the text into sentences:
    sentences = tokenizer.tokenize(text)
    The resulting list, sentences, has all the sentences in the first part of the book:
    ['To Sherlock Holmes she is always _the_ woman.', 'I have seldom heard him mention her under any other name.', 'In his eyes she eclipses and predominates the whole of her sex.', 'It was not that he felt any emotion akin to love for Irene Adler.', 'All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind.', 'He was, I take it, the most perfect reasoning and observing machine that the world has seen, but as a lover he would have placed himself in a false position.', 'He never spoke of the softer passions, save with a gibe and a sneer.', 'They were admirable things for the observer—excellent for drawing the veil from men's motives and actions.', 'But for the trained reasoner to admit such intrusions into his own delicate and finely adjusted temperament was to introduce a distracting factor which might throw a doubt upon all his mental results.', 'Grit in a sensitive instrument, or a crack in one of his own high-power lenses, would not be more disturbing than a strong emotion in a nature such as his.', 'And yet there was but one woman to him, and that woman was the late Irene Adler, of dubious and questionable memory.']

How it works…

In step 1, we import the nltk package. In step 2...

Índice

  1. Python Natural Language Processing Cookbook
  2. Contributors
  3. Preface
  4. Chapter 1: Learning NLP Basics
  5. Chapter 2: Playing with Grammar
  6. Chapter 3: Representing Text – Capturing Semantics
  7. Chapter 4: Classifying Texts
  8. Chapter 5: Getting Started with Information Extraction
  9. Chapter 6: Topic Modeling
  10. Chapter 7: Building Chatbots
  11. Chapter 8: Visualizing Text Data
  12. Other Books You May Enjoy
Estilos de citas para Python Natural Language Processing Cookbook

APA 6 Citation

Antić, Z. (2021). Python Natural Language Processing Cookbook (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/2381899/python-natural-language-processing-cookbook-over-50-recipes-to-understand-analyze-and-generate-text-for-implementing-language-processing-tasks-pdf (Original work published 2021)

Chicago Citation

Antić, Zhenya. (2021) 2021. Python Natural Language Processing Cookbook. 1st ed. Packt Publishing. https://www.perlego.com/book/2381899/python-natural-language-processing-cookbook-over-50-recipes-to-understand-analyze-and-generate-text-for-implementing-language-processing-tasks-pdf.

Harvard Citation

Antić, Z. (2021) Python Natural Language Processing Cookbook. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/2381899/python-natural-language-processing-cookbook-over-50-recipes-to-understand-analyze-and-generate-text-for-implementing-language-processing-tasks-pdf (Accessed: 15 October 2022).

MLA 7 Citation

Antić, Zhenya. Python Natural Language Processing Cookbook. 1st ed. Packt Publishing, 2021. Web. 15 Oct. 2022.