eBook - ePub

Deep Learning for Search

Name: Deep Learning for Search
Author: Tommaso Teofili

Tommaso Teofili

328 pages
English
ePUB (adapté aux mobiles)
Disponible sur iOS et Android

eBook - ePub

Deep Learning for Search

Tommaso Teofili

Détails du livre

Aperçu du livre

Table des matières

Citations

À propos de ce livre

Summary Deep Learning for Search teaches you how to improve the effectiveness of your search by implementing neural network-based techniques. By the time you're finished with the book, you'll be ready to build amazing search engines that deliver the results your users need and that get better as time goes on!Foreword by Chris Mattmann.Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the Technology Deep learning handles the toughest search challenges, including imprecise search terms, badly indexed data, and retrieving images with minimal metadata. And with modern tools like DL4J and TensorFlow, you can apply powerful DL techniques without a deep background in data science or natural language processing (NLP). This book will show you how. About the Book Deep Learning for Search teaches you to improve your search results with neural networks. You'll review how DL relates to search basics like indexing and ranking. Then, you'll walk through in-depth examples to upgrade your search with DL techniques using Apache Lucene and Deeplearning4j. As the book progresses, you'll explore advanced topics like searching through images, translating user queries, and designing search engines that improve as they learn! What's inside

Accurate and relevant rankings
Searching across languages
Content-based image search
Search with recommendations

About the Reader For developers comfortable with Java or a similar language and search basics. No experience with deep learning or NLP needed. About the Author Tommaso Teofili is a software engineer with a passion for open source and machine learning. As a member of the Apache Software Foundation, he contributes to a number of open source projects, ranging from topics like information retrieval (such as Lucene and Solr) to natural language processing and machine translation (including OpenNLP, Joshua, and UIMA).He currently works at Adobe, developing search and indexing infrastructure components, and researching the areas of natural language processing, information retrieval, and deep learning. He has presented search and machine learning talks at conferences including BerlinBuzzwords, International Conference on Computational Science, ApacheCon, EclipseCon, and others. You can find him on Twitter at @tteofili. Table of Contents

PART 1 - SEARCH MEETS DEEP LEARNING

Neural search
Generating synonyms

PART 2 - THROWING NEURAL NETS AT A SEARCH ENGINE

From plain retrieval to text generation
More-sensitive query suggestions
Ranking search results with word embeddings
Document embeddings for rankings and recommendations

PART 3 - ONE STEP BEYOND

Searching across languages
Content-based image search
A peek at performance

Foire aux questions

Comment puis-je résilier mon abonnement ?

Il vous suffit de vous rendre dans la section compte dans paramètres et de cliquer sur « Résilier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez résilié votre abonnement, il restera actif pour le reste de la période pour laquelle vous avez payé. Découvrez-en plus ici.

Puis-je / comment puis-je télécharger des livres ?

Pour le moment, tous nos livres en format ePub adaptés aux mobiles peuvent être téléchargés via l’application. La plupart de nos PDF sont également disponibles en téléchargement et les autres seront téléchargeables très prochainement. Découvrez-en plus ici.

Quelle est la différence entre les formules tarifaires ?

Les deux abonnements vous donnent un accès complet à la bibliothèque et à toutes les fonctionnalités de Perlego. Les seules différences sont les tarifs ainsi que la période d’abonnement : avec l’abonnement annuel, vous économiserez environ 30 % par rapport à 12 mois d’abonnement mensuel.

Qu’est-ce que Perlego ?

Nous sommes un service d’abonnement à des ouvrages universitaires en ligne, où vous pouvez accéder à toute une bibliothèque pour un prix inférieur à celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! Découvrez-en plus ici.

Prenez-vous en charge la synthèse vocale ?

Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte à haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accélérer ou le ralentir. Découvrez-en plus ici.

Est-ce que Deep Learning for Search est un PDF/ePUB en ligne ?

Oui, vous pouvez accéder à Deep Learning for Search par Tommaso Teofili en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Informatik et Neuronale Netzwerke. Nous disposons de plus d’un million d’ouvrages à découvrir dans notre catalogue.

Informations

Éditeur

Manning

Année

2019

ISBN

9781638356271

Sujet

Informatik

Sous-sujet

Neuronale Netzwerke

Part 1. Search meets deep learning

Setting up search engines to effectively react to users’ needs isn’t an easy task. Traditionally, many manual tweaks and adjustments had to be made to a search engine’s internals to get it to work decently on a real collection of data. On the other hand, deep neural networks are very good at learning useful information about vast amounts of data. In this first part of the book, we’ll start looking into how a search engine can be used in conjunction with a neural network to get around some common limitations and provide users with a better search experience.

Chapter 1. Neural search

This chapter covers

A gentle introduction to search fundamentals
Important problems in search
Why neural networks can help search engines be more effective

Suppose you want to learn something about the latest research breakthroughs in artificial intelligence. What will you do to find information? How much time and work does it take to get the facts you’re looking for? If you’re in a (huge) library, you can ask the librarian what books are available on the topic, and they will probably point you to a few they know about. Ideally, the librarian will suggest particular chapters to browse in each book.

That sounds easy enough. But the librarian generally comes from a different context than you do, meaning you and the librarian may have different opinions about what’s significant. The library could have books in various languages, or the librarian might speak a different language. Their information about the topic could be outdated, given that latest is a fairly relative point in time, and you don’t know when the librarian last read anything about artificial intelligence, or if the library regularly receives publications in the field. Additionally, the librarian may not understand your inquiry properly. The librarian may think you’re talking about intelligence from the psychology perspective,^[1] requiring a few iterations back and forth before you understand one another and get to the pieces of information you need.

¹
This happened to me in real life.

Then, after all this, you might discover the library doesn’t have the book you need; or the information may be spread among several books, and you have to read them all. Exhausting!

Unless you’re a librarian yourself, this is what often happens nowadays when you search for something on the internet. Although we can think of the internet as a single huge library, there are many different librarians out there to help you find the information you need: search engines. Some search engines are experts in certain topics; others know only a subset of a library, or only a single book.

Now imagine that someone, let’s call him Robbie, who already knows about both the library and its visitors, can help you communicate with the librarian in order to better find what you’re looking for. That will help you get your answers more quickly. Robbie can help the librarian understand a visitor’s inquiry by providing additional context, for example. Robbie knows what the visitor usually reads about, so he skips all the books about psychology. Also, having read a lot of the books in the library, Robbie has better insight into what’s important in the field of artificial intelligence. It would be extremely helpful to have advisors like Robbie to help search engines work better and faster, and help users get more useful information.

This book is about using techniques from a machine learning field called deep learning (DL) to build models and algorithms that can influence the behavior of search engines, to make them more effective. Deep learning algorithms will play the role of Robbie, helping the search engine to provide a better search experience and to deliver more precise answers to end users.

One important thing to note is that DL isn’t the same as artificial intelligence (AI). As you can see in figure 1.1, AI is a huge research field; machine learning is only part of it, and DL, in turn, is a sub-area of machine learning. Basically, DL studies how to make machines “learn” using the deep neural network computing model.

Figure 1.1. Artificial intelligence, machine learning, and deep learning

1.1. Neural networks and deep learning

The goal of this book is to enable you to use deep learning in the context of search engines, to improve the search experience and results. Even if you’re not going to build the next Google search, you should be able to learn enough to use DL techniques within small or medium-sized search engines to provide a better experience to users. Neural search should help you automate work that you’d otherwise have to perform manually. For example, you’ll learn how to automate extraction of synonyms from search engine data, avoiding manual editing of synonym files (chapter 2). This saves time while improving search effectiveness, regardless of the specific use case or domain. The same is true for having good related-content suggestions (chapter 6). In many cases, users are satisfied with a combination of plain search together with the ability to navigate related content. We’ll also cover some more-specific use cases, such as searching content in multiple languages (chapter 7) and searching for images (chapter 8).

The only requirement for the techniques we’ll discuss is that they have enough data to feed into neural networks. But it’s difficult to define the boundaries of “enough data” in a generic way. Let’s instead summarize the minimum number of documents (text, images, and so on) that are generally needed for each of the problems addressed in the book: see table 1.1.

Table 1.1. Per-task requirements for neural search techniques

Task	Minimum number of docs (range)	Chapter
Learning word representations	1,000–10,000	2, 5
Text generation	10,000–100,000	3, 4
Learning document representations	1,000–10,000	6
Machine translation	10,000–100,000	7
Learning image representations	10,000–100,000	8

Note that table 1.1 isn’t meant to be strictly adhered to; these are numbers drawn from experience. For example, even if a search engine counts fewer than 10,000 documents, you can still try to implement the neural machine translation techniques from chapter 7; but you should take into account that it may be harder to get high-quality results (for example, perfect translations).

As you read the book, you’ll learn a lot about DL as well as all the required search fundamentals to implement these DL principles in a search engine. So if you’re a search engineer or a programmer willing to learn neural search, this book is for you.

You aren’t expected to know what DL is or how it works, at this point. You’ll get to know more as we look at some specific algorithms one by one, when they become useful for solving particular types of search problems. For now, I’ll start you off with some basic definitions. Deep learning is a field of machine learning where computers are capable of learning to represent and recognize things incrementally, by using deep neural networks. Deep artificial neural networks are a computational paradigm originally inspired by the way the brain is organized into graphs of neurons (although the brain is much more complex than an artificial neural network). Usually, information flows into neurons in an input layer, then through a network of hidden neurons (forming one or more hidden layers), and then out through neurons in an output layer. Neural networks can also be thought of as black boxes: smart functions that can transform inputs into outputs, based on what each network has been trained for. A common neural network has at least one input layer, one hidden layer, and one output layer. When a network has more than one hidden layer, we call the network deep. In figure 1.2, you can see a deep neural network with two hidden layers.

Figure 1.2. A deep neural network with two hidden layers

Before going into more detail about neural networks, let’s take a step back. I said deep learning is a subfield of machine learning, which is part of the broader area of artificial intelligence. But what is machine learning?

1.2. What is machine learning?

An overview of basic machine learning concepts is useful here before diving into DL and search specifics. Many of the concepts that apply to learning with artificial neural networks, such as supervised and unsupervised learning, training, and predicting, come from machine le...