An Introduction to Duplicate Detection
eBook - PDF

An Introduction to Duplicate Detection

  1. English
  2. PDF
  3. Available on iOS & Android
eBook - PDF

An Introduction to Duplicate Detection

Book details
Table of contents
Citations

About This Book

With the ever increasing volume of data, data quality problems abound. Multiple, yet different representations of the same real-world objects in data, duplicates, are one of the most intriguing data quality problems. The effects of such duplicates are detrimental; for instance, bank customers can obtain duplicate identities, inventory levels are monitored incorrectly, catalogs are mailed multiple times to the same household, etc. Automatically detecting duplicates is difficult: First, duplicate representations are usually not identical but slightly differ in their values. Second, in principle all pairs of records should be compared, which is infeasible for large volumes of data. This lecture examines closely the two main components to overcome these difficulties: (i) Similarity measures are used to automatically identify duplicates when comparing two records. Well-chosen similarity measures improve the effectiveness of duplicate detection. (ii) Algorithms are developed to perform on very large volumes of data in search for duplicates. Well-designed algorithms improve the efficiency of duplicate detection. Finally, we discuss methods to evaluate the success of duplicate detection. Table of Contents: Data Cleansing: Introduction and Motivation / Problem Definition / Similarity Functions / Duplicate Detection Algorithms / Evaluating Detection Success / Conclusion and Outlook / Bibliography

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access An Introduction to Duplicate Detection by Felix Nauman,Melanie Herschel in PDF and/or ePUB format, as well as other popular books in Computer Science & Computer Networking. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Springer
Year
2022
ISBN
9783031018350

Table of contents

  1. Cover
  2. Synthesis Lectures on Data Management
  3. Copyright Page
  4. Title Page
  5. Contents
  6. Data Cleansing: Introduction and Motivation
  7. Problem Definition
  8. Similarity Functions
  9. Duplicate Detection Algorithms
  10. Evaluating Detection Success
  11. Conclusion and Outlook
  12. Bibliography
  13. Authors' Biographies