Social Media Content Analysis
eBook - ePub

Social Media Content Analysis

Natural Language Processing and Beyond

  1. 199 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Social Media Content Analysis

Natural Language Processing and Beyond

Book details
Book preview
Table of contents
Citations

About This Book

-->

Social media platforms have been ubiquitously used in our daily lives and are steadily transforming the ways people communicate, socialize and conduct business. However, the growing popularity of social media adversely leads to wild spread of unreliable information. This in turn inevitably creates serious pollution problem of the global social media environment, which is harmful against humanity. For example, President Donald Trump used social media strategically to win in the 2016 USA Presidential Election. But it was found that many messages he delivered over social media were unproven, if not untrue. This problem must be prevented at all cost and as soon as possible. Thus, analysis of social media content is a pressing issue. It is a timely and important research subject worldwide. However, the short and informal nature of social media messages renders conventional content analysis, which is based on natural language processing (NLP), ineffective. This volume consists of a collection of highly relevant scientific articles published by the authors in different international conferences and journals, and is divided into three distinct parts: (I) search and filtering; (II) opinion and sentiment analysis; and (III) event detection and summarization. This book presents the latest advances in NLP technologies for social media content analysis, especially content on microblogging platforms such as Twitter and Weibo.

--> Contents:

  • Search and Filtering:
    • Ranking Model Selection and Fusion for Effective Microblog Search (Z Wei, W Gao, T El-Ganainy, W Magdy and K-F Wong)
    • Microblog Search and Filtering with Real-Time Dynamics Based on BM25 (W Gao, Z Wei and K-F Wong)
    • Exploring Tweets Normalization and Query Time Sensitivity for Twitter Search (Z Wei, W Gao, L Zhou, B Li and K-F Wong)
    • A Hierarchical Knowledge Representation for Expert Finding on Social Media (Y Li, W Li and S Li)
    • Twitter Hyperlink Recommendation with User-Tweet-Hyperlink Three-Way Clustering (D Gao, R Zhang, W Li and Y Hou)
    • Detect Rumors Using Time Series of Social Context Information on Microblogging (J Ma, W Gao, Z Wei, Y Liu and K-F Wong)
    • An Empirical Study on Uncertainty Identification in Social Media Context (Z Wei, J Chen, W Gao, B Li, L Zhou, Y He and K-F Wong)
    • Detecting Semantic Uncertainty by Learning Hedge Cues in Sentences Using an HMM (X Li, W Gao and J W Shavlik)
  • Opinion and Sentiment Analysis:
    • A Unified Graph Model for Sentence-Based Opinion Retrieval (B Li, L Zhou, S Feng and K-F Wong)
    • Intersubjectivity and Sentiment: From Language to Knowledge (L Gui, R Xu, Y He, Q Lu and Z Wei)
    • Event-Driven Emotion Cause Extraction with Corpus Construction (L Gui, R Xu, D Wu, Q Lu and Y Zhou)
    • Learning Task Specific Distributed Paragraph Representations Using a 2-Tier Convolutional Neural Network (T Chen, R Xu, Y He and X Wang)
    • Build Emotion Lexicon from Microblogs by Combining Effects of Seed Words and Emoticons in a Heterogeneous Graph (K Song, S Feng, W Gao, D Wang, L Chen and C Zhang)
    • Personalized Sentiment Classification Based on Latent Individuality of Microblog Users (K Song, S Feng, W Gao, D Wang, G Yu and K-F Wong)
    • Efficient Feedback-Based Feature Learning for Blog Distillation as a Terabyte Challenge (D Gao and W Li and R Zhang)
    • Inferring Topic-Dependent Influence Roles of Twitter Users (C Chen, D Gao, W Li and Y Ho)
    • A Constrained Multi-View Clustering Approach to Influence Role Detection (C Chen, D Gao, W Li and Y Ho)
  • Event Detection and Summarization:
    • Using Content-Level Structures for Summarizing Microblog Repost Trees (J Li, W Gao, Z Wei, B Peng and K-F Wong)
    • Utilizing Microblogs for Automatic News Highlights Extraction (Z Wei and W Gao)
    • Gibberish, Assistant, or Master? Using Tweets Linking to News for Extractive Single-Document Summarization (Z Wei and W Gao)
    • Using Tweets to Help Sentence Compression for News Highlights Generation (Z Wei, Y Liu, C Li and W Gao)
    • Joint Topic Modeling for Event Summarization Across News and Social Media Streams (W Gao, P Li and K Darwish)
    • Automatic Twitter Topic Summarization with Speech Acts (R Zhang, W Li, D Gao and Y Ouyang)
    • Sequential Summarization: A Full View of Twitter Trending Topics (D Gao, W Li, X Cai, R Zhang and Y Ouyang)
    • TGSum: Build Tweet Guided Multi-Document Summarization Dataset (C Chen, W Li, S Li, F Wei and M Zhou)
    • Topic Extraction from Microblog Posts Using Conversation Structures (J Li, M Liao, W Gao, Y He and K-F Wong)
    • Exploiting Community Emotion for Microblog Event Detection (G Ou, W Chen, T Wang, Z Wei, B Li, D Yang and K-F Wong)
    • Tracking Sentiment and Topic Dynamics from Social Media (Y He, C Lin, W Gao and K-F Wong)

-->
--> Readership: Academics and researchers in social media content analysis and natural language processing. -->
Keywords:Social Media;Natural Language Processing;Information Retrieval;Content AnalysisReview:0

Frequently asked questions

Simply head over to the account section in settings and click on ā€œCancel Subscriptionā€ - itā€™s as simple as that. After you cancel, your membership will stay active for the remainder of the time youā€™ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlegoā€™s features. The only differences are the price and subscription period: With the annual plan youā€™ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weā€™ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Social Media Content Analysis by Kam-Fai Wong, Wei Gao;Ruifeng Xu;Wenjie Li; in PDF and/or ePUB format, as well as other popular books in Computer Science & Natural Language Processing. We have over one million books available in our catalogue for you to explore.

Information

Publisher
WSPC
Year
2017
ISBN
9789813223622
Part I
Search and Filtering

Chapter 1

Ranking Model Selection and Fusion for Effective Microblog Search

Zhongyu Wei1, Wei Gao2, Tarek El-Ganainy2, Walid Magdy2 and Kam-Fai Wong1
1Department of Systems Engineering & Engineering Management,
The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR

{zywei, kfwong}@se.cuhk.edu.hk
2Qatar Computing Research Institute, Hamad Bin Khalifa University,
P.O. Box 5825, Doha, Qatar

{wgao, telganainy, wmagdy}@qf.org.qa
Re-ranking was shown to have positive impact on the effectiveness for microblog search. Yet, existing approaches mostly focused on using a single ranker to learn a better ranking function with respect to various relevance features. Given different available rank learners (such as learning to rank algorithms), in this work, we mainly study an orthogonal problem where multiple learned ranking models form an ensemble for re-ranking the retrieved tweets rather than just using a single ranking model in order to achieve higher search effectiveness. We explore the use of query-sensitive model selection and rank fusion methods based on the result lists produced from multiple rank learners. Base on the TREC microblog datasets, we found that our selection-based ensemble approach can significantly outperform the system that uses the single best ranker, and it also has clear advantage over the rank fusion approach that combines the results of all the available models.
This chapter has been published as a workshop paper in the SIGIR 2014 Workshop on Social Media Retrieval and Analysis (SoMeRA) [31].āˆ—

1.Introduction

In recent years, microblogging services have become increasingly popular on the Internet. For example, Twitter has billions of online users who can exchange information of interest everyday in the form of short messages, which are called tweets each limited within 140 characters. Because of the timely fashion of tweets, breaking news or current events are captured and propagated much faster over this platform than the traditional news feeds on the Web. Therefore, users are willing to search over the enormous collection of online microblogs to satisfy their information needs for on-going topics.
However, topical ad hoc search so far is not the most popular search behavior on Twitter. Human factor study [29] found that Twitter users mainly perform search to get updates about some entities or celebrities, find friends, get insight about certain hashtags, and so on. This is not only because of the social nature of the service, but also due to the generally low quality of microblogs. The latter becomes a big obstacle for ad hoc search since the strict (short) length limit and the colloquial form of expressions in the posts can result in serious word mismatch problem.
It has been found that in general two people use the same term to describe the same concept in less than 20% of times [6]. Word mismatch problem is more severe for short casual queries (like microblog queries) than for long elaborate ones [33]. If documents are very brief such as tweets, the risk of query terms failing to match words observed in relevant documents would be even larger [7]. The problem does not only have the effect of hindering the retrieval of relevant documents, but also naturally produces bad rankings of retrieved relevant documents [5].
In microblog search, some techniques such as query or document expansion have been used to address word mismatch for providing better retrieval effectiveness, and among others, reports showed that the ranking models learned from various relevance features for re-ranking top retrieval results can typically improve the final results [9, 15, 22]. However, all of the reported re-ranking methods merely focused on feature engineering, and none of them on ranking technique itself. Also, the applied models are all based on a single ranker which is typically query-insensitive and may not be universally suitable for different types of queries.
In this chapter, we study how to improve re-ranking microblog search results by leveraging ranked list from multiple rankers. We examine some state-of-the-art post-retrieval re-ranking approaches and their variants: (1) We choose the best single ranking model among all the candidate models; (2) For each query, we select the best performed ranking model from the candidate models in a query-sensitive manner; (3) We aggregate the ranked lists of all the candidate models available using different fusion techniques; (4) Instead of selecting the single best ranker for each query or fusing the results of all candidate models, we explore different fusion techniques to combine the outputs of top-k ranking models selected in a query-by-query basis. We compare these approaches based on TREC Microblog datasets. Experimental results show that the query-sensitive selection together with the ensemble of multiple rankers can achieve statistically significant improvements over baselines.

2.Related Work

Several studies have investigated the nature of microblog search compared to other search tasks. Naveed et al. [24] illustrated the challenges of microblog retrieval, where documents are very short and typically focused on a single topic. Teevan et al. [29] highlighted the differences between microblog queries and Web search queries: firstly, microblog queries represent usersā€™ interest to find updates about a given event or person as opposed to relevant pages on a given topic in Web search; secondly, the length of microblog queries are much shorter (with only 1.64 words on average) as compared to that of Web queries (with 3.08 words on average).
TREC introduced a track for ad hoc microblog search starting from 2011 [18, 25, 28]. Many different approaches were proposed while only a few of them presented good retrieval effectiveness. Typical methods could be summarized as using query or document expansion for retrieval, performing post-retrieval re-ranking based on various relevance features, or the combination of both. Among the effective approaches, many of them used learning to rank algorithms [19] for re-ranking [9, 15, 22]. However, these works only focused on feature engineering and none of them examined the ranking techniques more deeply. They suffered from the following issues: (1) some work had very small training set that is not sufficient to learn a powerful ranker [9]; (2) some of them used only a very limited feature set with just 10 or so features [15, 22]; (3) all of them simply employed a single ranking model which is query-insensitive. There leaves much room for further improvement by using more sophisticated techniques.

3.Ranker Selection and Fusion

To improve the effectiveness of re-ranking of retrieved tweets, we present three considerations different from previous work: (1) Instead of employing only one ranking model, we can resort to multiple ranking models and combine the results produced from them for re-ranking; (2) The model selection could be query-sensitive, aiming to choose the multiple top rankers in a unsupervised query-by-query basis for improving the re-ranking for the entire topic set; (3) A metasearch fusion algorithm can be adopted to aggregate the preferences of multiple ranking models based on either the selected top ranking models or all available ranking models.
Inspired by the supervised model selection strategy [26], we propose a re-ranking system that allows to select multiple ranking models and combine their results on a per-query basis. Suppose we have learned a set o...

Table of contents

  1. Cover
  2. Halftitle
  3. Title
  4. Copyright
  5. Preface
  6. Contents
  7. About the Editors
  8. Part I: Search and Filtering
  9. Part II: Opinion and Sentiment Analysis
  10. Part III: Event Detection and Summarization