Chapter 1
Introduction to Social Networks
In recent years, with the development of Internet technology, online social networks have gradually changed peopleâs lives. Facebook, Twitter, Sina Weibo, etc., have become more and more popular. So far, there are more than 2.2 billion and 500 million registered users on Facebook and Twitter, respectively. Every day, hundreds of millions of people spend a lot of time on the social network platforms to share, communicate, connect, interact, and create massive information, which has directly contributed to the arrival of the era of big data. Large amounts of data is generated in the online social networks, including usersâ sharing, relationships, and interactions, etc. These data provides an opportunity for us to learn about the patterns of interaction between users, to detect network events, and to predict user behaviors. Thus, massive social network data has great research value and huge market applications. Social network mining, which is a new research field with rapid growth, has become a hot research topic. In this research area, social connections are important and inseparable features of social networks. Compared with traditional data mining, we need some new methodologies to analyze and mine the social network data which are related to the social psychology, statistics, spectral analysis, probabilistic theory, graph theory, and graph mining, and so on.
1.1Social Networks
A social network consists of a set of social actors, a set of social relationships, and other social interactions between actors. With the rapid development of Internet technology, there have been a large number of social platforms, such as Facebook, Twitter, Sina Weibo, YouTube, etc. These social platforms have worked as social sensors to sense usersâ behaviors. Every day billions of people spend a lot of time communicating on social platforms, thus generating a lot of user behavior data. These data form an online social network.
Nowadays, online social networks have become important channels for people to obtain and spread information, to make friends, and as the sources of entertainment. Because of the complexity of social network structures and the massive information, it is also a frontier research field in computational science, management science, psychology science, behavior science, and sociology. As a result, the analysis and mining of social networks have already influenced the political, economic, and culture development of countries. It has played an increasingly important role in the field of business intelligence, social management, academic research, and certain other fields.
Through social network mining, we can analyze the usersâ behaviors and preferences to help businesses recommend products better. Sitaram Asur of HP Labs has succeeded in predicting the movieâs box office by analyzing Twitter data (Asur and Huberman, 2010). Analysis of user location data can help policemen to find users with abnormal behaviors, and thus maintain law and order. We can effectively mobilize public opinion for monitoring and eliminate negative social network information through a study of social network information dissemination. Real-time analysis of network robustness also allows us to detect the internal and external events in the social networks. In Tunisia and Egypt, the governmentsâ opponents have used social networks for revolutionary propaganda, making a seemingly powerful regime collapse in half a month. Therefore, online social network research has been of significant concern at home and abroad in recent years.
At present, the main research objectives of social network mining are as follows:
(1)Analysis of the topological characteristics of social networks.
(2)User behavior analysis.
(3)Socialization recommendation.
(4)Community detection.
(5)Information dissemination in social networks.
1.2Challenges of Social Network Mining
Social network mining has attracted the attentions of both academia and industry, and it has produced many research results. However, social network mining is still in its infancy. With the development of online social networks, social network mining has been facing the following challenges:
Large scale: Social networks are usually large-scale networks with millions of vertices, which are complex and have massive amounts of data. But most of the existing representative methods are only tested on small-scale social networks, and thus the scalability of these methods is poor. Therefore, how to design an accurate and scalable method is a problem that needs to be solved.
Heterogeneity: Data heterogeneity is a great challenge in social network mining. User attributes and behaviors can differ vastly across social networks due to the different site designs. In many situations, we need to incorporate the heterogeneous user behaviors to build a reasonable model. It is a natural question how to model the heterogeneous user behaviors.
Uncertainty: The same users may provide inconsistent information for the same attribute in different social networks. This could be due to the input errors or usersâ intentional omission. These situations often need a combination of natural language processing, machine learning, data mining, and some other technologies to help deal with them, which thus poses the challenges of user behavior mining and analysis.
Mutual influence: In many traditional problems, research objects are treated as the independent entities. However, individuals in social networks are connected to one another, and their behaviors also impact each other. Social network has the power to dramatically influence our choices, actions, thoughts, feelings, even our desires. In social network-related researches, we cannot analyze and mine user behaviors independently.
Fragmented data: Todayâs information-based dynamic platforms provide a convenient way for users to soak in numerous accounts of different online social networks simultaneously. Moreover, user behaviors are usually fragmented, with uncertainty, incompleteness, and varying levels of quality. This fragmented data brings many challenges to us since each piece provides some limited information, but not the whole picture.
1.3Chapter Organization
The rest of this book organized as follows. In Chapter 2, we introduced some basic concepts of network modeling. In Chapter 3, we develop R-energy as a new metric of network robustness based on the spectral analysis of network structures. In Chapter 4, we propose an unsupervised method, Collective Network Linkage, to link users across heterogeneous social networks. In Chapters 5 and 6, we detect dense sub-structures from bipartite graphs and signed graphs.
Chapter 2
Network Modeling
We live in a connected world. The transportation network has made travel much more convenient; the financial network has made peopleâs consumption and financial management very easy; and communication networks have helped spread information very quickly. Network brings peopleâs lives closer and promotes the development of industries rapidly. These connected networks can be represented by the different graphs. The information form networks can help us to predict connections, make user portraits, recommend production, etc. In this chapter, we will introduce some basic concepts about graphs in network mining.
2.1The Types of Networks
In a network, every individual can be represented as a vertex (node) of a graph, and two individuals who interact each other can be connected with an edge in the graph. Social networks may be represented as many different types of graphs.
2.1.1Graph
In this section, we list some common notations used in this book.
Definition 2.1 (Graph). A graph G is a binary tuple, denoted as G = (V, E), where V is the vertex set, and E â V Ă V is the edge set.
Graph represents the friendships between individuals (users, people, etc.), which are modeled as or vertices, and the connection or relationship between two vertices is called an edge. In some social platforms, we may care about the directions of relationships. For example, Twitter promotes two-way communication with its following and follower friends, it does not force a connection between them, i.e., you can follow an account, but it does not have to follow you, and vice versa. Furthermore, we distinguish the graphs as directed and undirected graphs. This is described below.
Definition 2.2 (Directed and Undirected Graphs). Given a graph G = (V, E),
(1)graph G is a directed graph if âi â j â {1, 2, . . . , |V|} such that (vi, vj) â E, but (vj, vi) â E;
(2)graph G is an undirected graph, for 1 †i â j †|V|, if (vi, vj) â E, we have (vj, vi) â E.
In some cases, we may care about the strength of a relationship between two vertices. For example, if two individuals ...