Big Data Analytics
eBook - ePub

Big Data Analytics

A Social Network Approach

Mrutyunjaya Panda, Ajith Abraham, Aboul Ella Hassanien, Mrutyunjaya Panda, Ajith Abraham, Aboul Ella Hassanien

Share book
  1. 316 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Big Data Analytics

A Social Network Approach

Mrutyunjaya Panda, Ajith Abraham, Aboul Ella Hassanien, Mrutyunjaya Panda, Ajith Abraham, Aboul Ella Hassanien

Book details
Book preview
Table of contents
Citations

About This Book

Social networking has increased drastically in recent years, resulting in an increased amount of data being created daily. Furthermore, diversity of issues and complexity of the social networks pose a challenge in social network mining. Traditional algorithm software cannot deal with such complex and vast amounts of data, necessitating the development of novel analytic approaches and tools.

This reference work deals with social network aspects of big data analytics. It covers theory, practices and challenges in social networking. The book spans numerous disciplines like neural networking, deep learning, artificial intelligence, visualization, e-learning in higher education, e-healthcare, security and intrusion detection.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Big Data Analytics an online PDF/ePUB?
Yes, you can access Big Data Analytics by Mrutyunjaya Panda, Ajith Abraham, Aboul Ella Hassanien, Mrutyunjaya Panda, Ajith Abraham, Aboul Ella Hassanien in PDF and/or ePUB format, as well as other popular books in Volkswirtschaftslehre & Statistik für Volks- & Betriebswirtschaft. We have over one million books available in our catalogue for you to explore.

Information

CHAPTER 1
Linkage-based Social Network Analysis in Distributed Platform
Ranjan Kumar Behera,1,* Monalisa Jena,2 Debadatta Naik,1 Bibudatta Sahoo1 and Santanu Kumar Rath1
1 National Institute of Technology, Rourkela, 769008, India.
2 Fakir Mohan University, Balasore, 756019, India.
* Corresponding author: [email protected]
Introduction
The social network is a platform where a large number of social entities can communicate with each other by sharing their views on a number of topics, posting a bunch of multimedia files or exchanging a number of messages. It is the structure, where the group of users is connected through their social relationships. The social network can be modeled as a non-linear data structure, like the graph, where each node represents a user and the edges between them depict the relationships between the users. Facebook, Twitter, YouTube, LinkedIn, Wikipedia are some of the most popular social network platforms where billions of users interact every day. In these social network websites, along with the interaction, exabyte of structured, unstructured and semi-structured data is generated at every instance of time (Hey et al. 2009). This is the reason where the term ‘big data’ is associated with the social network. Research in the social network has the beginning with the social scientist analyzing human social behavior, mathematicians analyzing complex network theory and now computer scientists analysing the generated data for extracting quite a number of useful information. Social scientists, physicists, and mathematicians are basically dealing with the structural analysis of social network while computer scientists are dealing with data analysis. Social networks are the most important sources of data analytics, assessment, sentiment analysis, online interactions and content sharing (Pang et al. 2008). At the beginning stage of a social network, information was posted on the homepages and only a few internet users were able to interact through the homepages. However, nowadays an unimaginable number of activities are carried out through the social network which leads to a huge amount of data deposition. Social network enables the users to exchange messages within a fraction of time, regardless of their geographical location. Many individuals, organizations and even government officials now follow the social network structure and media data to extract useful information for their benefit. Since the data generated from the social network are huge, complex and heterogeneous in nature, it proves a highly computationally challenging task. However, big data technology allows analysts to sift accurate and useful information from the vast amount of data.
Social network analysis is one of the emerging research areas in the era of big data analytics. Social network consists of linkage and content data (Knoke and Yang 2008). Linkage data can be modeled through graph structure which depicts the relationship between the nodes whereas the content data is in the form of structured, semi-structured or unstructured data (Aggarwal 2011). They basically consist of text, images, numerical data, video, audio, etc. Basically social network analysis can be broadly classified into two categories: Social Media Analysis (He et al. 2013) and Social Structure Analysis (Gulati 1995).
It has been observed that a huge amount of data is generated from social network at every fraction of time and the size of generated data is increasing at an exponential rate. Storing, processing, analyzing the huge, complex and heterogeneous data is one of the most challenging tasks in the field of data science. All the data which is being created on the social network website can be considered as social media. The availability of massive amounts of online media data provides a new impetus to statistical and analytical study in the field of data science. It also leads to several directions of research where statistical and computational power play a major role in analyzing the large-scale data. The structure-based analysis is found to be more challenging as it is a more complex structure than the media data. A number of real-time applications are based on structural analysis of the network where linkage information in the network plays a vital role in analysis. Social network is the network of relationships between the nodes where each node corresponds to the user and the link between them corresponds to the interaction between them. The interaction may be friendship, kinship, liking, collaboration, co-authorship or any other kind of relationship (Zhang et al. 2007). The basic idea behind each social network is the information network where groups of users either post and share common information or exchange information between them. The concept of social network is not restricted to particular types of information network; it could be any kind of network between the social entities where information is generated continuously. A number analysis can be carried out using the structural information of the network to identify the importance of each node or reveal the hidden relationships between the users. Before discussing the details of research in social network analysis, it is better to point out certain kinds of structural properties that real-world social network follows. Some of them are small-world phenomenon, power-law degree distribution or scale-free network, etc., which were devised much before the advent of computer and internet technology. Small-world phenomenon was proposed by Jeffery and Stanley Milgram in 1967 which says that most of the people in the world are connected through a small number of acquaintances which further leads to a theory known as six-degree of separation (Kleinfeld 2002, Shu and Chuang 2011). According to the theory of six-degree of separation, any pair of actor in the planet are separated by at most six degree of acquaintances. This theory is now the inherent principle of today’s large-scale social network (Watts and Strogatz 1998). A number of experiments were carried out by social scientists to prove the six-degree of separation principle. One of the experiments is reflected in MSN messenger data. It shows that the average path length between any two MSN messenger users is 6.6. The real-world social network is observed to follow power-law degree distribution, which implies that most of the nodes in the network are having less degree and few of the nodes are having a larger degree. The fraction of nodes having k connection with other nodes in the network depends on the value of k and a constant parameter. It can be mathematically defined as follows (Barabsi and Albert 1999):
F(k)=kγ
(1)
where F(k) is the fraction of node having out-degree k and y is the scale parameter typically ranging from 2 to 3. Scale-free network is the network which follows the power-law degree distribution. We can say that the real-world social network is scale-free in nature rather than following a random network where degree distribution among the nodes is random. Traditional tools are inefficient in handling a huge amount of unstructured data that are generated from the large-scale social network. Apart from the generated online social media data, the structure of the social network is quite complex in nature, being difficult to analyze. A distributed platform like Spark (Gonzalez et al. 2014), Hadoop (White 2012) may be a suitable platform for analyzing large-scale social network efficiently.
Research Issues in Social Network
Nowadays the social network is observed to be an inevitable part of human life. The fundamental aspect of social networks is that they are rich in content and linkage data. The size of the social network is increasing rapidly as millions of users along with their relationships are added dynamically at every instance of time (Borgatti and Everett 2006). A huge amount of data is generated exponentially through the social network. The social network analysis is based on either linkage data or the content data of the network. A number of data mining and artificial intelligence techniques can be applied to these data for extracting useful pattern and knowledge. As the size of the data is huge, complex and heterogeneous in nature, traditional tools are inefficient in handling such data. Big data techniques may be helpful in analyzing the ever-increasing content data of the social network. However, in this chapter, we mainly focus on the structural analysis of the social network. Structural analysis can be carried out on either static or dynamic network. Static analysis of the network is possible only when the structure of the network does not change frequently, like in bibliographic network or co-authorship network where the author collaboration or citation count increases slowly over time. In static analysis, the entire network is processed in batch mode. Static analysis is easier in comparison to dynamic analysis where the structure of the networks is changing at every instance of time.
A large number of research problem may evolve in the context of structural analysis of the network. As we know, the structure of the social network changes at an exponential rate in the era of the Internet, the attributes involved in dynamics of the network also change. Modeling the dynamic structure of the social network is found to be quite a challenging task as the network parameters are changing more rapidly than expected. For example, as per small-world phenomena, any two entities are supposed to be separated by a small number of acquaintances but the actuality of this phenomenon may be fickle over the structural changes of the network. Verifying several structural properties in the dynamic network is found to be of great interest in recent years.
Linkage-based Social Network Analysis
Centrality Analysis
Centrality analysis is the process of identifying the important or most influential node in the network (Borgatti and Everett 2006). The meaning of importance may differ from application to application. In one context, a specified node is found to be the most influential node while in another context, the same node might not have a higher influence factor in the network, for example, in a social network website, it can be inferred that at one instance of time, Obama is the most powerful person in the world while in another instance of time, he might not be that powerful. Centrality analysis could be helpful in finding the relative importance of the person in the network. The word importance has a wide number of meanings that lead to different kinds of centrality measures. Evaluating centrality values for each node in a ...

Table of contents