Data Science

Here is how to go about empirically testing data from social (media) networks using probabilistic distributions

Share on:

CcHUB, as a leading social innovation hub in Africa, acquired the ability to gather and find meaning in data from social networks such as Facebook, and Twitter. This enabled us to deploy platforms that use social network analysis techniques to determine connectedness and social ties between individuals and communities. These connections provide insights that can be pivotal in times of crisis (SNA Linked), and present an immediate opportunity for trade flows across countries (SCI Trade).

Like other structures of network data around us, a network from social media data follows a power law degree distribution (Lubos Takac, 2012). The aim of this article is to empirically test the network data from social (media) networks by estimating the best fitting power law model, and test its statistical credibility, by using machine learning techniques (Bootstrapping).

Social network analysis techniques are used to determine connectedness and social ties between users by using centrality measures, and these measures are known to be represented through the degree distribution which is a statistic that computes the number of vertices that are of a certain degree. A power law distribution describes the 80-20 rule (known as the Pareto principle, Zepf’s Law) that governs various domains, in social networks theories, a small number of vertices are assumed to be incident to a large number of edges (Bilal Khan, 2015).

The {poweRlaw} package of R was used to test for the presence of a power-law distribution on social connectedness index (SCI) data obtained from Facebook Data for Good Database. The generated p-value ranges between 0.33 and 0.78. This indicates that at the 5% level of significance we cannot reject the null hypothesis that the SCI data is generated from a power-law distribution. Thus, data that measure the social connectedness between different geographical locations follow power-law degree distribution.

Recommended

Research & data modelling
5 min read

Remote Sensing Adoption Gap Analysis

Research & data modelling
5 min read

Comparative Analysis of Use Cases of Remote Sensing for Development

Research & data modelling
5 min read

Artificial Intelligence Skepticism in Career Domains

Scroll to Top