Social Media and Statistical Methods:Are they reconcilable?

Social Media has been the Buzzword in recent times and I feel analytics companies are bored with structured data with different metrics and are exploring new realms of data which are not-so-structured. In this process, companies are using different methods to find patterns within the chaos. If you have read the earlier Blog post then you are already introduced to the analytics lifecycle. I will talk about the  techniques in social media analytics. Among different techniques that are employed, notable are the various social media tools that have come up from different vendors for different platforms, API’s of the platforms themselves, machine learning and other methods used for text analysis. But have we ever thought about using statistical methods which are well established in the field of analytics to evaluate and find insights from the data that we get from these platforms? This is what I am going to talk about now.

Statistical methods have been significantly prevalent in the field of analytics which is common knowledge and I feel there is no reason why they cannot be used in social media analytics space. Over the years multivariate regression models, logistic regression, survival analysis, inter-correlation matrix, factor analysis and chi-square automatic interaction detector and cluster analysis have hogged the analytics space in any segment that you can think of like credit card analytics, risk analytics, retail management, corporate intelligence or supply chain. It is time we use them for social media analytics. By saying this I would like to point out that using statistical methods for social media is not a new concept and some notable research has been done already. Some research papers worth mentioning in this context are

  • Predicting Future with Social Media” by Bernado Huberman and Sitaram Asur, where  they use regression models to predict the success of Hollywood Movies
  • Golder et. al. showed how the number of messages sent versus the number of users sending this messages on Facebook follow a Power-law and thin-tailed exponential distributions using the log-likelihood test. They also give a lot of insights on the time of use of Facebook and their social variations
  • Another notable research in the context of twitter is by Meeyoung Cha, where they use Spearman’s rank correlation coefficient to identify top influencers and their relative influence ranks and they identify that million followers does not necessarily imply huge influence as their  paper is aptly named “Measuring User influence in Twitter: The million follower fallacy”

There is a rich literature in academics which uses the well-known statistical methods on social media data which all analytics companies deploying social media strategies will be able to leverage. In my next post I will talk about the publications of Bernado Huberman in detail and in the upcoming posts I will talk about how we can use them in Social Media analytics.

Few thoughts that pose my mind are Can we do a tree modeling to find out the probability of a blog being viral? Can we exactly find out the influence of a person when he/she is operating on multiple platforms by common statistical methods? Is it possible to understand the trend of conversations by looking at the time series data? The point I am trying to make here is I believe we “can” develop social media advanced analytics techniques like other established analytics streams and the time is Now!!. Let me know what you think about it.

Related Articles:


About Arindam Mondal

Social media Analytics Professional with focus on Statistical Analysis.
This entry was posted in Advanced Analytics and tagged , , , . Bookmark the permalink.

2 Responses to Social Media and Statistical Methods:Are they reconcilable?

  1. Eddie Gear says:

    Running a website is more a number game. Its data, conversions, leads and so on. Today, there are already hundreds of start up that are focused towards social media engagement metrics measurements. Its all about what methodology you apply and how you measure your performance in the social media space.

  2. harsht says:

    Statistical analysis fundamentally assumes the independence of all observations, on the predictor variables. So you assume that Person A and Person B do not influence one another on any of the independent variables. So you cannot use statistics like number of followers into a regression model for example ( any network measure for that matter). Having said that there are a variety of techniques that have been develop to overcome these limitations : ERGM, P*, SIENA the list is growing

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s