Social Media has been the Buzzword in recent times and I feel analytics companies are bored with structured data with different metrics and are exploring new realms of data which are not-so-structured. In this process, companies are using different methods to find patterns within the chaos. If you have read the earlier Blog post then you are already introduced to the analytics lifecycle. I will talk about the techniques in social media analytics. Among different techniques that are employed, notable are the various social media tools that have come up from different vendors for different platforms, API’s of the platforms themselves, machine learning and other methods used for text analysis. But have we ever thought about using statistical methods which are well established in the field of analytics to evaluate and find insights from the data that we get from these platforms? This is what I am going to talk about now.
Statistical methods have been significantly prevalent in the field of analytics which is common knowledge and I feel there is no reason why they cannot be used in social media analytics space. Over the years multivariate regression models, logistic regression, survival analysis, inter-correlation matrix, factor analysis and chi-square automatic interaction detector and cluster analysis have hogged the analytics space in any segment that you can think of like credit card analytics, risk analytics, retail management, corporate intelligence or supply chain. It is time we use them for social media analytics. By saying this I would like to point out that using statistical methods for social media is not a new concept and some notable research has been done already. Some research papers worth mentioning in this context are
- “Predicting Future with Social Media” by Bernado Huberman and Sitaram Asur, where they use regression models to predict the success of Hollywood Movies
- Golder et. al. showed how the number of messages sent versus the number of users sending this messages on Facebook follow a Power-law and thin-tailed exponential distributions using the log-likelihood test. They also give a lot of insights on the time of use of Facebook and their social variations
- Another notable research in the context of twitter is by Meeyoung Cha et.al., where they use Spearman’s rank correlation coefficient to identify top influencers and their relative influence ranks and they identify that million followers does not necessarily imply huge influence as their paper is aptly named “Measuring User influence in Twitter: The million follower fallacy”
There is a rich literature in academics which uses the well-known statistical methods on social media data which all analytics companies deploying social media strategies will be able to leverage. In my next post I will talk about the publications of Bernado Huberman in detail and in the upcoming posts I will talk about how we can use them in Social Media analytics.
Few thoughts that pose my mind are Can we do a tree modeling to find out the probability of a blog being viral? Can we exactly find out the influence of a person when he/she is operating on multiple platforms by common statistical methods? Is it possible to understand the trend of conversations by looking at the time series data? The point I am trying to make here is I believe we “can” develop social media advanced analytics techniques like other established analytics streams and the time is Now!!. Let me know what you think about it.
- “Advanced Analytics predictions for 2010” by James Kobielus,Forrester