Flaw of Averages

When working with numbers representing many people, one often employs the average to describe the people represented by the numbers. However, this approach can lead to serious problems due to the flaw of averages, which is that findings based on the average are wrong on average. In fact, often when dealing with people, the average person…… Continue reading Flaw of Averages

The Illusion of Data Validity: Why Numbers About People Are Likely Wrong

This reflection article addresses a difficulty faced by scholars and practitioners working with numbers about people, which is that those who study people want numerical data about these people. Unfortunately, time and time again, this numerical data about people is wrong. Addressing the potential causes of this wrongness, we present examples of analyzing people numbers,…… Continue reading The Illusion of Data Validity: Why Numbers About People Are Likely Wrong

When mere correlations are not enough: The Granger Causality test

In most data science-related problems, datasets consist of multiple variables, in which independent variables might depend on other independent variables. When the variables in datasets represent observations at different times, we call this dataset a time series set. The time interval in these data sets may be hourly, daily, weekly, monthly, quarterly, annually, etc. One…… Continue reading When mere correlations are not enough: The Granger Causality test

SIG-KM International Research Symposium 2022 Keynote

Super time at the SIG-KM International Research Symposium 2022, hosted online by the University of North Texas. Thanks to Jeff Allen and to Lu An for the invitation! Thanks also to the Center for Studies of Information Resources (CSIR) of Wuhan University (WHU) and other academic sponsors for their support of the symposium!  

Comparison of Google Analytics and SimilarWeb for Web Analytics

Approaches to collecting website analytics data can be grouped by the focus of data collection efforts, resulting in the emergence of three general methodologies, namely: user-centric, site-centric, and network-centric. Two industry standard and popular web analytics platforms are Google Analytics and SimilarWeb. Google Analytics is a site-centric service, and SimilarWeb is a user-centric service that…… Continue reading Comparison of Google Analytics and SimilarWeb for Web Analytics

User-centric, site-centric, and network-centric: Approaches to collecting website analytics data

Approaches to collecting website analytics data can be grouped by the focus of data collection efforts, resulting in the emergence of three general methodologies, namely: (a) user-centric, (b) site-centric, and (c) network-centric. The central traits of each are as follows. User-centric: Web analytics data is gathered via a panel of users, which is tracked by software installed on…… Continue reading User-centric, site-centric, and network-centric: Approaches to collecting website analytics data

SegmentSizeEstimator, a research tool of the Acua Platform

Wondering what factors contribute to high levels of online engagement? In our research, we have found that one of the most reliable predictors of level of engagement for an ad, online content, or social media post for a given channel is simply size of the target population. For example, we’ve ranked viewers of YouTube channels…… Continue reading SegmentSizeEstimator, a research tool of the Acua Platform

Engineers, Aware! Commercial Tools Disagree on Social Media Sentiment

For segmentation, one often need to use sentiment analysis services. Large commercial sentiment analysis tools are often deployed in software engineering due to their ease of use. However, it is not known how accurate these tools are, and whether the sentiment ratings given by one tool agree with those given by another tool. We use…… Continue reading Engineers, Aware! Commercial Tools Disagree on Social Media Sentiment

What Really Matters?: Characterizing and Predicting User Engagement of News Postings Using Multiple Platforms, Sentiments, and Topics

This research characterizes user engagement of approximately 3,000,000 news postings of 53 news outlets and 50,000,000 associated user comments during 8 months on 5 social media platforms (i.e. Facebook, Instagram, Twitter, YouTube, and Reddit). We investigate the effect of sentiments and topics on user engagement across four levels of user engagement expressions (i.e. views, likes,…… Continue reading What Really Matters?: Characterizing and Predicting User Engagement of News Postings Using Multiple Platforms, Sentiments, and Topics

Measuring 9 emotions of news posts from 8 news organizations across 4 social media platforms for 8 months

Using Plutchik’s wheel of emotions framework, we identify the emotional content of 133,487 social media posts and the audience’s emotional engagement expressed in 2,824,162 comments on those posts. We measure nine emotions (anger, anticipation, anxiety, disgust, joy, fear, sadness, surprise, trust) and two sentiments (positive and negative) using two extraction resources (EmoLex, LIWC) for eight…… Continue reading Measuring 9 emotions of news posts from 8 news organizations across 4 social media platforms for 8 months