{"id":365,"date":"2022-09-29T17:28:45","date_gmt":"2022-09-29T14:28:45","guid":{"rendered":"https:\/\/quecst.qcri.org\/blog\/?p=365"},"modified":"2022-10-26T13:28:46","modified_gmt":"2022-10-26T10:28:46","slug":"when-mere-correlations-are-not-enough-the-granger-causality-test","status":"publish","type":"post","link":"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/","title":{"rendered":"When mere correlations are not enough: The Granger Causality test"},"content":{"rendered":"<p>In most data science-related problems, datasets consist of multiple variables, in which independent variables might depend on other independent variables.<\/p>\n<p>When the variables in datasets represent observations at different times, we call this dataset a time series set.<\/p>\n<p>The time interval in these data sets may be hourly, daily, weekly, monthly, quarterly, annually, etc.<\/p>\n<p>One way to quantify these multivariate relationships in datasets is linear regression. A regression model might indicate a strong relationship between two or more variables. Still, these variables may be unrelated in reality. In this case, predictions based on these relationships fail due to a lack of domain knowledge [1]. For instance, researchers might build multilinear regression models without knowing the nature of the relationship between variables. Suppose such regression models produce a high R square value. In that case, the resulting model might further mislead the interpretations and generate poor predictions or forecasting [1].<\/p>\n<p>Consider the following time-series graph: variable X has a direct influence on variable Y, but there is a lag (i.e., time difference) of 5 between X and Y, so we cannot use the correlation matrix [2].<\/p>\n<figure id=\"attachment_366\" aria-describedby=\"caption-attachment-366\" style=\"width: 706px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-366\" src=\"https:\/\/quecst.qcri.org\/blog\/wp-content\/uploads\/2022\/09\/Granger_Image.png\" alt=\"When mere correlations are not enough: The Granger Causality tes\" width=\"706\" height=\"390\" srcset=\"https:\/\/acua.qcri.org\/blog\/wp-content\/uploads\/2022\/09\/Granger_Image.png 706w, https:\/\/acua.qcri.org\/blog\/wp-content\/uploads\/2022\/09\/Granger_Image-300x166.png 300w\" sizes=\"(max-width: 706px) 100vw, 706px\" \/><figcaption id=\"caption-attachment-366\" class=\"wp-caption-text\">When mere correlations are not enough: The Granger Causality test<\/figcaption><\/figure>\n<p>For example, consider X an increase in positive covid-19 cases in a city and Y an increase in the number of people hospitalized. For better forecasting, we would like to know if there is a causal relationship between the variables X and Y [1].<\/p>\n<p>To solve this issue, Prof. Clive W.J. Granger, recipient of the 2003 Nobel Prize in Economics, developed the causality concept to improve forecasting performance [4].<\/p>\n<p>It is, basically, an econometric hypothetical test for verifying the usage of one variable in forecasting another in multivariate time series data with a particular lag.<\/p>\n<p>The requirements for performing the Granger Causality test include the following:<\/p>\n<ul>\n<li>Testing if the variables are stationary: this is a prerequisite for performing the Granger Causality test, which indicates that the data must be stationary (i.e., it should have a constant mean, constant variance, and no seasonal component).<\/li>\n<li>If the data is not stationary, transform it by differencing it, either with first-order or second-order differencing.<\/li>\n<li>Do not proceed with the Granger causality test if the data is not stationary after second-order differencing.<\/li>\n<\/ul>\n<p>Performing hypothesis testing: check for the null hypothesis as follows:<\/p>\n<ul>\n<li>Null Hypothesis (H<sub>0<\/sub>) : Y<sub>t <\/sub>does not \u201cGranger cause\u201d X<sub>t+1 <\/sub>e., \ud835\udefc<sub>1<\/sub> = \ud835\udefc<sub>2 <\/sub>= \u22ef = \ud835\udefc<sub>\ud835\udc5d<\/sub> = 0.<\/li>\n<li>Alternate Hypothesis (HA): Y<sub>t <\/sub>does \u201cGranger cause\u201d X<sub>t+1<\/sub>, i.e., at least one of the lags of Y is significant.<\/li>\n<\/ul>\n<ol start=\"3\">\n<li>Calculate the f-statistic: using the following equations:<\/li>\n<\/ol>\n<ul>\n<li>F<sub>p,n-2<\/sub><sub>\ud835\udc5d<\/sub><sub>\u22121 <\/sub>= (\ud835\udc38\ud835\udc60\ud835\udc61\ud835\udc56\ud835\udc5a\ud835\udc4e\ud835\udc61\ud835\udc52 \ud835\udc5c\ud835\udc53 \ud835\udc38\ud835\udc65\ud835\udc5d\ud835\udc59\ud835\udc4e\ud835\udc56\ud835\udc5b\ud835\udc52\ud835\udc51 \ud835\udc49\ud835\udc4e\ud835\udc5f\ud835\udc56\ud835\udc4e\ud835\udc5b\ud835\udc50\ud835\udc52) \/ (\ud835\udc38\ud835\udc60\ud835\udc61\ud835\udc56\ud835\udc5a\ud835\udc4e\ud835\udc61\ud835\udc52 \ud835\udc5c\ud835\udc53 \ud835\udc48\ud835\udc5b\ud835\udc52\ud835\udc65\ud835\udc5d\ud835\udc59\ud835\udc4e\ud835\udc56\ud835\udc5b\ud835\udc52\ud835\udc51 \ud835\udc49\ud835\udc4e\ud835\udc5f\ud835\udc56\ud835\udc4e\ud835\udc5b\ud835\udc50\ud835\udc52)<\/li>\n<li>\u00a0F<sub>p,n-2<\/sub><sub>\ud835\udc5d<\/sub><sub>\u22121 <\/sub>=\u00a0 ( (\ud835\udc46\ud835\udc46\ud835\udc38\ud835\udc45\ud835\udc40\u2212\ud835\udc46\ud835\udc46\ud835\udc38\ud835\udc48\ud835\udc40) \/\ud835\udc5d) \/(\ud835\udc46\ud835\udc46\ud835\udc38\ud835\udc48\ud835\udc40 \/\ud835\udc5b\u22122\ud835\udc5d\u22121)<\/li>\n<\/ul>\n<p>where n is the number of observations and SSE is Sum of Squared Errors.<\/p>\n<p>If the p-values are less than a significance level (0.05) for at least one of the lags then reject the null hypothesis.<\/p>\n<p>Once all the requirements are met, perform test for both the direction X<sub>t<\/sub>&#8211;&gt;Yt and Y<sub>t<\/sub>&#8211;&gt;X<sub>t<\/sub>. Try different lags (p). The optimal lag can be determined using AIC [1].<\/p>\n<p>Now that you know some background on this statistical causality test, in this blog post, we will teach you how to perform the Granger Causality test in Python using the <a href=\"https:\/\/dl.dropboxusercontent.com\/s\/xmov0qhbrgxqdqv\/commentCounts.csv?dl=0\">yearly volume of toxic comments and links from Reddit<\/a> (available <a href=\"https:\/\/dl.dropboxusercontent.com\/s\/xmov0qhbrgxqdqv\/commentCounts.csv?dl=0\">here<\/a>) [5]<\/p>\n<p>In the Reddit comments dataset above, the time series consists of years (from 2005 to 2020), and the first variable consists of the total number of toxic comments, while the second variable consists of the total number of links in comments.<\/p>\n<p>First, we import the required libraries:<\/p>\n<pre style=\"color: #000000; background: #ffffff;\"><span style=\"color: #696969;\">#Import the required libraries <\/span>\n<span style=\"color: #800000; font-weight: bold;\">import<\/span> matplotlib<span style=\"color: #808030;\">.<\/span>pyplot <span style=\"color: #800000; font-weight: bold;\">as<\/span> plt\n<span style=\"color: #800000; font-weight: bold;\">import<\/span> seaborn <span style=\"color: #800000; font-weight: bold;\">as<\/span> sns\n<span style=\"color: #800000; font-weight: bold;\">import<\/span> numpy <span style=\"color: #800000; font-weight: bold;\">as<\/span> np\n<span style=\"color: #800000; font-weight: bold;\">import<\/span> pandas <span style=\"color: #800000; font-weight: bold;\">as<\/span> pd\n<\/pre>\n<p><!--Created using ToHtml.com on 2022-10-26 10:17:38 UTC --><br \/>\nThen, we read the downloaded dataset (keep it in the same path as your script) as follows:<\/p>\n<pre style=\"color: #000000; background: #ffffff;\"><span style=\"color: #696969;\">#Read the data and print the contents of the file<\/span>\n<span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span><span style=\"color: #0000e6;\">'Redditor comments:'<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span><span style=\"color: #0000e6;\">\"=============\"<\/span><span style=\"color: #808030;\">)<\/span>\ndf <span style=\"color: #808030;\">=<\/span> pd<span style=\"color: #808030;\">.<\/span>read_csv<span style=\"color: #808030;\">(<\/span><span style=\"color: #0000e6;\">\"commentCounts.csv\"<\/span><span style=\"color: #808030;\">)<\/span> \n<span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span>df<span style=\"color: #808030;\">)<\/span>\n<\/pre>\n<p><!--Created using ToHtml.com on 2022-10-26 10:18:17 UTC --><br \/>\nThe first requirement to conduct the Granger Causality test is to check if the dataset is stationary or not. For that, we can conduct an Augmented Dickey-Fuller Test (ADF Test) to check the f-statistic value:<\/p>\n<pre style=\"color: #000000; background: #ffffff;\"><span style=\"color: #800000; font-weight: bold;\">from<\/span> statsmodels<span style=\"color: #808030;\">.<\/span>tsa<span style=\"color: #808030;\">.<\/span>stattools <span style=\"color: #800000; font-weight: bold;\">import<\/span> adfuller\nresult <span style=\"color: #808030;\">=<\/span> adfuller<span style=\"color: #808030;\">(<\/span>df<span style=\"color: #808030;\">[<\/span><span style=\"color: #0000e6;\">'TotalLinks'<\/span><span style=\"color: #808030;\">]<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span>f<span style=\"color: #0000e6;\">'Test Statistics: {result[0]}'<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span>f<span style=\"color: #0000e6;\">'p-value: {result[1]}'<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span>f<span style=\"color: #0000e6;\">'critical_values: {result[4]}'<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">if<\/span> result<span style=\"color: #808030;\">[<\/span><span style=\"color: #008c00;\">1<\/span><span style=\"color: #808030;\">]<\/span> <span style=\"color: #44aadd;\">&gt;<\/span> <span style=\"color: #008000;\">0.05<\/span><span style=\"color: #808030;\">:<\/span>\n    <span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span><span style=\"color: #0000e6;\">\"Series is not stationary\"<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">else<\/span><span style=\"color: #808030;\">:<\/span>\n    <span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span><span style=\"color: #0000e6;\">\"Series is stationary\"<\/span><span style=\"color: #808030;\">)<\/span>\nresult <span style=\"color: #808030;\">=<\/span> adfuller<span style=\"color: #808030;\">(<\/span>df<span style=\"color: #808030;\">[<\/span><span style=\"color: #0000e6;\">'TotalToxic'<\/span><span style=\"color: #808030;\">]<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span>f<span style=\"color: #0000e6;\">'Test Statistics: {result[0]}'<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span>f<span style=\"color: #0000e6;\">'p-value: {result[1]}'<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span>f<span style=\"color: #0000e6;\">'critical_values: {result[4]}'<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">if<\/span> result<span style=\"color: #808030;\">[<\/span><span style=\"color: #008c00;\">1<\/span><span style=\"color: #808030;\">]<\/span> <span style=\"color: #44aadd;\">&gt;<\/span> <span style=\"color: #008000;\">0.05<\/span><span style=\"color: #808030;\">:<\/span>\n    <span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span><span style=\"color: #0000e6;\">\"Series is not stationary\"<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">else<\/span><span style=\"color: #808030;\">:<\/span>\n    <span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span><span style=\"color: #0000e6;\">\"Series is stationary\"<\/span><span style=\"color: #808030;\">)<\/span>\n<\/pre>\n<p>The result here shows that our dataset is not stationary in links. Thus, we have to make it stationary by performing first-order differencing as follows:<\/p>\n<pre style=\"color: #000000; background: #ffffff;\">df_transformed <span style=\"color: #808030;\">=<\/span> df<span style=\"color: #808030;\">.<\/span>diff<span style=\"color: #808030;\">(<\/span><span style=\"color: #808030;\">)<\/span><span style=\"color: #808030;\">.<\/span>dropna<span style=\"color: #808030;\">(<\/span><span style=\"color: #808030;\">)<\/span>\ndf <span style=\"color: #808030;\">=<\/span> df<span style=\"color: #808030;\">.<\/span>iloc<span style=\"color: #808030;\">[<\/span><span style=\"color: #008c00;\">1<\/span><span style=\"color: #808030;\">:<\/span><span style=\"color: #808030;\">]<\/span>\n<\/pre>\n<p>By repeating the ADF test, we can check if the transformed dataset is stationary or not:<\/p>\n<pre style=\"color: #000000; background: #ffffff;\">result <span style=\"color: #808030;\">=<\/span> adfuller<span style=\"color: #808030;\">(<\/span>df_transformed<span style=\"color: #808030;\">[<\/span><span style=\"color: #0000e6;\">'TotalLinks'<\/span><span style=\"color: #808030;\">]<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span>f<span style=\"color: #0000e6;\">'Test Statistics: {result[0]}'<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span>f<span style=\"color: #0000e6;\">'p-value: {result[1]}'<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span>f<span style=\"color: #0000e6;\">'critical_values: {result[4]}'<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">if<\/span> result<span style=\"color: #808030;\">[<\/span><span style=\"color: #008c00;\">1<\/span><span style=\"color: #808030;\">]<\/span> <span style=\"color: #44aadd;\">&gt;<\/span> <span style=\"color: #008000;\">0.05<\/span><span style=\"color: #808030;\">:<\/span>\n    <span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span><span style=\"color: #0000e6;\">\"Series is not stationary\"<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">else<\/span><span style=\"color: #808030;\">:<\/span>\n    <span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span><span style=\"color: #0000e6;\">\"Series is stationary\"<\/span><span style=\"color: #808030;\">)<\/span>\nresult <span style=\"color: #808030;\">=<\/span> adfuller<span style=\"color: #808030;\">(<\/span>df_transformed<span style=\"color: #808030;\">[<\/span><span style=\"color: #0000e6;\">'TotalToxic'<\/span><span style=\"color: #808030;\">]<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span>f<span style=\"color: #0000e6;\">'Test Statistics: {result[0]}'<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span>f<span style=\"color: #0000e6;\">'p-value: {result[1]}'<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span>f<span style=\"color: #0000e6;\">'critical_values: {result[4]}'<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">if<\/span> result<span style=\"color: #808030;\">[<\/span><span style=\"color: #008c00;\">1<\/span><span style=\"color: #808030;\">]<\/span> <span style=\"color: #44aadd;\">&gt;<\/span> <span style=\"color: #008000;\">0.05<\/span><span style=\"color: #808030;\">:<\/span>\n    <span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span><span style=\"color: #0000e6;\">\"Series is not stationary\"<\/span><span style=\"color: #808030;\">)<\/span>\n<span style=\"color: #800000; font-weight: bold;\">else<\/span><span style=\"color: #808030;\">:<\/span>\n    <span style=\"color: #800000; font-weight: bold;\">print<\/span><span style=\"color: #808030;\">(<\/span><span style=\"color: #0000e6;\">\"Series is stationary\"<\/span><span style=\"color: #808030;\">)<\/span>\n<\/pre>\n<p>Both tests show that the variables are now stationary, which means that we can perform the Granger Causality test on both sides as follows:<\/p>\n<pre style=\"color: #000000; background: #ffffff;\"><span style=\"color: #800000; font-weight: bold;\">from<\/span> statsmodels<span style=\"color: #808030;\">.<\/span>tsa<span style=\"color: #808030;\">.<\/span>stattools <span style=\"color: #800000; font-weight: bold;\">import<\/span> grangercausalitytests\ngrangercausalitytests<span style=\"color: #808030;\">(<\/span>df_transformed<span style=\"color: #808030;\">[<\/span><span style=\"color: #808030;\">[<\/span><span style=\"color: #0000e6;\">'TotalLinks'<\/span><span style=\"color: #808030;\">,<\/span> <span style=\"color: #0000e6;\">'TotalToxic'<\/span><span style=\"color: #808030;\">]<\/span><span style=\"color: #808030;\">]<\/span><span style=\"color: #808030;\">,<\/span> maxlag<span style=\"color: #808030;\">=<\/span><span style=\"color: #008c00;\">4<\/span><span style=\"color: #808030;\">)<\/span>\ngrangercausalitytests<span style=\"color: #808030;\">(<\/span>df_transformed<span style=\"color: #808030;\">[<\/span><span style=\"color: #808030;\">[<\/span><span style=\"color: #0000e6;\">'TotalToxic'<\/span><span style=\"color: #808030;\">,<\/span> <span style=\"color: #0000e6;\">'TotalLinks'<\/span><span style=\"color: #808030;\">]<\/span><span style=\"color: #808030;\">]<\/span><span style=\"color: #808030;\">,<\/span> maxlag<span style=\"color: #808030;\">=<\/span><span style=\"color: #008c00;\">4<\/span><span style=\"color: #808030;\">)<\/span>\n<\/pre>\n<p>Suppose that we pick a certain lag value like 3. In that case, we can see that in both directions, toxic comments granger cause links, and links granger cause toxic comments. Thus, the Granger Causality test concludes that in user-generated comments, there is a correlation between the existence of links and toxic comments [5].<\/p>\n<p>Of course, the Granger Causality test is not suitable for every data science case, and just like any other statistical hypothesis method, it has its strengths and limitations, which we summarize as follows:<\/p>\n<p><strong>Strengths of Granger Causality test<\/strong><\/p>\n<ul>\n<li>Simple to compute and can be applied in many applications<\/li>\n<li>\u00a0It provides a much more rigorous rule for causation (or information flow) than simply observing a high correlation with some lag-lead relationship [3].<\/li>\n<li>When time information is available, it characterizes the underlying spatiotemporal dynamics of variables rather than just modest correlations [3].<\/li>\n<\/ul>\n<p><strong>Limitations of Granger Causality test<\/strong><\/p>\n<ul>\n<li>Granger causality does not provide any insight into the relationship between the variable; hence it is not true causality, unlike &#8220;cause and effect&#8221; analysis [1].<\/li>\n<li>Possible sources of misguiding test results include not frequent enough or too frequent sampling in the dataset [2]. This is particularly true in the case of the posts collection in Reddit from [5], where around 99% of posts had links, so it was not possible to perform the same Granger Causality test on posts and comments.<\/li>\n<li>The Granger causality test cannot be performed on non-stationary data.<\/li>\n<\/ul>\n<p>The entire script is available on <a href=\"https:\/\/colab.research.google.com\/drive\/1nqkQeEJdubwItCIIhKR4gvu_Bv3FdvNt?usp=sharing\">Google Colab<\/a><\/p>\n<p>Happy coding!<\/p>\n<p><strong>References: <\/strong><\/p>\n<ul>\n<li>Padav, &#8220;Granger Causality in Time Series Explained with Chicken and Egg problem,&#8221; <em>Analytics Vidhya<\/em>, Aug. 22, 2021. <a href=\"https:\/\/www.analyticsvidhya.com\/blog\/2021\/08\/granger-causality-in-time-series-explained-using-chicken-and-egg-problem\/\">https:\/\/www.analyticsvidhya.com\/blog\/2021\/08\/granger-causality-in-time-series-explained-using-chicken-and-egg-problem\/<\/a> (accessed Sep. 28, 2022).<\/li>\n<li>\u201cGranger causality,\u201d Wikipedia. Aug. 03, 2022. Accessed: Sep. 29, 2022. [Online]. Available: <a href=\"https:\/\/en.wikipedia.org\/w\/index.php?title=Granger_causality&amp;oldid=1102158801\">https:\/\/en.wikipedia.org\/w\/index.php?title=Granger_causality&amp;oldid=1102158801<\/a><\/li>\n<li>Zhang, &#8220;Constructing ecological interaction networks by correlation analysis: hints from community sampling.&#8221; <em>Network Biology<\/em> Vol. 1, pp. 81-98, Sep. 2011.<\/li>\n<li>Mukherjee, P. and Jansen, B. J. &#8220;<a href=\"http:\/\/www.bernardjjansen.com\/uploads\/2\/4\/1\/8\/24188166\/jansen_conversing_and_searching.pdf\" target=\"_blank\" rel=\"noopener\">Conversing and searching: the causal relationship between social media and web search<\/a>,&#8221; <em>Internet Research<\/em>, vol. 27, no. 5, pp. 1209\u20131226, Jan. 2017, doi: <a href=\"https:\/\/doi.org\/10.1108\/IntR-07-2016-0228\">10.1108\/IntR-07-2016-0228<\/a>.<\/li>\n<li>Almerekhi, H. Kwak, H., and Jansen,B. J. &#8220;<a href=\"https:\/\/peerj.com\/articles\/cs-1059\/\" target=\"_blank\" rel=\"noopener\">Investigating toxicity changes of cross-community redditors from 2 billion posts and comments<\/a>,&#8221; <em>PeerJ Comput. Sci.<\/em>, vol. 8, p. e1059, Aug. 2022, doi: <a href=\"https:\/\/doi.org\/10.7717\/peerj-cs.1059\">10.7717\/peerj-cs.1059<\/a>.<\/li>\n<\/ul>\n<p>(Blog Post Author: <a href=\"https:\/\/www.linkedin.com\/in\/hind-almerekhi\/\" target=\"_blank\" rel=\"noopener\">Hind Almerekhi<\/a>)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In most data science-related problems, datasets consist of multiple variables, in which independent variables might depend on other independent variables. When the variables in datasets represent observations at different times, we call this dataset a time series set. The time interval in these data sets may be hourly, daily, weekly, monthly, quarterly, annually, etc. One&hellip; <a class=\"more-link\" href=\"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/\">Continue reading <span class=\"screen-reader-text\">When mere correlations are not enough: The Granger Causality test<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21],"tags":[63],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v19.13 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>When mere correlations are not enough: The Granger Causality test - Team Acua<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"When mere correlations are not enough: The Granger Causality test - Team Acua\" \/>\n<meta property=\"og:description\" content=\"In most data science-related problems, datasets consist of multiple variables, in which independent variables might depend on other independent variables. When the variables in datasets represent observations at different times, we call this dataset a time series set. The time interval in these data sets may be hourly, daily, weekly, monthly, quarterly, annually, etc. One&hellip; Continue reading When mere correlations are not enough: The Granger Causality test\" \/>\n<meta property=\"og:url\" content=\"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/\" \/>\n<meta property=\"og:site_name\" content=\"Team Acua\" \/>\n<meta property=\"article:published_time\" content=\"2022-09-29T14:28:45+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-10-26T10:28:46+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/quecst.qcri.org\/blog\/wp-content\/uploads\/2022\/09\/Granger_Image.png\" \/>\n<meta name=\"author\" content=\"Jim Jansen\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Jim Jansen\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/\"},\"author\":{\"name\":\"Jim Jansen\",\"@id\":\"https:\/\/acua.qcri.org\/blog\/#\/schema\/person\/e3bb7a0b58349e548e8940716694c215\"},\"headline\":\"When mere correlations are not enough: The Granger Causality test\",\"datePublished\":\"2022-09-29T14:28:45+00:00\",\"dateModified\":\"2022-10-26T10:28:46+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/\"},\"wordCount\":1070,\"publisher\":{\"@id\":\"https:\/\/acua.qcri.org\/blog\/#organization\"},\"keywords\":[\"Granger Causality\"],\"articleSection\":[\"Customer segmentation\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/\",\"url\":\"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/\",\"name\":\"When mere correlations are not enough: The Granger Causality test - Team Acua\",\"isPartOf\":{\"@id\":\"https:\/\/acua.qcri.org\/blog\/#website\"},\"datePublished\":\"2022-09-29T14:28:45+00:00\",\"dateModified\":\"2022-10-26T10:28:46+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/acua.qcri.org\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"When mere correlations are not enough: The Granger Causality test\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/acua.qcri.org\/blog\/#website\",\"url\":\"https:\/\/acua.qcri.org\/blog\/\",\"name\":\"Team Acua\",\"description\":\"Audience, Customer, and User Analytics\",\"publisher\":{\"@id\":\"https:\/\/acua.qcri.org\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/acua.qcri.org\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/acua.qcri.org\/blog\/#organization\",\"name\":\"Team Acua\",\"url\":\"https:\/\/acua.qcri.org\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/acua.qcri.org\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/acua.qcri.org\/blog\/wp-content\/uploads\/2022\/10\/cropped-cropped-logo.png\",\"contentUrl\":\"https:\/\/acua.qcri.org\/blog\/wp-content\/uploads\/2022\/10\/cropped-cropped-logo.png\",\"width\":1466,\"height\":770,\"caption\":\"Team Acua\"},\"image\":{\"@id\":\"https:\/\/acua.qcri.org\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/acua.qcri.org\/blog\/#\/schema\/person\/e3bb7a0b58349e548e8940716694c215\",\"name\":\"Jim Jansen\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/acua.qcri.org\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/a4f97370631247bb1aed9a897d658981?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/a4f97370631247bb1aed9a897d658981?s=96&d=mm&r=g\",\"caption\":\"Jim Jansen\"},\"sameAs\":[\"https:\/\/quecst.qcri.org\/blog\"],\"url\":\"https:\/\/acua.qcri.org\/blog\/author\/jjansenacm-org\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"When mere correlations are not enough: The Granger Causality test - Team Acua","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/","og_locale":"en_US","og_type":"article","og_title":"When mere correlations are not enough: The Granger Causality test - Team Acua","og_description":"In most data science-related problems, datasets consist of multiple variables, in which independent variables might depend on other independent variables. When the variables in datasets represent observations at different times, we call this dataset a time series set. The time interval in these data sets may be hourly, daily, weekly, monthly, quarterly, annually, etc. One&hellip; Continue reading When mere correlations are not enough: The Granger Causality test","og_url":"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/","og_site_name":"Team Acua","article_published_time":"2022-09-29T14:28:45+00:00","article_modified_time":"2022-10-26T10:28:46+00:00","og_image":[{"url":"https:\/\/quecst.qcri.org\/blog\/wp-content\/uploads\/2022\/09\/Granger_Image.png"}],"author":"Jim Jansen","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Jim Jansen","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/#article","isPartOf":{"@id":"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/"},"author":{"name":"Jim Jansen","@id":"https:\/\/acua.qcri.org\/blog\/#\/schema\/person\/e3bb7a0b58349e548e8940716694c215"},"headline":"When mere correlations are not enough: The Granger Causality test","datePublished":"2022-09-29T14:28:45+00:00","dateModified":"2022-10-26T10:28:46+00:00","mainEntityOfPage":{"@id":"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/"},"wordCount":1070,"publisher":{"@id":"https:\/\/acua.qcri.org\/blog\/#organization"},"keywords":["Granger Causality"],"articleSection":["Customer segmentation"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/","url":"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/","name":"When mere correlations are not enough: The Granger Causality test - Team Acua","isPartOf":{"@id":"https:\/\/acua.qcri.org\/blog\/#website"},"datePublished":"2022-09-29T14:28:45+00:00","dateModified":"2022-10-26T10:28:46+00:00","breadcrumb":{"@id":"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/acua.qcri.org\/blog\/when-mere-correlations-are-not-enough-the-granger-causality-test\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/acua.qcri.org\/blog\/"},{"@type":"ListItem","position":2,"name":"When mere correlations are not enough: The Granger Causality test"}]},{"@type":"WebSite","@id":"https:\/\/acua.qcri.org\/blog\/#website","url":"https:\/\/acua.qcri.org\/blog\/","name":"Team Acua","description":"Audience, Customer, and User Analytics","publisher":{"@id":"https:\/\/acua.qcri.org\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/acua.qcri.org\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/acua.qcri.org\/blog\/#organization","name":"Team Acua","url":"https:\/\/acua.qcri.org\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/acua.qcri.org\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/acua.qcri.org\/blog\/wp-content\/uploads\/2022\/10\/cropped-cropped-logo.png","contentUrl":"https:\/\/acua.qcri.org\/blog\/wp-content\/uploads\/2022\/10\/cropped-cropped-logo.png","width":1466,"height":770,"caption":"Team Acua"},"image":{"@id":"https:\/\/acua.qcri.org\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/acua.qcri.org\/blog\/#\/schema\/person\/e3bb7a0b58349e548e8940716694c215","name":"Jim Jansen","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/acua.qcri.org\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/a4f97370631247bb1aed9a897d658981?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a4f97370631247bb1aed9a897d658981?s=96&d=mm&r=g","caption":"Jim Jansen"},"sameAs":["https:\/\/quecst.qcri.org\/blog"],"url":"https:\/\/acua.qcri.org\/blog\/author\/jjansenacm-org\/"}]}},"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/acua.qcri.org\/blog\/wp-json\/wp\/v2\/posts\/365"}],"collection":[{"href":"https:\/\/acua.qcri.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/acua.qcri.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/acua.qcri.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/acua.qcri.org\/blog\/wp-json\/wp\/v2\/comments?post=365"}],"version-history":[{"count":6,"href":"https:\/\/acua.qcri.org\/blog\/wp-json\/wp\/v2\/posts\/365\/revisions"}],"predecessor-version":[{"id":510,"href":"https:\/\/acua.qcri.org\/blog\/wp-json\/wp\/v2\/posts\/365\/revisions\/510"}],"wp:attachment":[{"href":"https:\/\/acua.qcri.org\/blog\/wp-json\/wp\/v2\/media?parent=365"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/acua.qcri.org\/blog\/wp-json\/wp\/v2\/categories?post=365"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/acua.qcri.org\/blog\/wp-json\/wp\/v2\/tags?post=365"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}