Approaches to collecting website analytics data can be grouped by the focus of data collection efforts, resulting in the emergence of three general methodologies, namely: (a) user-centric, (b) site-centric, and (c) network-centric. The central traits of each are as follows.
- User-centric: Web analytics data is gathered via a panel of users, which is tracked by software installed on users’ computers, such as a plugin for a web browser [5–8]. For example, when users install an extension to their browser, they approve in the license agreement that the data on the websites they visit will be processed and analyzed. The primary advantage here is that the user-centric approach does not rely on cookies or tags (i.e., snippets of information placed by a server to a user’s web browser to keep track of the user) but on direct observation. An additional advantage is comparing web analytics data across multiple websites. The challenge is recruiting and incentivizing a sufficiently large user panel that is a representative sample of the online population—due to this challenge, only a few companies have recruited sizeable user panels (e.g., Alexa). Another disadvantage may be the issue of privacy since many users are not willing to share information on every website they visit, so some users may try to mask their actual online actions from the tracking plugin.
- Site-centric: Web analytics is gathered via software on a specific website [9–16]. Most websites use a site-centric approach for analytics data gathering, typically employing cookies and/or tagging pages on the website (e.g., Google Analytics, Adobe Analytics). The primary advantage of this approach lies in counting events and actions (e.g., pages viewed, times accessed), which is relatively straightforward. Another advantage is that users do not need to install specific software beyond the browser. However, there are disadvantages. First, site-centric software focuses on cookies/tags, so these counts may not reflect actual people (i.e., the measures of the cookies and tags) or people’s actions on the website. Instead, site-centric approaches measure the number of cookies dropped or tags fired as proxies for people or interactions. Second, this approach is susceptible to bots (i.e., autonomous programs that pretend to be real users) and other forms of analytics inflation tactics, such as click fraud . Finally, site-centric analytics usually represent just one website and are only accessible to the owner of that website, making the site-centric approach not widely available for business intelligence, marketing, advertising, or other tasks requiring web analytics data from a large number of sites.
- Network-centric: Web analytics is gathered via observing and collecting traffic in the network [18, 19]. There are various techniques for network-centric web analytics data gathering, with the most common being data purchased or acquired directly from Internet service providers (ISPs). However, other data gathering methods include leveraging search traffic, search engine rankings, paid search, and backlinks [20, 21]. The main advantage of the network-centric approach is that one can relatively easily collect analytics concerning many websites. Also, the setup is comparatively easy, as neither users nor websites are required to install any software. The major disadvantage is that there is no information about the onsite actions of the users. A second disadvantage is that major ISPs do not freely share their data, so acquiring it can be expensive. However, companies can acquire other network-centric data more reasonably (i.e., SpyFu, SEMRush; two common industry tools for search marketing), albeit requiring substantial computational, programming, and storage resources.
Of course, one can use a combination of these methods , but these are three general approaches, with much academic research leveraging one or more of these methods [23–26].
Jansen, B. J., Jung, S.G., and Salminen, J. (2022) Measuring user interactions with websites: A comparison of two industry standard analytics approaches using data of 86 websites. PLoS ONE. 17(5): e0268212. https://doi.org/10.1371/journal.pone.0268212