Approaches to collecting website analytics data can be grouped by the focus of data collection efforts, resulting in the emergence of three general methodologies, namely: user-centric, site-centric, and network-centric.
Two industry standard and popular web analytics platforms are Google Analytics and SimilarWeb. Google Analytics is a site-centric service, and SimilarWeb is a user-centric service that also combines the three general methodologies.
Concerning the data collection, analysis, and reporting algorithms of Google Analytics, they are proprietary. However, enough is known to validate their employment as being industry standard and state-of-the-art. The techniques of the tagging process of are well-known, although there may be some nuances in implementation. Google Analytics employs statistical data sampling techniques, so the values in these cases may not be the result of the complete data analysis for some reports. However, the general overview of the data sampling approach is presented in reasonable detail, and the described subsampling is an industry-standard methodology.
SimilarWeb is a service providing web analytics data for one or multiple websites. SimilarWeb uses a mix of user, site, and network-centric data collection approaches to triangulate data, reportedly collecting and analyzing billions of data points per day. SimilarWeb’s philosophical approach is that each method has strengths and weaknesses, and the best practice is triangulating multiple algorithms and data sources, a respected approach in data collection and analysis.
Regarding the data collection, analysis, and reporting algorithms of SimilarWeb, they are proprietary, but again, enough is known to validate the general implementation as state-of-the-art. The SimilarWeb foundational principle of triangulating user, site, and network-centric data collection data is academically sound, with triangulating data and methods used and advocated widely by scholars. SimilarWeb data collection, analysis, and reporting methodology are outlined in reasonable detail, although, like Google Analytics, the proprietary specifics are not provided. However, from the ample documentation that is available, the general approach is to collect data from three primary sources, which are: (a) a reportedly 400 million worldwide user panel at the time of the posting, (b) specific website analytics tracking, and (c) ISP and other traffic data. These sources are supplemented with publicly available datasets (e.g., population statistics). Each of these datasets will overlap (i.e., the web analytics data from one collection method will also appear in one or both of the other collection methods). With the collected data augmented with publicly available data, SimilarWeb uses statistical techniques and ensemble machine learning approaches to generate web analytics results. These analytics can then be compared to the overlapped data to make algorithmic adjustments to the predictions. This is a more complex approach relative to Google Analytics; however, SimilarWeb’s scope of multiple websites also requires a more complicated approach. In sum, the general techniques employed by SimilarWeb are standard methodologies, academically sound, and industry standard state-of-the-art.
Verification of Analytics for a Single Website
In general, Google Analytics is a site-centric web analytics platform, so it would be a reasonable service to use for a single website that one owns and has access. However, comparing analytics values from Google Analytics to those of SimilarWeb (or other website analytics services) may be worthwhile, as these will be the values that outsiders see concerning the website.
Estimating Google Analytics Metrics for Multiple Websites
From our research, the differences between Google Analytics and SimilarWeb metrics for total visits and unique visitors are systematic (i.e., the differences stay relatively constant), notably for visits and unique visitors. This means that, if you have Google Analytics values for one site, you can adjust and use a similar difference for the other websites to get reasonable analytics numbers to those from Google Analytics. This technique is valuable in competitive analysis situations where you compare multiple sites against a known website and want the Google Analytics values for all sites. However, SimilarWeb generally provides conservative analytics metrics compared to Google Analytics, meaning that, if solely relying on this single service, analytics measures may be lower, especially for onsite interactions. So, decisions using these analytics metrics need to include this as a factor.
Concerning the use of the two platforms, SimilarWeb provides conservative analytics results relative to Google Analytics, and these web analytics tools can be complementarily utilized in various contexts, especially when having data for one website and needing analytics data for other websites.
Jansen, B. J., Jung, S.G., and Salminen, J. (2022) Measuring user interactions with websites: A comparison of two industry standard analytics approaches using data of 86 websites. PLoS ONE. 17(5): e0268212. https://doi.org/10.1371/journal.pone.0268212