Measuring pageviews - A principal guide to proper implementation on websites, applications and video streams

Amalie Updated by Amalie

Measuring pageviews - A principal guide to proper implementation on websites, applications and video streams

Introduction

We have received several questions about how to properly implement pageview measurement tags in native app environments where the definition of what constitutes a pageview not is clear-cut in the same way as in classic web environments.

Our aim with this document is to discuss the overall principles of what constitutes a pageview. Furthermore, we also discuss how the definition of a pageview can be implemented not only in native applications but also in some considerations in modern web frameworks.

Even with these considerations in place, we will still encounter different edge cases where it might be unclear whether an event constitutes a pageview or not. We will outline some overall guidelines that can be used to perform an assessment of whether an edge case event qualifies as a pageview or not.

Challenges with the traditional pageview assumption

Classic web measurement works under the assumption that each pageview corresponds to a full page loaded from a web server, and that each page load runs some analytics tracking code which sends a pageview to a back-end server to track the event.

In-app measurements are (in some cases) challenged in the sense that it does not operate with a similar pageview taxonomy.

Our observation is however that this is not an in-app-specific issue. The reality is that the www. based web also has changed a lot in the last five years and that more and more websites do not fit this traditional pageview model.

The problem is caused by the use of Single Page Applications (SPA).

To give a specific example, consider 'mail.google.com' (Gmail). Most people who use Gmail in their browser keep it open in the background and switch to it every once in a while to see if they have any new messages. When they do, they click on the message to read it.

The vast majority of Gmail users almost never reload the page, which raises a few important questions from an analytics point of view:

If a user loads Gmail once, and then uses it hundreds of times over the next few days without reloading, should that really only be considered one pageview? If a user clicks the logo to refresh the content (or via pull to refresh in the mobile version of the app), should that be considered a pageview? Is that usage functionally different from refreshing the page to load new content? What about when the user loads a new message, should that be considered a new pageview?

If two users visit Gmail the exact same number of times per day - one of them reloads the site every time and the other leaves it open in a background tab, should those two usage patterns result in significantly different pageview counts?

The problem these questions are meant to illustrate is that for more and more websites, the traditional definition of a pageview is challenged. Not from a technical point of view, but from a methodological point of view. It is perfectly possible to measure each and every event initiated by a user, but we need to establish a methodological framework that defines what constitutes something equivalent to a traditional pageview whenever we operate with SPA.

Translating pageviews to in-app measurement

A typical assumption is that in-app measurement is troubled by the lack of pageviews while web measurement is not. Truth be told, in-app measurement is divided into two different camps just like web measurement is divided into two camps; traditional websites and SPA.

Apps can either be native apps or hybrid apps:

A native app is one that is installed directly onto the smartphone and can work, in most cases, with no internet connectivity depending on the nature of the app. Native apps are always installed through an application store (such as Google Play or Apple’s App Store). They are developed specifically for one platform, and can take full advantage of the device features — they can work much faster by harnessing the power of the processor and can access specific hardware like GPS. Native apps operate like SPAs. They do not consist of pages that are loaded from a server sequentially instead, the user navigates through a series of views, often located on the same page.

Hybrid apps are part native apps and part web apps. Like native apps, they live in an app store and can take advantage of some device features available. But like web apps, they rely on HTML being rendered in a browser, with the caveat that the browser is embedded within the app.

Often, companies build hybrid apps as wrappers for an existing web page. In that way, they hope to get a presence in the app store without spending significant effort developing a different app. Hybrid apps are popular because they allow cross-platform development and thus significantly reduce development costs, meaning that the same HTML code components can be reused on different mobile operating systems. It allows especially publishers to reduce costs by maintaining a single HTML-based content base which is served to all users no matter if they access the www.-version through a browser or an application downloaded through an app store.

Hybrid apps can either require online connectivity to execute a full page load (which is easily classified as a pageview) or be a SPA where pages are cached for the user to access offline. In this case, we need to develop a methodology for measuring pageviews and not page loads.

The media app landscape in Finland consists of a compilation of both.

We believe the issue at hand is not related to web versus in-app, but instead SPA versus traditional web issues. Both traditional websites, as well as hybrid apps loading URL content, can be measured by a traditional analytics setup, however, the challenge is all about measuring SPAs on both web AND hybrid apps.

Web

App

Single Page Application

Single Page Application

​Traditional Website

Hybrid App loading URLs

Now, in order to measure SPAs correctly on both websites and in-app, we need to talk about methodology.

What (historically) Constitutes a Pageview?

Ironically enough, while we call the metric a “Page” “View”, we have historically actually been measuring “Page” “Loads”. A web page is loaded, an analytics script fires, and a pageview event is sent to the web analytics server.

A Modern Implementation Approach

But instead of tracking how many times a page was loaded, we should track how many times it was viewed. We can do this with the Page Visibility API, which has actually been around for quite some time and is well-supported in all browsers on both desktop and mobile. As it turns out, tracking how often the page was viewed rather than how often it was loaded elegantly handles a surprising number of cases that fail using the current model:

  • When users leave an app in a background tab and switch to it again hours or days later without reloading.
  • When users leave a tab open as a reference and switch to it often for quick access to the content (again, without reloading the page).
  • When users open a page in a background tab and then forget about it (never actually viewing the content).

The Page Visibility API consists of both the document, visibilityState Property, as well as the visibilityChange Event. With these two pieces, one can ensure that pageviews are never sent unless the page’s visibilityState is actually visible. One can also send pageviews in cases where a user returns to your site after it has been in a background tab for a while, by looking for visibilityChange Events. The Page Visibility API, therefore, solves the problem of how to track pageviews on apps that never need to be reloaded.

The second part of the solution is the History API, which (now that it is supported in all browsers) is the de facto way developers build SPAs. As a result, analytics tools can look for changes to the URL and send pageviews whenever that happens. This allows SPAs to be tracked in exactly the same way traditional sites are tracked.

Technical details

While we need to discuss the relevant definitions when we embark on the implementation phase, the basic idea for tracking pageviews with the Page Visibility and History APIs is as follows (and these steps can be applied to any website, regardless of whether it is a traditional content site, SPA, or PWA):

  1. When the page loads, send a pageview if the visibility state is visible.
  2. If the visibility state is not visible wait for the visibility state to change to visible and send the pageview at that point.
  3. If the visibility state changes from hidden to visible and enough time have passed since the previous interaction by this user, send a pageview.
  4. If the URL changes (just the pathname or search parts, not the hash part since that is used for anchor links) send a pageview.

The second step above is the most important one, and it is also the most ambiguous. The question is - How long is “enough time” since the previous user interaction?

On the one hand, you would not want to track every visibility state change as a new pageview since it is common for users to frequently switch between tabs (and in fact, some apps work best when used in multiple tabs at the same time expecting a lot of tab switching).

On the other hand, you want to capture the fact that a user is returning to your site or application after not using it for a while (i.e. a separate usage instance rather than a single, continuous usage instance).

Luckily, all analytics tools already define a way to differentiate between distinct usage instances - they are called sessions.

A session is a group of interactions that take place within a given time frame, and a session ends when some predetermined timeout period has passed. For example, by default, in Google Analytics, a session ends when there are 30 minutes of inactivity.

So getting back to the third step in the list above, our proposal is that if a user’s session has timed out and the page’s visibility state changes from hidden to visible, a new pageview should be sent. Visibility state-changes that occur in the middle of a session should not be considered distinct pageviews (though they can still be tracked as events if that information is relevant).

It is a well-known method for actually measuring SPAs with common web analytics tools such as Google Analytics. The term used is 'virtual pageviews' (e.g. pageviews initiated by the user but not necessarily triggering a page load from the web server).

A good description of such a setup with Google Tag Manager and Google Analytics can be found here.

But even with a modern implementation approach, edge cases will appear. And since the FIAM measurement is all about introducing a common standard, it is important that edge cases are addressed in a consistent manner across publishers and applications.

Trying to create a comprehensive list of all potential edge cases would be a tremendous task, and probably extremely hard to navigate afterwards. Instead, we have created a set of implementation guidelines or principles that can be used to assess each edge case with.

Implementation guidelines

Regular websites and webpages will automatically be handled by the SAK script provided by AudienceProject. Only use this implementation guide if you have special circumstances that deviate from a regular pageview-based implementation.

  • A page should only count as a pageview if the user has made an active decision (e.g. clicked a link or similar) to make the page available.
  • As a rule of thumb, a user-initiated pageview can only be counted if the user-driven action results in an update of at least 40% of the content on the page in question.
  • The same rule of thumb applies to applications. A user action can only count as a pageview if a significant share of the screen content is renewed as a result of a user-driven action. A minimum of 40% of the content on the page should be changed.
  • Publications that utilize frames or derivatives of frames must use the content frame for measuring pageviews, not the navigation menu frame, top frame, hidden frames or similar constructs.
  • Pages with embedded auto-refresh must only be counted once.
  • User-initiated page refresh can be counted if the user-driven action results in an update of at least 40% of the content on the page in question.
  • Pages with automatic redirection should not be tagged. If the user has made an active decision to navigate to the redirect page, a tag can be placed on the destination page of the redirect, but not on pages automatically skipped due to the redirect.
  • Pop-ups or overlays should only be tagged if the user makes an active decision to open the pop-up or overlay by clicking a link or similar navigation component.
  • Streaming: As a rule of thumb, pageview events should only be triggered if a user-driven action is initiated to change the channel and/or video stream.

How did we do?

Introduction to the FIAM reporting hierarchy

Publisher Consent Flows - high-level overview of different models

Contact