FAQ

Amalie Updated by Amalie

For general information about FIAM, the relationship between FIAM, the measurement members and AudienceProject please visit the FIAM website FAQ

How are audiences measured in FIAM?

Compared to the fact that in classical digital analytics systems count different browsers, devices or applications, in the new FIAM measurement it is possible to calculate genuine reach based on people. The method is based on a large national panel and the device map created with it, on the basis of which the reporting of different visitors produces clearly the best result of the modern calculation methods in use. For the panelists, the FIAM data is weighted to Finland's online population data produced by Statistics Finland for different age groups and genders.

Measurement data is collected every time the measurement script on the measured site is executed, and a call to the AP's collection interface is made from it. In practice, the call is executed on every page screen, or in the case of applications, e.g. when loading a new article to the user's screen. The methodology is not dependent on devices (applications are also measured using this method), operating systems (iOS does not cause a problem) or browsers (Safari is not a problem).

In order to correctly classify the data of different customers and sections of media services, unique identifiers (t-code) have been created for each publisher, which are added to the measurement pixel call (https://visitanalytics.userreport.com/hit.gif?t=[t-code ]). In this way, what is produced by the measurement call can be targeted to the right media site and classified as belonging to a certain section in the results.

The information collected from individual page downloads is decisive for the measurement of AudienceReport. The main data collected are date (UTC), time (UTC), IP address, URL address, URL parameters, User-Agent and AP cookie (if available). The importance of this data in terms of measurement and reporting is summarized below:

  1. date and time
    determines which measurement period the event belongs to
    helps to detect incorrect simultaneous measurement calls
    used to detect 
  2. IP address
    identifying the user's geographic location
    traffic quality assessment, e.g. bots, proxy servers, etc.
    identifying individual people and different devices
    identifying same-user for same-device traffic
  3. URL address
    from which domain the measurement call originated
  4. URL parameters
    includes e.g. of the above-mentioned identifier (t-code) and any other additional parameters
  5. User-Agent
    technical description of the browser and terminal device, used e.g. to identify the user's terminal device type (e.g. computer, tablet, phone or smart TV) or browser name and version
    Identifying same-user for same-device traffic
  6. AP's cookie
    combine e.g. impressions to the panelist, also in native applications
    an important factor in the calculation of overlapping use and reach of different media (unique reach)

How is reach calculated in FIAM?

Below is a simplified description of the calculation of reach. A detailed description is circulated in PDF via. MMF.

Reach is calculated through multiple steps, the first step being a Deep Learning algorithm predicting the total number of reached devices based on the log lines consented for data processing. These devices are then connected to individual unique persons through the use of our extensive panel knowledge, demographic information, IP of the traffic, and more. At the end, unique persons are extrapolated to all log lines.

Each panelist has its own weighting factor, which is used in the calculations.

Relative reach

The relative reach is obtained as a share of the absolute reach based on the panelists in the target group. The share is calculated by taking the sum of the weighting factors of all validated panelists in the target group and dividing it by the sum of the weighting factors of all identified panelists.

Unique persons

The number of individual people who have used the measured media. The total number of page downloads (impressions) of the review period is known with certainty. Each impression contains information about device, location and time. A Deep Learning algorithm connects this information into device-specific browsing patterns to combine multiple impressions into a single device.

Knowledge of device ownership distributions from both our panel and Statistics Finland then enables us to distribute the devices onto people using a probabilistic model. From this step on, the number of unique persons is known.

The model is robust against changes to the cookie environment of the internet, because the cookie information is not used to predict the device reach. The cookie is used to recognise panelists, and therefore to get a correct conversion from device reach to human reach we use our extensive panel. However, by using our proprietary Knowledge Graph we can connect multiple cookie identifiers to the same panelist, and use it to identify household traffic based on IP address from the same panelist, greatly increasing our predictive power even in the absence of cookie identification. This makes our model more robust against ad-blockers, Apples ITP-type automatic cookie disposal protocol, general cookie death, etc.

Frequency

With the total human reach known, and the number of pageviews a technical metric that can be counted, the frequency can be found by dividing the pageviews with the total human reach.

Total reach

When the number of people reached based on the traffic consented for measurement is known, and the rate of traffic consented for measurement versus non-consented for measurement, the human reach can be extrapolated according to this rate such that the human reach becomes modeled to cover all of the traffic, rather than just the consented traffic.

What is the panel used at FIAM like?

The AudienceProject panel used in FIAM differs significantly from the panels used by visitor measurement systems that previously operated in Finland. People recruited to the panel do not need to install tracking programs that measure usage on their computers or mobile devices, as AP's methodology is based on identifying panelists and measuring their internet usage in other ways. Panelists are identified using e.g. e-mails, device IDs, cookies, IP addresses and their combinations and other information obtained from the terminal device. Identification takes place passively.

How are panelists recruited?

AudienceProject offers various online survey tools for companies, organizations and other website owners, which can be used e.g. for evaluating user satisfaction with services. AP can also use these online surveys to recruit panelists, so that at the end of the survey, consent is asked for participation in AP's panel.

As part of the survey, the panelists are asked 9 basic demographic questions. These background variables form the basis of the demographic variables of the AP measurement. The 9 basic demographic variables are:

  1. Country
  2. ZIP code
  3. Gender
  4. Age
  5. Employment situation
  6. Level of education
  7. The number of people in the household
  8. Whether there are children in the household or not
  9. Household income

You can access an example survey from this link: https://www.userreport.com/#__urp=test_invite

How is the representativeness of the panelists ensured?

Representativeness is ensured in the panel through multiple methods. The first and most important method is a very diverse recruitment process, where both publishers and other sources contribute to recruitment. In addition to that, the panelists are weighted to match the demographic distributions of age, gender, geography and education, ensuring full representativity on those demographics.

No one can join the panel themselves: people can only join the panel at the invitation of the AP. Because of this, all network users have more or less the same probability of receiving an invitation. This guarantees the randomness of the sample, which is vital in all research. This also minimizes the so-called participation of professional panelists. The problem for professional panelists is that, first of all, the various questions become so familiar to them that it affects their answers. In addition, they can mainly focus on the pursuit of various prizes, which can contribute to the quality of the answers. To prevent this, AP generally only motivates the panelists with short text formats and modest prizes. Only if the panelist is also a member of AP's research panel, will they participate in the gift card draw.

Information on the nine basic variables always remains included in the respondent data. As a result, the representativeness of surveys can always be assessed in relation to the population. The relative distribution of the panelists in these basic variables initially corresponds quite closely to the distribution of the Finnish population. In addition to that, the representativeness and quality of the data is secured by weighting the material to Finnish population data using statistical methods.

In addition to the built-in verification and error correction (e.g. the correctness of the code of postal codes), the system performs automatic checks on the answers afterwards before the answerer is added to the panel. The automatic check also includes a logical check, which analyzes the internal consistency of the answers. The amendment rejects from the panel, for example, a defendant who has answered that he is 16 years old and lives alone with two children. Those who answered too quickly or skipped over the questions will also be excluded from the panel.

If there is no trace of the panelist in the measurement (at least one opened site or advertisement) for 90 days, the panelist will be removed from AP's panel.

How well is the panel suited for measuring smaller media?

The AP's panel offers good observation volumes even for smaller media. Due to the methodological choices made in determining reach, the effect of a small panelist number for small sites has only a minor effect on the predicted reach. Therefore the model is accurate at predicting reach even for small sites. Adding certain target groups to small sites will be met with a warning if the sample size is not sufficiently large to accurately predict the in-target group reach. This is because the relative share of reach to the target group is based purely on the panelist sample, and is therefore sensitive to target group selections.

How is the panel information weighted?

AP panelists are weighted according to age, gender, education and ZIP-code. The panelists are always re-weighted every two weeks. Weightings only apply to the panel and individual panelists, measured traffic (census) is not weighted in any way.

A more detailed technical description can be found in AP's detailed method description circulated in PDF via. MMF. 

How is the panelists' privacy protected?

Privacy protection is strongly built into all of AP's operations. The consent of the panelists is first asked once when participating in the online survey, and then again from those participating in the research panel (i.e. those used in FIAM).

Panelists can also leave the panel at any time via AP's website. More information on the subject at: https://privacy.audienceproject.com/fi-FI/for-users

How is bot traffic taken into account in FIAM?

There are two types of Bot traffic:

  1. Good bots (crawlers, spiders).
    e.g. bots that search for information in search engines.
    usually easily identifiable through User-Agents and can be filtered out.
  2. Bad bots that, e.g.:
    copy content from other sites.
    click on banners or download content to generate false usage data. The purpose is to earn money, e.g. by producing artificial ad banner clicks.
    create automatic spam messages in articles or comment fields.
    try to get through the security guards by impersonating the real user, or try to take over the sites.

Bad bots can try to disguise themselves as good bots by changing their User-Agent signature. In order to identify such misbehaving bots, AP evaluates them by making sure that their IP addresses belong to the IP range of official search engines.

Bot traffic is filtered out of the measurement using the IAB Spiders & Bot list.

Does FIAM measure net visitor numbers?

Yes, net visitor numbers are obtained from FIAM, where users using different devices are counted only once. It is also possible to calculate net figures from different media combinations.

Which end devices are included in the measurement?

Currently practically all terminals equipped with an internet connection, which Finns use to consume the content of the participating companies, are included in the measurement. AudienceProject offers the possibility to measure both browser-based traffic and native applications, and the implementation of the measurement system in different user interfaces has been done in close cooperation with publishers, including quality assurance. With the launch of FIAM 2.0 Big Screens (addressable TV devices such as smart TVs) are also included in the measurement.

What should be tagged and what if they haven't been done?

All online content and applications should be tagged to ensure the correctness of the measurement and that the number of visitors is not underestimated. As part of the onboarding process, a quality control check is done in the system by comparing the publisher's digital analytics numbers with its own data (page views and screen views) to ensure tag coverage. More information can be found in the online implementation instructions and in the validation instructions.

Why do the new figures differ from the figures of the previous system?

The previous AudienceProject method of determining reach was based on panelist frequency, while the new methodology is much more advanced and robust against small sample size variations and varying cross-site panelist recognition rates. Small sites will therefore experience a significant change, while larger sites will experience a more modest change. The largest sites will experience little change, because for large sites the old method was fairly accurate. In any case, it can be said without a doubt that the measurement method in use today produces much more accurate and precise results compared to the past.

Read more in the Migration FAQ.

How are sessions calculated in FIAM and why are the numbers different from our own analytics?

The definition of a session in an online service or mobile application is actually based only on an agreement on how consumer behavior-based on technical measurement is interpreted as different visits. Classically, e.g. the definition has been reached according to which a break of more than half an hour in the use of the site or service starts a new session, and AudienceProject also works based on this definition. The definition of a session also includes other technical boundary conditions, depending on the measurement tool, and in addition, there are usually mutual anomalies in the technical implementation of the measurement and/or the interpretation of the data of the measurement systems, which explain the differences in the results. In the end, the question is not which session numbers produced by the system are correct from the point of view of one publisher, but which session definition is used in FIAM compared to the publisher's own analytics.

How and how often is FIAM data reported?

The Media Titles and Media Title Sections of FIAM media that are ready for publication are published on the Toplist and publicly available via the Results section of FIAM's web pages. 

The public reporting takes place weekly so that the previous week's figures are always published on Thursday of the following week. Members of the FIAM community additionally have the option of getting daily reports on yesterday's data via. The internal tool Kits Explorer.

To understand the process of the public reporting, visit the article describing the weekly routines of a FIAM Publisher.

Why does the page views between my media and its sections not match?

In the FIAM measurement community, the question is sometimes raised why the page views between the media and its sections do not match - meaning that the total volume of sub-section page views included in the media may not always match the overall volume of the media title level, ie. it might be lower.

This is a feature of the FIAM measurement, not a bug. The FIAM measurement concept includes the possibility of measuring and reporting section page views and other metrics, but it is perfectly OK to measure only a part of the media level page views under the sub-sections.

Technically, the measurement system built by AudienceProject differs from many of the traditional digital analytics tools in that sub-section page views are sent as their own measurement calls instead of allocating one call to multiple measured units through parameters included in the call. In other words, the sub-section is always its own independent tracking point and entity, whose reach and other metrics are calculated from the data specifically collected for it. Conversely, the calculation of the media level is done from the data collected for that measurement point identifier, not as a roll-up or aggregation from the sub-section page views.

In terms of quality assurance, the sub-sections are also an independent entity, so the validation of the numbers required for inclusion in the FIAM list is done separately for the sections.

Why does the reach result of my media not match the reach of my media title?

Generally, we don't see this very often, but it can happen in rare cases that the reach of a 'child' (media or section) does not match with the reach result of the 'parent' (media title or section title).

The reason for this is that the (device) reach is computed top-down. This is a method decision that has the benefit that it enables us to extrapolate reach on traffic from user agents with no consented events, however it also comes with the trade off that this approach is the primary reason for the edge cases described above.

While the results may look illogical, they are not the result of any bug or error, but an edge case and the result a statistical model where this can happen.

We generally expect to see two (rare) cases of this:

Case 1 / The combined reach of the 'children' (media or section) does not add up to the reach result of the 'parent' (media title or section title)

In this case the top-down approach is what causes the system to estimate higher reach on 'parent' level than on the combined 'children'.

Case 2 / The reach of a 'child' (media or section) appears to be higher than the reach of the 'parent' (media title or section title)

In this case the result is caused by the top down approach in combination with the deduplication between 'children'; The deduplication may decreases vs. increase the (device) reach on some children resulting in rare cases where the 'child' reach is higher than the 'parent' reach.

How did we do?

Dictionary

Contact