Google Analytics: When are you compliant with GDPR?

Google Analytics: When are you compliant with GDPR?

On September 21, 2022, the Danish Data Protection Agency (DPA) concluded “the tool (Google Analytics) cannot, without more, be used lawfully. Lawful use requires the implementation of supplementary measures in addition to the settings provided by Google” (press release).

This has spurred a debate amongst many of our clients on how to proceed. And there are many ways to proceed. In this article we will focus on how you could proceed with Google Analytics, with the main focus on what to do from a technical perspective as well as from a diligence perspective to avoid violating GDPR. It does not cover everything you can do as safeguards goes, but it should give sufficient insights to make those assessments.

GDPR compliance isn’t just a technical solution

The main focus from European DPA’s (and court rulings) is on a technical nature. Because if you just install Google Analytics o.s. and apply its features for anonymizing visitors, the solution still passes the IP-address, and can also pass events that contain other information about the visitors true identity.

The technical solution can mitigate the passing of the IP address, but it cannot take account for every way you can pass personal information to Google Analytics.

For example:

  • If you go to a website with an internal search and type in your name, phonenumber or email address it will likely produce a page URL that contains that which you typed in, this will be sent along with the pageview event to Google Analytics
  • Some businesses have failsafes in place that scrambles the URL, either on your browser, or within the tracking solution, which avoids this from making its way to Google Analytics
  • But trying to set up a technical solution that can catch everything is not an easy undertaking, and you will likely not cover all eventualities

The easier solution is to be diligent, and view GDPR compliance as something that requires:

  • A technical setup that gives you control over any data passed, that also avoids the passing of the IP-address in full to whichever tracking system you have in place
  • Routines for regularly vetting if the event data itself contained fragments of personal data (names, phone numbers, addresses, emails, social security numbers and such)
  • Routines for what to do in order to remove any data from such incidents, and also routines for patching the tracking solution or the website to avoid this from recurring

Why supplementary measures are a required part of the solution

GDPR article 4 states that location data is violating GDPR. The IP address has been found in EU courts to give exact location data to identify a physical address, in which the possibility is high to identify an individual person. Thus, passing the full IP address to Google Analytics is a violation of GDPR.

Even if you set “anonymizeIp=true” (a feature that sets the last section of the IP address to .000 which means it doesn’t contain data beyond the city-wide area of your whereabouts), you are still violating GDPR unless you add supplementary measures.

Because the problem with this feature is how the internet works as default when you are on a browser.

Whenever you send data from your browser, the default setting on your browser is to include your IP-address and your userAgent. The main reason for this is if the information you pass is a request to get something in return, that information is necessary to know where to send that which you requested, and also make adjustments if necessary. (Another reason is that the recipient will then know you are a real person and not some bot trying to spam them).

So, the “anonymizeIp” is just passed as an instruction alongside this, requesting the IP to be sufficiently anonymized to not carry exact location data after processing.

illustration of data passing from a visitor to google analytics with the visitors ip address
Whenever information is passed from a browser, the IP address is included as default, which is why European Data Protection Agencies recommends a proxy be put in place

Thus, if we pass data from the user with “anonymizeIp=true” feature turned on, the full IP address is still included every time data is passed from the user to Google Analytics.

  • Universal Analytics will pass this information directly to Google Analytics servers in the US for all processing (this violates both GDPR and ‘Schrems II’[1])
  • Google Analytics 4 will pass data first to servers inside the EU for processing of the IP-address (to remove the last section that contains information that places a person inside a fairly small area/street address). This does not violate ‘Schrems II’[1]. The rest of the information (with the processed IP-address) is passed to US servers for further processing[2], but again if personal data manages to be included that still violates ‘Schrems II’

[1] ’Schrems II’ is the common nickname for the ruling where NOYB (Max Schrems organization) won in the EU court over the EU and removed the “Privacy Shield 2.0” (name for the agreement between US and EU that allowed the transfer of personal data to be processed US based servers). Max Schrems had also previously won over the EU over the previous agreement between EU and US (‘Schrems I’ removed the first such deal known as “Safe Harbour”).

[2] If you uphold GDPR as well as Google Analytics terms-of-service, you should not include even fragments of personally identifiable information in the data that is processed on US based servers, thus, not breach Schrems II which negated the deal that allowed personal information to be passed to US servers. But because you passed the IP-address to Google, you are still in violation of GDPR because you are sending personal data

Because of this, Google has the means to identify the user even with the “anonymizeIp” feature turned on, and as a business applying Google Analytics you cannot guarantee that they do not apply this information since you cannot document this. This is one of the main reasons why European DPA’s have stated that Google Analytics with its own features is not enough to comply with GDPR.

A proxy is required for any tracking system

It doesn’t matter what tracking system you apply. If personal data is passed directly from the visitors browser to the end recipient, unless you own this endpoint, you are giving personal data as classified by GDPR to some other entity, and do not have sufficient control to ensure that it is not stored somewhere.

So, if you have adhered to the requirements and installed a functioning cookie-consent. You only have consent to place a cookie, not to pass or store personal data. Thus, just swapping out Google Analytics with another tracking system will not fix the core issue of the IP-address making its way through, unless you add supplementary measures in the form of a proxy.

illustration of data passing from a visitor to google analytics via a proxy, removing the visitors IP address
With a proxy, the IP address follows the data to your server, where it is anonymized and the server replaces the visitor as sender (meaning the servers full IP address is what Google Analytics receives)

The IP-address gets passed when the scripts that runs Google Analytics is downloaded as well

A regional court in the EU (Münich, Germany) found that a company website which applied Google Fonts was in violation with GDPR.

While there is no ruling yet (which we are aware of) that states the downloading of the scripts required to run Google Analytics is in itself a violation of GDPR; you are sending visitors IP-addresses to Google to download these scripts.

illustration of downloading google analytics scripts passing ip address to google
The downloading of gtag.js / analytics.js still sends the visitors full IP-address to Google Analytics servers

If you have the proxy in place (this is a feature in Google Tag Manager), you can actually route the download request via your server (there is an API that allows this named ‘getGoogleScripts’).

illustration of downloading google analytics scripts via your proxy to hide the visitors IP address from Google
By enabling getGoogleScripts, the server proxies both the request and the return of the Google Analytics scripts, causing a disconnect between the visitor and Google Analytics servers

By enabling getGoogleScripts, the server proxies both the request and the return of the Google Analytics scripts, causing a disconnect between the visitor and Google Analytics servers

And we do recommend to our clients that they implement this. Even if it has not yet been found to violate GDPR, we find that disconnecting the visitor completely from their Google Analytics is just prudent, when the options to do that exist.

The last consideration for your technical setup; the online identifier

In and of itself, an online identifier is a string with some static letters and random numbers that are unique to one browser in one device. Google Analytics places a cookie (default name is ‘_ga’, in Google Analytics its values can be found under ‘clientID’) that includes a random string of numbers, which is unique to one visitor (one browser on one device).

The default settings are that the clientID is stored for 2 years after the last time it produced any data (not all operating systems, browsers or settings allow this). So any data produced to your Google Analytics database from one browser on one device is tied in with this unique string so long as it exists.

This unique string contains no information about who the visitor is. But the more data you store alongside a unique online identifier, the higher the risk is that the data, when combined, will allow you to find the visitor’s true identity.

However, you should by now have a cookie-consent on your website, where visitors give their consent for this online identifier to be placed. While this is not something any business can hide behind if personal information makes it through, it is our experience that this makes the online identifier something that businesses make individual decisions about (how much risk they are willing to accept).

What to consider

If you are a business that relies heavily on your website’s performance – you probably apply attribution models to see longer user journeys when defining what to do more and less of. In order to do this, you need the online identifier to persist.

But if you do not apply attribution models and have no plans to do so, then it might just be easier to limit this risk and thus mitigate the severity of a potential GDPR violation.

In such a scenario, here are some options you may want to consider:

Removing the online identifier:

Only applicable for GA4 (not Universal Analytics). Instead of identifying individuals, pass the data they produce without tying it to any one unique identifier. This will result in your attribution failing (including last-click), even if machine learning tries to mitigate this.

Removing the uniqueness of online identifiers:

Place online identifiers that are not unique to each visitor. But rather one that applies to more people, e.g., the date that they first visited + the referrer. This will give a first-click attribution model, but might mess with the last-click attribution if you attempt to override the referrer with the cookie-values (but if you process it afterwards you might keep both first and last click attribution and other attribution models can be applied with machine learning).

Giving up information on returning visitors:

You can also go for a variation of what CNIL (the French DPA) suggested, by for example setting the user cookie expiration to 30 minutes after their session has ended (so there are no returning visitors). Last-click attribution will persist, but machine learning will have a hard time to build any other attribution model you can rely on.

This is the model that has been applied by Oslo Municipality in Norway.

Upholding the intent behind GDPR:

Place unique online identifiers, but have mechanisms and routines in place to ensure that no other personal identifier is passed with the data. To the point where the data you help generate in and of itself can not be applied to reverse-engineer the visitor’s (likely) identity.

NB! This means taking a bigger risk (even with cookie-consent in place).

Removing Google Analytics from defining this value:

Since the EU has shown distrust toward Google Analytics where personal data is concerned, you can also consider taking over the definition of the unique online identifier:

Your proxy can set httpOnly cookies which are only accessible to the server and visitor (not the Google Analytics scripts), which does remove some uncertainty regarding the Google Analytics scripts having access to pass this information outside the proxy
Or you can pseudonymise the visitors with programs like Tracedock (removing Google Analytics definition for cookies)

NB! This does not negate the risk of upholding unique identifiers, but it does reduce what Google Analytics can define. At large however, this should be considered a smoke screen (it does not change anything where the data being tied to one unique value per visitor is concerned). Also, Tracedock does market that they are able to identify the same user across browsers and devices, meaning that you actually take on an even bigger risk where GDPR violations are concerned – due to digital fingerprinting becoming more available.

Repeat: The setup is not sufficient to comply with GDPR

As a business that tracks visitors on your website, you need to ensure that the information (pageviews, event parameters, referrer information and the userAgent) which gets sent to Google Analytics or any other tracking platform, does not in itself pass any fragments that can be stitched together to define who the visitors are.

Google Analytics Terms of Services clearly state that such information should not be passed to them, and GDPR regulations means that if you do pass such information, as the controller, you are liable for violating GDPR. This again is the case with any tracking solution you apply. If the data can be stitched together to identify a visitor’s true identity, then you are in fact violating GDPR (and Schrems II if you apply Google Analytics and many other mainstream tracking technologies).

Therefore, ensuring regular audits of the collected data is a required part of GDPR compliance. Here you should check if:

  • Page views might have collected URL-strings that contain fragments or full information of the likely identity of the visitor,
  • Event data might have picked up on something elsewhere that can identify the visitor,
  • Information combined about the visitor between event, referrer, userAgent, anonymized location and pageview data enables the likely identification of any visitor

And if you come across this, you need to have routines for deleting that data, and also either in the tracking setup or better yet – on the website, to prevent this from happening again. And you should document this.

While this does not negate the violation of GDPR if it has already occurred, if documented, it does show that you are taking the responsibility as a controller seriously, and that you are actively working towards hindering this from ever happening again.

So, GDPR compliance is not a “set it and forget it” scenario. It will require an investment in both time and routines, as well as a technical setup that combined does work towards ensuring that the data does not identify any individuals.

Switching to another platform is rarely a solution

Several businesses have already opted to replace Google Analytics with another tracking platform as their solution to limit their GDPR exposure. However, even if Google Analytics has been the target for courtrooms and Data Protection Agencies, the principles for why they are the target still applies to any tracking solution. And it is easier to target the #1 platform globally since it will resonate with more businesses, than to target one that is rarely applied.

But businesses do not become compliant by merely changing what system tracks visitors’ doings and then owns the servers all that data gets stored on. They only shift the risk somewhere else.

As of now, the only viable solution is to get supplementary measures in place as well as routines for regularly assessing if you are compliant (and acting on it). And as far as platforms goes, the only solution out there where you can guarantee that you have full control over the entire process is to apply Snowplow Analytics. But changing to this does not limit the exposure to GDPR, it only enables you to say with 100% certainty that you deleted personal data (but if GDPR is violated, it is still violated then).

Curamando is ready to help

We have helped businesses set up this technical solution, and others similar to it for other platforms. We also have routines to perform checks if data contains personal information or fragments thereof, and a lot of experience on how to solve such issues.

Blog

Google Analytics: When are you compliant with GDPR?

On September 21, 2022, the Danish Data Protection Agency (DPA) concluded “the tool (Google Analytics) cannot, without more, be used lawfully. Lawful use requires the implementation of supplementary measures in addition to the settings provided by Google” (press release). This has spurred a debate amongst many of our clients on how to proceed. And there […]

Read more

Blog

Curamando creates Retail Media Unit to help retailers and e-commerce platforms build new revenue streams

October 7, 2022 Curamando, a digital transformation, and growth consultancy, today announced the formation of a Retail Media Unit that will assist retailers in the Nordics monetize new revenue streams and build retail media networks. With the emerging movement for retailers and any e-commerce platform to create their own, monetized platforms for endemic and non-endemic […]

Read more

Blog

How to expand digital sales beyond the Nordics

As life has returned to normalcy, e-commerce growth is plowing back to the pre-covid levels. Some of the e-commerce companies are even reporting de-growth as compared to the previous year. Markets across the US and Europe are experiencing a slowdown. Rising inflation- the war in Ukraine, and high energy cost are some of the factors […]

Read more
Curamando + Sello

Blog

Sello and Curamando team up to help online sellers succeed on marketplaces

The nordic management consulting firm Curamando, which is a part of the ARC Arise Consulting Group, is announcing a strategic partnership with the marketplace integration platform Sello. “Sello’s powerful solution has contributed to an explosion of interest in marketplace business models, an area where we have significant expertise and experience.” – Suminder Pal Singh, Curamando […]

Read more