Google Analytics Data Quality Checklist

Google Analytics is interesting. It has the lowest barrier to entry of any analytics tool (because it’s free) but is deceptively complex. As analysts, we should understand that no implementation is perfect. You can hire a team of consultants to audit and correct low-hanging fruit of your implementation, but your business isn’t stagnant. It’s fluid. Even if it’s as fluid as molasses you will still need to monitor your data quality. I can’t write a blog post that will solve that problem for you.

Instead, I want to help you solve problems that consultants will help you solve when they initiate the engagement. When I conduct site audits for clients, there is a set of consistent recommendations that are very “out-of-the-box”. The logic is consistent and brands also consistently overlook them. Instead of paying someone to go in and look at this fundamental stuff, here’s how you can do it yourself… and I’m assuming you know how to create goals, leverage eCommerce reports, enable demographic reports, internal search, and connecting GA to developer tools & AdWords.

Create Filtered and Unfiltered views

views

Historical data in Google Analytics cannot be changed. Once it’s in the reports… that’s it. It’s in your best interest to ensure that you have a backup plan in case you screw something up. That’s why it’s a best practice to have a Filtered view and an Unfiltered view. The Filtered view will have all of your “clean” data. This will be your primary view that you use to conduct analysis. The Unfiltered view (in addition to being your insurance policy) will also help you build out your filtered view.

Let’s say you force your URL’s to lowercase in your Pages report. There are some filters that you may want to implement that are case sensitive. If you are unable to see the case of these strings then your filters won’t work. You get the idea. That doesn’t mean you can’t enable eCommerce reports in the Unfiltered view (you should), that doesn’t mean you can’t create goals or link AdWords (you should). All this means is you keep the raw data as-is. No trimming of query strings, no forcing lowercase, no excluding internal traffic.

Force the Request URI to lowercase

casesensitive

See the problem here? When you’re analyzing user behavior on the homepage, you might be missing a significant chunk of traffic. This could affect how you interpret content affinity and campaign performance/reconciliation. This doesn’t just impact your reporting. Google Analytics imposes some restrictions on data volume:

Daily processed tables store a maximum of 50k rows for standard Analytics and 75k rows for Google Analytics 360.

This means if you exceed 50,000 unique page names (or 50,000 of any dimension), the remaining results will be bucketed into this “(other)” label:

other

Gross. So how do you fix it?

Go to Filters and create a Custom Filter. From there, click on the “Lowercase” option from the radio buttons lower on the page:

lowercase

Select “Request URI” from the list and save. That’s it. Ensure you’re saving this filter on the Filtered view and not applying it to the Unfiltered view.

Remove useless query strings

qstringsreport

So you’re now forcing things lowercase. Your work here is done. NOT SO FAST! Looks like we found more duplicate line items because of those pesky query strings. Time to knock those out. Query strings are split into name/value pairs (/index?name=value&name=value&name=value). You’ll need to take an inventory of all of the NAMES of query strings and get rid of them in the View Settings page in your Admin section:

qstrings

It’s CaSe SeNsItIvE so make sure you’re searching for these strings in the Unfiltered view! That means listing “qstring1” will NOT work if the URL in the query string is listed as “qString1”. It DOESN’T matter if you have a filter applied to one of your views. I found the most effective way to take inventory of the query string names is to simply search for the following: \?

The slash that precedes the question mark escapes that character. Google Analytics’ search defaults to Regular Expressions so this will literally search for a question mark. From there, I just keep a text editor file open and add to the list. It’s tedious, but you really should only have to do a large-scale cleansing once.

Eliminate Spam… all of it

botfiltering

Spam in your GA account is annoying as hell. Let’s go ahead and assume you have already gone into your View Settings and checked the box above. If you haven’t… do it. It’s not perfect, but Google works hard to keep their bot filter up-to-date. So how do we identify and exclude the rest of the spam? Spam bots are primarily vehicles to get you to click on bogus websites – like best-seo-stuff.com

spambot

How do we know this is spam? 241 sessions, 241 new users, 100% bounce rate and 100% new sessions are a dead giveaway. Even when a popular site links to your own website, these metrics are basically not possible.

So from here you have a few options. You can create a filter to remove this domain; however, this spam traffic is often triggered without anything physically visiting your site. That means it isn’t associated with a hostname (for instance, my hostname is jimalytics.com). That said, it’s often easier to create a filter that includes your hostname:

hostname

This will cover 99% of bot issues.

Exclude internal traffic

ipfilter

You do a lot of testing. Your developers do a lot of testing. Your boss and employees do a lot of testing. There’s a surprising amount of traffic that comes from internal resources. If you don’t have one already, create a filter that will filter out your office’s IP address (or range of addresses). Talk to your development or IT team to understand what that range is.

Takeaways

By NO means is this list comprehensive. There are other considerations that you might want to take like cross-domain tracking, effective channel groupings and campaign tracking, and enhanced eCommerce among others. What are some other out-of-the-box items that you look for when you conduct an audit?

Leave a Reply

Your email address will not be published. Required fields are marked *