Sessions are becoming worthless. You heard it here first, folks! All you can eat fear-mongering! The title of this is pretty sensational because I’m not really talking about ALL sessions. Well, I sorta am. Okay, let’s break things down a bit. Spam bots exist, right? Cool, glad we got that out of the way. But really – bots suck. Headless browsers crawl your site and inflate your visits – usually it’s direct traffic. I mean the bots that aren’t direct traffic are hopefully already filtered out.
So there was this bot that cropped up around July of 2014. It’s basically this bot that blasts a specific page of a site through proxies, grabs a retargeting cookie, and goes back to the spammer’s site and collects that sweet CPM nectar. Ad services are aware of it and they offer credit to mitigate this. Whatever. Not my problem, talk to our media team… but this bogus data in [Google/Adobe/Coremetrics/WebTrends/Hit Counter] is really starting to piss me off. These bots have gotten really obnoxious – they’re faking ISP’s, cities, using various browsers, etc. In order to trick the media companies, they need to be able to APPEAR human. That means that it becomes MUCH more difficult (if not impossible) to determine whether the session was from a bot or a human. Since that July bot, I’ve been seeing loads of direct traffic sessions from each IE browser.
Why is this a problem? Well, it screws with our session count and you can’t filter it out. That means your executive dashboard that shows traffic volume is wrong. I guess you can segment and split this traffic out of your reports – basically you can just NOT include direct traffic – but typically that’s a HUGE traffic source on your site. I’m cool with segmenting it out for the purpose of analysis, but using sessions at a high level to determine total traffic to the site is unreliable. Sessions are also what folks are used to, which makes not including this value that much more difficult.
What do we do about this? Some jokers will say “Estimate the average sessions for each day year-over-year and subtract the sessions from the current year and…” no. FORGET that. That’s a huge waste of time. Holy cow, I can’t believe you even suggested that. Some websites have this “bot checker” interstitial page that waits a few seconds before the physical site loads. You know who implemented that? Steve from IT probably insisted that an interstitial page was a good idea after clubbing the Design/UX team over the head with a 20lb salmon. Not viable. And if Steve is reading this, don’t quit your day job… and I’m calling the police… and why salmon?
In the end, there are 3 options: stop them on the back end, stop them on the front end, or don’t stop them at all. I’m going to go over the first 2 options at the same time because the solution is roughly the same. You need to find some consistent identifier for these bots, which requires building/buying something and hoping that these bots never evolve to mimic human behavior. Captchas have taught me that this effort is pretty fruitless. Don’t get me wrong – it’s great to mitigate the stress that might be put on your server(s), but from a web analytics perspective it means we’re seeing spikes of traffic until these bots are recognized and stopped.
Perhaps the best solution is to do nothing. Crazy, I know. However, there isn’t a heck of a lot we CAN do to stop these bots from looking like real traffic. I think this is a great opportunity to back away from focusing on total sessions. The goal of these bots is to land on the site, grab a cookie, and leave (cookie bandits). If bots are an issue for you, a short-term solution may be to trend non-bounce sessions. To better wrap context around metrics, segmentation is always key… but long-term I see this issue growing. Protecting ourselves against bots means we’ll have to communicate differently with our business partners. It means we can’t just dump a session count to our stakeholders and say “Here ya go” – it means we need to communicate distinct actions. We need to communicate user behavior in terms of anticipated BEHAVIOR. Bouncing is not a behavior. It’s an outcome of NOT behaving a way we might expect a human to behave.
So what does this mean for the session? It means looking at the session in a aggregate is dying. It SHOULD be dying. It’s 2015, not 2005. I know folks probably roll their eyes and say “We just communicate sessions because our stakeholders understand it.” Ironically, it’s not what characterizes business performance. We can track physical behavior so let’s communicate that way. Easier said than done? If bot traffic continues to proliferate, you may not have a choice.