Your career site data is ruined by bots

If you regularly analyse the performance of your career site, you might have seen some odd numbers. Spikes of traffic with a very low session duration and a high bounce rate, traffic coming from locations you would not expect or traffic to pages that are hardly visited regularly. That is probably bot traffic and it can highly influence the outcome of your career site analytics. So how can you prevent that?

What are bots?

Let’s start with the basics. Bots are software programs that operate on the Internet and perform repetitive tasks. They are very common; more than half of Internet traffic is bots scanning content, interacting with webpages, chatting with users, or looking for attack targets. The most known one is the Googlebot, which indexes content from all over the web so it can be used in Google Search.

Some bots, like the Googlebot, are useful. However, some bots are ‘bad’ and are meant to break into insecure websites, bring down a website (DDoS attack) or scrape content without consent.

Bots are also common in recruitment, mostly known by the job boards. For those job boards to stay relevant, need to all recent job posts. Since they are spread amongst thousands of sites and need to be constantly refreshed, they use bots to fetch those job posts. And that is exactly where the problem lies.

Why do they mess up your analysis?

One common analysis is calculating the conversion rates from the job detail page to the apply form to a finished application. With that analysis, you can figure out how well candidates move through the funnel and how smooth the process is. It’s a great, pragmatic way to verify the performance of your funnel. If you split up performance per source, it will also allow you to compare the performance per source.

Bots fetch as many job postings as possible, but they can’t apply. That means your job detail page and apply form visits will be artificially higher, whilst your finished applications will be more realistic.

That means that your analysed conversion rates are lower than they actually are. That means that when comparing your average conversion rates to conversion rates from one source, your comparison will be skewed.

The same goes for looking at general website interaction metrics like bounce rate, session duration and pages/session. Traffic from bots generally has a session duration of just a few seconds and an extremely high bounce rate, skewing your averages.

It is good to understand that whilst nothing on your site is broken, the impact of bots on your analysis can be huge. We’ve seen bot traffic taking up a quarter to half of total career site traffic, which will greatly influence your averages.

How to fix this?

I’m going to assume you use Google Analytics. First up, make sure you enable the built-in Bot Filtering.

No alt text provided for this image

However, most bot traffic will still slip through. Recognising this traffic can be quite tricky, as there might be a lot of jobs. Next to that, ’they’ will try to blend in with regular traffic, as bots don’t want to be blocked. That also means there is no complete solution.

No alt text provided for this image

One way is to filter based upon the browser User Agent, which is basically the make and model of a browser. The biggest part of bots uses Linux as their operating system, which we can recognise and filter out on Google Analytics view level. Since the raw User-Agent is not a standard dimension in Google Analytics, you’ll need to use a custom dimension for it. In the Google Tag Manager, create a Variable of the type Javascript Variable and add the following:

No alt text provided for this image

Then you should add this to the Google Analytics Setting in GTM and create a custom dimension in Google Analytics. Based on that custom dimension, you can then identify bot traffic and filter them out on View level. I’d advise creating a new view for this, so you can compare your original view and the filtered view.

One way to do that is to filter out all traffic from computers running Linux, as that is the go-to OS for bots. However, you will also filter out genuine traffic coming from talent using Linux. That should not be a big chunk of your traffic, as Linux takes up roughly 2%-3% of global computer usage and has no mobile presence.

Another option is to create a segment that requires sessions to last 2 or more seconds, meaning you segment out sessions of 0 or 1 second. This will segment out most bots, but could also segment out genuine, low-quality traffic that you’d like to investigate. The downside of using a segment is that you always need to apply the segment when you open Google Analytics, as that defaults to the All Users segment.


It is important to understand the impact of bot traffic on your analysis, as it can influence them hugely because of skewed averages, both in career site engagement metrics (bounce rate, session duration, etc.) and conversion rate. There are no easy to implement ideal solutions, but the two mentioned should help a long way. Knowing this happens and taking it into account, is in this case possibly more important to completely fixing it.

Schrijf je in voor de nieuwsbrief.

Een nieuwsbrief, ja. Maar niet zo eentje vol met sales, maar eentje vol met tips & tricks.

Nog niet klaar met lezen? We hebben meer.