One of our largest customers was seeing a 47% traffic difference between the data gathered by their server logs compared to the Bango data. Although relying on web server logs for your data is tricky (you need to take into account redirects, post backs, image and style sheet requests, spiders and bots as none of these constitute real page views) you should be able to get the data within a few percent.
So to find out why the data discrepancy was happening I started to investigate. In my view the first thing to do with any data investigation is to pick an hour and compare between the two datasets. The best thing to compare to begin with is page views by user agent. This will show up whether the discrepancy is spread across many user agents or whether one user agent (i.e device) is causing the problem. As was the case with this key customer, the whole of the discrepancy was made up from one iPhone 3.13 user agent. The customer was seeing 20x as many page views from one particular iPhone 3.13 user agent as they were any other user agent and 20x as many as Bango were seeing. All other user agents matched on page views within a few percent.
Having discovered that one user agent was the source of the problem we were able to hone in and look for why this might be. The customer in question had software that allocated unique IDs to each person that visited their site based on cookies – as we know cookies can be an unreliable way of measuring real traffic. Using this unique ID and digging into their data, I suddenly saw that almost all of the page views coming from the iPhone 3.13 user agent were coming from the same person/thing. Whatever was causing all of these page views was sending back the cookies to the server and causing over 300 page views a minute at times (more than any human could achieve).
Although it was hard to pinpoint exactly what this might have been, a denial of service attack or left over bot net being the most likely possibilities it was clear that it couldn’t have been human what was causing the discrepancy. After removing the page views created by this non human source from the key customer’s data the datasets suddenly matched-up within a few percent.
What this investigation showed is that providing analytics data from real users is key for accuracy and the best way to get the true picture of the performance of your mobile site. Bango provides analytic data from real users, we don’t include bots and spiders so our customers can have piece of mind when measuring their mobile sites. We allocate a unique user ID to every mobile visitor, delivering 8x greater accuracy than any other analytics solution.