"Brand-X" Browsers -- User Agent Strings
A Note on User Agent Identifiers and Browser Statistics
Whenever anyone gives statistics purporting to identify what percentage of users are using which browsers, this is (if it's not just somebody's wild guess) probably taken from analysis of the user agent identifiers of visitors to a Web site. This identifier is part of the HTTP protocol, and is a string that usually gives the name and version of the browser being used. Unfortunately, there is no real consistency in the format of this string, which makes analysis very difficult and statistics suspect.
Netscape (back when it actually existed as a distinct browser) always used "Mozilla" as its name in these strings, but many/most other browsers "lie" and also identify themselves as "Mozilla", something that got established quite a few years ago (during the 1990s "browser wars") because the other browser makers wanted to get through browser-identifiers on sites that disabled Netscape-specific enhancements when any other browser was being used. So they identified themselves as Mozilla/2.0 (compatible; RealBrowserName) -- even though they weren't always truly compatible with Netscape. One of the browsers doing this was MSIE, which used strings like Mozilla/2.0 (compatible; MSIE 2.0). When MSIE got enough market share to be the "browser to imitate" by many of the Brand X's, you started seeing strings like Mozilla/3.0 (compatible; MSIE 3.0; RealBrowserName), which is pretending to be MSIE pretending to be Netscape. There was much debate among developers and testers of Mozilla in its early days on what to do about its user agent string (which starts with "Mozilla/5.0" even though this did not correspond to the actual version number of any Mozilla-based browser until many years later when Firefox 5.0 came out, rapidly succeeded by 6.0 with their current rapid-release strategy), with some wanting a "clean start" by changing its opening word to something else (even though the old pre-Firefox Mozilla Suite, once the flagship project of the Mozilla organization, was actually the only browser that could honestly call itself "Mozilla"), while others were deathly afraid to make the slightest alteration (even to change the version number with each release as Netscape always did) lest it discombobulate "browser sniffers" and lock Mozilla users out of sites. So it seems like we're stuck for the indefinite future with user agent strings that get further and further away from honestly describing the browser name and version they represent, and contain increasing amounts of fossilized deadwood that can't be removed because some site, somewhere, allegedly depends on its presence.
I think browsers that "spoof" others like this are doing the cause of independent browsers a disservice. In the short run, such dodges help users get around clueless browser detection in Web sites, but in the long run it causes those same clueless webmasters to see statistics that confirm their belief that "everybody uses [fill in currently popular browser]", even if a large chunk of those users are really using something else but pretending to be using the popular browser. (One site claims that, using a test page that both logged the presence of "MSIE" in user agent strings and used a "conditional-comments" proprietary Microsoftism to cause a particular stylesheet to load only in true MSIE browsers, fully 18% of browsers claiming to be "MSIE" actually are not.) Thus, I have all the browsers I use configured to use a completely honest user agent string wherever this is an available option (e.g., my copy of Opera used an "Opera" string with no mention of Mozilla or MSIE, even before they made this the default), and wish that this were the default for all browsers (with a "spoofing" string, if available at all, only present as a settable option for the special purpose of going to a site that otherwise doesn't work).
Speaking of Opera, after a long time of defaulting to a "spoofing" identifier, they finally got honest and started using a logical user agent string with "Opera/x.xx". But, after a while of this, they found a new idiocy to perpetrate: when they reached version 10.0, the first major browser to get to a double-digit version number, they found that some moronic browser-sniffers couldn't handle such a number and looked at only one digit, reading it as either version 1 or version 0 of Opera and demanding that users upgrade before using their site. So the Opera people had to start lying once again, this time starting their strings with "Opera/9.80" and adding a "Version/10.00" later in the string with the real version. Is this a temporary workaround they'll eventually be able to drop, or are they stuck permanently this way? Are other browsers that reach 10.0 going to have to do similar things in the future? How many different version numbers will Firefox wind up with? (It's got several, already, including the meaningless "Mozilla/5.0", a Gecko version number that's in the "rv:" parameter rather than the Gecko token you might naively expect to have it -- that used to have the build date, until Firefox 5.0 when it got replaced with yet another unchanging fossilized element of '20100101' -- and an actual Firefox version number that follows "Firefox". They did, however, trim a good deal of fat out of the Firefox user agent string as of version 5.0, though still leaving some historical nonsense for "compatibility" with other browsers. On the other hand, when they did finally reach version 10.0, they managed to get there without special mucking-around with the user agent string to accommodate double-digit versions.)
There seems to be no end to the degree of foolishness that gets committed in the name of browser identification.
Google Chrome, for instance, uses
Yet another bit of idiocy is in the platform section of Windows user agent strings, and this one is squarely Microsoft's fault: Windows versions identify themselves as "Windows NT [some version number]" even though NT itself has been obsolete for at least a decade. Newer Windows versions like XP and Vista and Windows 7 were made to call themselves later versions of NT so that software designed for that ancient Windows variety would keep working, and that's been kept up indefinitely. To reach the pinnacle of ridiculousness, their marketing department picked the name "Windows 7" because that number was what the version number had made its way up to (even though Windows version numbers had been kept pretty well hidden from end users since the really ancient Windows 3.1)... but then when Win7 actually did get released, the M$ techies thumbed their nose at the marketing types by actually giving that version the internal numbering (visible in user-agent strings and the like) of "Windows NT 6.1", thus making the "7" a misnomer (misnumber?). When they come out with Windows 8, what number will that actually be? (6.2, apparently.)
All of this makes it very hard to identify what browsers are really being used. To make it even harder, there are a few browsers that actually let the user change the user-agent string, and some users put in None Of Your Business, a joke name like Nutscrape, or random garbage characters. For the use of my own Web log analysis, I use a Perl routine I developed that attempts, as best I can, to parse out the true browser being used (modified every time I run into another browser that does it differently), but it's not perfect. So don't put too much trust in anybody's browser usage statistics. (And that isn't even considering various Web caching systems that make all site hit counts suspect, and the fact that any stats based on hits to inline images like counters or ad banners will exclude text-mode browsers, browsers with image loading turned off, and accesses by users with filtering programs that skip loading online ads, etc.)
Also note that "user-agents" are not fully synonymous with browsers. Browsers are user agents, but so are some other things, such as indexing robots. So some of the weird names like "Scooter" you might see in your logs are not "brand X" browsers, but indexers from a search engine. Be hospitable to them or you won't get indexed, or you'll get indexed under something inappropriate (try a Google search for "Unsupported Browser" some time, and see how many sites that were rude to the Googlebot got indexed under their "Get a better browser, loser" brushoff page rather than their real content). Unfortunately, spammers also have robots that go through Web sites harvesting e-mail addresses to annoy.
Other user agents include programs to download a site for offline browsing or to generate a map or outline of a site. Others are "download managers," such as Go!Zilla and SmartDownload, which take over when the user starts to download an executable file from the Web, managing the download process and giving the ability to resume an aborted download. You may see any of these turn up in your logs along with browsers.
Make your site better by looking at other sites that show, by example, what not to do!
NOTE: The inclusion of a site in my "Hall of Shame" links should not be construed as any sort of personal attack on the site's creator, who may be a really great person, or even an attack on the linked Web site as a whole, which may be a source of really great information and/or entertainment. Rather, it is simply to highlight specific features (intentional or accidental) of the linked sites which cause problems that could have been avoided by better design. If you find one of your sites is linked here, don't get offended; improve your site so that I'll have to take down the link!
(See also somebody else's User-Agent Detection Hall of Shame, which has a similar idea to this, in blog form!)
Well, at least none of the sites below put users in jail for using the "wrong" browser!
This page was first created 24 Sep 1998, and was last modified 01 Nov 2013.