|
Dan's Web Tips:"Brand-X" Browsers -- User Agent Strings[<== Previous] | [Up] | [Next ==>] A Note on User Agent Identifiers and Browser StatisticsWhenever anyone gives statistics purporting to identify what percentage of users are using which browsers, this is (if it's not just somebody's wild guess) probably taken from analysis of the user agent identifiers of visitors to a Web site. This identifier is part of the HTTP protocol, and is a string that usually gives the name and version of the browser being used. Unfortunately, there is no real consistency in the format of this string, which makes analysis very difficult and statistics suspect. Netscape (back when it actually existed as a distinct browser) always used "Mozilla" as its name in these strings, but many other browsers "lie" and also identify themselves as "Mozilla", something that got established quite a few years ago (during the 1990s "browser wars") because the other browser makers wanted to get through browser-identifiers on sites that disabled Netscape-specific enhancements when any other browser was being used. So they identified themselves as Mozilla/2.0 (compatible; RealBrowserName) -- even though they weren't always truly compatible with Netscape. One of the browsers doing this was MSIE, which used strings like Mozilla/2.0 (compatible; MSIE 2.0). When MSIE got enough market share to be the "browser to imitate" by many of the Brand X's, you started seeing strings like Mozilla/3.0 (compatible; MSIE 3.0; RealBrowserName), which is pretending to be MSIE pretending to be Netscape. There was much debate among developers and testers of Mozilla in its early days on what to do about its user agent string (which starts with "Mozilla/5.0" even though this does not correspond to the actual version number of any Mozilla-based browser to date), with some wanting a "clean start" by changing its opening word to something else (even though the old pre-Firefox Mozilla Suite, once the flagship project of the Mozilla organization, was actually the only browser that could honestly call itself "Mozilla"), while others were deathly afraid to make the slightest alteration (even to change the version number with each release as Netscape always did) lest it discombobulate "browser sniffers" and lock Mozilla users out of sites. So it seems like we're stuck for the indefinite future with user agent strings that get further and further away from honestly describing the browser name and version they represent, and contain increasing amounts of fossilized deadwood that can't be removed because some site, somewhere, allegedly depends on its presence. I think browsers that "spoof" others like this are doing the cause of independent browsers a disservice. In the short run, such dodges help users get around clueless browser detection in Web sites, but in the long run it causes those same clueless webmasters to see statistics that confirm their belief that "everybody uses [fill in currently popular browser]", even if a large chunk of those users are really using something else but pretending to be using the popular browser. (One site claims that, using a test page that both logged the presence of "MSIE" in user agent strings and used a "conditional-comments" proprietary Microsoftism to cause a particular stylesheet to load only in true MSIE browsers, fully 18% of browsers claiming to be "MSIE" actually are not.) Thus, I have all the browsers I use configured to use a completely honest user agent string wherever this is an available option (e.g., my copy of Opera used an "Opera" string with no mention of Mozilla or MSIE, even before they made this the default), and wish that this were the default for all browsers (with a "spoofing" string, if available at all, only present as a settable option for the special purpose of going to a site that otherwise doesn't work). Speaking of Opera, after a long time of defaulting to a "spoofing" identifier, they finally got honest and started using a logical user agent string with "Opera/x.xx". But, after a while of this, they found a new idiocy to perpetrate: when they reached version 10.0, the first major browser to get to a double-digit version number, they found that some moronic browser-sniffers couldn't handle such a number and looked at only one digit, reading it as either version 1 or version 0 of Opera and demanding that users upgrade before using their site. So the Opera people had to start lying once again, this time starting their strings with "Opera/9.80" and adding a "Version/10.00" later in the string with the real version. Is this a temporary workaround they'll eventually be able to drop, or are they stuck permanently this way? Are other browsers that reach 10.0 going to have to do similar things in the future? How many different version numbers will Firefox wind up with? (It's got several, already, including the meaningless "Mozilla/5.0", a Gecko version number that's in the "rv:" parameter rather than the Gecko token you might naively expect to have it -- that has the build date -- and an actual Firefox version number that follows "Firefox".)
There seems to be no end to the degree of foolishness that gets committed in the name of browser identification.
Google Chrome, for instance, uses All of this makes it very hard to identify what browsers are really being used. To make it even harder, there are a few browsers that actually let the user change the user-agent string, and some users put in None Of Your Business, a joke name like Nutscrape, or random garbage characters. For the use of my own Web log analysis, I use a Perl routine I developed that attempts, as best I can, to parse out the true browser being used (modified every time I run into another browser that does it differently), but it's not perfect. So don't put too much trust in anybody's browser usage statistics. (And that isn't even considering various Web caching systems that make all site hit counts suspect, and the fact that any stats based on hits to inline images like counters or ad banners will exclude text-mode browsers, browsers with image loading turned off, and accesses by users with filtering programs that skip loading online ads, etc.) Try My CGI Browser Detection Now! Also note that "user-agents" are not fully synonymous with browsers. Browsers are user agents, but so are some other things, such as indexing robots. So some of the weird names like "Scooter" you might see in your logs are not "brand X" browsers, but indexers from a search engine. Be hospitable to them or you won't get indexed, or you'll get indexed under something inappropriate (try a Google search for "Unsupported Browser" some time, and see how many sites that were rude to the Googlebot got indexed under their "Get a better browser, loser" brushoff page rather than their real content). Unfortunately, spammers also have robots that go through Web sites harvesting e-mail addresses to annoy. Other user agents include programs to download a site for offline browsing or to generate a map or outline of a site. Others are "download managers," such as Go!Zilla and SmartDownload, which take over when the user starts to download an executable file from the Web, managing the download process and giving the ability to resume an aborted download. You may see any of these turn up in your logs along with browsers. Hall of ShameMake your site better by looking at other sites that show, by example, what not to do! NOTE: The inclusion of a site in my "Hall of Shame" links should not be construed as any sort of personal attack on the site's creator, who may be a really great person, or even an attack on the linked Web site as a whole, which may be a source of really great information and/or entertainment. Rather, it is simply to highlight specific features (intentional or accidental) of the linked sites which cause problems that could have been avoided by better design. If you find one of your sites is linked here, don't get offended; improve your site so that I'll have to take down the link! Well, at least none of the sites below put users in jail for using the "wrong" browser!
Links
[<== Previous] | [Up] | [Next ==>]
This page was first created 24 Sep 1998, and was last modified 10 Jul 2010.
|