Dan's Web Tips:

Titles, META Tags, LINK tags, and Search Engine Robots

Identify Your Pages!

TIP: Give users of your site, and search-engine robots indexing it, a clear picture of what your site and each individual page in it are about by properly using the TITLE element and other metadata.

Several useful pieces of metadata (data that describes your other data) can be placed within the HEAD section of your HTML documents. The TITLE element gives the page a name. LINK tags let you express some interrelationships among your documents. META tags express some other information about your pages. Here's how to use them!

Example showing syntax:

<HEAD> <TITLE>MegaCorp, Inc.</TITLE> <META NAME="description" CONTENT="A massive corporation that's trying to totally dominate every conceivable market, including the Internet."> <META NAME="keywords" CONTENT="corporation, business, big, powerful, dominant, domination, Internet"> <LINK REL="StyleSheet" href="css/style.css" type="text/css"> <LINK REV="Made" HREF="mailto:webmaster@megacorp.example"> </HEAD>

Title Element

The TITLE line in the HEAD of your page is very important, and not just because it's required by the HTML specification (a page won't validate if it's missing). It identifies your page in the top of a user's browser window, in a browser's session history, in users' bookmark lists, and in search engines that index your site automatically. Every page of your site needs a title, and it should identify both your site as a whole and the specific page within it even if read out of context. Thus, the page with the June newsletter for MegaCorp distributors might have <TITLE>MegaCorp Distributor Newsletter: June, 1997</TITLE>. If you just say "June Newsletter", somebody looking at a bookmark list or search engine result won't know what newsletter it is (or what year!), while if you just use the title "MegaCorp, Inc." on all your pages, users can't tell them apart in their history list!

Apparently some search engines weight the TITLE very heavily, so including as many "good keywords" as possible in it can help your search engine ranking. But don't go overboard with this; the TITLE ought to also look decent in bookmark lists and the browser title bar, so don't shove in some godawful mess that crams in every keyword you can think of; that's unaesthetic and may actually be viewed as "spamdexing" and be penalized by search engines. But if you can come up with a reasonably concise, logical title that happens to feature a few good, relevant words, that's nice.

(And please resist the urge to begin your TITLE with "Welcome to..."! That's so ubiquitous on the Web it's almost a cliché, and it adds nothing useful for the purposes of bookmarking and site indexing.)

WARNING: Watch out for silly default values of the TITLE element inserted by various authoring tools. Be sure you don't publish pages with a title like "Untitled Document", "Default Page", "Home Page" (Whose home page?), a raw filename, the name of the authoring tool or the data format it uses (e.g., "Shockwave"), etc. If necessary, use a plain text editor to put reasonable titles in place of the stupid ones your development tools inflict on your site.

One final note on titles: I've run into all sorts of silly and careless titles inflicted on pages out of ignorance or laziness. I never, however, expected that any site would purposely change its titles from meaningful ones that are different on every page, into silly, mindlessly repeated ones that are the same throughout an entire section of the site. But then, when I happened to view the source of a page within the ETrade site, I found this amazing snippet of code, complete with a revealing comment:

<title>E*TRADE Financial - Accounts</title>  

So, apparently, they had their site full of "page specific titles", until somebody named Serena insisted that they be dumbed down. Congratulations, Serena, whoever you are; you're hereby inducted into the Dan's Web Tips Hall of Shame!

META Tags

The META tags let you provide additional information about your page for use by search engines and indexing "robots". The description field should contain a brief description of the page, while the keywords field can have a list of words separated by commas to be used in indexing. Unfortunately, due to extensive abuse by spamdexers, search engines don't make much use of these tags nowadays, which is a shame because they could be very useful if used honestly and logically. Still, it does no harm to use them, and some of the indexers do still make use of them. However, Google is the leading search engine now, and it does not make much use of META elements (though the description element does on occasion get displayed in the search results), so many people these days say it's a waste of time to use them at all. On the other hand, as Lillian Hellman famously said, "I cannot and will not cut my conscience to fit this year's fashion." I have and will always make my decisions of what to do on my sites in keeping with my own sense of logical structure, rather than pander to what's supported or encouraged by the trendy browser or search engine of the moment. Occasionally, the real world catches up to me!

As with the TITLE, pick META tags for each page that describe its content. Don't just use the same tags on all pages; that will produce search engine results that consist of a whole bunch of identical entries like:

MegaCorp, Inc.: A massive corporation that's trying to totally dominate every conceivable market, including the Internet.

where they actually go to different pages with specific functions like listing the names of the board of directors, a list of the subsidiaries, and descriptions of each new industry MegaCorp is about to take over and dominate. Use specific titles and descriptions on each page to give them more descriptive entries. It's probably best to use the longest keywords list on the main home page, because that's the page you want to get the best indexing in search engines so users mostly enter there (unless they're searching for something very narrowly-defined that's found on another page). Don't just repeat the same keywords on every page, or users will get lots of pages in your site that aren't even relevant to the specific topic they're looking for, just because you mindlessly replicated a keyword that really just belonged in its own section and in the main home page.

TIP: Choose your titles and descriptions carefully for each page of your site... don't just mindlessly copy something repeatedly!

When you're putting separate META descriptions on each page, don't just copy the TITLE of each page onto the META DESCRIPTION line! I've seen people do that, and it's got to be the silliest thing to do with the description field... even sillier than copying the same description on all pages in your site. When a user sees the description in a search engine results page, it will be shown underneath the title, so there's no sense in repeating it. Come up with a different description, preferably in complete (but concise) sentences with correct spelling, grammar, punctuation, and capitalization, to complement your title rather than echoing it. Keep it brief, though, since many search engines will truncate it if it's long.

The Wrong Way:

<TITLE>MegaCorp New Product Releases</TITLE> <META NAME="description" CONTENT="MegaCorp New Product Releases">

See how silly this looks in a search engine:

MegaCorp New Product Releases: MegaCorp New Product Releases

The Right Way:

<TITLE>MegaCorp New Product Releases</TITLE> <META NAME="description" CONTENT="Descriptions of the hot new products MegaCorp has released recently, such as the MegaNet Astrogator Web browser.">

This comes out much better:

MegaCorp New Product Releases: Descriptions of the hot new products MegaCorp has released recently, such as the MegaNet Astrogator Web browser.

Don't Wait... Do It Now!

TIP: Put your titles and descriptions in first, when you start work on a site... don't leave them blank until later! You never know when you'll wind up indexed!

One more note on TITLE and META tags: They should be the first things you put into your web page, even if you haven't developed the rest of the content yet! When you put anything up on your web server, even if it just says "Under Construction" as a placeholder for a site that you're planning on putting there eventually, don't put it up there with missing, empty, or incorrect titles and descriptions. Some search engine may come along and index the site, and you'll end up listed in a search engine with the title "Add Title Here Later" and a description that's an empty string, because you put that in as a placeholder, intending to replace it with something more meaningful later. And once the search engine indexes the site, it may not come back and re-scan it for many months. Don't think the search engines won't find your site even if you haven't actually submitted it yourself; if anybody, anywhere puts in a link to your site, the web-crawling robots at various search engines can follow it and get to your page. And if you have your own domain name, the robots might query registries and registrars via the "WhoIs" database, looking for sites to index. So take the time at the start of your development project to put in a title and description that reflect what the site is going to be about (and if you don't know what the site is going to be about, why are you developing it in the first place?). You can refine and edit it later as the development proceeds, but put something meaningful there right away.

If you really are unable to come up with content for the META Description field, just leave out the meta lines altogether rather than putting a blank placeholder META tag as some developers do; an empty description is worse than none at all, since it encourages the search-engine description of your site to be the null string, while an omitted META tag lets the search engine generate its own description from the text content of the page.

If you have access to the root directory of the domain name your site is in, you can prevent robots from indexing an under-construction site by using a robots.txt file. If you do this, you don't need to worry about the content of the TITLE and META tags until the site is "live". (See this document on robots exclusion standard or this tutorial for information on the syntax of this file.) Don't forget to remove the robots.txt entry for a site once you're finished with it and want the search engines to index it!

Other META and LINK Tags

The title, description, and keywords aren't the only things you can put in your page headers. There are some more things you can do in META and LINK tags between the <HEAD> and </HEAD> tags of your pages. Here are a few more:

<link rel="canonical" href="http://some-site-url.example/">
The first truly new and different metadata type to be added to the canon of Web document header tags in a long time, this was agreed upon by Google, Yahoo, and Microsoft in 2009. Its purpose is to let you specify what form of the address of a particular page you want to be used, indexed, bookmarked, linked to, and so on. This can be important when a page can be reached by many different addresses, perhaps due to the site being dynamically generated by programs that sprinkle query and session IDs into the URLs which make them different for different users. This tag can specify which of the many possible addresses you actually want to be the "real" address of the page, and search engines that support the tag will thus index only that form rather than diluting your search position with multiple alternative URLs. (This won't help against spamdexers who purposely try to flood search results with multiple copies of the same thing.) Originally, the "canonical" URL had to be in the same domain as the current URL the tag was accessed under, but this requirement has since been loosened so that the tag can be used to point to different domains, which may be handy if you have several domain names pointing at the same content (perhaps an old address you used to advertise the site under, and a new one you're using now). It can also be used to try to get search listings consistent about whether or not to include the "www." at the front of your address, even though the site can be accessed with or without it (assuming you configured the server this way, which is a sensible thing to do).

<META name="robots" content="noindex,nofollow">
Another way of telling robots not to index things in your site, though I don't think it's quite as widely suppored as the robots.txt standard. If "noindex" is in the content, it's telling robots not to index the current page; if "nofollow" is there, it's telling them not to follow links from it. "noindex,follow" would tell them not to index the current page but to follow links from it; this is a useful value in pages marking obsolete parts of your site and linking people to relevant parts that are still maintained. (Its "cousin", the rel="nofollow" attribute for the A tag, is more frequently used, to prevent search engines from following particular links, and, in particular, to keep such links from giving improved page rank to their destinations and thus supposedly discouraging link-spamming. Some people think this sucks.)

<META name="resource-type" content="document">
Ummm, I guess this says that this page is a document... instead of what? Actually, I'm not really sure what this means... a hosting provider I used a long time ago suggested I include this line for the benefit of its internal search engine, and I've carried it over on my pages ever since, but it's probably useless. (I've since found out that this tag was originated in site-index.pl, an indexing script that was popular around 1994-95, which also originated the keywords and description META tags. Those two caught on, but the resource-type one didn't. Other than "document", the other possible value was "service", to denote non-static pages such as search engines and guestbooks.)

<META name="distribution" content="global">
Another line I've had on my pages since their first hosting provider... it suggests that you want your pages indexed globally and not just on a local search engine. Probably ignored, though. (This also is from site-index.pl, and can be set to "local" or "global"; a third value of "Internal Use" has been used by some people to denote pages that shouldn't be indexed at all, but was never implemented in any known indexer; you're better off using the robots META tag for that.)

<META name="geo.position" content="26.367559;-80.12172">
<META name="geo.region" content="US-FL">
<META name="geo.placename" content="Boca Raton, FL">
Tags used by the GeoTags site to index pages geographically. (geo.position also works in A2B). The position tag contains the decimal latitude and longitude of your location. There is a proposed Internet standard for such geographical meta tags.

<META name="ICBM" content="26.367559, -80.12172">
<META name="DC.title" content="THE NAME OF YOUR SITE">
Tags used by the GeoURL site, which is similar to the above GeoTags site. However, the GeoTags META tags above will also work. (Both geo.position and ICBM work in A2B.) The DC.title tag is from the Dublin Core, a set of standardized META tags.

<LINK REL="StyleSheet" href="css/style.css" type="text/css">
LINK tags are used to indicate related resources to the page; in this case, you are telling the browser where to find the stylesheet. The href attribute works the same as that in the A tag, and can have any absolute or relative URL. Be sure your server is configured to send the stylesheet with MIME type text/css; while MSIE is notorious for second-guessing MIME types, other browsers can be pickier.

<LINK rev="Made" href="mailto:webmaster@webtips.dan.info">
<LINK rel="Up" href="./">
<LINK rel="Top" href="./">
<LINK rel="Next" href="logical.html">
<LINK rel="Prev" href="subdir.html">
<LINK rel="First" href="intro.html">
<LINK rel="Last" href="misc.html">
These have been suggested in the HTML specs for many years as a method to suggest related pages and provide a contact e-mail address to reach the webmaster, but actual browser support was hard to find -- the Lynx text browser supported them, and a few other obscure browsers, but none of the common ones, until a version of Mozilla added support for LINK tags, creating a navigation bar for them if you enable that feature. However, the later and more popular Firefox chose not to carry over this feature; you need to download an addon to get it.

Links with a rel attribute are suggesting an outbound relation from the current page to the linked one, while ones with a rev attribute suggest an inbound relation from the linked resource to the current page, but this concept can be very confusing to understand -- for instance, the "Prev" link is rel rather than rev even though its effect is to "go back", since it's still a "forward link" to the "previous" page -- a link with rev="Prev" would be saying that this page is the previous page of the linked one, and would hence be equivalent to rel="Next". Clear? Generally, only rel links are used except for the traditional rev="Made" link to the webmaster e-mail address.

<LINK rel="shortcut icon" href="favicon.ico" type="image/x-icon">
Tells the browser where to find an icon to use in conjunction with the page in bookmark lists, the URL bar, etc. While MSIE automatically looks for "favicon.ico" in the root directory of your site (which some webmasters find annoying if they have no desire to use an icon and don't like getting lots of "404 Not Found" errors in their log), some other browsers, like Mozilla, are more considerate and only look for an icon if it's explicitly linked. The link needn't be to a file named "favicon.ico"; you can call it anything you want and put it anywhere. Mozilla even supports icons of different types (e.g., JPEG graphics), though MSIE only supports the Windows icon format. Mozilla allows the rel attribute to be "icon" (or any phrase containing "icon" as a word), while MSIE looks only for "shortcut icon". Technically, a link attribute with multiple words is saying that the link is of multiple types, with each word being a different type identifier, but I think the Microsoft programmers were just clueless enough to think that spaces could be used within type names, and that "shortcut icon" is what they think the type is.

In addition to the rel and rev attributes, a LINK tag can have a title attribute to give a description of the link, and a hreflang attribute to indicate, with standard two-letter language codes, what language the destination page is in ("en" for English and "es" for Spanish, for instance).

<META http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
This is a commonly-used (mostly because it's stuck in against the author's will by WYSIWYG editors) META tag to set the character encoding; I discuss this on the "WYSIWYG" page linked above, and also on my page on characters, and prefer not to use this tag myself, as this information is better done by real server headers. However, these days even the "standards gods" at W3C advocate using it, so that when your pages are accessed without server headers (e.g., when saved to a hard disk) they still have an indication of what character encoding is used. In general, http-equiv attributes are used to supply the equivalent of server-supplied HTTP headers in the form of HTML tags. Another commonly-used one is the "META Refresh" tag, which I discuss elsewhere.

More Search Engine Notes

Not all search engines use META tags (and in fact, the weighting search engines give to these tags has decreased over the years due to heavy misuse of the tags in attempts to artificially enhance a site's position), so it's still important to have some text content in your page that's relevant to the topic of your site, so that you get indexed with appropriate keywords. This can be a real problem with highly-graphical sites that have little text. Use of ALT text helps in this regard.

If you insist on using frames (blecch!), be sure to have a <NOFRAMES> section with content for non-frame-using browsers; this not only keeps the site from looking totally blank to browsers that don't do frames, it will also get you indexed in search engines, most of which ignore frames too. Be sure to put the TITLE and META tags in the main index file along with your frameset definition. (You'd be surprised how many people put this stuff in all their individual frame files, but leave it out of the main index even though that may be the only file that ever gets scanned by search engines that don't recognize frames!) And don't make the NOFRAMES content consist entirely of a sentence like "Your browser doesn't support frames! Get a better browser, loser!" This is not only rude, but it might just turn up in a search engine as the description of your site! And then there are the sites that include a completely empty NOFRAMES element, except for an empty BODY element within it; these invariably (at least as far as I've seen) set a background color via an attribute in the BODY tag, showing a peculiar preoccupation on the part of the site designer over exactly what color the completely blank page is to be displayed on noframes-enabled browsers, or more likely showing that some HTML-excreting program out there was perversely developed to include this little bit.

Spamdexing

Some people, unsatisfied with the search engine positions their sites get through honest indexing of their contents, try to artificially enhance their standing through manipulative techniques like excessive keyword repeating in the META tags, extra keywords in an invisible color in the main body, huge keyword lists in the ALT attribute of every image, and a page full of keywords with a "META Refresh" that immediately goes on to the real home page. These sorts of things are collectively referred to as "spamdexing," due to their similarity to "spamming" of e-mail and newsgroups in an attempt to push your message on possibly unwilling recipients. None of these things are a good idea. They not only reduce the quality of the search engines for all other users, but they will probably boomerang against the people who use them as the search engine programmers get smarter and make their search robots screen out such manipulation. Already, many such techniques are detected by the popular search engines, and they may end up not indexing your site at all if they detect that you're trying to artificially enhance your position. So don't try it!

TITLE elements designed to manipulate search engine position might also be less useful for the other purposes of that element, such as identifying a page in bookmarks and the browser history. That seems to be what happened with Epinions, which used to use the author-supplied title to each specific review as the title attribute for that review, but later changed all of them to a more generic "Compare Prices and Read Reviews on [Product Name] at Epinions.com". Even worse, the Ultimate Band List was at one point redesigned to make all of its pages have the title "Free Music Download, MP3 Music, Music Chat, Music Video, Music CD, ARTISTdirect Network", not giving any indication of what particular page it is, or even what specific site -- it named the parent group of sites (ARTISTdirect) rather than the specific one (UBL) being accessed. This sort of thing most likely was crafted to enhance their site's position under the sorts of searches their advertisers most cared about, at the expense of having titles that aren't particularly informative in distinguishing the different reviews within] a particular product. I would stay away from that sort of thing.

There's never any end to the procession of underhanded techniques some people are willing to do in order to get an artificial burst of popularity for their site, probably one that doesn't come close to deserving it. Some really annoying things I've run into lately include "spamming" of online guestbooks using automated scripts posting the same generic praise, accompanied by a link to the author's site, to lots of different guestbooks around the Web; and even "spamming" of Web referrer logs, where webmasters keep track of what sites are linking to them. This latter technique is advertised by Easy Web Promotion -- Sleazy Web Promotion is more like it. It consists of having a script "fake" accesses to various sites with referrer URLs pointing at their client's site; then when the webmasters try the URLs to see who's out there linking to them, they wind up in the sites being advertised. Even better for the advertiser, some "blogs" (short for "web log", a trendy new type of site that consists of the site author's day-by-day commentary) have a public page showing referrer URLs, so this sort of spamming might result in links showing up publicly.

Another thing that's sometimes detected and penalized by search engines is the tactic of serving a different version of a site to search robots than to regular browsers, by checking the user agent string of visitors. This can be detected by a robot visiting the site twice, once with a user agent string indicating a robot, and once with one mimicing a popular browser, and comparing the resulting pages. So if you're tempted to use "browser sniffing" to modify your page depending on what user agent requests it, here's a reason to avoid it, other than the accessibility problems you might cause by trying to second-guess your users and their many possible browser configurations.

Incidentally, without the slightest bit of spamdexing or other sneaky and underhanded tactics, and without contriving my content to the eccentricities of particular search engines or paying any submission service big bucks, I've managed to do very good for myself in the area of search engine indexing. Many of my personal pages are well-placed in search engines under a number of relevant keywords. Two particularly good examples are my fan site about Mexican entertainer Tatiana, which is presently #2 in Google under the query "tatiana" (after a site about an Australian athlete of that name), and my fan site about the singer Tiffany which is #2 in Google under the query "tiffany" (immediately after the site of the famous jeweler Tiffany & Co., which I'm sure spent a heck of a lot more developing and promoting their site than I did mine). (Note: Those were the positions these pages held at the time I wrote this, but they ebb and flow with time, sometimes reaching the #1 spot and sometimes dropping out of the top ten altogether; there's no telling where they might be when you check now!) These are astounding achievements, if I can toot my own horn a little, given the hundreds of thousands of pages that include these names. How did I achieve this placement without spending a cent on it? Actually, darned if I know for sure, but I think having relevant content and structuring it logically without lots of gee-whiz graphical gimmickry getting in the way of the plain text content were what put my sites over the top.

Links

Reference Info:

Search Engine Placement Hints & Tips:

Search Engines & Meta Tags
Search Engine Watch -- some information on how to improve your rankings with search engines.
Spider-Food: info on search engines.
Robots Exclusion Standard to keep web pages from being indexed if you don't want them.
Northern Webs Search Engine Tutorial
The Art of Business Web Site Promotion
Bruce Clay has some online tips on Web site promotion and search engine placement, including an interesting chart of the interrelationships among the different search engines, directories, and paid listing services, and a page on search engine optimization.
Submit Corner -- search engine placement tips and tools.
SearchEngines.com -- various hints about search engines.
Google Ranking Tips
Google's own page of webmaster info
About Search Engine Optimisation
Rank Rage -- on spam and the search engine marketer

Tools

Robot exclusion file validator
Another robots.txt validator
Robots.txt checking tool
TracerLock -- monitor search engines to see new sites that appear under given keywords
Site Analysis -- free automated analyzer that shows if you're making effective use of titles and meta tags for search engine positioning.
Search Engine Optimization Tools
Poodle -- a tool that shows you your site as Google probably sees it.

Commentary

Meta-Crap -- a critique of the concept of "metadata" supplied by Web authors to aid in the indexing of their pages. The author says that this is mostly useless because the Web is full of lying and cheating "marketing types" on one hand, and lazy and stupid people incapable of inserting correct metadata if their life depends on it on the other.
The Google AdWords Happening -- A "Web Artist" makes creative use of Google's advertising feature until they kick him out.
Search engine optimization� is the 21st-century version of phrenology

[<== Previous] | [Up] | [Next ==>]