Titles, META Tags, LINK tags, and Search Engine Robots
[<== Previous] | [Up] | [Next ==>]
Identify Your Pages!
TIP: Give users of your site, and search-engine robots
indexing it, a clear picture of what your site and each individual page
in it are about by properly using the TITLE
element
and other metadata.
Several useful pieces of metadata (data that describes your other data)
can be placed within the HEAD
section of your HTML documents.
The TITLE
element gives the page a name. LINK
tags let you
express some interrelationships among your documents. META
tags express
some other information about your pages. Here's how to use them!
Example showing syntax:
<HEAD>
<TITLE>MegaCorp, Inc.</TITLE>
<META NAME="description" CONTENT="A massive corporation that's trying to totally dominate
every conceivable market, including the Internet.">
<META NAME="keywords" CONTENT="corporation, business, big, powerful, dominant, domination, Internet">
<LINK REL="StyleSheet" href="css/style.css" type="text/css">
<LINK REV="Made" HREF="mailto:webmaster@megacorp.example">
</HEAD>
Title Element
The TITLE
line in the HEAD
of your page is very important, and not
just because it's required by the HTML specification (a page won't validate if it's missing). It identifies your page in the top of a
user's browser window, in a browser's session history, in users' bookmark
lists, and in search engines that index your site automatically. Every page of your site needs a title,
and it should identify both your site as a whole and the specific page within it even if read out of
context. Thus, the page with the June newsletter for MegaCorp distributors might have <TITLE>MegaCorp
Distributor Newsletter: June, 1997</TITLE>
. If you just say "June Newsletter", somebody looking at a
bookmark list or search engine result won't know what newsletter it is (or what year!), while if you just use the
title "MegaCorp, Inc." on all your pages, users can't tell them apart in their history list!
Apparently some search engines weight the TITLE
very heavily, so including
as many "good keywords" as possible in it can help your search engine ranking. But don't
go overboard with this; the TITLE
ought to also look decent in bookmark
lists and the browser title bar, so don't shove in some godawful mess that crams in every
keyword you can think of; that's unaesthetic and may actually be viewed as "spamdexing"
and be penalized by search engines. But if you can come up with a reasonably concise, logical
title that happens to feature a few good, relevant words, that's nice.
(And please resist the urge to begin your TITLE
with "Welcome to..."!
That's so ubiquitous on the Web it's almost a cliché, and it adds nothing useful
for the purposes of bookmarking and site indexing.)
WARNING: Watch out for silly default values of the TITLE
element inserted by various authoring tools. Be sure you don't publish pages with
a title like "Untitled Document", "Default Page", "Home Page" (Whose
home page?), a raw filename, the name of the authoring tool or the data format it uses
(e.g., "Shockwave"), etc. If necessary, use a plain text editor to put reasonable
titles in place of the stupid ones your development tools inflict on your site.
One final note on titles: I've run into all sorts of silly and careless titles inflicted on pages
out of ignorance or laziness. I never, however, expected that any site would purposely
change its titles from meaningful ones that are different on every page, into silly, mindlessly repeated
ones that are the same throughout an entire section of the site. But then, when I happened to view the source
of a page within the ETrade site, I found this amazing snippet of code,
complete with a revealing comment:
<title>E*TRADE Financial - Accounts</title>
<!--
Changed 12/07/01 for Serena, etrade no longer wants
page specific titles
-->
<!--
<TITLE>E*TRADE - Tax Records</TITLE>
-->
So, apparently, they had their site full of "page specific titles", until somebody named Serena insisted that they
be dumbed down. Congratulations, Serena, whoever you are; you're hereby inducted into the Dan's Web Tips Hall of
Shame!
META Tags
The META
tags let you provide additional information about your page for use by search engines and
indexing "robots". The description field should contain a brief description of the page, while the
keywords field can have a list of words separated by commas to be used in indexing. Unfortunately, due to
extensive abuse by spamdexers, search engines don't make much use of these tags nowadays,
which is a shame because they could be very useful if used honestly and logically. Still, it does no harm to use them,
and some of the indexers do still make use of them. However, Google is the leading
search engine now, and it does not make much use of META
elements (though the description
element does on occasion get displayed in the search results), so many people these days say it's
a waste of time to use them at all. On the other hand, as Lillian Hellman famously said, "I cannot and will not
cut my conscience to fit this year's fashion." I have and will always make my decisions of what to do on my sites in
keeping with my own sense of logical structure, rather than pander to what's supported or encouraged by the trendy
browser or search engine of the moment. Occasionally, the real world catches up to me!
As with the
TITLE
, pick META
tags for each page that describe its content. Don't just use the same tags on all
pages; that will produce search engine results that consist of a whole bunch of identical entries like:
- MegaCorp, Inc.
- A massive corporation that's trying to totally dominate
every conceivable market, including the Internet.
where they actually go to different pages with specific functions like listing the names
of the board of directors, a list of the subsidiaries, and descriptions
of each new industry MegaCorp is about to take over and dominate. Use specific titles and
descriptions on each page to give them more descriptive entries. It's probably best to use the longest
keywords list on the main home page, because that's the page you want to get the best indexing in
search engines so users mostly enter there (unless they're searching for
something very narrowly-defined that's found on another page). Don't just
repeat the same keywords on every page, or users will get lots of pages
in your site that aren't even relevant to the specific topic they're
looking for, just because you mindlessly replicated a keyword that really
just belonged in its own section and in the main home page.
TIP: Choose your titles and descriptions carefully for
each page of your site... don't just mindlessly copy something repeatedly!
When you're putting separate META
descriptions on each page,
don't just copy the TITLE
of each page onto the META DESCRIPTION
line! I've seen people do that, and it's got to be the silliest thing
to do with the description field... even sillier than copying the same
description on all pages in your site. When a user sees the description
in a search engine results page, it will be shown underneath the title,
so there's no sense in repeating it. Come up with a different description,
preferably in complete (but concise) sentences with correct spelling, grammar,
punctuation, and capitalization, to complement your title
rather than echoing it. Keep it brief, though, since many search engines will
truncate it if it's long.
The Wrong Way:
<TITLE>MegaCorp New Product Releases</TITLE>
<META NAME="description" CONTENT="MegaCorp New Product Releases">
See how silly this looks in a search engine:
- MegaCorp New Product Releases
- MegaCorp New Product Releases
The Right Way:
<TITLE>MegaCorp New Product Releases</TITLE>
<META NAME="description" CONTENT="Descriptions of the hot new products MegaCorp has
released recently, such as the MegaNet Astrogator Web browser.">
This comes out much better:
- MegaCorp New Product Releases
- Descriptions of the hot new products MegaCorp has
released recently, such as the MegaNet Astrogator Web browser.
Don't Wait... Do It Now!
TIP: Put your titles and descriptions in first, when
you start work on a site... don't leave them blank until later! You
never know when you'll wind up indexed!
|
One more note on TITLE
and META
tags: They
should be the first things you put into your web page,
even if you haven't developed the rest of the content yet! When you put
anything up on your web server, even if it just says "Under Construction"
as a placeholder for a site that you're planning on putting there
eventually, don't put it up there with missing, empty,
or incorrect titles and descriptions. Some search engine may come along
and index the site, and you'll end up listed in a search engine with
the title "Add Title Here Later" and a description that's an empty string,
because you put that in as a placeholder, intending to replace it with
something more meaningful later. And once the search engine indexes the
site, it may not come back and re-scan it for many months. Don't think
the search engines won't find your site even if you haven't actually
submitted it yourself; if anybody, anywhere puts in a link to your site,
the web-crawling robots at various search engines can follow it and get
to your page. And if you have your own domain name, the robots might
query registries and registrars via the "WhoIs" database, looking for sites to index. So take
the time at the start of your development project to put in a title and
description that reflect what the site is going to be about (and if
you don't know what the site is going to be about, why
are you developing it in the first place?). You can refine and edit it
later as the development proceeds, but put something meaningful there
right away.
If you really are unable to come up with content for the META
Description
field, just leave out the meta lines altogether
rather than putting a blank placeholder META
tag as some
developers do; an empty description is worse than none at all, since it
encourages the search-engine description of your site to be the null string,
while an omitted META
tag lets the search engine generate
its own description from the text content of the page.
If you have access to the root directory of the domain name your site is
in, you can prevent robots from indexing an under-construction site by
using a robots.txt
file. If you do this, you don't need to
worry about the content of the TITLE
and META
tags until the site is "live". (See this document
on robots exclusion standard or this
tutorial for information on the syntax of this file.) Don't forget
to remove the robots.txt
entry for a site once you're finished
with it and want the search engines to index it!
Other META and LINK Tags
The title, description, and keywords aren't the only things you can put in your page headers.
There are some more things you can do in META and LINK tags between the <HEAD>
and </HEAD>
tags of your pages. Here are a few more:
<link rel="canonical" href="http://some-site-url.example/">
The first truly new and different metadata type to be added to the canon of Web document header tags in
a long time, this was agreed upon by Google, Yahoo, and Microsoft in 2009. Its purpose is to let you
specify what form of the address of a particular page you want to be used, indexed, bookmarked, linked
to, and so on. This can be important when a page can be reached by many different addresses, perhaps due
to the site being dynamically generated by programs that sprinkle
query and session IDs into the URLs which make them different for different users. This tag can specify
which of the many possible addresses you actually want to be the "real" address of the page, and search
engines that support the tag will thus index only that form rather than diluting your search position with
multiple alternative URLs. (This won't help against spamdexers who purposely try to flood search
results with multiple copies of the same thing.) Originally, the "canonical" URL had to be in the same domain as
the current URL the tag was accessed under, but this requirement has since been loosened so that the tag can be used
to point to different domains, which may be handy if you have several domain names pointing at the same content (perhaps
an old address you used to advertise the site under, and a new one you're using now). It can also be used to try to get
search listings consistent about whether or not to include the "www." at the front of your address, even though the site
can be accessed with or without it (assuming you configured the server this way, which is a sensible thing to do).
<META name="robots" content="noindex,nofollow">
Another way of telling robots not to index things in your site, though I don't think it's quite
as widely suppored as the robots.txt standard. If "noindex" is in the content, it's telling
robots not to index the current page; if "nofollow" is there, it's telling them not to follow links
from it. "noindex,follow" would tell them not to index the current page but to follow links from it;
this is a useful value in pages marking obsolete parts of your site and linking people to relevant
parts that are still maintained. (Its "cousin", the rel="nofollow"
attribute for the
A
tag, is more frequently used, to prevent search engines from following particular links, and,
in particular, to keep such links from giving improved page rank to their destinations and thus supposedly
discouraging link-spamming. Some people think this
sucks.)
<META name="resource-type" content="document">
Ummm, I guess this says that this page is a document... instead of what? Actually,
I'm not really sure what this means... a hosting provider I used a long time ago suggested I include
this line for the benefit of its internal search engine, and I've carried it over on my pages ever since,
but it's probably useless. (I've since found
out that this tag was originated in site-index.pl, an indexing script that was popular around
1994-95, which also originated the keywords and description META tags. Those two caught
on, but the resource-type one didn't. Other than "document", the other possible value was
"service", to denote non-static pages such as search engines and guestbooks.)
<META name="distribution" content="global">
Another line I've had on my pages since their first hosting provider... it suggests that you want
your pages indexed globally and not just on a local search engine. Probably ignored, though.
(This also is from site-index.pl, and can be set to "local" or "global"; a third value
of "Internal Use" has been used by some people to denote pages that shouldn't be indexed at all, but
was never implemented in any known indexer; you're better off using the robots META tag for that.)
<META name="geo.position" content="26.367559;-80.12172">
<META name="geo.region" content="US-FL">
<META name="geo.placename" content="Boca Raton, FL">
Tags used by the GeoTags site to index pages geographically.
(geo.position also works in A2B).
The position tag contains the decimal latitude and longitude of your location.
There is a proposed Internet standard
for such geographical meta tags.
<META name="ICBM" content="26.367559, -80.12172">
<META name="DC.title" content="THE NAME OF YOUR SITE">
Tags used by the GeoURL site, which is similar to the
above GeoTags site. However, the GeoTags META tags above will also work. (Both geo.position and ICBM work
in A2B.) The DC.title tag is from the Dublin Core, a set of standardized META tags.
<LINK REL="StyleSheet" href="css/style.css" type="text/css">
LINK tags are used to indicate related resources to the page; in this case, you are
telling the browser where to find the stylesheet. The href attribute works the same
as that in the A tag, and can have any absolute or relative URL. Be sure your server
is configured to send the stylesheet with MIME type text/css; while MSIE is notorious
for second-guessing MIME types, other browsers can be pickier.
<LINK rev="Made" href="mailto:webmaster@webtips.dan.info">
<LINK rel="Up" href="./">
<LINK rel="Top" href="./">
<LINK rel="Next" href="logical.html">
<LINK rel="Prev" href="subdir.html">
<LINK rel="First" href="intro.html">
<LINK rel="Last" href="misc.html">
These have been suggested in the HTML specs for many years as a method to suggest related pages
and provide a contact e-mail address to reach the webmaster, but actual browser support was hard
to find -- the Lynx text browser supported them, and a few other obscure browsers, but none of the
common ones, until a version of Mozilla added support for LINK
tags, creating a navigation bar for them if you enable that feature. However, the later and more
popular Firefox chose not to carry over this feature; you need to download an addon to get it.
Links with a rel attribute are suggesting an outbound relation from the current page to the linked one, while
ones with a rev attribute suggest an inbound relation from the linked resource to the current
page, but this concept can be very confusing to understand -- for instance, the "Prev" link is rel
rather than rev even though its effect is to "go back", since it's still a "forward link" to the
"previous" page -- a link with rev="Prev" would be saying that this page is the previous
page of the linked one, and would hence be equivalent to rel="Next". Clear? Generally, only
rel links are used except for the traditional rev="Made" link to the webmaster e-mail
address.
<LINK rel="shortcut icon" href="favicon.ico" type="image/x-icon">
Tells the browser where to find an icon to use in conjunction with the page in bookmark lists,
the URL bar, etc. While MSIE automatically looks for "favicon.ico" in the root directory of your site
(which some webmasters find annoying if they have no desire to use an icon and don't like getting lots
of "404 Not Found" errors in their log), some other browsers, like Mozilla, are more considerate and only
look for an icon if it's explicitly linked. The link needn't be to a file named "favicon.ico"; you can
call it anything you want and put it anywhere. Mozilla even supports icons of different types (e.g.,
JPEG graphics), though MSIE only supports the Windows icon format. Mozilla allows the rel
attribute to be "icon" (or any phrase containing "icon" as a word), while MSIE looks only for
"shortcut icon". Technically, a link attribute with multiple words is saying that the link is of
multiple types, with each word being a different type identifier, but I think the Microsoft programmers
were just clueless enough to think that spaces could be used within type names, and that "shortcut
icon" is what they think the type is.
In addition to the rel and rev attributes, a LINK tag can have a
title attribute to give a description of the link, and a hreflang attribute to indicate,
with standard two-letter language codes, what language the destination page is in ("en" for English
and "es" for Spanish, for instance).
<META http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
This is a commonly-used (mostly because it's stuck in against the author's will by WYSIWYG
editors) META tag to set the character encoding; I discuss this on the "WYSIWYG" page
linked above, and also on my page on characters,
and prefer not to use this tag myself, as this information is better done by real server headers.
However, these days even the "standards gods" at W3C advocate using
it, so that when your pages are accessed without server headers (e.g., when saved to a hard disk) they still
have an indication of what character encoding is used.
In general, http-equiv attributes are used to supply the equivalent of server-supplied HTTP
headers in the form of HTML tags. Another commonly-used one is the "META Refresh" tag, which
I discuss elsewhere.
More Search Engine Notes
Not all search engines use META
tags (and in fact, the weighting
search engines give to these tags has decreased over the years due to heavy
misuse of the tags in attempts to artificially enhance a site's position), so it's still important
to have some text content in your page that's relevant to the topic of
your site, so that you get indexed with appropriate keywords. This can
be a real problem with highly-graphical sites that have little text.
Use of ALT text helps in this regard.
If you insist on using frames (blecch!), be sure to have a <NOFRAMES>
section with content for non-frame-using browsers; this not only keeps
the site from looking totally blank to browsers that don't do frames,
it will also get you indexed in search engines, most of which ignore
frames too. Be sure to put the TITLE
and META
tags in the main index
file along with your frameset definition. (You'd be surprised how many people put this stuff
in all their individual frame files, but leave it out of the main index
even though that may be the only file that ever gets scanned by search
engines that don't recognize frames!) And don't make the NOFRAMES
content
consist entirely of a sentence like "Your browser doesn't support frames!
Get a better browser, loser!" This is not only rude, but it might just
turn up in a search engine as the description of your site! And then there are the sites
that include a completely empty NOFRAMES
element, except for an empty
BODY
element within it; these invariably (at least as far as I've seen) set a
background color via an attribute in the BODY
tag, showing a peculiar preoccupation
on the part of the site designer over exactly what color the completely blank page is to be displayed
on noframes-enabled browsers, or more likely showing that some HTML-excreting program out there was
perversely developed to include this little bit.
Some people, unsatisfied with the search engine positions their sites get
through honest indexing of their contents, try to artificially enhance
their standing through manipulative techniques like excessive keyword
repeating in the META tags, extra keywords in an invisible color in the
main body, huge keyword lists in the ALT attribute of every image,
and a page full of keywords with a "META Refresh" that immediately
goes on to the real home page. These sorts of things are collectively
referred to as "spamdexing," due to their similarity to "spamming" of
e-mail and newsgroups in an attempt to push your message on possibly
unwilling recipients. None of these things are a good idea.
They not only reduce the quality of the search engines for all other users,
but they will probably boomerang against the people who use them as the
search engine programmers get smarter and make their search robots screen
out such manipulation. Already, many such techniques are detected by the
popular search engines, and they may end up not indexing your site at all if
they detect that you're trying to artificially enhance your position.
So don't try it!
TITLE
elements designed to manipulate search engine position might also
be less useful for the other purposes of that element, such as identifying a page in
bookmarks and the browser history. That seems to be what happened with Epinions,
which used to use the author-supplied title to each specific review as the title attribute for that review, but later
changed all of them to a more generic "Compare Prices and Read Reviews on [Product Name] at Epinions.com".
Even worse, the Ultimate Band List was at one point redesigned to make all of its pages have the title
"Free Music Download, MP3 Music, Music Chat, Music Video, Music CD, ARTISTdirect Network",
not giving any indication of what particular page it is, or even what specific site -- it named the parent
group of sites (ARTISTdirect) rather than the specific one (UBL) being accessed. This sort of thing most
likely was crafted to enhance their site's position under the sorts of searches their advertisers most cared about,
at the expense of having titles that aren't particularly informative in distinguishing the different reviews within]
a particular product. I would stay away from that sort of thing.
There's never any end to the procession of underhanded techniques some people are willing to do in order to
get an artificial burst of popularity for their site, probably one that doesn't come close to deserving it.
Some really annoying things I've run into lately include "spamming" of online guestbooks using automated
scripts posting the same generic praise, accompanied by a link to the author's site, to lots of different
guestbooks around the Web; and even "spamming" of Web referrer logs, where webmasters keep track of what sites
are linking to them. This latter technique is advertised by Easy
Web Promotion -- Sleazy Web Promotion is more like it. It consists of having a script "fake" accesses to various
sites with referrer URLs pointing at their client's site; then when the webmasters try the URLs to see who's out there
linking to them, they wind up in the sites being advertised. Even better for the advertiser, some "blogs" (short for
"web log", a trendy new type of site that consists of the site author's day-by-day commentary) have a public page showing
referrer URLs, so this sort of spamming might result in links showing up publicly.
Another thing that's sometimes detected and penalized by search engines is the tactic of serving a different
version of a site to search robots than to regular browsers, by checking the user agent string of visitors.
This can be detected by a robot visiting the site twice, once with a user agent string indicating a robot, and once
with one mimicing a popular browser, and comparing the resulting pages. So if you're tempted to use "browser sniffing"
to modify your page depending on what user agent requests it, here's a reason to avoid it, other than the accessibility
problems you might cause by trying to second-guess your users and their many possible browser configurations.
Incidentally, without the slightest bit of spamdexing or other sneaky and
underhanded tactics, and without contriving my content to the eccentricities
of particular search engines or paying any submission service big bucks,
I've managed to do very good for myself in the area of search engine indexing.
Many of my personal pages are well-placed in search engines under a number of
relevant keywords. Two particularly good examples are my
fan site about Mexican entertainer Tatiana, which is presently
#2 in Google
under the query "tatiana" (after a site about an Australian athlete of that name), and my fan
site about the singer Tiffany which is #2 in
Google under the query "tiffany" (immediately after the site of the
famous jeweler Tiffany & Co., which I'm sure spent a heck of a lot more
developing and promoting their site than I did mine). (Note: Those were the
positions these pages held at the time I wrote this, but they ebb and flow with time,
sometimes reaching the #1 spot and sometimes dropping out of the top ten altogether;
there's no telling where they might be when you check now!) These are astounding
achievements, if I can toot my own horn a little, given the hundreds of
thousands of pages that include these names. How did I achieve this
placement without spending a cent on it? Actually, darned if I know for sure,
but I think having relevant content and structuring it logically
without lots of gee-whiz graphical gimmickry getting in the way of the
plain text content were what put my sites over the top.
Reference Info:
Search Engine Placement Hints & Tips:
Tools
Commentary
- Meta-Crap -- a critique of the concept
of "metadata" supplied by Web authors to aid in the indexing of their pages. The author says that
this is mostly useless because the Web is full of lying and cheating "marketing types" on one hand,
and lazy and stupid people incapable of inserting correct metadata if their life depends on it on
the other.
- The Google AdWords Happening -- A "Web Artist" makes
creative use of Google's advertising feature until they kick him out.
- Search engine optimization” is the 21st-century version of phrenology
[<== Previous] | [Up] | [Next ==>]
This page was first created 13 Jul 1997, and was last modified 27 May 2018.
Copyright © 1997-2018 by Daniel R. Tobias. All rights reserved.