Dan's Web Tips:Directories and Default Index Files[<== Previous] | [Up] | [Next ==>] See also Belarusian, Dutch, Dutch (another), French, German, Indonesian, Norwegian, Portuguese, Russian, and Ukranian translations (done by others in their own sites) Structuring Your SiteTIP: Make intelligent use of subdirectories to logically structure your site, make maintenance easier, and give parts of your site memorable URLs. A Web site needn't be all in one directory. You can use subdirectories (what graphical-environment types tend to call "folders" these days, but us old-time computerists prefer the more technical term) within your site. That's a good way to separate your content in a logical, easily-maintainable way. If you just dump everything in one directory, it will get unwieldy very fast. Subdirectories can be used for the following purposes:
TIP: Once you decide on your directory structure and file names, don't change them unless you have a really good reason! Decide on the directory structure of your site early, when you first start working on it; it's much easier
to develop and maintain a site starting with a sensible structure than to try to change the structure of a
site after it's already evolved haphazardly. And if you change the file and directory names after the
site's already been up for a while, you'll break any bookmarks, links, and search-engine entries that
have been made to parts of your site other than the main home page. So come up with sensible names
from the start, and try to avoid changing them thereafter unless absolutely necessary.
Even a "trivial" change like changing all your Note: Both .html and .htm are common extensions for HTML documents. .html is generally regarded as the more "proper" extension, standing for the full document format name "HyperText Markup Language", but .htm came into use early in the history of the Web for the sake of developers using operating systems like MS-DOS or Windows 3.1 that were limited to three-letter extensions. Nowadays, with few people using such operating systems on the Internet, and with modern FTP programs supporting an option to add the extra letter to the end of filenames on upload, there are few good reasons to use the shorter extension, and some people think URLs look "cheesy" with the short extension. Some authoring tools, especially those created by Microsoft, still default to this extension, so lots of sites use it even when the developers' system lacks the limitation that led to it. In fact, one of the common superstitions in file naming is that names should be limited to 8 letters plus a 3 letter extension; this is no longer true for the vast majority of systems in current use, and even systems that are still limited in this manner have no problem browsing Web sites with URL names not abiding by this limitation.
Here's as good a place as any to remind you that, on UNIX servers (which is
what a large portion of Web sites use), filenames are case-sensitive.
A name in uppercase like Default Index FilesTIP: Use the default index file sensibly to simplify the URL of your site. Do the same for subdirectories, to simplify the URLs of your sub-sites. Almost all Web servers have a default file, usually index.html, but sometimes default.html, welcome.html, or default.htm, that will be loaded automatically when a directory name is used as the URL. You can take advantage of this to make your URL shorter and more elegant-looking. Many users don't know this and use URLs like:
If Mary named her main page index.html, she'd be able to give her URL as:
Some people get this half right, and give their URL as:
They used the right filename, but didn't realize that they didn't have to actually type that name. The directory name alone suffices, is easier to type, and looks nicer. (See the notes below on linking back to your home page.) Put a default index file in every directory, even directories that don't actually need one (e.g., your graphics directory). If you don't, a user who enters the directory name as a URL will get a raw directory listing, and you may have files you'd prefer random users not see (like pages that are still under construction). A "dummy" index file prevents such snooping. Final Slash in PathnamesTIP: Don't leave out the closing slash of directory-name URLs! Always include the final slash (/) at the end of a URL that ends in a directory name. If you use:
(without the slash), the browser will first try to retrieve a file rather than a directory, and only when the server realizes that ~msmith is a directory name will it tell the browser to add the slash and try again. This takes one extra communication round between browser and server, slowing down the retrieval. Also, the browser doesn't know in advance that the address without the slash goes to the same page as the one with it, so it won't show the link in the "visited-link" color if the user already went there, and won't take advantage of a previously-cached copy of the page that may exist. Even worse, there are a few old browsers (some versions of Mosaic, for instance) that don't handle this sort of redirection correctly. They may pull up the correct Web page without the slash, but they then fail to handle relative links from the page correctly. A link to stuff.html from the URL http://www.someplace.net/~msmith/ should end up going to http://www.someplace.net/~msmith/stuff.html, but if the slash is omitted and the browser software isn't smart enough to add it once it's redirected by the server, it will think it's really one directory level higher in the tree, and parse the relative URL as http://www.someplace.net/stuff.html. This will then cause a 404 Not Found error, and the user won't know why.
If you're using the One very prominent site whose creators failed to heed my advice on trailing slashes is the official government posting of the Ken Starr Report on President Clinton's relations with intern Monica Lewinsky. Due to news-media hype, this report (posted to several official government sites on September 11, 1998, and shortly thereafter to various private-sector sites as well) got some of the heaviest Internet traffic ever, causing the servers to be so overloaded in the first few hours the report was up that most people couldn't connect. Unfortunately, the government added to this problem by using versions of the URLs of these sites lacking the trailing slash everywhere they publicized or linked to the sites, thus ensuring that each access of the site would have one more server transaction than would be necessary if the slash had been used. With the high level of traffic the site had at the time, this probably added long delays for many people's accesses. Another reason to use closing slashes...When URLs get published in print media such as newspapers, magazines, and newsletters, they often get put in sentences with periods at the end. Some readers (especially those who are novices to the Web and unaware of what characters are usually in URLs in what order) will think the period is part of the URL and type it into their browsers. If the URL ends in a slash, adding a period onto it will be treated by most servers as a reference to the "single-dot" symbolic-link directory, which points at the current directory. This will bring up the same page as the user would have received without the extra period (though with a slightly inelegant URL). Without the closing slash, adding a period causes it to be appended to the requested filename, usually producing a 404 Not Found error. A final note on slashes...
Having said all this, I'd better remind you not to "overcorrect" by
adding slashes to URLs that aren't supposed to have them. If the URL
references a file rather than a directory, there
shouldn't be a slash at the end. So don't type
" Linking Back HomeTIP: The home page is (usually) named index.html, but don't link to that filename! When linking back to your main home page from other pages in your site, use
Note: As a general rule, you should be consistent and link to each of your pages with
one single "canonical" URL per page, so that the "visited" link color and browser caches work
properly. My notes on linking to the default index and always using closing slashes in directory
links are two instances of this; other cases include sites that are accessible via multiple
domain or host names: Also, if you use the same graphic in multiple places, be sure you use the same copy of it, at the same URL, so that browsers can use the previously cached copy of it instead of reloading it each time. You can use index files in each subdirectory if you have multiple directories, so Mary can do sub-sites on her hobbies of stamp collecting and cats as
In such a structure, the main menus of the "stamps" and "cats" subsites will be the index.html files of these respective directories, and there can be an unlimited number of other files in each of the directories. But don't confuse the structure by putting the main menu elsewhere; I've seen sites that use "stamps.html" in the parent directory as the main menu of the "stamps" subsite, with the remainder of the files in the subdirectory "stamps/". This illogical move separates the subsite menu from its related files, so I don't know what the developer was thinking when he or she did it. If you put the main index of the subsite in the proper directory, but don't name it as the default index, you end up with "redundant" URLs like:
I like to call such URLs "Foo-slash-foo" URLs, since they're of
the form
Probably, the developer just wasn't thinking clearly when planning the file and directory names in such a site. You can do better! NOTE: I thought when I came up with the above "foocorp" example that this was a contrived, exaggerated URL used for effect, and that I wasn't likely to run into one that bad in the real world... but I found that the mlb.com address of Major League Baseball's site redirects to this atrocity:
When linking to the parent directory, use One thing to note: If you do the links the way I recommend here, they won't work when you browse through your Web pages on your hard disk, since your hard disk does not have any "default" filename as a directory index. You will see the raw directory when you follow such a link. But are you developing your Web site to look good on your hard disk or on the destination Web server? Unless you're creating a site to distribute on floppy or CD-ROM to run in non-networked environments, the aim of your development is to make the site work well on the server, so you should put up with a little awkwardness when you're testing it on your own machine before uploading it. When you follow a link and a raw directory comes up, that's not an error; just click on "index.html" and keep going, with the awareness that this "problem" will go away once you put the site up on the server where it belongs. If you do need a version of the site that runs correctly on a hard or floppy disk, there are some programs available to export a Web site to a disk in runnable mode, which automatically change all links to valid filenames rather than directory names. Teleport Pro and WebSnake are two such programs, available through TUCOWS.
Links[<== Previous] | [Up] | [Next ==>]
This page was first created 13 Jul 1997, and was last modified 28 Dec 2008.
|