Dan's Web Tips | Change

Dan's Web Tips:

Dealing with Changes

[<== Previous] | [Up] | [Next ==>]

See also Swedish translation (done by others in their own site)

TIP: Put some careful thought into any site redesign in order to minimize or mitigate the problems it will cause to anyone who has linked or bookmarked parts of your present site.

Elsewhere I stated that, unless you have a really good reason, you shouldn't change the file or directory names in your Web site, because that would break any links to its pages. Sometimes, however, change is necessary, and here are some pointers on how to deal with it when it happens.

If you rename pages and directories in your site, or delete a section, there will still be people out there trying to go to the now-invalid URLs. These include people following links that were made in other sites to these addresses, or people who have bookmarked these pages. If you don't want these people to get "404 Not Found" error messages, you should do something to make sure that the old URLs still give the user something meaningful. One way is to put pages temporarily at the old addresses telling people that the site they're looking for has moved, and giving a link to the new address. (However, this is not necessarily the best approach... see below for a more sophisticated technique to handle changed URLs.) The temporary pages can say to change any links and bookmarks to the old address. To stop these temporary pages from getting indexed in search engines, use a META tag (in the HEAD section) <META NAME="robots" CONTENT="noindex">. Some search engines ignore these META tags, so also (if you have access to the root directory of the domain the site is in) put a robots.txt file including the names of all of these temporary files. (See Robots Exclusion Standard documentation.)

If there's a new page in your site serving the same specific function as one of the old deleted pages, you should make the link at the old page go directly to it, intead of just going to the top home page. If the old page has been removed with no direct replacement, say so and link to the home page or a sub-site with contents related in some way to the deleted page, if any.

Now that you've got all these temporary pages cluttering up your directory structure, you'd probably love to be able to get rid of them eventually to clean up your structure. In order to do this, you'll have to try to get people to stop linking to the old addresses. This can be extremely difficult; there are still people linking to URLs in my sites that haven't been current for years. But you can try to track down such links using some search engines.

Go to the Google or AltaVista search engine, and type as your search string "link:http://www.yoursite.com/oldsite", using the URL of the page or directory you're checking for links to after the "link:". If you're checking for all links to a given directory, leave out the closing slash (in contrast to my urging elsewhere always to use closing slashes), so you catch all links to the given directory whether or not the linker used the slash. (The search engine will find all links that begin with the given text you input.)

This query will show you the sites that link to yours (at least the ones that AltaVista knows about), and you can go to each of them in turn and inform the webmaster that the URL has changed. Sometimes you may find it's hard to find out how to contact a site's webmaster; they didn't take my advice about providing contact e-mail links, and you may have to do lots of poking around their site until you find an e-mail address or feedback form. Maybe there isn't one at all; try e-mailing webmaster at the domain of the site, or if the URL includes a tilde (~), try the word following the tilde as the userid (www.isp.com/~mary/ might have an e-mail address mary@isp.com).

Even if you are able to e-mail them, there's no guarantee they'll ever get around to changing your link; it can be like pulling teeth to get some Web site maintainers to update their sites. If the page in question has a line "Last updated December 2, 1995", that's a bad sign.

If you get periodic reports on the traffic to pages of your site, you can keep an eye on how many people are going to the obsolete pages, and if few or none are, you can delete them even if there is still a stray obscure link to it somewhere. You'll probably never get rid of every link and bookmark to old pages, but eventually you've got to clean out the old junk to make your whole site more manageable and logical.

By the same token, by checking the traffic counts to graphics in your site (if your access report includes them) you can find graphics that are no longer used (such as images and buttons in pages that were removed or redesigned) and get rid of them, saving disk space. If you don't clean this stuff out, you'll be wasting lots of space (and might be charged extra for it by your provider), but if you delete graphics too soon you might cause broken images in pages that are still using a graphic that you incorrectly thought was no longer used. So check your usage logs and carefully prune your obsolete files. Keep a backup of the old files somewhere in case you need them again.

And don't forget to look through your own site for links to your obsolete pages. It would be pretty embarrassing to be e-mailing lots of other people getting them to change links to a page that you're incorrectly linking to yourself! (There are link checker programs, like Xenu's Link Sleuth, which you can use to look through your site for bad links.)

If your access reports also show the "404 Not Found" hits to your site, you may also notice that you're still getting hits to pages which you've removed, and you may wish to put temporary pages at those addresses notifying people about what happened to the old pages. You may also notice you are getting hits to pages that never existed; this could indicate that there is a bad link either on another page of your site or on somebody else's site.

Sidebar: There are a few filenames which you might get "404 Not Found" hits for by no fault of your own, since there are automated programs out there which try to load them whether they exist or not. One is favicon.ico, which MSIE 5.0 (and up) tries to find in every site a user bookmarks (it's the location for a "custom favorites icon" for the site), though ironically microsoft.com lacks one of these. Another is robots.txt, which various robot programs (including search engine indexers) look for at the root of a domain to find out if any parts of the site have been designated as excluded in this file. (See my titles page for more info on this.)

Hopefully, the difficulty of changing links to a changed page will convince you of the importance of preserving file and directory names if at all possible to prevent these difficulties. Even a massive overhaul of the graphical appearance and navigational controls of a site can be done without changing the URLs of the site's pages. The problem, in the case of corporate or organizational sites, might be convincing the graphical-design or marketing types assigned the task of a site redesign of the need for paying attention to the pre-existing structure, since their mindset tends to be in favor of building a new site from the ground up without paying such attention.

Redirecting with HTACCESS

The previous section discussed all the problems involved in dealing with obsolete URLs in your site that are still getting linked to, and the mess of "temporary" redirect pages you may never be able to get rid of. Fortunately, there's another, cleaner approach available to most webmasters; you can make the server automatically redirect URLs. The method I describe here works on the Apache server, which is the most popular Web server. There may be similar methods for other server software, but I'm not as familiar with them. As soon as anybody mentions server redirects, many people object, "But my provider doesn't let me change the server configuration!" While it's true that you've got a more intense degree of customization ability if you have access to the main configuration file of the server, but probably don't have such access if you don't have the machine to yourself, there are some things you can do using .htaccess files in individual directories, unless the server administrator is particularly paranoid and has blocked users from doing it.

Create a plain text file named .htaccess (yes, that's got a file extension with no name before it; there may be some old operating systems that don't like this), and put lines like this in it:

RedirectPermanent /politics/cyber/domain.html http://domains.dan.info/

This line, in the .htaccess file for dan.info, tells the server that the URL "/politics/cyber/domain.html" (don't include the "http://yourdomain" part, as it's understood) should be redirected to "http://domains.dan.info/". I use lines like this to redirect various obsolete URLs to their new locations. The server sends the redirection address to the browser with a "301 Moved Permanently" header, so search engines pick up on it and change their links to the new address the next time they check the URL.

The advantages of using .htaccess instead of a redirection page are that you can keep the redirect indefinitely without cluttering up your directory structure with "temporary" stuff, and that users go seamlessly to your new pages without going through an extra page. A disadvantage is that users seldom notice they've been redirected and change links and bookmarks accordingly, so your obsolete URLs keep getting used indefinitely, although search engines are usually smart enough to notice the redirection and change their listing accordingly. Also, if you have a large number of .htaccess redirections, it can get confusing to keep track of, and you might eventually create a different Web page with the same filename as a former page and wonder why that page won't come up when you type its URL in your browser (it's because you redirected the URL somewhere else, stupid!). Thus, even with this redirect technique available to help you keep URLs working after a site rearrangement, it's still a good idea to think out your naming structure well in the first place in order to minimize the need for change.

Changing Providers

One special sort of change is a change from one provider to another. If your site has its own domain name, this should be easy; just copy the exact same directory structure, with all of its files, to the new provider, and all the URLs of your site will stay the same. There could be some special cases you'll have to deal with like CGI scripts that don't work the same on different servers, or a different default index file (maybe your new server requires default.html instead of index.html and you have to rename such files; in this case, if you followed my advice and linked to the directory name instead of the index file name, your site will still work identically).

If you don't have your own domain name, things are more difficult, as you've got to get everybody who links to you to change their links to your new address. (If you used relative URLs within your site, your own links should continue to function.) Use the same techniques described above to find such links and get the site maintainers to change them. See if you can get your former provider to leave up a page directing people to your new site, or set up an automatic server redirection to send all accesses of your old URL to your new one. (In this latter case, however, when the users are automatically redirected they might never realize they are supposed to change their links and bookmarks.)

More Site Redesign Notes

When you're redesigning an already-existing Web site, I suggest you do the editing of each page by beginning with the version of that page that already exists in the old site design (if that page exists in the old design). Load that into your editor as the starting point, remove anything from the old design that you don't want to keep, and add the new elements. The alternative style, preferred by many designers, is to start with a "clean slate" and build the new pages from the ground up, but that often results in content and structure getting unintentionally lost. Even if 90% of the old HTML code will be removed or replaced in the redesign, you don't want to lose the other 10%, as may happen if you start anew with a thought in the back of your head that "you'll add the old stuff back later, when finishing up the redesign"; this "later" stage has a way of never happening as you get caught up in the crush of a deadline.

If nothing else, you probably want to keep the page titles and META tags if they're well-developed on the existing site. I've seen plenty of sites that had good titles and descriptions in the HEAD sections of all of their pages, then got redesigned and wound up with very poor titles and no META tags because the designer didn't think of saving those parts from the original structure. Don't let that happen to you!

As a final note, some studies are showing that users actually prefer sites to evolve gradually rather than be massively redesigned every few months. The conventional wisdom on the Web seemed to be that people would get bored with an "old design" and it was necessary to keep throwing everything out and starting over with "new technology", but, at least in the case of sites that users go to for actual useful information (different rules apply to fun & games sites) the users prefer the navigational structure to stay put; it's very jarring to have it keep changing in unpredictable ways every time you revisit the site.

Users who have come to count on a site having some particular information in a particular location will be annoyed if some of it is removed in a big revamping. So you should try not to remove still-useful content just because the pages it's on don't fit your current snazzy graphical layout and you haven't yet gotten the chance to rework them. Leave the old pages up until their new versions are ready, even if that means that some parts of your site will have a different visual appearance from others.



[<== Previous] | [Up] | [Next ==>]


This page was first created 23 Nov 1997, and was last modified 02 Jun 2013.
Copyright © 1997-2018 by Daniel R. Tobias. All rights reserved.