Honey, what you see is what you get, and if you've found the real thing it's the best thing yet.
--The Dramatics
TIP: If you must use a WYSIWYG editor, at least try to clean up after it. |
If you insist on using one of those "WYSIWYG" ("What You See Is What You Get"... though what you see might well not be what users of other systems or browsers might get) editors that shield you from true HTML coding, you should know about some of the junk they tend to insert into your documents. This stuff is very familiar to anyone who's tried to clean up the mess left behind by running a web site through one of those site-mangling editors.
In the main menu of this site, I refer to WYSIWYG editors as "not very well housebroken." This implicit comparison of these programs to dogs is actually not particularly fair to the canine population; in fact, dogs are quite fussy about where they poop, while WYSIWYG editors don't appear to show this degree of discretion.
Here are a few things to look for during such a cleanup job (using a regular text editor, since the WYSIWYG editors can block the user from seeing, and fixing, the HTML code itself).
Watch out for inserted tags like:
<META HTTP-EQUIV="Content-type"
CONTENT="text/html; charset=iso-8859-1">
Some of the WYSIWYG editors stick this at the start of each document. What the above line means is "The character set for this document is ISO-8859-1." Actually, this is normally the default character set anyway, and used to be explicitly designated as such by HTML standards. However, the standards have changed recently (perhaps out of a "politically-correct" desire not to favor one language's characters over another), so there actually is some use to this line, even though some of the W3 Consortium's own documents still claim that ISO-8859-1 is the standard default. Unfortunately, the addition of a "META charset" line causes some browsers to clear the screen after loading the document and re-draw it (presumably in the newly-set character set, same as the old one), producing an annoying flicker effect. (The newest Netscape version apparently finally fixes this.)
If you remove this line, you'll eliminate the flicker effect, but to be standards-compliant you still should specify a character set. The proper place to do it is in the server-generated headers. Depending on your system administrator's settings, you may or may not have the ability to control this; ask your administrator for more details. On some systems, you can put a .htaccess file in each directory to control the content type settings within that directory and its subdirectories. The specific details depend on what server software is in use and what platform it's running on. (Note that even if you don't specify a character set, most browsers will do OK with a document in the standard ISO-8859-1 character set, anyway. If you don't use characters above #127, the 7-bit range, an even wider range of browsers won't have problems with it, since that part of the character set is the even older standard US-ASCII and is generally common to pretty much all character sets now in use (barring EBCDIC and Commodore PET-ASCII).
You may also want to remove the META tags inserted by almost all editors that advertise which editor was used. Do you really want to announce to the world (or, at least, the HTML buffs who view the code of your pages) that you use one of those goofy editors? (Actually, if you're planning on giving your web pages to another web developer eventually to fix up the mess your editor caused, leaving in this line may be desirable so the other developer knows which particular quirks to look out for!)
But do leave in the META tags giving descriptions and keywords for search engines; see my description elsewhere. Despite the importance of those tags, some editors may make it difficult for you to insert them, while they go and insert all sorts of unnecessary or even destructive tags themselves.
Some editors will stick in all sorts of bizarre things that have
no use, and may even mess up your layouts in some browsers. For
instance, Netscape Navigator Gold likes to pepper your document with
. That's a "non-breaking space," a
special kind of space character that doesn't word-wrap (and, though
this isn't part of the official standard, also doesn't generally
collapse if more than one is used in a row, so it can be used to add
extra horizontal whitespace). Netscape's editor seems to think very
often that the user wants to have an extra non-breaking,
non-collapsing space, for instance at the end of each text line.
Maybe it first converts the carriage returns to spaces in its initial
parsing of the document, then decides that these need to be "hard
spaces" and so inserts
. That's the
best I can figure out the possible "thought process" it goes through,
but the result is to stick those silly, unnecessary characters all
over the place. They make the document bigger in disk space and
transfer time than it needs to be, make it harder to understand and
maintain in a regular text editor with all this extra junk in it, and
on occasion, these extra spaces will even find their way to a spot
that screws up the layout, like at the beginning of a table row
making it fail to line up with the other rows.
You can go through your document and get rid of those funny space characters, and then deal with the possible ruining of your layout if some of them actually were necessary to make things line up.
Other wasteful things editors will insert include formatting tags applied to empty spaces, like:
<CENTER><DIV ALIGN=CENTER><P
ALIGN=CENTER><FONT FACE="Arial,Helvetica"><FONT
SIZE="+1"><FONT
COLOR="red"><B> </B></FONT></FONT></FONT></P></DIV></CENTER>
This whole big mess of code serves only to insert a blank
paragraph for vertical spacing, accomplishable via
<P></P>
. All the other tags are
useless. They're added because the editors are so dumb that if you
have stuff like font settings enabled they insist on adding them even
to blank spaces. The editors are also pretty dumb about failing to
collapse redundant tags. Even if the various font changes above were
actually needed to make sure that blank space was rendered
correctly, you could have done it with:
<P ALIGN=CENTER><FONT
FACE="Arial,Helvetica" SIZE="+1"
COLOR="red"><B> </B></FONT></P>
Note how the three different centering tags were reduced to an attribute of the single paragraph tag, and the three different font settings were made into attributes of one FONT tag. This produces a shorter, cleaner, more logical piece of code, showing the advantages of coding by hand instead of using some silly editor!
WYSIWYG editors will also stick some weirdo oddball attributes
into tags, like NATURALSIZEFLAG
in the
IMG
tag. This is not part of any HTML spec that I
am aware of, is not used by any browser that I know of, and I haven't
the slightest idea what it means, but some of those editors like to
stick it in.
Watch the syntax of the HTML tags generated by your editor. Many of them do improper things like failing to put quotes around attribute values that require them (any that include characters other than letters, numbers, and hyphens; for instance, "100%" and "+1"), failing to properly nest blocks (you should use closing tags in the reverse order of the opening tags), etc.
Sometimes some of the editors will insert code that works in one
browser but doesn't have the same effect on another. Microsoft
FrontPage has the tendency to produce code that's fine for MS
Internet Explorer but not so nice in Netscape. One example is how
FrontPage will change all centering tags to <P
ALIGN=CENTER>
even if you originally inserted them as
<CENTER>
. While, in most cases, using an
ALIGN attribute is more logical and elegant than the awkward CENTER
tag, there are places where the only reliable way to center something
is to use <CENTER>
, like for centering
tables and form submission buttons. These aren't considered part of a
paragraph by Netscape (since a paragraph, by the HTML specs, can
contain only character-level content such as text or inline images),
so they won't be centered by <P
ALIGN=CENTER>
. Internet Explorer seems to be rather
more liberal in what it'll consider to be part of a paragraph, and
FrontPage follows this attitude by using only the paragraph centering
attribute even when editing a page that was created using a different
tag. The result is a page with some elements not properly centered in
Netscape, and every time you try to fix it, FrontPage will
"helpfully" keep changing it back.
If you don't watch out, your editor might insert links and graphics in the form of hardcoded URLs of files on your hard disk instead of the proper relative URLs that work on the actual web server. Look for such code as:
<IMG SRC="file://C|/www/sitefiles/mypic.GIF"
WIDTH=200 HEIGHT=300 ALT="Picture of me">
All such URLs starting with "file:" are invalid for use on public web sites, since they point at your local hard drive and nobody else has access to that but you. Editors will tend to stick in such URLs quite often (sometimes using operating-system-specific characters like backslashes instead of the proper forward slashes, making the syntax not even valid).
This is a hard error to track down as long as you're using a WYSIWYG editor to hide your actual HTML code from you, and you're only viewing your site from your own system, on which the hard-drive references work fine. You'll need to view your pages in a normal text editor to see which URLs are screwed up and fix them.
Elsewhere I give reasons why
you should link back to your main home page with <A
HREF="./">
instead of explicitly naming your index page
(e.g., index.html
), and similarly, to link to
subdirectory indices with the directory name instead of the explicit
index filename. No WYSIWYG editor that I've encountered yet does
links in this manner; instead, they all seem to force these links to
be in the form of explicit filenames, and may even change your
properly-formed directory-name links to a hardcoded "index" reference
without you knowing it.
By default, many WYSIWYG editors fill in sections of code that are useful to users in "non-standard" browsing situations with un-helpful content. I'm referring to things like the ALT text in images, and the NOFRAMES section of framed sites. These are useful to users of older browsers, text-only browsers, text-readers for blind users, and, don't forget, the search-engine indexing robots. The use of meaningful content in these sections is important to the overall accessibility of your site, but too many users let these things be filled in by their editors in a "default" manner. This results in the NOFRAMES section saying nothing but "This site requires frames and your browser does not support them", and the ALT text of images containing something pointless like the filename and byte size of the image. See my sections on frames and images to find out what you should be putting there.
Since WYSIWYG editors let you add stuff to your site with a point-and-click interface, they encourage you to pay little attention to your filenames and directory structures. If you don't watch out, you'll wind up with a tangled mess with graphics scattered over lots of different directories, filenames with random mixtures of uppercase and lowercase letters, and other chaotic things you'd probably not do if you did it by hand. (See my comments on directory structures.)
All WYSIWYG editors tend to use lots of tables for page layout. Some, most notably NetObjects Fusion, go further and generate tables from hell, with enormous numbers of rows and columns with specific, finely-set pixel widths and heights in order to (attempt to) achieve a highly precise layout. The result, in addition to a site that is probably highly sensitive to the user's browser version and screen width and unlikely to degrade well on any "nonstandard" platform, is incredibly convoluted HTML code that's almost impossible to maintain in any other manner than in the editor that created it in the first place. So if you expect to ever switch editors or fine-tune your pages by hand, you had better not start designing them in an editor that generates this sort of code.
Even in the less-Byzantine tables generated by a more "run-of-the-PageMill" editor, watch out for bad style such as hardcoded table pixel widths that won't work well when users have screen widths different from your own.
Since WYSIWYG editors create HTML code based on the visual
appearance of the document rather than its logical structure (even
though HTML is a markup language intended to represent logical
structure), it's likely to fail to "get the point" of your document's
structure and mark it up logically. You may well end up with your
main headers being marked up with font sizes instead of
H1
and other header tags, while other parts of
your document may have non-header content marked up with
H
tags because the editor decided that was the
best way to do a font effect. Your paragraphs may be broken with
<BR><BR>
instead of
<P>
. The result may well look the same as a
logically-marked-up document (at least in the browser versions the
editor is coding for), but text-mode browsers, blind speech readers,
search engine robots, and other programs that try to interpret your
logical structure in a manner other than the graphical rendering
you're used to, are likely to get lost. (For instance, the content of
your <H1>
header is often weighted heavily
in search-engine indexing, while the same header marked up merely
with a font size will probably be treated as much-less-important
plain text.)
So, do you really want to use those editors? They may seem to be saving you some time when you're first starting to set up a site, but in the long run, you'll be spending a lot more time fixing up the problems they cause.
Here are some of the specific things I know about that certain individual editors do:
ALT=""
attribute so they're suppressed in
text-mode browsers like Lynx, leading to
"[INLINE][INLINE][INLINE]
"
all over the place.<H1>
as
is logical, but a FONT SIZE
tag instead. When
you choose a block quote, it gets rendered not by the proper
BLOCKQUOTE
element, but by a misused
UL
tag. The creators of NOF go way beyond the
call of duty in avoiding any semblance of logicality in their tag
usage. (See my discussion of physical and
logical tags.)SIZE="+1"
.IMG SRC
references accordingly.
Thus, if the site had the graphics neatly placed in a separate
directory to begin with, once this editor got hold of it they'll
end up strewn all over the place in all the directories you ever
edited HTML files in. You can wind up with lots of copies of the
same graphic wasting space on your server. Even CGI-generated
graphics like counters tend to get changed into static graphics by
this editor.CENTER
tags to P
ALIGN=CENTER
whether you like it or not, and even in
circumstances where the latter won't work in some browsers.</TABLE>
tag
of a table before the opening
<TABLE>
tag, so that it was closing a
table that hadn't opened yet, and then never actually closing the
real table once it started. Netscape can't cope with that sort of
bogus code, though Microsoft Internet Explorer is somewhat more
tolerant. The author swears that the code is exactly as
Front Page generated it and he didn't mess around with it using
any other program, though I have trouble believing that even
Microsoft would generate HTML this bad.<HEAD>
section the
following line:<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<A HREF>
tag surrounding it is the
logical way to do it, but Publisher would rather create an
imagemap, a less-elegant method with reduced browser
compatibility.