Cleaning Up After WYSIWYG Editors
Honey, what you see is what you get, and if you've found
the real thing it's the best thing yet.
--The Dramatics
TIP: If you must use a WYSIWYG editor, at least try to
clean up after it. |
If you insist on using one of those "WYSIWYG" ("What You See Is What You
Get"... though what you see might well not be what users
of other systems or browsers might get) editors that shield you from true
HTML coding, you should know about some of the junk they tend to insert
into your documents. This stuff is very familiar to anyone who's tried
to clean up the mess left behind by running a web site through one of those
site-mangling editors.
In the main menu of this site, I refer to WYSIWYG editors as "not very
well housebroken." This implicit comparison of these programs to dogs is
actually not particularly fair to the canine population; in fact, dogs
are quite fussy about where they poop, while WYSIWYG editors don't appear
to show this degree of discretion.
Here are a few things to look for during such a cleanup job (using a
regular text editor, since the WYSIWYG editors can block the user from
seeing, and fixing, the HTML code itself).
Unnecessary Meta Tags
Watch out for inserted tags like:
<META HTTP-EQUIV="Content-type" CONTENT="text/html; charset=iso-8859-1">
Some of the WYSIWYG editors stick this at the start of each document.
What the above line means is "The character set for this document is ISO-8859-1."
But that's already the default character set by
HTML standards. Thus, there's no need for such a line as this unless you're
using a different character set, like for foreign-language documents. When
the above line is included, some browsers clear the screen after loading
the document and re-draw it (presumably in the newly-set character set,
same as the old one), producing an annoying flicker effect. Get rid of
this line, and your web pages become less of a user annoyance. (If you
really do need to set a different character set, it would be much better
to do it by configuring the server software to send the proper "charset"
identifier in the document MIME type, but you probably only have access
to that if you're the system administrator.)
You may also want to remove the META tags inserted by almost all editors
that advertise which editor was used. Do you really want to announce to
the world (or, at least, the HTML buffs who view the code of your pages)
that you use one of those goofy editors? (Actually, if you're planning
on giving your web pages to another web developer eventually to fix up
the mess your editor caused, leaving in this line may be desirable so the
other developer knows which particular quirks
to look out for!)
But do leave in the META tags giving descriptions and keywords for search
engines; see my description elsewhere. Despite
the importance of those tags, some editors may make it difficult for you
to insert them, while they go and insert all sorts of unnecessary or even
destructive tags themselves.
Useless Inserted Characters
Some editors will stick in all sorts of bizarre things that have no use,
and may even mess up your layouts in some browsers. For instance, Netscape
Navigator Gold likes to pepper your document with .
That's a "non-breaking space," a special kind of space character that doesn't
word-wrap (and, though this isn't part of the official standard, also doesn't
generally collapse if more than one is used in a row, so it can be used
to add extra horizontal whitespace). Netscape's editor seems to think very
often that the user wants to have an extra non-breaking, non-collapsing
space, for instance at the end of each text line. Maybe it first converts
the carriage returns to spaces in its initial parsing of the document,
then decides that these need to be "hard spaces" and so inserts .
That's the best I can figure out the possible "thought process" it goes
through, but the result is to stick those silly, unnecessary characters
all over the place. They make the document bigger in disk space and transfer
time than it needs to be, make it harder to understand and maintain in
a regular text editor with all this extra junk in it, and on occasion,
these extra spaces will even find their way to a spot that screws up the
layout, like at the beginning of a table row making it fail to line up
with the other rows.
You can go through your document and get rid of those funny space characters,
and then deal with the possible ruining of your layout if some of
them actually were necessary to make things line up.
Useless Inserted Tags
Other wasteful things editors will insert include formatting tags applied
to empty spaces, like:
<CENTER><DIV ALIGN=CENTER><P ALIGN=CENTER><FONT
FACE="Arial,Helvetica"><FONT SIZE="+1"><FONT COLOR="red"><B> </B></FONT></FONT></FONT></P></DIV></CENTER>
This whole big mess of code serves only to insert a blank paragraph
for vertical spacing, accomplishable via <P></P>.
All the other tags are useless. They're added because the editors are so
dumb that if you have stuff like font settings enabled they insist on adding
them even to blank spaces. The editors are also pretty dumb about failing
to collapse redundant tags. Even if the various font changes above were
actually needed to make sure that blank space was rendered correctly,
you could have done it with:
<P ALIGN=CENTER><FONT FACE="Arial,Helvetica" SIZE="+1"
COLOR="red"><B> </B></FONT></P>
Note how the three different centering tags were reduced to an attribute
of the single paragraph tag, and the three different font settings were
made into attributes of one FONT tag. This produces a shorter, cleaner,
more logical piece of code, showing the advantages of coding by hand instead
of using some silly editor!
WYSIWYG editors will also stick some weirdo oddball attributes into
tags, like NATURALSIZEFLAG in the IMG tag.
This is not part of any HTML spec that I am aware of, is not used by any
browser that I know of, and I haven't the slightest idea what it means,
but some of those editors like to stick it in.
Invalid Syntax
Watch the syntax of the HTML tags generated by your editor. Many of them
do improper things like failing to put quotes around attribute values that
require them (any that include characters other than letters, numbers,
and hyphens; for instance, "100%" and "+1"), failing to properly nest blocks
(you should use closing tags in the reverse order of the opening tags),
etc.
Browser-Inconsistent Code
Sometimes some of the editors will insert code that works in one browser
but doesn't have the same effect on another. Microsoft FrontPage has the
tendency to produce code that's fine for MS Internet Explorer but not so
nice in Netscape. One example is how FrontPage will change all centering
commands to <P ALIGN=CENTER> even if you originally
inserted them as <CENTER>. While, in most cases, using
an ALIGN attribute is more logical and elegant than the awkward CENTER
tag, there are places where the only reliable way to center something is
to use <CENTER>, like for centering tables and form
submission buttons. These aren't considered part of a paragraph by Netscape
(since a paragraph, by the HTML specs, can contain only character-level
content such as text or inline images), so they won't be centered by <P
ALIGN=CENTER>. Internet Explorer seems to be rather more liberal
in what it'll consider to be part of a paragraph, and FrontPage follows
this attitude by using only the paragraph centering attribute even when
editing a page that was created using a different tag. The result is a
page with some elements not properly centered in Netscape, and every time
you try to fix it, FrontPage will "helpfully" keep changing it back.
Invalid URLs
If you don't watch out, your editor might insert links and graphics in
the form of hardcoded URLs of files on your hard disk instead of the proper
relative URLs that work on the actual web server. Look for such code as:
<IMG SRC="file://C|/www/sitefiles/mypic.GIF" WIDTH=200
HEIGHT=300 ALT="Picture of me">
All such URLs starting with "file:" are invalid for use on public web
sites, since they point at your local hard drive and nobody else has access
to that but you. Editors will tend to stick in such URLs quite often (sometimes
using operating-system-specific characters like backslashes instead of
the proper forward slashes, making the syntax not even valid).
This is a hard error to track down as long as you're using a WYSIWYG
editor to hide your actual HTML code from you, and you're only viewing
your site from your own system, on which the hard-drive references work
fine. You'll need to view your pages in a normal text editor to see which
URLs are screwed up and fix them.
Inelegant Links Back To Home Page
Elsewhere I give reasons why you should
link back to your main home page with <A HREF="./">
instead of explicitly naming your index page (e.g., index.html),
and similarly, to link to subdirectory indices with the directory name
instead of the explicit index filename. No WYSIWYG editor that I've encountered
yet does links in this manner; instead, they all seem to force these links
to be in the form of explicit filenames, and may even change your properly-formed
directory-name links to a hardcoded "index" reference without you knowing
it.
Un-Helpful Defaults
By default, many WYSIWYG editors fill in sections of code that are useful
to users in "non-standard" browsing situations with un-helpful content.
I'm referring to things like the ALT text in images, and the NOFRAMES section
of framed sites. These are useful to users of older browsers, text-only
browsers, text-readers for blind users, and, don't forget, the search-engine
indexing robots. The use of meaningful content in these sections is important
to the overall accessibility of your site, but too many users let these
things be filled in by their editors in a "default" manner. This results
in the NOFRAMES section saying nothing but "This site requires frames and
your browser does not support them", and the ALT text of images containing
something pointless like the filename and byte size of the image. See my
sections on frames and images
to find out what you should be putting there.
Weird Filenames and Tangled Directory Structures
Since WYSIWYG editors let you add stuff to your site with a point-and-click
interface, they encourage you to pay little attention to your filenames
and directory structures. If you don't watch out, you'll wind up with a
tangled mess with graphics scattered over lots of different directories,
filenames with random mixtures of uppercase and lowercase letters, and
other chaotic things you'd probably not do if you did it by hand. (See
my comments on directory structures.)
Tables From Hell
All WYSIWYG editors tend to use lots of tables
for page layout. Some, most notably NetObjects Fusion, go further and generate
tables from hell, with enormous numbers of rows and columns with
specific, finely-set pixel widths and heights in order to (attempt to)
achieve a highly precise layout. The result, in addition to a site that
is probably highly sensitive to the user's browser version and screen width
and unlikely to degrade well on any "nonstandard" platform, is incredibly
convoluted HTML code that's almost impossible to maintain in any other
manner than in the editor that created it in the first place. So if you
expect to ever switch editors or fine-tune your pages by hand, you
had better not start designing them in an editor that generates this sort
of code.
Even in the less-Byzantine tables generated by a more "run-of-the-PageMill"
editor, watch out for bad style such as hardcoded table pixel widths that
won't work well when users have screen widths different from your own.
Lapses of Logic
Since WYSIWYG editors create HTML code based on the visual appearance of
the document rather than its logical structure (even though HTML is a markup
language intended to represent logical structure), it's likely to fail
to "get the point" of your document's structure and mark it up logically.
You may well end up with your main headers being marked up with font sizes
instead of H1 and other header tags, while other parts
of your document may have non-header content marked up with H
tags because the editor decided that was the best way to do a font effect.
Your paragraphs may be broken with <BR><BR> instead
of <P>. The result may well look the same as a logically-marked-up
document (at least in the browser versions the editor is coding for), but
text-mode browsers, blind speech readers, search engine robots, and other
programs that try to interpret your logical structure in a manner other
than the graphical rendering you're used to, are likely to get lost. (For
instance, the content of your <H1> header is often weighted
heavily in search-engine indexing, while the same header marked up merely
with a font size will probably be treated as much-less-important plain
text.)
So, do you really want to use those editors? They may seem to
be saving you some time when you're first starting to set up a site, but
in the long run, you'll be spending a lot more time fixing up the problems
they cause.
Particular Quirks of Editors
Here are some of the specific things I know about that certain individual
editors do:
NetObjects Fusion:
-
Generates Tables From Hell to (attempt to)
achieve pixel-perfect positioning.
-
Is, in fact, incapable of generating pages that aren't all contained in
a big table, with an absolute pixel width. You can't make your pages
resize gracefully to different screen widths, even if you're aware of this
issue!
-
Uses its own bizarre directory structure full of weird filenames, and offers
no option to override it with your own desired structure.
-
Sticks in lots of blank spacer graphics all over the place and doesn't
have the decency to put an empty ALT="" attribute so they're
suppressed in text-mode browsers like Lynx, leading to "[INLINE][INLINE][INLINE]"
all over the place.
-
Does a really lousy job of letting you import or export web pages between
NOF and other editing programs, even though their marketing hype claims
you can. Pages imported from another editor will have their HTML code screwed
around with so it probably doesn't function as originally intended. Pages
exported to another editor will make that editor go through conniptions
trying to render the convoluted table structures NOF creates.
-
Doesn't use logical, structured HTML elements; when you choose a Level
1 Header, it doesn't use <H1> as is logical, but a FONT
SIZE tag instead. When you choose a block quote, it gets rendered
not by the proper BLOCKQUOTE element, but by a misused
UL tag. The creators of NOF go way beyond the call of duty
in avoiding any semblance of logicality in their tag usage. (See my discussion
of physical and logical tags.)
Netscape Navigator Gold / Communicator:
-
Loves to add extra characters, especially at the end of lines,
in some cases messing up your formatting.
-
Doesn't put the required quotes around attributes like SIZE="+1".
-
When you save a site, by default it copies all the graphics into the same
directory you saved the page, even if they were originally in other directories
or even on other servers, and change the IMG SRC references
accordingly. Thus, if the site had the graphics neatly placed in a separate
directory to begin with, once this editor got hold of it they'll end up
strewn all over the place in all the directories you ever edited HTML files
in. You can wind up with lots of copies of the same graphic wasting space
on your server. Even CGI-generated graphics like counters tend to get changed
into static graphics by this editor.
-
If you choose to "maintain links" when you start editing a site, it changes
them all to absolute instead of relative
URLs, regardless of how you'd like to keep them.
Microsoft FrontPage:
-
Will change your CENTER tags to P ALIGN=CENTER
whether you like it or not, and even in circumstances where the latter
won't work in some browsers.
-
Loves to insert its weird "PageBot" elements all over the place, and add
in features that only work if your provider's server has FrontPage extensions
installed.
-
Like all Microsoft products, it doesn't mind using nonstandard characters
like so-called "smart quotes" that aren't part of the official standard
character set, and might show up as something weird on a non-Windows system.
-
Still, it does seem to be making at least a half-hearted attempt to create
logically structured HTML, which is better than most WYSIWYG editors, and
the latest version gives you a mode where you can edit the raw HTML (though,
in some cases, Front Page will change it back to its own desired syntax).
-
...At least, I thought it attempted to keep your HTML logical...
recently, somebody asked me to find out the cause of a Front Page-generated
web page being completely empty when viewed in Netscape, and when I viewed
the source, it turned out that it had the closing </TABLE>
tag of a table before the opening <TABLE> tag,
so that it was closing a table that hadn't opened yet, and then never actually
closing the real table once it started. Netscape can't cope with that sort
of bogus code, though Microsoft Internet Explorer is somewhat more tolerant.
The author swears that the code is exactly as Front Page generated
it and he didn't mess around with it using any other program, though I
have trouble believing that even Microsoft would generate HTML this bad.
-
WARNING: DON'T create FrontPage webs in the root directory
of your hard disk... if you use FrontPage later to delete them, it could
wipe out your entire hard disk! This bug has been acknowledged by Microsoft...
FrontPage treats the entire directory you place web documents in as being
part of that web site, and so will delete everything there if you delete
them within Front Page. This means that if you use your system root as
the place to put your files, everything will get wiped out (including
subdirectories, hidden and system files, etc.) if you remove it!
Microsoft Publisher:
-
Like Net Objects Fusion, this is another program that goes for the "Tables
From Hell" approach to page layout, producing messy, convoluted code
that's very difficult to work with outside the program.
-
It puts in the <HEAD> section the following line:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
What this is saying is that the Windows character set is being used
(or, at least, version 1252 of it; I'm not sure what other Windows character
sets exist). At least this program is honest about that, unlike
some other Microsoft software which blithely uses the nonstandard Windows
characters like so-called "smart quotes" and then sticks in an ISO-8859-1
character set identifier to claim to be using the standard character set.
However, the use of platform-specific character sets is a bad idea when
the ISO and Unicode standards exist to represent special characters in
a manner understandable across the net.
-
The sites I've seen generated with MS Publisher are notable for their complete
absence of ALT text in images and imagemaps. I don't know whether MS Publisher
fails to provide a means of putting these attributes in, or if the users
are just not bothering to do it.
-
One user reports that this program converts all your JPEGs to GIFs, and
he couldn't find any way to override that. This is the only web editor
I know of that messes around with the user's graphic files (other than
renaming them and moving them to different directories, which several other
editors do).
This page was first created 29 Nov 1997, and was
last modified 23 Jun 1998.
Web Tips Menu -- Dan's Home Page
webmaster@webtips.dan.info