Server vs. Client
TIP: Keep clearly in mind the distinction between the server and the client, and know what actions are best performed at each end of the Web connection process.
One source of confusion for "newbie" Web developers is the distinction between the server and client sides of the Web. Every Web session involves both elements, and each has its important role to play, but many people don't understand exactly what things are done by each. This leads to developers attempting to accomplish things at the browser end that should be done by the server, or vice versa.
Let's examine the two sides of the Web, one at a time:
The server is a rather passive process. It just sits there, waiting for somebody to request something from it. When a request comes in, the server fulfills it, then goes back to waiting for more requests.
The word "server" is often used both to describe the program that handles the fulfillment of requests, and the physical machine this program runs on. This is a bit confusing, and often inaccurate because the same physical machine can actually be operating in the role of either a client or a server, or both simultaneously, depending on what programs are running on it.
The main kind of server used on the Web is an HTTP (Hypertext Transport Protocol) server, used to send Web documents under "http:" URLs. However, other sorts of servers like FTP (File Transfer Protocol) servers can also be used on the Web. The same machine may be running several different servers. The way they're distinguished is by "port number," a numeric identifier that is used by the clients when establishing a connection. Usually HTTP servers are at port 80, for instance. If you see a number after a colon (:) in an URL, that specifies a port number. This is normally omitted when the standard port is used.
The most common server software is Apache, available as freeware for a wide variety of platforms. No matter what operating system you're running, you can probably get an appropriate version of Apache and run a personal Web server yourself, for internal use on your system or LAN even if you don't have a permanent Internet connection. However, a professional hosting service is better for high-traffic, high-reliability public sites, but most of them use Apache too. Microsoft's entry in the server field, the "IIS Server", fortunately has not achieved the sort of dominance they have in the desktop OS and browser markets; you mostly hear about it when yet another virus, trojan-horse, or hacker exploit succeeds in compromising its security.
Some things that are done at the server end:
Associating MIME Types with Data
The way a server tells a client what kind of data it is sending is through the use of MIME
(Multipurpose Internet Mail Extensions) types. These are names that are assigned to different
kinds of data such as
Usually, the server will determine what MIME type to use by the file extensions. If the
extension of a file is
The client is supposed to render the data (or launch an external application to do it) based on the MIME type. Unfortunately, as mentioned below, Microsoft Internet Explorer ignores the standards and makes its own decisions about what to render, frequently causing headaches for developers.
Other HTTP Headers
In addition to the Content-Type header, the server may send other information in HTTP headers prior to sending the document itself. Some of these headers may suggest things to the client such as whether or not to cache the document for later reuse. However, in the end, it is the client's decision whether to follow such suggestions or not; the server can't "force" anything. Some of these HTTP headers can be simulated in HTML using META "http-equiv" tags, but this is an inferior solution to sending the headers via the HTTP protocol in the first place.
Server Side Includes
Some Web servers can pre-process HTML documents and execute some embedded directives, such as ones calling for another file to be inserted at a particular point in the document. These directives are not HTML tags, and are not part of the HTML specifications. The browser never sees them because the server processes them and replaces them with the outcome of the directive, which is what the browser receives. Thus, the "HTML correctness" of the document depends on the resulting data after the directives are processed, not the original file with the directives included. This also means that any user who uses a browser command to save the HTML file will not get your original source document, but only the document resulting from the server processing.
Since processing a document with server-side includes takes more server resources than simply serving a static document, many servers are configured to look for server-side includes only when the file extension is .shtml instead of .html. Due to security considerations, some server administrators disallow server-side includes altogether. You'll have to consult your own server administrator to find out if server-side includes are supported on your site and if so, what syntax rules they follow.
Most interactive Web sites use CGI scripts. These run at the server end, so it isn't meaningful to ask what browsers "support CGI" or not, beyond inquiring into support for particular data-transmission features such as forms (which all browsers other than some really archaic line-mode text browsers support). The CGI scripts generate HTML (and/or other media types) as their output, so any browser compatibility concerns are based solely on the browser compatibility of the output data, not on the fact that the data comes from a script rather than being stored statically on the server.
"CGI" (Common Gateway Interface) is not a programming language; it's simply a set of common variables passed between the browser and the server in accordance with the HTTP protocol rules. The CGI scripts themselves may be written in any language that works on the server's system. PERL is a very popular language for such scripting.
CGI scripts won't work on your local hard drive; people (like corporate bosses) sometimes ask if an interactive, CGI-based Web site that a Web development team just completed can be put on a CD-ROM for use by people without an Internet connection, but that isn't usually feasible since the scripts are not designed to run on the end-user's system. You'd have to run a Web server on the user's system to do CGI scripting, and it would have to be compatible with the script language that was used.
Many server administrators restrict access to CGI scripts for security reasons, since a CGI script is able to read and write files on the server and, depending on the security of the system, could have access to confidential data or the ability to alter data in a harmful manner. Thus, you might not be able to put them up yourself in your Web site. If you are allowed to do so, you might have to name your scripts with a .cgi extension or put them in a cgi-bin directory so the server knows they're scripts; consult your administrator for details for your server. Even if users aren't allowed to put scripts up themselves, there may be some standard scripts up on the server already that you can use for things like counters, reply forms, guestbooks, etc.
Microsoft FrontPage will generate Web pages with references to a particular set of CGI scripts from Microsoft, but these will only work if the server you're placing them on supports FrontPage extensions, which many don't. These extensions are mostly geared to Windows NT servers; the .exe files FrontPage will put in a _vti-bin directory are gibberish to a UNIX server, though a flavor of FrontPage extensions is available for UNIX too. Some server administrators distrust anything from Microsoft and will refuse to install this stuff, so you're out of luck if you want to use it.
The client, or "user agent", is what a user runs to access the Web. A browser is a user agent, but there are other sorts of user agents too, like search engine indexing robots. The client makes requests from servers, and takes the resulting data and renders it in some manner which may vary greatly depending on what sort of user agent it is.
HTTP, the primary protocol used for Web documents, is a stateless, connectionless protocol, meaning that no permanent connection is retained between the client and the server; each document gets a separate request, in a separate session, which is terminated once the request is fulfilled. If an HTML page has 12 graphic files, 1 Java applet, a style sheet, and a background sound, that means 16 different connections to the server must be made to retrieve the HTML document and all of its embedded items.
Some things that are done by the client:
Rendering of HTML and Other Media Types
The server only transmits the HTML documents, graphic and sound files, and other content. It doesn't display or interpret these documents; that's the job of the client. The client receives the headers (such as MIME type) describing what type of data is being received, followed by the data itself, and then must do something with it such as displaying it in graphical or text mode, or indexing it. Sometimes you can configure a browser to use particular plug-ins or helper applications to display data types that are not natively supported; which application is used is determined by the MIME type.
Unfortunately, the popular Microsoft Internet Explorer browser seems to completely ignore MIME types, choosing to handle data based on a variety of other factors of its own choosing, such as the file extension shown in the URL, the contents of the file, and maybe the phase of the moon, rather than the MIME type sent by the server. While this sometimes "fixes" errors caused by servers sending the wrong MIME type, it also introduces errors, especially when servers use unusual URL extensions that don't match what Microsoft thinks the file types should be (especially common for Web pages generated dynamically from CGI scripts); in those cases, even though the server sends the right MIME type, MSIE ignores it and displays the file the way it feels like, which may be wrong. This is a very annoying deviation from the standards. And, worse than that, it's proved to be dangerous: see this warning and this discussion to see how this MIME second-guessing can be used by malicious "hackers" to get viruses and Trojan-horse programs onto your system and launched before you know what hit you.
Still more unfortunately, the Mozilla browser, which prides itself on following standards, is under great pressure to "bend" on this and start second-guessing MIME types too, as various people gripe about how this or that poorly-configured site "doesn't work right" in Mozilla. In fact, for a short time this 'bug fix' caused certain file extensions in URLs to override MIME types, prompted by a desire to get this site to work despite its server sending the wrong type for its Shockwave Flash animation. This caused the ugly "side effect" of making various other sites suddenly fail, most comically the fact that it became impossible to read newsgroup postings from Australia, because the .au domain ending in the message IDs was misinterpreted as the file extension of audio files. This bug report documents it. This report shows some of the other negative consequences of the "fix", which was quickly backed out. However, the Chimera browser for the Macintosh, based on Mozilla, is working on adding its own style of MIME second-guessing in order to deal with the widespread use of .dmg download files served with incorrect MIME types. There's also an open bug (actually a feature request) for the Apache server to get it not to send any MIME header at all for unknown file types, rather than a default type -- in this case, a browser would be allowed by the standards to guess a type.
The W3 Consortium has a clear statement about the incorrectness of overriding server MIME types with a "sniffed" or guessed type.
Test your browser's standards compliance by trying this link; it's a CGI script that generates output of type text/plain. Practically all browsers other than MSIE get it correctly, but MSIE is likely to think it's binary data or something and do something weird with it. On the other hand, on some misconfigured sites, MSIE will seem to "do the right thing" with data that Netscape "screws up" with, but actually it's Netscape that's doing the right thing for the data type the server claims to be sending. This failure is the server administrator's fault, not Netscape's. Some more tests are in somebody else's site.
Anyway, the specific details of how Web documents are displayed are decided by the user agent, and are not controllable by the server or the document author.
Some cookies are set to expire at the end of the current session, so when the user exits the browser the cookies are removed. Others have expiration dates in the future so that they stick around for a while; it is this latter type that raise the most privacy concerns because of their maintaining a long-term record of the user.
For security reasons, the browser will only send the contents of cookies back to the same server that set them in the first place. That means that one site has no way of knowing whether a user has any cookies from a second site, or vice versa. Thus, the Webmaster of your church's site can't see that you've got a cookie from the playboy.com site!
Make your site better by looking at other sites that show, by example, what not to do!
NOTE: The inclusion of a site in my "Hall of Shame" links should not be construed as any sort of personal attack on the site's creator, who may be a really great person, or even an attack on the linked Web site as a whole, which may be a source of really great information and/or entertainment. Rather, it is simply to highlight specific features (intentional or accidental) of the linked sites which cause problems that could have been avoided by better design. If you find one of your sites is linked here, don't get offended; improve your site so that I'll have to take down the link!
This page was first created 21 Jun 1998, and was last modified 08 Oct 2010.