Dan's Web Tips:

Server vs. Client

TIP: Keep clearly in mind the distinction between the server and the client, and know what actions are best performed at each end of the Web connection process.

One source of confusion for "newbie" Web developers is the distinction between the server and client sides of the Web. Every Web session involves both elements, and each has its important role to play, but many people don't understand exactly what things are done by each. This leads to developers attempting to accomplish things at the browser end that should be done by the server, or vice versa.

Let's examine the two sides of the Web, one at a time:

The Server

The server is a rather passive process. It just sits there, waiting for somebody to request something from it. When a request comes in, the server fulfills it, then goes back to waiting for more requests.

The word "server" is often used both to describe the program that handles the fulfillment of requests, and the physical machine this program runs on. This is a bit confusing, and often inaccurate because the same physical machine can actually be operating in the role of either a client or a server, or both simultaneously, depending on what programs are running on it.

The main kind of server used on the Web is an HTTP (Hypertext Transport Protocol) server, used to send Web documents under "http:" URLs. However, other sorts of servers like FTP (File Transfer Protocol) servers can also be used on the Web. The same machine may be running several different servers. The way they're distinguished is by "port number," a numeric identifier that is used by the clients when establishing a connection. Usually HTTP servers are at port 80, for instance. If you see a number after a colon (:) in an URL, that specifies a port number. This is normally omitted when the standard port is used.

The most common server software is Apache, available as freeware for a wide variety of platforms. No matter what operating system you're running, you can probably get an appropriate version of Apache and run a personal Web server yourself, for internal use on your system or LAN even if you don't have a permanent Internet connection. However, a professional hosting service is better for high-traffic, high-reliability public sites, but most of them use Apache too. Microsoft's entry in the server field, the "IIS Server", fortunately has not achieved the sort of dominance they have in the desktop OS and browser markets; you mostly hear about it when yet another virus, trojan-horse, or hacker exploit succeeds in compromising its security.

Some things that are done at the server end:

Associating MIME Types with Data

The way a server tells a client what kind of data it is sending is through the use of MIME (Multipurpose Internet Mail Extensions) types. These are names that are assigned to different kinds of data such as text/html for HTML and image/gif for GIF graphic files. At the beginning of the response the server sends the client after a request is a series of headers giving information about the data, including a "Content-Type" header with the MIME type.

Usually, the server will determine what MIME type to use by the file extensions. If the extension of a file is .html or .htm, it is sent as text/html, for instance. However, the exact details can be configured by the server administrator to deal with whatever file types happen to be in use.

The client is supposed to render the data (or launch an external application to do it) based on the MIME type. Unfortunately, as mentioned below, Microsoft Internet Explorer ignores the standards and makes its own decisions about what to render, frequently causing headaches for developers.

Other HTTP Headers

In addition to the Content-Type header, the server may send other information in HTTP headers prior to sending the document itself. Some of these headers may suggest things to the client such as whether or not to cache the document for later reuse. However, in the end, it is the client's decision whether to follow such suggestions or not; the server can't "force" anything. Some of these HTTP headers can be simulated in HTML using META "http-equiv" tags, but this is an inferior solution to sending the headers via the HTTP protocol in the first place.

Server Side Includes

Some Web servers can pre-process HTML documents and execute some embedded directives, such as ones calling for another file to be inserted at a particular point in the document. These directives are not HTML tags, and are not part of the HTML specifications. The browser never sees them because the server processes them and replaces them with the outcome of the directive, which is what the browser receives. Thus, the "HTML correctness" of the document depends on the resulting data after the directives are processed, not the original file with the directives included. This also means that any user who uses a browser command to save the HTML file will not get your original source document, but only the document resulting from the server processing.

Since processing a document with server-side includes takes more server resources than simply serving a static document, many servers are configured to look for server-side includes only when the file extension is .shtml instead of .html. Due to security considerations, some server administrators disallow server-side includes altogether. You'll have to consult your own server administrator to find out if server-side includes are supported on your site and if so, what syntax rules they follow.

CGI Scripts

Most interactive Web sites use CGI scripts. These run at the server end, so it isn't meaningful to ask what browsers "support CGI" or not, beyond inquiring into support for particular data-transmission features such as forms (which all browsers other than some really archaic line-mode text browsers support). The CGI scripts generate HTML (and/or other media types) as their output, so any browser compatibility concerns are based solely on the browser compatibility of the output data, not on the fact that the data comes from a script rather than being stored statically on the server.

"CGI" (Common Gateway Interface) is not a programming language; it's simply a set of common variables passed between the browser and the server in accordance with the HTTP protocol rules. The CGI scripts themselves may be written in any language that works on the server's system. PERL is a very popular language for such scripting.

CGI scripts won't work on your local hard drive; people (like corporate bosses) sometimes ask if an interactive, CGI-based Web site that a Web development team just completed can be put on a CD-ROM for use by people without an Internet connection, but that isn't usually feasible since the scripts are not designed to run on the end-user's system. You'd have to run a Web server on the user's system to do CGI scripting, and it would have to be compatible with the script language that was used.

Many server administrators restrict access to CGI scripts for security reasons, since a CGI script is able to read and write files on the server and, depending on the security of the system, could have access to confidential data or the ability to alter data in a harmful manner. Thus, you might not be able to put them up yourself in your Web site. If you are allowed to do so, you might have to name your scripts with a .cgi extension or put them in a cgi-bin directory so the server knows they're scripts; consult your administrator for details for your server. Even if users aren't allowed to put scripts up themselves, there may be some standard scripts up on the server already that you can use for things like counters, reply forms, guestbooks, etc.

Microsoft FrontPage will generate Web pages with references to a particular set of CGI scripts from Microsoft, but these will only work if the server you're placing them on supports FrontPage extensions, which many don't. These extensions are mostly geared to Windows NT servers; the .exe files FrontPage will put in a _vti-bin directory are gibberish to a UNIX server, though a flavor of FrontPage extensions is available for UNIX too. Some server administrators distrust anything from Microsoft and will refuse to install this stuff, so you're out of luck if you want to use it.

The Client

The client, or "user agent", is what a user runs to access the Web. A browser is a user agent, but there are other sorts of user agents too, like search engine indexing robots. The client makes requests from servers, and takes the resulting data and renders it in some manner which may vary greatly depending on what sort of user agent it is.

HTTP, the primary protocol used for Web documents, is a stateless, connectionless protocol, meaning that no permanent connection is retained between the client and the server; each document gets a separate request, in a separate session, which is terminated once the request is fulfilled. If an HTML page has 12 graphic files, 1 Java applet, a style sheet, and a background sound, that means 16 different connections to the server must be made to retrieve the HTML document and all of its embedded items.

Some things that are done by the client:

Rendering of HTML and Other Media Types

The server only transmits the HTML documents, graphic and sound files, and other content. It doesn't display or interpret these documents; that's the job of the client. The client receives the headers (such as MIME type) describing what type of data is being received, followed by the data itself, and then must do something with it such as displaying it in graphical or text mode, or indexing it. Sometimes you can configure a browser to use particular plug-ins or helper applications to display data types that are not natively supported; which application is used is determined by the MIME type.

Unfortunately, the popular Microsoft Internet Explorer browser seems to completely ignore MIME types, choosing to handle data based on a variety of other factors of its own choosing, such as the file extension shown in the URL, the contents of the file, and maybe the phase of the moon, rather than the MIME type sent by the server. While this sometimes "fixes" errors caused by servers sending the wrong MIME type, it also introduces errors, especially when servers use unusual URL extensions that don't match what Microsoft thinks the file types should be (especially common for Web pages generated dynamically from CGI scripts); in those cases, even though the server sends the right MIME type, MSIE ignores it and displays the file the way it feels like, which may be wrong. This is a very annoying deviation from the standards. And, worse than that, it's proved to be dangerous: see this warning and this discussion to see how this MIME second-guessing can be used by malicious "hackers" to get viruses and Trojan-horse programs onto your system and launched before you know what hit you.

Still more unfortunately, the Mozilla browser, which prides itself on following standards, is under great pressure to "bend" on this and start second-guessing MIME types too, as various people gripe about how this or that poorly-configured site "doesn't work right" in Mozilla. In fact, for a short time this 'bug fix' caused certain file extensions in URLs to override MIME types, prompted by a desire to get this site to work despite its server sending the wrong type for its Shockwave Flash animation. This caused the ugly "side effect" of making various other sites suddenly fail, most comically the fact that it became impossible to read newsgroup postings from Australia, because the .au domain ending in the message IDs was misinterpreted as the file extension of audio files. This bug report documents it. This report shows some of the other negative consequences of the "fix", which was quickly backed out. However, the Chimera browser for the Macintosh, based on Mozilla, is working on adding its own style of MIME second-guessing in order to deal with the widespread use of .dmg download files served with incorrect MIME types. There's also an open bug (actually a feature request) for the Apache server to get it not to send any MIME header at all for unknown file types, rather than a default type -- in this case, a browser would be allowed by the standards to guess a type.

The W3 Consortium has a clear statement about the incorrectness of overriding server MIME types with a "sniffed" or guessed type.

Test your browser's standards compliance by trying this link; it's a CGI script that generates output of type text/plain. Practically all browsers other than MSIE get it correctly, but MSIE is likely to think it's binary data or something and do something weird with it. On the other hand, on some misconfigured sites, MSIE will seem to "do the right thing" with data that Netscape "screws up" with, but actually it's Netscape that's doing the right thing for the data type the server claims to be sending. This failure is the server administrator's fault, not Netscape's. Some more tests are in somebody else's site.

Anyway, the specific details of how Web documents are displayed are decided by the user agent, and are not controllable by the server or the document author.

Java and JavaScript

Java and JavaScript are not the same thing; they are two different languages. When somebody refers to a "Java script", it leaves me wondering whether they mean a script written in Java, or in JavaScript (properly written as one word). Java is a programming language that compiles into platform-independent code that can be run from Web pages (but which is saved in separate files rather than being embedded directly in the HTML code). JavaScript is an interpreted language which appears directly in source-code form and can be placed right in an HTML document.

However, both of these languages have in common the fact that they run at the client end rather than the server end. It is the browser, not the server, that needs to be "Java-compatible" or "JavaScript-compatible". For that matter, it is the browser, not the server, that can be configured to disable Java or JavaScript so that these scripts are ignored.

This also means that any security risks that these scripts may impose are at the client end (in contrast to CGI scripts, which impose security risks at the server end). While Java and JavaScript were created with security in mind (and hence are limited in their ability to access files on the user's hard disk outside authorized directories), there have been occasional security bugs which have surfaced with these languages, and thus a user is at his or her own risk running such scripts. But no security risk is imposed on the server itself, as the scripts are running at the user end and have no way of accessing anything on the server except by making standard-protocol requests just as any user agent can do.

Complicating the issue somewhat is the fact that there are now also server-side versions of Java and JavaScript. These operate like any other server-side scripting language. The use of Java at the server end doesn't require browsers to also use it, nor does the use of Java at the client end require a Java-based server. Communication between the server and the client is only done through the use of Internet protocols, so any data passed between them doesn't depend directly on what language is used at each end to send and receive it, only on what protocols are followed in its transmission. And any security risks imposed on the other side of this process are only a function of what actions the protocols allow, not of what actions are supported by the scripting language used at the opposite end of the transmissions.

Mozilla offers a nice set of built-in tools for developers using these languages, such as the Java and JavaScript consoles and the DOM (Document Object Model) Inspector. The DOM is how scripts gain access to parts of Web pages; it is the standardized structure of the representation of page elements. There is a W3C DOM that is the official Web standard, and it is supported well by Mozilla, passably by recent versions of other browsers, but there are also nonstandard proprietary DOMs supported by various browser versions which you should refrain from using, as it will produce incompatible code.

Cookies

Cookies, or "persistent client-side data", can originate either at the client or the server end (servers can send them as part of the HTTP headers of a document, and client-side scripting languages like JavaScript can request that the browser set them), but they are stored exclusively on the client end. If you configure your browser to refuse cookies, they won't be set no matter what the server tries to send you.

A cookie is a little piece of information that is stored by a Web site to keep track of some aspect of your session. Cookies are a way of getting around the "stateless, connectionless" status of HTTP. People have raised privacy concerns about cookies because they can keep track of what you do at a Web site and transmit this information back to that site the next time you visit it, letting them market to you in a more targeted (and, in some people's view, more intrusive) way. However, cookies can also be used in ways that increase the user's convenience, like storing a user's login name and password for a site that requires them, saving the user from having to remember it (but creating security concerns if you'd rather not let others who might have acccess to your machine get into those sites). Additionally, online stores with "shopping carts" often use cookies to keep track of what items a user has placed in his or her cart, and also to track "referral codes" indicating what site the user entered the store from in order to give proper commissions for affiliate programs.

Some cookies are set to expire at the end of the current session, so when the user exits the browser the cookies are removed. Others have expiration dates in the future so that they stick around for a while; it is this latter type that raise the most privacy concerns because of their maintaining a long-term record of the user.

For security reasons, the browser will only send the contents of cookies back to the same server that set them in the first place. That means that one site has no way of knowing whether a user has any cookies from a second site, or vice versa. Thus, the Webmaster of your church's site can't see that you've got a cookie from the playboy.com site!