Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesBuild Your Own WWW Server


April 1995 / Features / Build Your Own WWW Server

Many organizations want to set up their own World Wide Web server. Here's why--and how to do it.

Bob Friesenhahn

Even if your company isn't hooked up to the Internet and has no interest in becoming part of the WWW (World Wide Web--also known as W3, or just "the Web"--a global network based on the Internet's TCP/IP protocols), there are significant reasons why you should consider using Web-based software and technology on your LAN. This article describes how to implement a Web server using readily available free software and equipment you might already own. It also points out pitfalls and obstacles to think about before beginning this type of project.

But you definitely should consider the benefits of weaving yourself into the Web. With HTTP (Hypertext Transport Protocol) servers and graphical clie nt software, such as Mosaic and Netscape, you can achieve much of the information-sharing that's possible with Lotus Notes--but at a lower cost and without incurring the overhead of distributed databases.

Instead of replicating information to a local repository, the way Notes does, HTTP and Mosaic give you an efficient way of distributing up-to-date information and providing remote access to programs within your organization. With an attractive graphical interface, such as Mosaic, this results in a cost/benefit ratio that's hard to beat.

How the Web Works

A major thrust of the WWW effort has been to unite existing protocols and formats into one common interface. To accomplish this, the URL (uniform resource locator) addressing format allows a user to specify any object on the Internet, along with sufficient information to retrieve it. Consider this example: ftp://ftp.uu.net:21/README. This URL says to retrieve the file named README from the site ftp.uu.net using FTP (file transfer protocol) on port 21. Commonly supported protocols and programs include FTP, Finger, Gopher, HTTP, NNTP (Network News Transport Protocol), Rlogin, Telnet, and WAIS (Wide Area Information Service).

WWW servers are designed to handle documents created using the HTML (Hypertext Markup Language) format. What makes HTML documents unique is their ability to include hypertext links that facilitate rapid access to other locations within the same document, to other documents on the same site, and to documents at another site entirely--the capability that makes the Web so powerful. HTML also supports internal references to external objects, such as image files.

The Web is based on the HTTP, which supports a client/server model. The protocol covers operations ranging from simple get commands to complex authentication mechanisms. HTTP is currently specified to run on top of TCP, the Internet's basic transport mechanism, but this is not an inherent limitation of the protocol.

To maximize the availability of the data on the Internet, many software packages--called Web browsers--have been developed. These packages use URLs to specify data location and can retrieve and display many forms of information. The most popular of these clients are Mosaic (originally developed at NCSA, the National Center for Supercomputing Applications, Champaign, IL) and Lynx.

Many variants of Mosaic are now available from such companies as Netscape Communications (formerly Mosaic Communications, Mountain View, CA), NCSA, Spyglass (Savoy, IL), and Spry (Seattle, WA) for popular platforms. All share a user-friendly GUI that can display text in different fonts and in-line graphics. Lynx is a character-based program for those who don't have the high-speed SLIP or PPP Internet connection required for effective use of GUI-based software. Usually, a Web client is thought of as a user-interface tool, but a WWW server can also be a Web client.

How Can I Serve You?

A WWW server delivers data vi a HTTP. A client opens a connection to the server, submits a single request, and receives the response; the connection is then closed. Only a small part of the URL--the path portion--is passed from the client to the server; the server already knows the rest.

When a full URL is passed from the client to the server, this indicates a proxy request that the server is expected to place on behalf of the client. Servers that support proxy capability act as clients on the requester's behalf. For example, an HTTP-only client can perform an FTP transfer by using HTTP to request an FTP URL from the proxy server. The proxy server uses FTP to retrieve the data and passes it back to the client via HTTP.

Proxy servers can also provide the benefit of a local disk cache, which speeds up the recovery of often-used URLs. Sites that use an Internet firewall can use a proxy server to allow ordinary clients to access the Internet through the firewall.

The WWW server is responsible for mapping a supplied URL in to an object or responding with an error message. An object can take many forms. The simplest, and most common, object is an HTML text file. But it's possible for an object to be a program or to be built into a database request.

WWW servers are designed to mask implementation details from the user. Considerable flexibility in the configuration files allows the remapping of directory references into other directories, individual files, or programs.

Regardless of the type of object that the WWW server finally resolves to, the server is responsible for indicating the data format used for the reply. Response formats use MIME (Multipurpose Internet Mail Extensions) conventions. For instance, an HTML file would be indicated as type text/html in the response header.

Selecting the Server Platform

Deciding how--and whether--to implement a Web server, as well as what computer platform to use, depends on a number of factors. These include your finances, expected usage, data ty pes to be stored, available hardware, and your experience with specific operating systems.

Finances. You must first decide whether you have sufficient resources and dedication to operate your own server. For a normal WWW server, you need a full-time Internet connection (56 Kbps or higher), a 486-class or better computer, software, physical space, and technically knowledgeable people. Or you can hire a commercial service to put your data on the Web.

Expected usage. If you expect a large number of accesses per unit of time--as a large organization or popular public-access site might require--then you should design your server accordingly. But if you are doing this only as a hobby, then a desktop PC with a good 28.8-Kbps modem and SLIP or PPP support might do.

Data types and quantity stored. Some forms of data, especially graphical images, occupy considerable storage space. Be sure your server has sufficient disk space.

CPU resources. Serving up text files on request takes up few CPU resources. However, you may choose not to limit your server to such simple tasks. For example, if your system or application stores data in a nonstandard format, it may need to run a program on the fly to generate a desired response in an accepted format. Similarly, if you're providing access to a database, your server will use significant CPU resources during database accesses.

Experience with operating systems. Most people prefer to use operating systems that they're already familiar with. Luckily, WWW servers are available for many operating systems. The ideal operating system has excellent TCP/IP networking and efficient multitasking. This rules out MS-DOS and Windows. Unix, Windows NT, and Digital Equipment's VMS (with multithreading extensions) are all suitable. Unix is currently the best-supported operating system for WWW use and involves the least amount of risk.

Available hardware. Most sites connected directly to the Internet already use systems that a WWW server can run on. The main c oncern is whether your existing system (which may already support E-mail and Usenet news groups) has sufficient capacity to support a server. There are no clear-cut guidelines for choosing a server platform. A Pentium-based PC or Unix workstation (e.g., Sun's SparcStation 10) should provide good service in most cases.

There's more performance variance due to your operating system than to your hardware. A solid Unix implementation is an excellent choice. Failing that, VMS (for a Digital VAX minicomputer or an Alpha-based workstation) or Windows NT should be a workable solution.

Installing and Configuring Server Software

Once you decide to use a WWW server, you need software. (See "Setting Up a Server," January BYTE, for on-line sources for WWW server software.) The example in this article is geared toward the CERN (European Laboratory for Particle Physics) httpd software, based on my experience.

Building a server from source code is often a good idea because it incre ases your familiarity with the software and ensures that the binary is 100 percent compatible with your system. But if you don't have an appropriate compiler or don't want to learn how to build a server, then a precompiled binary may be a better choice--and, with luck, everything will work fine.

There are few build-time choices because most options are configured at run time. With the CERN server, the only build-time option is to include SOCKS (a generic proxy implementation library) support. This allows the server to reside on a LAN that's inside a firewall and to access the Internet via a SOCKS proxy daemon that executes on the firewall machine.

If you're building a server to provide HTTP proxy service to clients on a LAN, including this feature ensures the highest possible level of protection against unauthorized hackers on the network. Follow whatever building procedures are specified for the server software you select.

Before you install your server binaries, however, you should map out the server's directory structure. You must decide whether to install the binaries, configuration files, and log files where your system normally keeps such things or to put everything in a single directory tree. That's the approach I took, since having everything available in one place makes it easier to find related information and to move or copy the server to a new machine (see the figure " Server Directory Tree ").

Add an entry to your system's start-up script that automatically starts the server software when the system is booted--at least a command line that specifies the path to the primary executable as well as the configuration file to use. It's best to minimize the information specified on the command line; the configuration file is a more appropriate place for such data.

Most HTTP servers come with reasonable defaults and sample configuration files. It's best to start off small and add in features as your server grows. At a bare minimum, the configuration sh ould specify the following information:

-- The root (ServerRoot) of the directory tree (/httpd/server_root 
     in the sample directory structure).
-- The user identity under which the server will run. For security 
     reasons (more on this in the next section), it's best to give 
     the server as few privileges as possible.
-- The path to the log files.
-- A mapping to executables on the server. For example, in the 
     CERN HTTP configuration file, the following entry will map 
     any URL that starts with /cgi-bin/ (a common prefix) into 
     the server-side bin directory and treat it as an executable.

         Exec /cgi-bin/*/httpd/server_root/bin/*

-- A mapping to HTML source files. If HTML source files are placed 
     under a directory called /Web, the configuration entry to map 
     URLs into that directory is Pass /* /Web/*. In the CERN server 
     (and likely in others as well), the Exec and Pass mapping lines 
     are matched in the order specified in the confi
guration file. 
     So, to provide special treatment for a particular URL, position 
     it before the default entries.

Security Issues

Current WWW servers provide simple authentication mechanisms similar to those used in FTP and Telnet applications (i.e., cleartext passwords matched against DES-encoded equivalents in a password file). This is poor security, but it's the current least common denominator between clients and servers (i.e., you can always use it).

More secure schemes based on Kerberos and RSA (Rivest-Shamir-Adleman) public-key encryption are currently under development. Netscape Communications has announced a secure client/server combination that uses RSA technology to perform authentication and encryption.

However, there's deep concern in the WWW community that marketing this technology without agreement by other WWW software suppliers may actually destabilize Web security standards. Recognizing this concern, Netscape Communications has joi ned the W30 Consortium to work out new security and HTML standards.

In addition to password-style authentication, access can be restricted based on network address--a domain address or an IP address mask. The CERN server supports groups, which can be individual user names, users at specified hosts, or other combinations. This capability, along with passwords, allows powerful access control.

Even if you don't plan to provide documents that require security, you should learn how to implement user-level security. If you offer access to a program and need to know who a user is, you must use authentication to acquire this information.

Currently, password-level authentication is the only available mechanism that can reliably identify a user on a TCP link. When authentication is required, the client displays a user-name/password challenge. The client caches the entry and passes it, in scrambled form, with each URL request sent to the same server (this ensures that the challenge is not repeated f or each request). You can then use the authenticated user ID in executing server-side programs.

Design for Sharing and Growth

If anything remains constant, it's the need for change. It's far better to overdesign the directory structure of your server than to underdesign it. Use subdirectories and categorizations freely. Do this job well, and few changes will be required as your server grows. This is especially important for hypertext documents; fixing broken cross-references can be time-consuming.

If your server will be developed and maintained by a number of people in your organization, take this into account. It's unlikely that you'll want to provide update privileges for all files by all users. So, split up your directory trees so they reflect your organization. This should result in maximum productivity while stepping on the fewest number of toes.

Some new conventions support WWW server development, including the concept of using a Webmaster and Docmasters. A We bmaster is responsible for the server's technical administration and overall structure. Docmasters are responsible for formulating and maintaining documents that reside on the server. In a shared environment, establish conventions that match your organization and your server goals.

Once you've built your server, you'll need to populate it with HTML documents and pages. As the listing shows, formatting can be complicated. But a variety of editing and creation tools are available, including SoftQuad's HotMetal (available via anonymous ftp from ftp.ncsa.uiuc.edu:/Mosaic/contrib/SoftQuad), Cyberleaf from Interleaf (Waltham, MA), and AnchorPage from Iconovex (Bloomington, MN). Also, Microsoft recently announced an add-on to Word 6.0 that facilitates the creation of HTML documents.

Whither the Web?

The Web is a rapidly growing virtual document that comprises thousands of hypertext links to documents on sites around the globe. The HTTP server--the Web's underpinning--provides dis tributed access to documents, data, and programs. HTTP allows efficient sharing of information without the hassles and overhead of distributed databases. It also provides for platform-independent interfaces supported under Windows, the Mac, and Unix.

Even if your organization chooses not to participate in the WWW, you can use Web technologies on your LAN to provide common access to information.


Aliaserv: A Sample HTML Forms Application

Aliaserv.htm 
<html> 
<head> 
<title>Aliaserv</title> 
</head>
<body>
<img src="aliaserv.gif" width="300" height="100" alt=" ">
<h2>Mail Alias Query Facility</h2>

<form action="/cgi-bin/aliaserv" method="POST">

Arguments <input size=30 maxlength=80 name="arguments">
<a href="aliaserv-help.html">
<img src="/Private/Images/Icons/action_help.gif" 
  width="70" height="46" align="middle"
  alt="Help"></a><p>

<dl
>
 <dd> <input type="radio" name="query_type" value="expa" checked>
   Expand Alias(es) Matching Argument(s)
 <dl>
  <dd> Select any or all of the following options for Expand Alias
  <dd> <input type="checkbox" name="query_options" value="-F">Follow
    .forward files
  <dd> <input type="checkbox" name="query_options" value="-v">Verbose
    alias expansion tracing
  <dd> <input type="checkbox" name="query_options" value="-V">Verbose
    plus preserve name@localhost aliases
  <dd> <input type="checkbox" name="query_options" value="-l">Long
    form. Repeat alias name and use commas as delimiters.
 </dl>
 <dd> <input type="radio" name="query_type" value="list"> List
    Aliases Matching Argument(s) 
 <dd> <input type="radio" name="query_type" value="user"> List
    Aliases Containing Argument(s)
</dl>

<INPUT TYPE="submit" VALUE="Submit Query">  <INPUT TYPE="reset"
  VALUE="St
art Over">
</form>
<a href="../main.html"><img src="/Private/Images/Icons/action_back.gif"
  width="69" height="40" alt="MIS Page"></a> |
<a href="/"><img src="/Private/Images/Icons/action_home.gif"
  width="69" height="40" alt="Home Page"></a>

<h5>Copyright Bob Friesenhahn (<a href="mailto:bfriesen@iphase.com">
  bfriesen@iphase.com</a>),1994</h5><br>
</body>
</html>




Server Directory Tree

illustration_link (14 Kbytes)

This directory configuration is how I chose to structure the server files. Keeping everything together simplifies administration and future expansion.


Aliaserv: A Sample HTML Forms Application: Screen Output

screen_ link (77 Kbytes)

This sample Mosaic forms application supports various queries of the Unix mail alias system.


Bob Friesenhahn, a Dallas-based software engineer, moderates several BIX conferences. He can be reached on the Internet at bfriesen@simple.dallas.tx.us or on BIX as "thefuzz."

Up to the Features section contentsGo to previous article: A Class of Their OwnSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network