From MAP@nic.dsi.net Sun Aug 28 07:52:00 PDT 1994 Article: 2888 of vmsnet.pdp-11 Path: nntp-server.caltech.edu!news.cerf.net!mvb.saic.com!info-pdp11 From: "Michael A. Patton" Newsgroups: vmsnet.pdp-11 Subject: Re: fuzzball Message-ID: Date: Sat, 27 Aug 1994 23:48:44 EDT Organization: Info-Pdp11<==>Vmsnet.Pdp-11 Gateway X-Gateway-Source-Info: Mailing List Lines: 995 Here's an old message I had around...a PostScript version is available by FTP to louie.udel.edu and get the file pub/ntp/doc/fuzz.ps.Z, The main text of the file is attached. -MAP ---------------------------------------------------------------- The Fuzzball David L. Mills Electrical Engineering Department University of Delaware Reprinted from: Mills, D.L. The Fuzzball. Proc. ACM SIGCOMM 88 Symposium (Palo Alto CA, August 1988, 115-122). Sponsored by: Defense Advanced Research Projects Agency contract number N00140-87-C-8901 and by National Science Foundation grant number NCR-86-12015. Abstract The Fuzzball is an operating system and applications library designed for the PDP11 family of computers. It was intended as a development platform and research pipewrench for the DARPA/NSF Internet, but has occasionally escaped to earn revenue in commercial service. It was designe d, implemented and evolved over a seventeen-year era spanning the development of the ARPANET and TCP/IP protocol suites and can today be found at Internet outposts from Hawaii to Italy stand ing watch for adventurous applications and enduring experiments. This paper describes the Fuzzb all and its applications, including a description of its novel congestion avoidance/control and time keeping mechanisms. Keywords: protocol testing, network testing, performance evaluation, Internet architecture, TCP/IP protocols, congestion control, internetwork time synchronization. 1. Introduction The Fuzzball is a software package consisting of a fast, compact operating system, support for the DARPA/NSF Internet architecture [8] and an array of application programs for network protocol development, testing and evaluation. It usually runs on a LSI-11 personal workstation, to which it also lends its name, and functions as a multi-purpose packet switch, gateway and service host. It supports the complete TCP/IP protocol suite, including network, transport and applications-level protocols for virtual-terminal, file-transfer and mail systems with text, voice and image capabilities. It includes a comprehensive user interface based on a multiple-user, virtual-machine emulator for the DEC RT-11 operating system, so that RT-11 program-development utilities and user programs can be run along with network-application programs in an internetworking environment. It is not clear how or exactly when the Fuzzball came to be called that. Its ancestor was born at the University of Edinburgh in 1971 and evolved as the DCN operating system for a distributed network of PDP11 virtual machines at the University of Maryland in 1975 [MIL75]. Among the lessons learned is the fact that context switching in a virtualized PDP11 architecture is painfully slow, so the DCN system remained a demonstration curio. By 1977 the DCN system was dismantled and rebuilt in a leaner, zippier design and software was developed for the TCP/IP protocol suite. The DCN routing protocol, now called Hellospeak [12], was completely redesigned to link DCN hosts and gateways together and to other networks, including ARPANET, SATNET and various LANs. At the time there were no full-featured Internet hosts other than DCN based on a microprocessor; so, presumably because nobody knew what DCN stood for, they became affectionally and widely known as Fuzzballs. The Fuzzball has since been in a state of continuous, reckless evolution to support more processor features and peripherals and to support progressively more complex protocol suites and more sophisticated resource-management techniques. Over the years many network experiments, demonstrations and more durable applications have been mounted on the Fuzzball, a selection of which is described in subsequent sections. Several experiments with advanced-technology networks such as packet radio, packet satellite and various gateway systems have used Fuzzballs as debugging, measurement and evaluation platforms. Various organizations have used Fuzzballs as terminal concentrators, mail and word-processing hosts, network monitoring and control devices and general-purpose packet switches and gateways. Fuzzballs have seen regular service on the INTELPOST and NSFNET Backbone networks and several campus networks at university, government and industrial research laboratories. Perhaps the most useful and interesting Fuzzball application is as an experimental tool for the development and evaluation of novel network algorithms and protocols. Many Fuzzballs have been deployed for that purpose at various locations in the US and Europe. In the typical case several Fuzzballs are lashed together with Ethernets and makeshift wires, then connected to the Internet via an existing LAN or gateway. Fuzzballs have even been connected via dial-up telephone circuits, amateur packet-radio channels, international undersea cables, satellite terminals and strange statistical multiplexors. This paper begins with an overfly of the Fuzzball system architecture, including process and memory structure, interprocess communication, and network support. Included are brief descriptions of application software found useful for network testing, debugging and evaluation. The paper then summarizes selected experiments and demonstrations in which Fuzzballs have played a pivotal part and highlights the useful features and lessons learned. It describes how Fuzzballs have been used in operational systems such as the INTELPOST and NSFNET Backbone networks and concludes with a detailed description of two interesting applications, one involving congestion avoidance/control and the other the synchronization of network clocks. 2. System Architecture The Fuzzball operating system consists of the supervisor and two kinds of processes, supervisor and user. The supervisor consists of the process scheduler and a collection of supervisor primitives (EMT instructions) used by all processes. Supervisor processes are used to support direct-access and terminal devices, as well as network interfaces and protocol multiplexors. User processes provide an emulated environment in which most system and user programs developed for the RT-11 operating system can run unchanged. Some user processes support dedicated functions such as routing daemons, input/output spoolers and generic network servers supporting connection-based (TCP) and connectionless (UDP) services. A process consists of four virtual-memory segments: the instruction segment, data segment, descriptor segment and parameter segment. The instruction and data segments for the supervisor and supervisor processes are allocated in separate kernel spaces to provide the maximum memory possible for buffers, control blocks and tables. The instruction and data segments for each user process are allocated in private user spaces, except for the emulator code, which resides in shared segments mapped into user space at addresses usually reserved for the RT-11 resident monitor. Descriptor segments contain the process stack and memory-management descriptors, as well as data used for process scheduling, interprocess communication and related functions. These segments are allocated in kernel space and not normally accessible to application programs. Parameter segments contain data used to establish various operational parameters, such as interrupt vector, disk volume size, device timeout and so forth, as well as various throughput and error counters, depending on process type. These segments are also allocated in kernel space, but mapped into each user process adjacent to the emulator code. This is done in order to facilitate operator monitoring and control of the various processes and their operational parameters. Virtual windows are used to access device registers and to share buffers and control blocks between processes to avoid time-consuming copy operations. Mutually exclusive access to shared data areas is provided using a set of semaphore queues supported by supervisor primitives. Processes encountering an occupied critical section are blocked and eventually serviced in order of arrival. A fixed number of semaphores is available for specific use, such as update access to the routing data base, volume directories and so forth. Processes communicate with one another using small (16-byte) interprocess messages in one of four formats. Message passing is supported by supervisor primitives that send a message or wait for one to arrive. Sending never blocks the process, while receiving does, unless a message is already waiting or until a specified timeout expires. Processes can send special messages called asynchronous interrupts which result from exceptional events and are especially useful for system logging and operator alarms. The scheduler determines the next process to be run, restores the processor state from the descriptor segment and activates the process. It uses a multi-level priority service discipline with preemptive round-robin (timesliced) service at each priority level. Preemptions occur when a higher priority process is scheduled due to a message arrival, semaphore unblock, timeslice end, asynchronous event or voluntarily. Voluntary preemptions result in the process either waiting for a message, semaphore unblock or asynchronous event such as a timer interrupt. The emulator intercepts synchronous interrupts (EMT instructions, etc.) and asynchronous interrupts and presents them within the virtual environment of the process. The user process interface presented by the emulator is a subset of that presented by the resident monitor of the RT-11 system and includes all of the programmed requests necessary to run standard RT-11 system components and user programs. A highly evolved logical clock is an intrinsic feature of the Fuzzball [16]. The logical clock increments at 1000 Hz and is equipped with frequency and phase control mechanisms useful for synchronizing time to other systems with either the Hellospeak routing protocol or Network Time Protocol mentioned later in this paper. Besides the logical clock, the system provides per-process timers and a full complement of timer and clock primitives. 2.1. Network Software A pair of supervisor device-driver processes is dedicated to every network interface device, one for input and one for output. There are three levels to a device driver, a common one responsible for Internet processing, including routing, fragmentation and various error processing, another below it responsible for the particular local-net architecture, such as Ethernet, serial line and so forth, and the third, which amounts to a set of interrupt routines and is responsible for the operation of the particular device. Support for the Hellospeak routing protocol is provided as an optional module that operates at the Internet level. The design is such that packets are transferred from the input device directly to a buffer and then out from the same buffer directly to the output device and without copying. This modular structure allows considerable flexibility, since the modules at the various levels can be swapped as required for the local net and device required. The TCP and UDP transport modules are implemented as a dedicated supervisor process, which is identified by an assigned Internet address. All datagrams directed to this address are delivered to this process, which is then responsible for matching to the correct protocol and port, assembling/disassembling the data and delivering it to a user process. A host can have multiple processes of this type in order to support configurations where the host is connected to more than one network interface. Transfers to and from the user processes use a shared virtual window in which data are copied directly between kernel-space buffers and user-space buffers. Some protocols are supported at the user-process level, including a UDP daemon for file-transfer, name-resolution and time protocols, as well as an EGP daemon. TCP daemons are used for remote spooling, both send and receive, to support standard and multi-media mail systems, as well as remote print, voice and image functions for special devices. TCP services such as the virtual-terminal (TELNET) server, file-transfer (FTP) server and several miscellaneous servers are handled by assigning a user process to the service upon arrival of a connection request and then passing an appropriate command to the application program which controls that process. 2.2. Application Software Each user has access to one or more user processes and can switch a single terminal from one to the other in a few keystrokes. A conventional command language interpreter (shell) running in each user process is used to initiate application programs, manage debug sessions, get system help and in general run the system. A very useful feature is that one process can be used to debug another while both are running. In addition, input and output streams can be redirected and virtual environments can be altered. A virtual volume feature allows volumes, complete with directory, to be encapsulated as an ordinary file. Among the often-used application programs supported by connection-oriented transport service (TCP) are the following: TELNET virtual-terminal protocol client and server programs, which provide user access to all Fuzzball functions, including command, control, monitoring and process/operator intercommunication. The client program includes features to test the negotiation mechanisms peculiar to that protocol, and to record and playback session text and emulate graphics display devices. FTP file-transfer protocol client and server programs, which provide the basic means to exchange program and data files between Fuzzballs in both ASCII and binary modes. Server and utility programs for exchanging mail with other Internet hosts. One of these is based on the ubiquitous (text-only) SMTP mail protocol used throughout the Internet, while the other uses the experimental MPM multi-media mail protocol and supports text, image, graphics and real-time speech modes. Utility servers for various "little" services, including TCP echo, discard, text generator and various user and system information utilities. In addition, a comprehensive Unix-compatible spooling facility can be used with text, image and speech devices. Among the often-used application programs supported by connectionless transport service (UDP) are the following: PING network-level testing utility, which uses ICMP Echo and ICMP Timestamp messages to interrogate remote machines, collect statistics and produce reports (PING stands for Packet Inter-Net Groper). XNET cross-net debugger, which is used for remote interactive loading, dumping and debugging, primarily for diskless machines. Generic UDP server, including two name-address translation protocols, two time-service protocols a file-transfer protocol and a remote host monitoring protocol. Utility programs for testing and debugging standard UDP services, including the above, as well as various surveying programs. 3. Fuzzball Applications In following sections selected fuzzball applications are described, ranging from Internet gateway development through transport-level performance improvement to the development of efficient name-address translation mechanisms. 3.1. Internet Gateway Development One of the earliest applications of Fuzzballs was as an Internet gateway, usually used between an experimental local net and the ARPANET. The first Internet gateways were implemented by Bolt, Beranek Newman (BBN) using PDP11 equipment, the BCPL language and the ELF operating system. Like the DCN system before it, ELF suffered from debilitating context-switching overhead and was limited to about 30 packets per second. The BBN implementation was eventually rehosted to the MOS operating system, which like the Fuzzball was built as a subset of its full-featured ancestor, and recoded in assembly code. This resulted in improved switching rates in the order of 200 packets/second. Meanwhile, the Fuzzballs were being used as experimental development platforms for the Gateway-Gateway Protocol (GGP) used in the BBN system, and later for the Exterior Gateway Protocol (EGP) [13]. The goal of the experimental effort was to explore data-management techniques, protocol interoperability, routing algorithms and congestion control mechanisms. As the result of this exploration, the maximum switching rate of the Fuzzball (with LSI-11/73 processor) was improved to over 400 packets per second. The participation of the Fuzzballs in the development of EGP was particularly important. EGP was developed in an atmosphere of controversy and delicate compromise between functionality and robustness. It is intended as both a primitive reachability protocol for use between administrative domains and a firewall to prevent instabilities within one domain from affecting another. In order to get the compromises just right, extensive analysis, prototyping and refinement cycles were necessary. Most of these involved Fuzzball development networks built on ARPANET paths, which required careful control of the routing environment and interoperation with existing GGP gateways. What made this practical was the easily butchered routing and forwarding mechanisms in the Fuzzball, in which firewall principles could be invented, explored and evaluated quickly and with low effort. 3.2. Internet Performance Issues Of considerable interest in the early development and prototyping of TCP/IP were the issues of interoperability and performance. As various implementations of the protocol suite matured, in particular those for Multics, TOPS-20, Unix and Fuzzball, distributed testing sessions called bakeoffs were held. With the exception of the Fuzzball, these implementations were intended for long-term operational use, with operational parameters optimized for the traffic flows and rates typical for a multi-user timesharing system. On the other hand, the Fuzzball implementation was specifically designed to explore the parameter space, sometimes in regimes ordinarily considered bizarre. For instance, Fuzzball utility hosts were configured with 1200-bps dial-up lines and used with TCP/IP and TOPS-20 hosts via Fuzzball gateways and ARPANET for ordinary virtual-terminal, file-transfer and electronic-mail applications. The purpose was not to establish an operational user environment, but to explore the practicality of the TCP/IP protocol suite and the various implementations at the extremes of the operational envelope. The bakeoffs and other experimental programs demonstrated that TCP/IP could indeed work well with grossly mismatched systems and transmission media; however, the experiments also revealed that catastrophic performance degradation could occur unless careful attention were paid to certain aspects of the protocol implementation. Not surprisingly, the most serious problem was overrunning the gateway between the 56-Kbps ARPANET and the 1200-bps dial-up lines. The most sensitive areas in the implementation included the mechanisms for packetization, retransmission and acknowledgement generation. In 1981 several experiments were carried out with the goal of exploring techniques to avoid unnecessary packet congestion in the gateways. These fell into three areas: reducing the incidence of small segments, especially in interactive virtual-terminal service, improving the efficiency of remote-echo and acknowledgement strategies and avoiding large pulses of data when large windows are first opened. Techniques used by the Fuzzballs to reduce the incidence of small segments include send delays, in which data are held and aggregated until a sufficiently large segment has accumulated, aggregated retransmissions, in which data arriving since the first transmission are included in subsequent retransmissions, and adaptive thresholds which prevent transmission until the TCP window has opened to a respectable size. Techniques used to improve the piggybacking efficiency of acknowledgements include acknowledgement delays, which delay the acknowledgement until the received data have been copied to user buffers and the application response (typically remote-echo data) are available. Many of these techniques were employed in some (but not all) the implementations at the time, including the Fuzzball, and have been further refined since then. However, since the Fuzzball was operating at the extremes of the credible envelope, precise tuning of the dynamics was exceptionally critical, to the point where the parameters had to be adjusted on an adaptive basis, including the retransmission timeouts, packetization delays and acknowledgement delays. A particularly troublesome issue was the determination of retransmission timeout under conditions of moderate segment loss and path delays varying over several orders of magnitude. An extensive survey [12], conducted with the aid of Fuzzballs used as network sounders, revealed the Internet had path delays and delay dispersions much larger than imagined. As the result of these experiments, several modifications to the suggested TCP retransmission-timeout estimation algorithm were made, including a nonlinear adjustment intended to improve performance under conditions of very high delay dispersion and provision for multiple per-segment range measurements to increase the sample density. In addition, an adaptive backoff algorithm was designed, along with explicit counting of outstanding segments with a rate-sensitive limit. These experiments used digital-analog converters and panel meters to watch critical time-varying variables, which resulted in some memorable demonstrations. 3.3. Applications Development Several experimental Internet mail systems have come and gone during the lifetime of the Fuzzballs. As each was designed, implemented and evaluated, Fuzzball prototypes were used for test and debug of the protocol and various implementations. Perhaps the most ambitious of these projects involved the DARPA Multi- media Mail Project, in which research groups at SRI, BBN, ISI and Linkabit developed an experimental multi-media mail architecture and presentation protocol with real-time voice, image and text capabilities. The Fuzzball implementation includes special-purpose spooler daemons for digitized speech, facsimile and bitmap graphics, as well as rule-based mail sending, receiving and reading programs. It was used mostly to explore the features of the architecture, develop fast encoding and storage algorithms and verify interoperability of all implementations. One of the most interesting experiments using the Fuzzballs involved the Domain Name System [17], which is now in widespread use as a host name/address directory service. The system is based on a distributed, hierarchical data base and a set of structured lookup procedures that can be used in connection-oriented or connectionless modes. The Fuzzball implementation includes both a domain-name server, which holds the data base, and a resolver subroutine, which is linked with every application program needing this service. A distinguishing feature of resolver subroutine relative to others implemented for Unix, is an amazingly persistent, adaptive search algorithm that painstakingly combs the data base of possibly many other servers in case of incomplete or inconsistent data. 4. Enduring Systems Over the years Fuzzballs have been involved in enduring demonstrations and networks requiring commercial grade operation, monitoring and control. Several of these applications, including two out of the three described in following sections, have provided service to large numbers of users under punishing conditions and have served to evaluate the performance of the Fuzzball implementation itself. 4.1. SATNET Demonstrations The DARPA Packet Satellite Project was started in 1975 to adapt packet-switching technology to international satellite communications, resulting by 1979 in the Atlantic SATNET system which spanned the US, United Kingdom and Norway [6]. As part of an intensive measurement and evaluation effort, Fuzzballs were used as traffic generators, statistics collection platforms and general-purpose data reducers [3]. At the 1979 National Computer Conference, Fuzzballs were used in a demonstration of SATNET with real-time packet-voice conferencing, image transmission, and an experimental small-aperture Earth terminal [MIL80]. As additional countries such as Italy and the Federal Republic of Germany joined the SATNET community the Fuzzball was often used as a vehicle to gain familiarity with the SATNET technology and as a tool for experiments and measurements. At one time or another, the research and defense establishments of every SATNET participant had nests of Fuzzballs spliced to their local-net infrastructure and functioning as gateways, lightweight electronic-mail hosts and experimental platforms. In order to support these activities a good deal of ad-hoc experimental software was developed, including data-reduction, graphics-display and format-conversion applications. The multiple virtual-process architecture and RT-11 emulator features of the Fuzzball proved highly useful and suited to this effort. Since the emulated RT-11 environment supports the FORTRAN, BASIC and C programming languages, as well as most RT-11 utility programs and text editors, real-time traffic generators, statistics processors and display generators can be quickly constructed or adapted to particular experiments. 4.2. INTELPOST Network The INTELPOST system was an electronic-mail network built in 1981 by BBN and COMSAT and operated by the US Postal Service and overseas affiliates. It linked sites in the US and Canada with several cities in Europe and South America and consisted of about a dozen PDP11-class processors with high-speed facsimile scanners and printers interconnected by serial lines with speeds to 56 Kbps. Each processor used the DEC RSX-11 operating system together with the Fuzzball TCP/IP networking software and Hellospeak routing algorithm. The software was adapted from the native Fuzzball code by simply replacing the macro library with another specifically tailored for the RSX-11 environment. This system is believed to be the first commercial deployment of the TCP/IP protocol suite outside the research community. It operated for several years carrying revenue traffic, but remained largely a pilot project and unknown by most of the public. It was in this environment that early experiments in congestion control were carried out, including those based on flow modulation, which is described in another section. While the system used the standard TCP/IP protocol suite and addressing conventions and an experimental gateway was set up at COMSAT Laboratories, the only traffic ever exchanged between INTELPOST and the DARPA research community was monitoring and debugging packets and an occasional misrouted file. 4.3. NSFNET Backbone The NSFNET is a loosely organized community of networks funded by the National Science Foundation to support the sharing of national scientific computing resources, data and information [7]. NSFNET consists of a large number of industry and academic campus and experimental networks, many of which are interconnected by a smaller number of regional and consortium networks. The NSFNET Backbone Network [15], which is the primary means of interconnection between the regional networks, includes switching nodes located at six supercomputer sites and three regional interties. Additional nodes are used for program development and testing, bringing the total to about thirteen. A Backbone node consists of a Digital Equipment Corporation LSI-11/73 system with 512K bytes of memory, dual-diskette drive, Ethernet interface and serial interfaces. One or two low-speed serial-asynchronous interfaces are provided, as well as one to three high-speed serial-synchronous interfaces. All Backbone nodes include crystal-stabilized clock interfaces, while one node (NCAR) is equipped with a WWVB radio time-code receiver providing a network time reference accurate to the order of a millisecond. The Backbone nodes function as Internet gateways [2] and include all the switching, monitoring, control and error-recovery functions necessary to forward packets from one regional Ethernet to another. Other gateways connect the Ethernets to regional and consortium networks, which in some cases span large regions of the country. Some sites are connected to other backbone networks such as ARPANET and public data networks as well. In mid-June 1988 there were 5856 hosts on 918 networks interconnected by 236 gateways listed at the Department of Defense Network Information Center alone, which in itself is only a small fraction of the overall Internet. Of the approximately 460 nets now in regular operation, about 130 are either directly connected to the Backbone or indirectly by means of gateways and other regional and consortium networks. The aggregate backbone traffic has doubled over the last nine months to about 71 packets per second, or about 4.4 packets per second per 56-Kbps line. 5. Experiments Among the many experiments mounted on the Fuzzball, three have been selected as typical and representing the scope and application of the Fuzzball implementation. The first examines issues in type-of-service routing and queueing, the second involves preemption and congestion management and the third describes an internetwork time service. 5.1. Type-of-Service and Precedence Queueing The Internet protocol includes provisions for specifying type of service (TOS), precedence and other information useful to improve performance and system efficiency for various classes of traffic. The TOS specification determines whether the service metric is to be optimized on the basis of delay, throughput or reliability and is ordinarily interpreted as affecting the route selection and queueing discipline. The precedence specification determines the priority level used by the queueing discipline. The Fuzzball routing mechanism includes provisions to determine the route based on TOS. The mechanism simply decodes the eight combinations of three TOS bits and appends the resulting octet to the four-octet Internet address. The routing tables use a mask-and-match scheme where the resulting 40 bits are first filtered by a mask specific to each entry, then matched against an address specific to that entry. The tables are ordered, with possibly several entries for the same address range, but different TOS masks, and with the first match found terminating the search. This scheme allows TOS interpretation to be integrated directly into the routing mechanism and protocols; however, existing routing protocols, including Hellospeak, do not provide means to propagate TOS masks so entries using this feature have to be manually configured. As the NSFNET Backbone has reached its capacity, various means have been incorporated to improve interactive service at the possible expense of deferrable (file-transfer and mail) service. An experimental priority-queueing discipline has been established based on the precedence specification. Queues are serviced in order of priority, with FIFO service within each priority level. However, many implementations lack the ability to provide meaningful values and insert them in this field. Accordingly, the Fuzzball cheats gloriously by impugning a precedence value of one in case the field is zero and the datagram belongs to a TCP session involving the virtual-terminal TELNET protocol. The results of this scheme are mixed, as could be expected. Customers of the NSFNET Backbone were thrilled when TELNET response dramatically improved after the new scheme was installed. However, it sometimes happens that low-priority traffic is continually pushed back in the queue as the result of high-priority arrivals, only to be preempted, cause a quench message (see below) or simply to time out. However, if large quantities of deferrable data, such as file-transfers and mail, get pushed back in this way, quench messages tend to concentrate on the originating hosts and help dissipate the load. 5.2. Congestion and Preemption Strategies In the TCP/IP protocol suite, which is based on end-to-end connectionless service, it is usually assumed that gateways have little state except that induced by the routing algorithm. In particular, the state of the various resources, including processor time, buffer space and queue contents, is generally invisible outside the gateway. This makes congestion avoidance and control very difficult, since the only access a congested gateway has to the state of any virtual flow is the composition of the queues in the gateway itself. In order to deal with traffic surges, the Internet architecture specifies the ICMP Source Quench message, which is in effect a choke packet sent to the originating host by a downstream gateway when it experiences an overload. Some gateway implementations make an attempt to cope with congestion by emitting a quench when a datagram arrives for a queue whose size exceeds a threshold or when the number of datagrams dropped due to congestion-related reasons exceeds a threshold. Host implementations respond to quenches in various ways; some ignore it, some including (modified) Berkeley 4.3 Unix and Fuzzball, reduce the TCP window size by a multiplicative factor, after which each acknowledgment received causes an additive increase, and some, including Cray Research Unix, operate on rate-based principles such as described in [PRU86]. Experience suggests that such simple mechanisms have only marginal utility and then only for traffic surges where the delay on the path to the originating host is short compared to the period of the traffic surge. In the case of the NSFNET Backbone gateways, which is typical of other Internet gateway systems, congestion is characterized by frequent surges lasting up to a few minutes along with occasional intense bursts lasting only a few seconds. Bursts, often caused by misengineered TCP implementations, can be extremely damaging, since they tend to fill up queues quickly and are resistant to quenches. Upon review of the extensive log information collected by the NSFNET Backbone Fuzzballs a key fact emerges: most congestion surges and bursts are caused by a relatively small number of originating hosts. This suggests an effective congestion policy should fairly distribute resources such as buffer space on the basis of originating host address. Congestion causes large queues, large delays, large delay dispersions and invites further abuse by undisciplined host retransmissions, even if heroic buffer space is available [18]. These observations suggest an effective congestion avoidance policy should be based on mean queue size and attempt to fairly equalize the sizes if more than one queue is present. However, when a burst exceeds the buffer space available, the only defense possible is to either drop arriving packets or selectively preempt ones already queued for transmission. This naturally suggests the preemption policy should be an extension of the quench policy and result in preemption of those hosts which do not respond to quench. Furthermore, fairness in the form of equal access to system resources should be observed in order to avoid capture of excessive network resources by reckless hosts [15]. In an experiment designed to evaluate various realizations based on the above policies, the Fuzzball implementation was modified to incorporate comprehensive quench, preemption and service disciplines based on the specifications explicit and implicit in every IP datagram header. The result is a service model based on IP precedence and type-of-service, a fairness model based on IP source address and a congestion-control model based on mean queue size. This architecture can be summarized in the following rules: 1. Customers are identified on the basis of IP source address. Each distinct IP address is associated with a different customer for quench and preemption purposes. 2. Customer service classes are determined on the basis of IP precedence and type-of-service. Queues are serviced in order of class, with all customers of a higher class serviced in FIFO order before any customer of a lower class. 3. Every customer has equal claim to critical system resources, most importantly buffer space. In case of insufficient resources, the quench and preemption mechanisms operate to reduce the allocations of those customers claiming the most resources, so that the available resources will tend to be equally allocated among all customers. 4. The congestion state for each queue is determined by its mean size. Below a selected threshold no quench messages are sent. Above the threshold, quench messages are sent to customers determined by (3) above and at a rate proportional to mean queue size. 5. If upon a new arrival insufficient system resources (e.g. input buffers) can be found for subsequent arrivals, customers determined by (3) above (which may include the new arrival) are preempted from the queues until these resources can be found. 6. In case of ties when the above mechanisms would preempt any one of two or more queues, the queue selected is the one going longest since the last preemption. In order to implement these mechanisms without significant performance penalty, it is necessary to minimize per-packet processing and minimize queue scanning. The technique implemented in the Fuzzball was adapted using timestamp mechanisms already in place for other functions. An arriving packet buffer is timestamped and inserted on an output queue determined by the routing algorithm in FIFO order by service class. When the buffer is removed from the queue for output, the difference between the removal time and the timestamp represents the queueing time for the buffer and is used in conjunction with the time-to-live field in the Internet header to determine whether the packet has lived too long in the system and should be destroyed. The queueing time can also be used to estimate the mean queue size as follows. The queueing times for successive packets are summed over an interval depending on the output line speed or service rate, currently about four times the maximum packet transmission time. At the end of each interval the sum is exponentially averaged into an accumulator with a selected weight, currently one-half, and the accumulator compared against a threshold. If the threshold is exceeded a quench is sent to the originating host with the largest total buffer space on the queue and the accumulator is forced to zero. This scheme is designed to avoid quenching hosts if the mean queue size is small, yet protect against excessive quench rates in case of overload. The averaging interval is chosen to match the expected mean path delay in the Internet, about a couple of seconds. The weight and threshold are chosen to begin triggering quenches when the mean queue size exceeds about one-half and reach maximum frequency (once quench per interval) when the mean queue size approaches two. As reported previously [15] there is no doubt the selective preemption scheme is highly effective in reducing the impact of bursts due to misengineered host implementations or profound speed mismatch at the entry points to the NSFNET Backbone. Some idea of the effectiveness of the quench scheme is evident from the following: In July 1988 before quench was implemented, the NSFNET Backbone aggregate traffic load per queue ranged from 0.3 to 4.0 packets per second, with a mean of 2.0. At that time the preemption rate was .06 and timeout rate .03 percent. In March 1988, several months after quench was installed, the load ranged from 2.6 to 9.2 packets per second, with a mean of 4.7. At that time the preemption rate was 0.37, timeout rate 0.11 and quench rate 0.27 percent. The traffic on the sixteen 56-Kbps internodal trunks had doubled from 31.5 to over 71 packets per second, but the aggregate loss rate was still well below the system objective of one percent. Tests with the Cray TCP/IP implementation show that congestion on NSFNET Backbone paths where the flow enters the system via an Ethernet are effectively throttled by the quench mechanism. Tests with a recently improved Unix 4.3bsd implementation show mean queue size reductions of about one-third over the uncontrolled size when quench is used. These data alone are insufficient to estimate the effect of the quench policy. At this time few definitive conclusions can be reached on the effectiveness of the quench and preemption schemes with respect to the global environment, since they are implemented only in the Fuzzball (although some Fuzzballs are in busy places) and responsive host implementations are far from ubiquitous. While the experiment demonstrated that the schemes can be effective, additional work is necessary to determine the parameters and their affect on the overall system performance. Finally, the fact that quenches are generated at intervals depending on mean queue size, not at regular intervals or as the result of new arrivals, suggests that host response mechanisms based on flow estimators may work much better than those based on window-size estimators with these schemes. 5.3. Time Synchronization Experiments conducted since 1981 have demonstrated that the Fuzzball implementation is ideally suited as an internet time server. The Fuzzball local clock is constructed as a first-order phase-lock loop, in which measured offsets between the local clock and an external reference source are used to adjust the local-clock phase and frequency. The implementation is carefully constructed to minimize sources of jitter, such as interrupt latencies and resource conflicts. The resulting accuracy at the application interface is usually less than a millisecond in phase and less than a part per million in frequency. A time-synchronization function is built into the DCN routing protocol [12], which uses a variant of the distributed Bellman-Ford algorithm [1]. One or more DCN hosts synchronize to an external reference source, such as a radio clock or time daemon, and the routing protocol constructs a minimum-delay spanning tree rooted on these hosts. The clock offsets along the tree are then computed using timestamps included in routing-update messages. Typical local-clock accuracies using Fuzzballs and the DCN routing protocol connected over LAN paths with serial lines and Ethernets are in the order of a few milliseconds. In 1985 after an extensive set of experiments and prototype refinement using Fuzzballs at several locations in the US and Europe, an architecture model and protocol based on the DCN design, but suitable for use as a ubiquitous Internet time service, was proposed. The purpose of the protocol, called the Network Time Protocol (NTP), is to connect a number of master clocks, synchronized to national standards by wire or radio, to widely accessible resources such as backbone gateways. These gateways, acting as primary time servers, use NTP between them to cross-check the clocks and mitigate errors due to equipment or propagation failures. While multiple primary servers may exist, there is no requirement for an election protocol, such as used in some Unix systems [5]. In order to reduce the protocol overhead, some number of local-net hosts or gateways, acting as secondary time servers, can run NTP with one or more primary servers, then redistribute time to the remaining local-net hosts using NTP, DCN or some other protocol such as described in [9] and [5]. In the interest of reliability, selected local-net hosts can be equipped with less accurate but less expensive backup clocks and used in case of failure of the primary and/or secondary servers or communication paths between them. The Fuzzball implementation includes algorithms for deglitching and smoothing clock-offset samples collected on a continuous basis, as well as algorithms for selecting good clocks from a population possibly including broken ones. These algorithms were evolved under typical operating conditions over the last two years. A comprehensive description of the NTP architecture, protocol and algorithms is given in [16]. Timekeeping accuracies achieved with the Fuzzball implementation on typical Internet paths are in the range of a few tens of milliseconds. Figure 1 shows the NTP clock offsets (absolute) measured between Fuzzball primary servers at the University of Delaware and University of Maryland over a period of about a week. Performance data collected with the primary servers show that this accuracy can be reliably maintained throughout most portions of the Internet, even in cases of failure or disruption of clocks, servers or communication paths. There are presently six Fuzzball primary time servers located on the east coast, west coast and midcontinent, each of which serves 20-40 Fuzzball and Unix secondary time servers in the US and Europe. A survey reported in [16] suggests there may be well over 2000 potential clients in the ARPANET/MILNET community, plus probably several thousand more in the NSFNET community. 6. Parting Shots During the past decade when the Fuzzball has been most useful, it represented a relatively inexpensive way to pry into network technology, prototype new architectures and protocols and in general fan the fuzes of the exploding Internet. As the cost of sophisticated workstations has plummeted in recent years, the LSI-11 technology itself is no longer economically attractive. The software, having been lashed together by several generations of programmers and in support of novel applications and ad-hoc experiments lasting over a decade, is not readily portable. For awhile, at least, Fuzzballs may lurk in dark corners of various laboratories in the hulks of otherwise outmoded PDP11 and LSI-11 systems and occasionally barge out to resolve routing problems and rescue wandering packets. In the long term Fuzzballs may endure as time servers and ad-hoc intelligence platforms, for which they are well suited. A new generation of Fuzzballs may in fact have already happened in the form of recent Unix-based workstations, which are fast becoming as ubiquitous as the common video terminal. However, the casual abuse heaped on the Fuzzball operating system is much harder to do in these workstations, since much of the code is licensable and sources are not commonly available. In addition, the Unix kernel is considerably less tractable than the Fuzzball and much of the networking code is not easily modifiable. Nevertheless, the Unix programming support is far superior to the Fuzzball and much more portable. The ideal next-generation Fuzzball would be programmed in C, support the Unix run-time environment, TCP/IP and ISO protocol suites and contain no licensed code. 7. References 1. Bertsekas, D., and R. Gallager. Data Networks. Prentice-Hall, Englewood Cliffs, NJ, 1987. 2. Braden, R. Requirements for Internet gateways. DARPA Network Working Group Report RFC 1009, USC Information Sciences Institute, June 1987. 3. Chu, W.W., D.L. Mills, et al. Experimental results on the packet satellite network. Proc. National Telecommunications Conference (November 1979), 45.4.1-45.4.12. 4. Defense Communications Agency. DDN Protocol Handbook. NIC-50004, NIC-50005, NIC-50006,(three volumes), SRI International, December 1985. 5. Gusella, R., and S. Zatti. The Berkeley UNIX 4.3BSD time synchronization protocol: protocol specification. Technical Report UCB/CSD 85/250, University of California, Berkeley, June 1985. 6. Jacobs, I.M., R. Binder, and E.V. Hoversten. General purpose packet satellite networks. Proc. IEEE 66, 11 (Nov 1978), 1448-1467. 7. Jennings, D.M., L.H. Landweber, I.H. Fuchs, D.J. Farber and W.R. Adrion. Computer networks for scientists. Science 231 (28 February 1986), 943-950. 8. Leiner, B., J. Postel, R. Cole and D. Mills. The DARPA Internet protocol suite. Proc. INFOCOM 85 (Washington, DC, March 1985). Also in: IEEE Communications Magazine (March 1985). 9. Marzullo, K., and S. Owicki. Maintaining the time in a distributed system. ACM Operating Systems Review 19, 3 (July 1985), 44-54. 10. Mills, D.L. An overview of the Distributed Computer Network. Proc. AFIPS 1976 National Computer Conference (New York, NY, June 1976). 11. Mills, D.L. Internetworking and the Atlantic SATNET. Proc. National Electronics Conference (October 1981), 378-383. 12. Mills, D.L. DCN local-network protocols. DARPA Network Working Group Report RFC-891, M/A-COM Linkabit, December 1983. 13. Mills, D.L. Exterior Gateway Protocol formal specification. DARPA Network Working Group Report RFC-904, M/A-COM Linkabit, April 1984. 14. Mills, D.L. Internet Delay Experiments. DARPA Network Working Group Report RFC-889, M/A-COM Linkabit, December 1983. 15. Mills, D.L., and H. Braun. The NSFNET Backbone Network. Proc. ACM SIGCOMM 87 Symposium(Stoweflake, VT, August 1987), 191-196. 16. Mills, D.L. Network Time Protocol (Version 1) specification and implementation. Electrical Engineering Department Report 88-04-01, University of Delaware, April 1988. 17. Mocapetris, P. Domain names - implementations and specifications. DARPA Network Working Group Report RFC-1035, USC Information Sciences Institute, November 1987. 18. Nagle, J. On packet switches with infinite storage. DARPA Network Working Group Report RFC-970,Ford Aerospace, December 1985. 19. Prue, W., and J. Postel. Something a host could do with source quench: the source quench introduced delay (SQuID), DARPA Network Working Group Report RFC-1016, USC Information Sciences Institute, July 1987.