################################################ # # # ## ## ###### ####### ## ## ## ## ## # # ## ## ## ## ## ### ## ## ## ## # # ## ## ## ## #### ## ## ## ## # # ## ## ###### ###### ## ## ## ## ### # # ## ## ## ## ## #### ## ## ## # # ## ## ## ## ## ## ### ## ## ## # # ####### ###### ####### ## ## ## ## ## # # # ################################################ The following paper was originally presented at the Ninth System Administration Conference (LISA '95) Monterey, California, September 18-22, 1995 It was published by USENIX Association in the Conference Proceedings of the Ninth System Administration Conference For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: office@usenix.org 4. WWW URL: http://www.usenix.org ^L How to Upgrade 1500 Workstations on Saturday, and Still Have Time to Mow the Yard on Sunday Michael E. Shaddock, Michael C. Mitchell, and Helen E. Harrison - SAS Institute Inc. ABSTRACT Imagine: It's Saturday afternoon. You run a script, watch it for a while, then go home. When you come back the next day, 1500 workstations and fileservers have new operating systems installed, complete with all your local customizations, with the user data on each one undisturbed and without leaving your office. On December 17, 1994, we did just that. This paper will describe the infrastructure that allows us to perform completely automated updates of a large distributed network of HP UNIX computers. First, we will describe the policies we designed for distributed systems administration. Next, we will describe the tools which we developed or collected to implement these policies, and we will describe how to put it all together to do an upgrade. Throughout we will explain the philosophy behind it all and how our particular implementation could be generalized to other sites. Finally, we will describe some of the lessons learned along the way. Support Philosophy and Design Goals In order to support a large number of workstations and fileservers with a small number of system administrators, we decided very early in the design phase of our network to do everything possible to make all of the machines look the same, but still allow for per-host tailoring. This goal was helped considerably by the fact that our network consists of only Hewlett Packard 9000/700 series workstations and fileservers. The second design goal was that we try not to modify any more system files than necessary. This would allow us to move from one operating system version to another without having to track down a large number of system files that we changed in each version. Our basic philosophy is that network services should be centrally administered, and should be replicated and distributed. AFS replicated volumes and BIND, the Berkeley Internet Name Daemon, are excellent examples of how we wanted to do things. Each uses a master copy, which is then replicated to multiple distributed service providers. If any one service provider becomes unavailable, the service requesters would automatically shift to another service provider. We wanted both our day-to-day systems and our support infrastructure to follow this paradigm. In addition to using replicated distributed services, we also try to cause each workstation to use the service provider which is ``closest'' to it on the network. Our network is heavily subnetted, and we want to reduce inter-subnet traffic and network latency. System Design Our standard system configuration is an HP700 with two internal disks. We named the two internal disks / and /local. The root disk, /, is virtually identical on every workstation and fileserver (except for licensed software differences which are managed by software distribution tools), and is where the operating system binaries, such as /bin and /usr/bin, reside. The /local disk is used for data that is machine specific. This machine specific data includes user home directories, backup tables, the workstation's AFS cache, and other local configuration files. As a result of this design we were able to set up a disk ``cloning'' system. If a workstation loses its system disk, we are able to replace it quickly and easily. If a /local disk breaks, we have everything that we need on the system disk to bring the system up to the point of being able to restore the necessary data on /local. Since almost all of the machine specific data is located in /local, and since all of the system disks are virtually identical, the amount of data that we actually need to back up is greatly reduced. There are still a few files in / that need to be backed up, but this number is very small. If a system has more than the standard two internal disk drives, the additional drives are typically mounted as subdirectories of /local. Since our backup software traverses directory trees in a manner similar to tar(1), mounting a new disk under /local automatically adds it to our list of things to be backed up. HP-UX for HP 700 series machines does not support disk partitioning, so we were limited to using multiple disks in order to segregate system data from other data, specific to that machine. One could, however, achieve similar results by using physical disk partitioning on other systems which do support it. Tools Hostclasses Hostclasses are a way of using a symbolic name to define a set of machines, and to use set operations upon those sets. They were initially designed and implemented at MCNC[1]. This initial implementation read the hostclass information directly from files in a known location. We have modified hostclasses to use a client/server approach, which includes multiple replicated servers, in keeping with our overall philosophy. Hostclasses can be incorporated into applications either through a user level program or a set of C library functions. Hostclasses can be used for myriad applications. For example, we have one hostclass called loc.DC. This defines all of our machines in our main Data Center. We also have a hostclass called appl.AFS, which lists all of our machines which are AFS fileservers. The intersection of these two hostclasses, =loc.DC % =appl.AFS lists all of the machines in our main Data Center which are AFS fileservers. One of our most common uses for hostclasses is to list which machines have which extra products installed, such as the ANSI C compiler or Japanese NLIO support. Hostclasses allow for centralized list management independent of any individual application. A hostclass is similar to a netgroups(4), except that hostclasses are designed to be used in a more general way. Sasify Our primary software configuration management tool is called sasify (formerly called doit). It uses a central database called an action file, which contains a list of actions to apply to a host, and applies them. These actions can include downloading and installing a new kernel, deleting files, installing patches, etc. There is a file stored on each machine that defines the current sasify level. When a system reboots, it performs its normal startup procedures, then downloads a copy of sasify from one of a set of known servers, and runs the downloaded sasify. Sasify checks to see what the current sasify level is, downloads a copy of the action file that pertains to the local host, and performs any actions necessary to update the system to a new level. There are also ways to specify actions that should always be run before and/or after any level-specific actions. Sasify uses hostclasses to determine which actions at a specific sasify level are to be run on which hosts. For example, your first level-specific action might be ``for all machines that do not run X.25, download version 9.77 of /hp-ux.'' Your second action would probably be ``for all machines that run X.25, download version 9.77_x25 of /hp-ux.'' In the first instance, we would use the hostclass expression =sasify.HP700 - =appl.X25 whereas in the second instance we would only need to specify =appl.X25 Following our support paradigm, sasify keeps a single centralized ``database'' of all of the actions necessary for all of our HP 700 series machines. After a new version of the action file is installed, it is replicated to our sasify database servers. When sasify downloads a new copy of the action file, it actually gets it from one of the five database servers. In addition to the action file being replicated from a central location, all of the data that sasify downloads is also replicated from a central location. Sasify then picks the ``closest'' of the data replicas. If it cannot reach the data replica that it prefers, it will randomly pick another of the replicas. Sasify can be used to maintain multiple configurations. It uses hostclasses to determine which configuration a machine should use, so it is not limited to supporting just one type of machine architecture. When we were designing sasify, we knew that we would eventually want to use it to maintain systems other than our main HP network, and designed it accordingly. We currently support 4 distinct HP configuration models and are working on extending sasify database to include the Suns that we support as well. Since hostclasses and sasify were presented at LISA VI in 1992[2], we will not go into more detail about their internal workings. There are a number of additional software maintenance and distribution solutions that have been developed at other sites. These include package[3], depot[4, 5], and config[6], each with a different feature set, which are designed to solve similar problems. Most of these other packages specialized in tracking local software updates. We designed our package so that it would not only enable us to download new software versions to our workstations, but would also provide methods for adding and deleting files, and for running arbitrary programs. If you have not already adopted a formal software distribution system there is a good chance that you will find one already written which will meet your needs. Getticket Experience has shown that even though we have replicated most of our services, there are still times when we want to control exactly how many systems are accessing a service simultaneously. In addition there are occasionally some services which are inherently difficult to replicate, perhaps because of licensing restrictions, or which generally receive only incidental use, but may be accessed more frequently during an upgrade. For example, all of our extra HP products, such as the ANSI C compiler, are loaded from a single location and are accessed only when a system disk is replaced or a product is installed on a new machine. During a full upgrade, however, we reload this software on each machine which is licensed for it. To provide this controlled access to these services, we wrote getticket and ticketd. The client program, getticket, queries ticketd for a ticket to a particular service. Ticketd knows which hosts provide which service, and how many tickets are available for that service on each host. It then hands out tickets to this service in a round-robin fashion to distribute the load between the service providers. The ticket that is given out has the name of the service provider embedded in it, so any scripts that we write do not have to know anything about who the service providers are, or how many of them there are. Getticket is also used to return tickets to ticketd once they are no longer needed. The ticketd program handles the tickets for multiple services simultaneously. Each service is defined in a tickettab file. The tickettab file lists the service name, the ticket lifetime, and the names of the service provider and a count for that provider. Each service provider can have a different number of tickets which it contributes to the pool of tickets for that service. The tickettab file for the hpux_patches service might look like: service hp-ux_patches 14400 milton 8 hasbro 4 tonka 16 This specifies that for the service hp-ux_patches, all tickets will timeout after 14400 seconds (four hours). There are a total of 28 tickets in the service pool: 8 from milton, 4 from hasbro, and 16 from tonka. The ticketd program builds a ticket out of the service provider name, an underscore, and an internally generated number (the seek offset into a file). A ticket from the hp-ux_patches service might be "milton_080". The entire string is returned to the ticketd program so that the service provider section can be pulled out of the ticket with the Korn shell syntax "${ticket%_*}", assuming the ticket is in the variable "$ticket" (and the hostnames do not contain underscores). The command echo $ticket | awk -F_ '{ print $1 }' is another way to get the service provider part of the ticket. While getticket was designed originally to manage access to services on specific hosts, it can be used more generally. The service provider field can be any string. It does not have to be a hostname. This feature allows the getticket program to be used as a simple licensing agent. For instance, suppose you have a 20 user license for the image viewer xv. The tickettab file might looks like this: service xv 1209600 xv 20 You could then use a simple wrapper program that uses the getticket library routines to get an xv ticket, runs the real xv program which has been hidden away, then returns the ticket. This is a handy way of complying with licensing restrictions on programs which do not support license management. Netdistd, Update, and Filesetload HP provides two programs with HP-UX that are used for distributing software across a network. The first of these, netdistd, is the distribution server. A netdist area includes software subsets, which HP refers to as filesets, and patches. The other program is update, which communicates with a netdistd to download filesets to the local machine. Update connects either to a single default netdist server or an alternate server specified on the command line. In order to make HP's update system fit our support paradigm we wrote a wrapper program called filesetload. Filesetload is told which service to use, and which filesets to load. Filesetload then checks to see if any of the specified filesets need to be installed, and if so, it uses getticket to get a ticket to the specified service, runs update to download and install the filesets, uses getticket to return the ticket, and then sends the log from the update to a mailing list of system administrators. Since filesetload is run every time a machine reboots, an unexpected benefit is that whenever a system disk is replaced any additional licensed products will be automatically reinstalled. Sortaddrs During an operating system upgrade, there are many machines rebooting simultaneously, each downloading a large amount of data. To prevent network bottlenecks it is helpful to balance the network load. Sortaddrs was written to address this problem. Sortaddrs takes a list of hostnames, sorts them by subnet address, and prints out the sorted list. The list is sorted so that the first entry is from subnet A, the second is from subnet B, and so on, until we cycle back to subnet A. Putting It All Together HP-UX Recovery System 1 /etc/disktab table of disk-drive geometries, used by 'newfs' 131 /etc/fsck make sure the filesystems are OK 164 /etc/init runs the /etc/rc actions 1 /etc/inittab tells /etc/init what to do 16 /etc/newfs initializes the filesystem 119 /etc/mkboot installs the bootstrap program 57 /etc/mkfs makes a new filesystem, called by 'newfs' 20 /etc/mknod makes a 'special' device 12 /etc/mount mounts a file system 1 /etc/rc startup script 12 /etc/reboot reboots the system 65 /etc/restore restores a dump image 20 /etc/umount unmounts a file system 25 /bin/chmod change the protection modes of a file 20 /bin/chown change the owner of a file 49 /bin/cpio file archiver 16 /bin/date set/display the time 90 /bin/dd changes blocking factor of data 98 /bin/gzip compress/decompress program 16 /bin/ln links two files together 172 /bin/ls get a directory listing 86 /bin/mkdir make a directory 94 /bin/rm removes a file 32 /bin/sed stream editor 262 /bin/sh command interpreter 29 /bin/stty set terminal characteristics 12 /bin/sync updates super-block 70 /lib/dld.sl dynamic loader 856 /lib/libc.sl shared C library 317 /usr/lib/uxboot.700.gz compressed bootstrap program, used by 'mkboot' Figure 1: Included Programs HP, like most UNIX vendors, provides tools to build a memory-resident operating system and filesystem. They refer to this as a recovery system. The typical use of a recovery system is to create a tape to be booted when a system suffers from catastrophic failure of its system drives. We have used these tools to build a custom version of the recovery system with our support tools installed, which is only used during an operating system upgrade. During an upgrade of this type sasify installs some support files in /local, downloads the recovery system as /hp-ux, and then reboots. When the system boots, it is then running our recovery system, which does not access the internal disks. We then mount these disks under temporary names, copy any ``precious'' data from / to /local, unmount /, newfs the / disk, download a dump(8) image of the standard system disk, and restore it as /. Next we copy the previously saved precious data from /local back to /, and reboot again. At this point, sasify picks up where it left off, and continues with any remaining updates. Since the recovery system has a limited size, we had to write smaller versions of some of the standard system utilities. For example, mount(8) takes up 180 KB of disk space. Our ``expert friendly'' version of mount(8), which does no significant error checking, but which is sufficient for our use while running the recovery system, only uses 12 KB. We were able to realize similar savings with several other programs that we needed on our recovery system. Since the HP recovery system uses a memory resident file system, all of the programs necessary for the recovery operation (shell, mount, restore, cpio,...) are contained in the data segment of the kernel image. We had to choose very carefully what would go into the recovery image, because the HP boot ROM would not load a kernel bigger than about 6 MB. We also wanted the recovery image to be as useful as possible, so we included a few things we could have copied to the /local disk instead of leaving memory resident. We pared down the size of executables wherever possible, by writing our own simplified versions of reboot and mount, by using gzip to compress the boot strap loader (uxbootlf.700) installed on the system disk by mkboot, and by stripping the symbol table off anything that had a symbol table. Figure 1 shows a list of all the programs we included and their sizes, in KB. We could save some space by including the /etc/unlink program instead of rm, but we would also have to write our own rmdir program. The ls program is an extravagance; we would like to find something smaller, but echo * is not as easy to use. We would also like to include tar, but it weighed in at 200 KB. For our purposes, cpio at 49 KB does just as well. The /etc/rc script used by the recovery system first tries to mount the /local disk. If that succeeds, it then executes a shell script placed on the /local disk (/local/recover/recover) by sasify. If the mount fails or the shell script is not found, it prints a message on the console and the shell is started. This lets us use the same recovery system for system updates and for emergencies. We use sasify to load into the /local/recover directory any programs needed to finish the update. Currently that includes find, hostname, ifconfig, sum, telnet, and a few others. Find is used to generate a list of filenames to save before the update. ifconfig and hostname are used to set up the networking so that the dump image and check sum file can be pulled across the network. Sum is used to verify that no errors occurred in the transfer of the dump image. telnet is used to send a last-gasp error message when a failure occurs from which we cannot recover. We use telnet to connect to the SMTP port of our mail gateway and send it a hand-crafted SMTP message. How Many, How Quickly As previously mentioned, during our testing phase we can determine how long the update process takes. We also time how long certain phases of the update process take. We can use these timings, along with our knowledge of how many service providers we have and how many simultaneous connections the service providers can support, to calculate how quickly we can reboot the machines. We also know the total time we want the update to take. If our reboot rate cannot get all of the machines updated in the time frame that we want, then we know we will need to adjust the number of service providers where possible. For example, lets say that a workstation takes 30 minutes for a complete update, we do not want any more than 5 machines talking to a single update server at the same time, and we have 10 update servers. Since we have 1500 workstations, that means it will take at least (1500 * 30) / (5 * 10) minutes or 15 hours to complete the update. The workstations should be rebooted (15 * 60 * 60) / 1500 or 36 seconds apart. This gives an upper bound for the reboot interval. In most circumstances we can reboot the machines much faster. Only a portion of the 30 minutes it takes to update the machine needs to be rate-limited to 50 machines at once. A fair amount of time is taken by checking disk consistency, mounting disks, enabling the network, and other local processing. The only part that has to be rate-limited is the section that downloads data from the sasify and netdist servers. If the average machine spends only 15 minutes of the 30 minutes downloading data, we can reboot a machine every 18 seconds instead of only every 36 seconds. The entire update would take about 8 hours instead of 15 hours. There is a danger in updating machines this quickly. Since it takes 30 minutes for one machine to finish, and machines are updating every 18 seconds, if there is a mistake in the update procedure (30 * 60) / 18 or 100 machines could be affected before anyone notices the failure of the first workstation. Doing the Upgrade Since all our workstations run sasify as part of their normal boot-up processing and since sasify is doing the updating, all we have to do to trigger an update is reboot a machine. To facilitate rebooting 1500 machines we wrote a Korn shell script that reads a list of hostnames and a time delay value. It reboots each machine in the list and waits the specified number of seconds before rebooting the next machine in the list. The first version of the script used a passed-in value as the time delay parameter. We soon realized that we needed a way to change the delay parameter while the updates were in progress. If you miscalculate how quickly to reboot machines the servers could become overloaded with update requests. In the calculations above we guessed that a server could handle five connections simultaneously. What happens if the servers can only handle four connections? We wanted a knob we could turn to speed up or slow down the reboots. We added this knob by having the reboot script read a file containing the delay time every time it was going to delay. We now pass in a file name instead of a delay time. When we start upgrading the workstations we try to use a larger delay than is really necessary. We then watch the server load as the workstations start updating. As the workstations complete their upgrade successfully we decrease the time delay in steps, thus rebooting machines more quickly, until the calculated frequency is reached. Upgrading Servers One tricky problem in an automated procedure of this type is upgrading your upgrade servers. Special care is needed while upgrading these servers, since some of them also run the various servers needed for doing the upgrades. We have managed to segregate the various servers well enough that we now do our global reboots in waves. First, we reboot our main netdist server. Then, we reboot the ``database'' servers one at a time. The database servers run named, the various AFS database processes, the hostclass servers, and the librarian services for sasify. Then we reboot half of the replica servers. These servers act as AFS replica servers, as netdist servers for HP patches, and as the download point for sasify data. Next we reboot the other half of the replica servers. At this point, all remaining workstations and fileservers can be rebooted. Lessons Learned Replicate, Replicate, Replicate We discovered that it is important to replicate as many of your services as possible. This improves reliability, provides for load balancing, and allows for improved throughput for large-scale operations such as software updates. Cleanly Segregate Functions Try to segregate different functions on appropriate servers as much as possible. At one point, we were running AFS database servers on one set of machines, named on another set, and sasify librarian and data servers on a third set. In this case, trying to determine the reboot order was a nightmare. We eventually realized that named, the AFS database servers, and the librarian services all depended on each other. We relocated them onto the same set of machines, reducing the complexity of the problem significantly. Additionally, once we instituted the AFS replica servers, the reboot sequence became obvious. Updating an Update Server After a reboot, an average system will start its standard processes, run sasify, and then start its individual local processes. This means that on the netdist server machines we need to start any netdist servers before we start sasify. In addition, we modified ticketd and sasify to be smart about picking servers. If machine A asks for a ticket to a service that machine A provides, ticketd returns a ticket for machine A instead of whatever ticket was next available in the round robin queue. We made similar changes to sasify. Centrally Administer Replicated Services Central administration of our replicated services has made it much easier for us to maintain all of the necessary configurations. While this is not a luxury that some sites, particularly universities, have, we should still acknowledge its benefits. Do Less, More Often We discovered that it is easier and less risky to apply a few changes once a month than to apply a large number of changes a few times a year. This should be self-evident, but it did take us some time to realize this. This also allows us to track patches from HP more closely than we previously could. Since we run an operation that is expected to be available 24 hours a day, 7 days a week, our upper-level management had to be convinced each time we need to do an upgrade that, overall, it was worth the downtime. Scheduled downtime only occurred 2-3 times per year, with many changes occurring at each of these outages. We were able to convince our management that the chances of a major error being made while only making a few changes was much smaller than if many changes were being made. We now schedule our downtime on a highly predictable monthly schedule with specific dates announced months in advance, This allows our product developers to schedule releases, regression tests, etc., without being surprised by scheduled downtime. In addition, we consolidate hardware changes to coincide with these scheduled maintenance times which has reduced the need for incidental downtime at other times. We are currently updating 1800 workstations each month. Testing When upgrading over 1500 workstations, a widespread failure can be particularly catastrophic and take a long time to fix. In this environment carefully controlled testing is extremely important before a major upgrade. We have a set of test machines where we do our initial testing. Once we are satisfied with these tests, we install the changes on the workstations of the UNIX support group. This gives us a chance to ``test drive'' the changes. During these tests we also time how long it typically takes to do an update, and use those numbers to help us determine how quickly machines can be rebooted. The Time It Almost Didn't Work When we first started updating the servers we had some network hardware problems. The symptoms were that the sasify program would hang without completing the transfer of the system disk image. We were worried that more network hardware failures could cause many workstation updates to fail in such a way that we would have to walk to them and restart them. We quickly wrote a program that would fork() and exec() its argument list and wait a specified number of seconds before killing its child process. This program was used to limit the time sasify could be hung. If sasify failed our script would restart it at the beginning (up to ten times). This change to fix the hung session problem was put into place after some testing. We started the workstations while we analyzed the failures from the hung machines, still trying to find the original problem even though we had worked around it. Analysis of the hung machines suggested some additional changes could help prevent some of the failures. The new changes were added while the workstation update was in progress. There was little or no testing done to the new changes. (Big Mistake!) Half an hour latter we started noticing that workstations were not finishing their update. We started to look for a reason and found a syntax error in the new changes. About 100 machines were trying to load the wrong things. We stopped the update process for the rest of the machines, then tried to figure out how to fix the 100 `broken' machines. We came up with a solution that required us to just restart sasify, tested it on a few machines, then started to walk to the 100 `broken' machines. Our use of sortaddr insured that they would be spread out all over our campus. When we got to the first machine, we found it had already restarted sasify. We watched it until we knew it was running correctly, then we moved on to the next machine. It too was re- running sasify. It was then that we realised that the code to limit the execution time of sasify had kicked in, causing each machine to start over, this time executing the corrected code. It saved us from having to walk to 100 machines! Conclusions While the implementation that we describe was done on an HP platform, we believe that many of the concepts are generalizable to any UNIX platform. The specific tools that we use to manage this network are only part of the whole picture. You also need an overall policy which will guide your support structure development as your network grows. We have found that a comprehensive strategy, consistently applied across all support solutions, not just those designed specifically for software updates, makes performing major software updates highly efficient, even in a large network. Availability For information on the availability of any of the tools mentioned in the paper, please send email to heh@unx.sas.com. Author Information Helen E. Harrison is the UNIX Support Manager at SAS Institute Inc., where her group provides hardware and software support for a network of over 1800 UNIX workstations and servers. She has been involved in UNIX systems administration for over 12 years and holds a B.S. in Computer Science from Duke University. Reach Helen at SAS Institute Inc, SAS Campus Drive, Cary, NC 27513; or by e-mail at heh@unx.sas.com. Michael Mitchell is a Systems Programmer in the UNIX Support Group at SAS Institute Inc. He has been involved in Distributed Computing for over 8 years and UNIX systems for 15 years. He holds a B.S. in Computer Science and a B.S. in Electrical Engineering, both from North Carolina State University. Reach Mike at SAS Institute Inc., SAS Campus Drive, Cary, NC 27513; or by e-mail at mcm@unx.sas.com. Michael Shaddock is a Systems Programmer in the UNIX Support Group at SAS Institute Inc. He has been involved in UNIX systems administration for over 8 years and holds a M.S. in Computer Science from the University of North Carolina at Chapel Hill. Reach Mike at SAS Institute Inc., SAS Campus Drive, Cary, NC 27513; or by e-mail at shaddock@unx.sas.com. References 1. Helen E. Harrison, Stephen P. Schaefer, and Terry S. Yoo, ``Rtools: Tools for Software Management in a Distributed Computing Environment,'' Proceedings of the Summer USENIX Conference, pp. 85-93, San Francisco, CA, June, 1988.. 2. Mark Fletcher, ``doit: A Network Software Management Tool,'' Proceedings of the USENIX Systems Administration (LISA VI) Conference, pp. 189-196, October 19-23, 1992., Long Beach, CA. 3. Transarc Corporation, ``AFS System Administrator's Guide,'' FS-D200-00.10.4, pp. 14-1-14-26, Pittsburgh, PA. 4. Walter C. Wong, ``Local Disk Depot - Customizing the Software Environment,'' Proceedings of the USENIX Systems Administration (LISA VII) Conference, pp. 51-55, Monterey, California, November 1-5, 1993. 5. Wallace Colyer and Walter Wong, ``Depot: A Tool for Managing Software Environments,'' Proceedings of the USENIX Systems Administration (LISA VI) Conference, pp. 151-159, October 19- 23, 1992., Long Beach, CA. 6. John P. Rouillard and Richard B. Martin, ``Config: A Mechanism for Installing and Tracking System Configurations,'' Proceedings of the USENIX Systems Administration (LISA VIII) Conference, pp. 9-17, September 19-23, 1994., San Diego, CA.