Towards a High-Level Machine Configuration System

             Paul Anderson - University of Edinburgh

                            ABSTRACT

This paper presents a machine configuration system which stores
all configuration parameters in a central ``database''. The
system is dynamic in the sense that machines reconfigure
themselves to reflect any changes in the database whenever they
are rebooted.  The use of a central database allows
configurations to be validated, and correct configurations to be
automatically generated from policy rules and high-level
descriptions of the network.  A permanent record of every machine
configuration is always available and the system is extensible to
handle configuration of new subsystems in a modular way. The
paper includes a review of previously published work and common
techniques for cloning and configuring workstations.

                          Introduction

     When a new machine is installed, it will rarely be used with
the default configuration supplied by the vendor of the operating
system.  The partitioning and allocation of space on the disks,
the software packages to be carried, and the network name and
address are typical configuration parameters that will be set
differently by different sites and for different machines at the
same site.  In addition to these basic parameters, most large
sites will require a more extensive customisation of the basic
system, for example running additional or replacement daemon
processes such as time synchronisation.

     Most vendors provide some kind of installation procedure
which allows the basic configuration parameters to be set.
However, in a typical large site, these procedures are nearly
always inadequate for one or more of the following reasons:
 o The procedures cover only the vendor-supplied software and
   are not extensible to cover local and third-party software.
 o The interface to the procedures is often a GUI and cannot
   easily be automated for handling large numbers of systems.
 o The procedures are not complete, and further manual
   operations (for example, crontab), or additional hand-editing
   of configuration files (for example, inetd.conf), are
   required to completely configure the machine.
 o The configuration information is stored on the machine itself
   so that it must be re-entered whenever the machine is re-
   installed, and it is unavailable for inspection when the
   machine is down.
 o The procedures are highly vendor-specific and are not
   appropriate for use in a heterogeneous environment.

     Sites with a small number of machines, or simple
configuration requirements, sometimes use only the vendor-
supplied procedures, but this means that machine upgrades or
installations require considerable manual intervention. Large
sites will usually have developed their own procedures to help
overcome some of these problems, and the following section
surveys some of the techniques that have been used.

     The remainder of the paper describes a configuration
procedure that has been developed for use in the Computer Science
Department at Edinburgh University. This stores complete machine
configuration information in a central ``database'', allowing
configurations to be validated and automatically generated. The
system is also modular so that new subsystems can be added
independently to the configuration procedure.

                           Background

     Most vendor-supplied installation and configuration tools
suffer from all of the problems listed in the previous section.
In many cases, attempts to simplify installation for small sites
(for example, graphical user interfaces) have caused further
difficulties for large sites.  Even where some provision has been
made for large-scale automation (such as Sun auto-install[[1]),]
the configuration process is still inadequate for the other
reasons given above.

     The most common technique for dealing with a large number of
machines is cloning.  Cloning procedures are not normally
supplied by the vendor, but different systems have evolved at
many large sites (for example, Ohio State University[[2]),] all
sharing similar characteristics.  A single template file-system
is hand-crafted with the site-specific configuration information
and replicated directly to create a new machine. Clearly, such a
pure cloning process is only sufficient if there are no machine-
specific configuration parameters, and every machine on the site
has an an identical basic file-system (or there are a small
number of categories). This approach has been taken in some
cases, such as the Athena[3] system, but it usually requires
unacceptable modifications to the vendor's base operating system.

     Various schemes have been used for applying machine-specific
changes to the template after (or during) the cloning operation;
for example, the above Ohio scheme, typecast[4] and mkserv[[5].]
These are adequate for environments where the configurations are
largely static and similar. However, they can become unwieldy
when there is a wide variation in the required configurations
and/or frequent changes. It can often be difficult to determine
the configuration that is actually being applied to an individual
machine; in some cases, this information might not exist
explicitly*; [[FOOTNOTE: For example, a particular configuration
parameter might be generated ``on the fly'' at installation time
by a script which implements some kind of policy rule.  ]] in
other cases, it might exist in a wide range of different files
and formats.  The lack of modularity in the configuration process
also makes it difficult for different people to maintain the
configuration of separate subsystems, and changing the
configuration of an existing machine is usually difficult.

     Storing the machine-specific configuration information
explicitly in some external database (for example, sad[[6])] is a
major improvement, since the configuration of a particular
machine is always clear and the information is always accessible,
even when the machine is down.  There is still the option of
using procedural rules to generate certain configuration
parameters* [[FOOTNOTE: For example, there may be a rule of the
form ``machines belonging the research group always carry GNU
Emacs''.  ]] but the rules are evaluated before the machine is
actually configured and the results of the evaluation are visible
explicitly in the database.

     The information from such a database can be used during the
cloning process to control the creation of the file-systems when
the machine is being built. In this case, the machine-specific
characteristics are hard-wired into the file-system and the
database information is no longer required for the running of the
machine (a static configuration).  Alternatively, all machines
can be created as pure clones and the configuration information
can be read dynamically from the database as the machines are
running (usually at boot time).  If the configuration information
is used in a static way, it is difficult to change without
completely re-cloning the system, but the machine is not
dependent on the availability of the central database, and no
configuration procedures need to be run at boot time.  Dynamic
configuration requires special configuration procedures (usually
run at boot time) and the machine is dependent on the existence
of the central database, but it does allow changes in the
database to be reflected immediately in the actual machine
configuration.

     A purely dynamic system is normally impractical for several
reasons:
 o Configuration of hardware-related parameters such as disk
   partitioning is not possible on a running system where the
   disks contain live data.
 o Configuration of very low level system software (such as
   basic networking) is difficult because the machine normally
   needs the network to be available before it can access the
   configuration database.

     However, the ``rotting'' of static systems and the
difficulty of identifying the configuration state of a particular
machine can lead to many problems which make a dynamic system
attractive.

     Many vendors are now moving towards dynamic configuration
systems based on object-oriented technology.  The Tivoli
``Management Environment'', for example, is an object-oriented
product which is available on several platforms. This provides a
central configuration database and a ``framework'' into which
objects can be slotted to control the various subsystems in a
uniform way. Hopefully, standards will develop, and become
adopted, so that multiple vendors (and users) can construct
objects which inter-operate across heterogeneous systems.
Although this provides the most promising future direction for
system configuration, most vendors do not currently supply such
software as part of their standard operating system package, and
current implementations may be too expensive and/or inflexible
for many sites.

                 A Simple Dynamic Implementation

     The Computer Science Department at Edinburgh University runs
a network of 300-400 workstations with about 2000 users. System
administration tools from the department are often adopted on a
wider scale throughout the University. At present, these machines
are mostly Suns (currently being upgraded to Solaris 2) and X
terminals, but the ability to integrate systems from different
vendors is considered very important and DEC, HP and SGI systems
have all previously been integrated into the network.
Particularly within research groups, such as the LFCS*,
[[FOOTNOTE: Laboratory for Foundations of Computer Science.  ]]
systems change rapidly and machine configurations are very
diverse, so it is important to have a sufficiently flexible
infrastructure to support this type of environment.

     The lcfg (``local configuration'') system[7] now being used
in the Computer Science Department is a mainly dynamic system
with a small amount of static configuration for the hardware and
low-level parameters.  All information that is necessary to
distinguish one machine from another is contained in the central
database and every machine can be rebuilt or duplicated using
just the information from the database together with the generic
system software*.  [[FOOTNOTE: Obviously backups of any user data
are also required.  ]] Only Suns are currently being configured
with lcfg, but it is intended that the system be portable,
presenting a uniform interface to the configuration process
across different platforms. The static part of the configuration
which interfaces with Sun auto-install is the only part of the
system which is expected to be significantly different on
different platforms.

     The static part of the configuration occurs when a machine
is installed.  Information is read from the database and used to
construct auto-install configuration files determining the type
of machine, the layout of the disks, the base software
configuration, and other static parameters.  When the machine
reboots for the first time after an installation, a further
script performs any remaining static configuration. This might
include addition of clients or loading of additional software
across the network.  All machines can be installed entirely
automatically, complete with all the necessary local
customisation, simply be creating the database entries and
booting the system from an install server.

     Every time the machine boots, a script reads the
configuration database to determine the subsystems that should be
configured on that machine. This executes a script for each
subsystem (for example, DNS or xntp) which consults the database
for relevant parameters and dynamically configures the subsystem
accordingly. New subsystems can therefore be incorporated into
the configuration process simply by adding their names to the
database entry for a specified machine.  The dynamic
configuration allows machines to be reconfigured very quickly to
adapt to changing requirements, or work around failed hardware.

                   The Configuration Database

     The configuration scripts use common routines to consult the
database for resources of the form
 host.subsystem.attribute = value
In theory any database could be used to hold these resources and
any mechanism could be used to distribute them to the client
machines.  A large relational database might be a useful tool for
extracting information about machine configurations, and making
complicated changes to groups of machines, but it is not strictly
necessary and, at present, a simple flat file is used for each
machine.  The resources are distributed and supplied to the
client machines using NIS[[8].]  NIS is not ideal for this
purpose, since it involves propagation of the entire database
every time a single change is made, and all system software below
the level of NIS must be statically configured.  We hope to
eventually develop a special protocol that operates at a lower
level, but NIS is currently proving adequate as a resilient
method of supplying machines with the necessary resources.

     The information in the source files is deliberately of a
very low level.  As described later, the eventual aim is to
generate this information automatically from a higher level
description of the machine and its relationship to other machines
in the network.  At present, the files are edited by hand and
passed through the C preprocessor which allows some degree of
structure to be introduced, and machines with similar
configurations to share common blocks of resources.  A total of
about 400 different resources are available for configuration of
various different subsystems, but many of these will nearly
always be used in their default values and a typical large server
requires about 70-100 resources to fully describe the
configuration. Clients usually require about half this number,
and the use of the C preprocessor reduces the configuration
description even further (some examples are given in the
appendix).

     Independent processes can very easily extract information
from the database and one important application of this is to
validate the consistency of the resources. A simple Perl script
scans the resources for a specified machine and performs various
consistency checks; the script is continually being extended to
identify the most common configuration errors and this allows
many problems to be detected before the machine installation has
started. Since information is available on all machines, inter-
machine problems can be located that might not normally be
detected until a much later stage. In particular, it is possible
to check before removing a machine from the network, that all
dependencies on that machine have been removed. Not all of these
dependencies are immediately obvious; for example, every ethernet
segment must include a host supplying bootparam service, and
removing the last bootparam server from an ethernet segment
should cause a warning to be generated.  Such checks can be used
to identify weak points in the network by answering questions,
such as ``what happens if a particular server fails''.

     Some of the resources are purely informational and are used
for administrative purposes (for example, the owner and location
of the machine). One interesting application is an experimental
World Wide Web service which makes information on all machines
available over the World Wide Web by automatically querying the
database when the page for a particular machine is accessed*.
[[FOOTNOTE: http://www.dcs.ed.ac.uk/cgi-bin/ hosts/INDEX ]] The
information in the database allows hyper-text links to be
generated between clients and their servers, and between personal
workstations and the home pages of their owners.

                   The Configurable Subsystems

     Each configurable subsystem on a machine (for example, a
printer) is a member of a particular class and the configuration
for all subsystems in a class is performed by the same class
script.  All the class scripts share a number of common routines
and are written in a stylised manner; this allows new classes
with simple configuration requirements to be added very easily.
A single subsystem called boot starts when the system boots.  The
resource boot.services is consulted to determine all the other
subsystems that should be configured at boot time and the
appropriate class scripts are executed. Provision is also made to
execute these scripts manually, or at regular intervals (from
cron).

     There are currently about 30 different classes implemented,
of which the following is selection:
   auth  configures all the authorisation of access to the
         machine.  This controls, for example, the groups of
         users that are permitted to log in, and the machines to
         be included in hosts.equiv file.
    amd  controls the amd automounter, specifying the cluster
         that is to be used and hence determining the servers
         from which the various file-systems will be mounted.
    dns  controls the type of DNS service to be provided and
         (where appropriate) specifies the servers to be used.
    www  controls the World Wide Web server.
    xdm  controls the xdm subsystem specifying which X terminals
         are to be managed and configuring some of the parameters
         of the login session. A separate subsystem controls the
         font server.
   inet  controls the services that are managed by inetd,
         including the access control which is managed by the
         tcpd wrapper program.

     The above subsystems run only when the machine boots, and
any change in the database resources is not reflected in the
corresponding subsystem until the machine is rebooted (or the
subsystem is manually restarted). These are mostly one-off
configurations (such as auth) or daemons which start once and run
continuously (such as www or xdm). Some subsystems need to be run
at regular intervals (for example, backups) and the boot
subsystem can arrange to schedule these to run from cron. In
particular, a group of processes runs every night to perform any
necessary updates to the local file-systems:
updatelf  uses lfu[9] to update the local file-systems with any
         changes that have been made to the master copies of
         locally maintained software.  The configuration of this
         subsystem determines the software packages that are to
         be carried by the machine.
  patch  applies any new systems patches that have been installed
         which are relevant to the machine.
 update  makes any necessary modifications to files in the root
         file-system to track the latest static configuration.

     Most class scripts also accept additional arguments to stop
and restart the subsystem, and to display logging and status
information.  A client program called om, and its associated
daemon omd, provide a way to execute these additional methods
remotely, including an authorisation scheme with access control
based on the user, the host, the subsystem, and the method. This
allows users to be given permission, for example, to stop and
restart certain daemons running on their personal workstation.
One possibility is that om will be extended to understand
netgroups of machines, allowing subsystems to be easily restarted
on a whole cluster of machines with a single command.

                    High Level Configuration

     One of the most important aspects of machine configuration
is to specify the role of a machine within the network. This
includes the relationship between a client and the servers which
supply various different services. Typically, these will include
file services of various types (home directories, program
binaries), name service (DNS), time synchronisation (xntp), font
service and others.  If a client and server are configured
independently, then there is no guarantee that the configurations
are compatible; for example, a client can quite easily be
configured to expect file service from a machine which is not
exporting the required files, or even from a machine that does
not exist!  Even within a single machine, there are similar
dependencies offering scope for errors when different subsystems
are configured using different methods; for example, if a
particular machine is to run a World Wide Web server, then the
appropriate software must be available on the machine.

     Using a common source of configuration information allows
most of these dependencies to be checked automatically. However,
the low level nature of the raw configuration resources means
that production of configuration files is awkward and error
prone.  Ideally, we would like to describe the relationship
between machines at a much higher level and have the low level
configuration information generated automatically. For example:
 o Machine A is the name server for the research group.
 o Machine B is a member of the research group.
 o Machine C is a member of the research group.

     From the above specification, it is possible to generate all
the necessary low level configuration information to load the
name-server software, and start the name-server subsystem, on
machine A, and configure the other machines to act as clients of
this machine.  An error (or at least a warning) would be expected
for any machines which did not have a name-server.

     The simple example given above can be accomplished quite
easily, using features of the C preprocessor, with the existing
implementation.  Changing machine A to some other machine should
cause the software and the daemon to be transferred to the other
machine, and clients to change their resolv.conf files to point
to the new server.

     In addition to the essential rules, like the name-server
example above, it is also very useful to be able to specify
policy rules in a similarly explicit manner. For example:
 o Students are not allowed to log in to personal workstations
   of staff members.
 o File-servers which are updating local file-systems during the
   night should do so at different times to avoid network
   congestion.

     Such policy rules are frequently contravened in practice
because they are not critical to the operation of the system and
mistakes can easily go unnoticed.  Using the rules to actually
generate the machine configuration guarantees that they will be
enforced.

     As the rules and their interactions become more complex, the
need for a special-purpose configuration language to replace the
C preprocessor quickly becomes apparent.  Designing such a
language[10] is not easy for several reasons; it must be able to
express high-level rules in a clear, explicit way, but be capable
of generating low level configuration information from these
rules.  Since the configuration subsystems must be extensible,
the language itself must be extensible so that new rules can be
added to control new subsystems, or new features of existing
subsystems.  Possible designs for such a language are currently
under investigation.

                   Conclusions & Further Work

     The use of a dynamic configuration system storing parameters
in a central database has been a big improvement over the
previous static system. In particular:
 o The ease with which configurations can be changed, and
   machines can be completely rebuilt, means that machine
   configurations do not ``rot'' and are always up-to-date.
 o New subsystems can easily be introduced and configured onto
   existing machines without interfering with other subsystems
   on the machine.
 o The ability to validate and examine explicit machine
   configurations from the database has reduced the number of
   errors that are caused, for example, by forgetting some
   dependency when removing a server.
 o Since the machines automatically reflect the configuration in
   the database, it is possible to have some confidence that
   policies specified in configuration rules are actually being
   enforced on the machines. This provides an improvement, for
   example, in security.

     Disadvantages include the longer time required to boot a
machine and the difficulty of manually creating correct low-level
configuration information.

     The ability to specify configurations and policies at a much
higher level is a very useful facility. The best way in which to
implement and exploit this possibility is an area for further
investigation. In the short term, incorporation of further
subsystems, porting to other platforms, and improvements to the
mechanism for storing and distributing the resources are likely
areas of future work.

                          Availability

     Copies of this paper and associated technical reports are
available via WWW from http://www.dcs.ed.ac.uk/staff/paul or
pub/paul/papers on ftp.dcs.ed.ac.uk (ftp).

                        Acknowledgements

     Thanks to all the systems staff of the Computer Science
Department for long discussions on the design of the
configuration system and for suffering all the machines with
broken configurations during the development and testing.

                       Author Information

     Paul Anderson is a graduate in pure mathematics. He has
taught computer science and managed software development before
becoming involved in systems administration.  He is currently
employed as Systems Development Manager with the Laboratory for
Foundations of Computer Science, where he is responsible for the
research laboratory's network. He is also working with other
system managers to develop the computing facilities within the
department and the University.  Paul can be reached by mail at:
The Laboratory for Foundations of Computer Science
Department of Computer Science
University of Edinburgh
King's Buildings
Edinburgh EH8 3JZ
U.K.
His email address is: paul@dcs.ed.ac.uk.

                           References

1. Sun Microsystems, ``Automatic installation,'' in Solaris 2.3
   system configuration and installation guide, 1993.
2. George M Jones and Steven M Romig, ``Cloning Customized Hosts
   (or Customizing Cloned Hosts),'' Proceedings of the LISA V
   Conference, pp. 233-237, Usenix, 1991.
3. Jennifer G Steiner and Danial E Geer, Network services in the
   Athena environment, Project Athena, Massachusetts Institute of
   Technology, Cambridge, MA 02139.
4. Elizabeth Zwicky, ``Typecast: beyond cloned hosts,''
   Proceedings of the LISA VI Conference, pp. 73-78, Usenix,
   1992.
5. Mark Rosenstein and Ezra Peisach, ``Mkserv - Workstation
   customization and privatization,'' Proceedings of the LISA VI
   Conference, pp. 89-95, Usenix, 1992.
6. Rick Dipper, ``Management information and decision support
   tools for Unix systems administration.,'' Proceedings of
   UKUUG/SUG Conference, pp. 143-153, UKUUG, 1993.
7. Paul Anderson, ``Local system configuration for syssies,'' CS-
   TN-38, Department of Computer Science, University of
   Edinburgh, Edinburgh, August 1991.  Available by anon ftp as
   file pub/paul/papers/tn38.ps from site ftp.dcs.ed.ac.uk.
8. Sun Microsystems, ``The Network Information Service,'' in
   System and network administration, pp. 469-511, Sun
   Microsystems, 1990.
9. Paul Anderson, ``Managing program binaries in a heterogeneous
   UNIX network,'' Proceedings of LISA V Conference, pp. 1-9,
   Usenix, 1991.  Available by ftp from ftp.dcs.ed.ac.uk as
   pub/paul/papers/lisa5.ps
10. Bent Hagemark and Kenneth Zadeck, ``Site - a Language and
   System for Configuring Many Computers as One Computing Site,''
   Proceedings of the LISA III Conference, pp. 1-13, Usenix,
   1989.

          Appendix 1: Configuration for a Simple Server

 /**************************************************************************
   Staffa
 **************************************************************************/
 #include <lfcs.h>
 /* Resources for information only */
 info.type               server
 info.location           the machine halls
 info.make               Sun
 info.model              10/40
 info.owner              LFCS
 info.memory             16 16 16
 info.sno                411m1238
 info.hostid             727099f2
 info.disks              internal wren
 info.disktype_internal  SUN1.05 cyl 2036 alt 2 hd 14 sec 72
 info.disksize_internal  1Gb
 info.diskdev_internal   c0t3d0
 info.disktype_wren      CDC Wren VII 94601-12G cyl 1929 alt 2 hd 15 sec 68
 info.disksize_wren      1Gb
 info.diskdev_wren       c1t1d0
 /* Statically configured resources */
 install.system_type     server
 install.arch            sun4m
 install.client_arch     sun4c sun4m
 install.local           B_INSTALL_CONFIG
 install.interfaces      le0 qe0
 install.hostname_le0    HOSTNAME
 install.hostname_qe0    HOSTNAME-j
 install.updatelf        true
 install.install_server  true
 install.filesystems     root swap var usr export local
 install.fs_root         c0t3d0s0 32 /
 install.fs_swap         c0t3d0s1 64 swap
 install.fs_var          c0t3d0s3 64 /var
 install.fs_usr          c0t3d0s4 auto /usr
 install.fs_install      c0t3d0s7 350 /export/install
 install.fs_export       c0t3d0s5 free /export
 install.fs_local        c1t2d0s2 all /disk/local
 /* Dynamically configured resources */
 auth.rootpwd            LFCS_SERVER_PASSWD
 auth.users              LFCS_SERVER_USERS
 auth.equiv              LFCS_EQUIV
 auth.rhosts             LFCS_RHOSTS
 amd.cluster             HOSTNAME.dcs.ed.ac.uk
 dns.type                server
 yp.type                 slave
 yp.servers              HOSTNAME
 boot.services           SERVER_SERVICES
 boot.run                SERVER_RUN
 cron.objects            boot
 cron.run_boot           0 0 * * *
 updatelf.fs             local
 updatelf.fs_local       sun4-51 share
 updatelf.netgroups      delete copy
 updatelf.action_copy    copy
 updatelf.action_delete  delete
 nfs.exports             local
 nfs.fs_local            /disk/local
 nfs.options_local       -o ro=machines

     Appendix 2: Configuration for a Simple Diskless Client

 /*********************************************************************
  * Gasker
  ********************************************************************/
 #include <lfcs.h>
 /* Resources for information only */
 info.type               private
 info.owner              paul
 info.location           1612
 info.make               Sun
 info.model              Classic
 info.sno                302U4308
 info.hostid             8001d534
 /* Statically configured resources */
 install.system_type     client
 install.arch            sun4c
 install.interfaces      le0
 install.hostname_le0    HOSTNAME
 install.root            B_SERVER:/export/root/HOSTNAME
 install.swap            B_SERVER:/export/swap/HOSTNAME
 /* Dynamically configured resources */
 mail.root               paul
 auth.rootpwd            LFCS_CLIENT_PASSWD
 auth.users              LFCS_CLIENT_USERS
 auth.equiv              LFCS_EQUIV
 auth.rhosts             LFCS_RHOSTS
 amd.cluster             B_SERVER.dcs.ed.ac.uk
 dns.servers             B_SERVER
 cron.objects            boot
 cron.run_boot           0 4 * * *