As with any job, tools are required to simplify tasks, increase efficiency, and many times just make things possible. The job of system administration is almost cursed with too many tools. It seems like every month there is yet another scripting language to do everything you used to do with that old language. The ability to compound tools into something more then the sum of its parts is extremely easy in the computer field. After many years, most system administrators have their own toolbox of favorite tools that they have honed over time to work just the way they need.
This part of the course will be very UNIX specific. It isn't really possible to teach tools and scripts without actual examples. However, many programs listed in this section are available on many different platforms and operating systems. Most of those that are not, have equivalents. I'm afraid it is left up to the reader to discover these if they wish.
Most tools start with simple unix utilities such as du, telnet and
ps. These are used all the time by sysadmins to check the status of
running machines, see what users are doing, and test that certain
services are working.
ps and top
Since just about the only way to do anything on a UNIX machine is with a process (except for the kernel itself) tools like ps and top are used constantly. If a machine is running slowly, a good bet is that there are too many processes or one process is taking up too much CPU time.
example output from top...
11:31pm up 5:44, 5 users, load average: 0.06, 0.07, 0.04 48 processes: 46 sleeping, 2 running, 0 zombie, 0 stopped CPU states: 5.1% user, 1.5% system, 0.0% nice, 93.2% idle Mem: 192936K av, 189784K used, 3152K free, 49044K shrd, 3928K buff Swap: 216868K av, 6628K used, 210240K free 159316K cached PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND 1128 kscott 18 0 8164 8164 8048 S 0 4.3 4.2 3:43 mpg123 181 kscott 1 0 3772 3544 1404 S 0 0.5 1.8 0:29 emacs 1438 root 3 0 1132 1132 940 R 0 0.5 0.5 0:00 top
example output from ps...
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND kscott 116 0.0 0.5 1788 1084 tty1 S 17:47 0:00 -bash root 117 0.0 0.2 1048 436 tty2 S 17:47 0:00 /sbin/agetty 38400 tty2 linux
find is a wonderful program with some of the worst syntax ever found in a UNIX utility. However, once you get a handle on it, you can write very powerful one-line scripts to do such things as recursively remove all files that have not been accessed in the past 10 days, and print out the names as you do it.
find /tmp -type f -xdev -atime +10 -exec rm {} \; -a -print
Something like this could be put in a cron job to clean up
disk space.
strace
strace shows system calls and signals issued by processes and spawned child processes. It's most often used to see what files a process is accessing. It can be used to trace running processes or to trace processes from beginning to end and this example does here.
strace -f -o /tmp/man.out ls
This is just a sample from the output...
1015 open(".", O_RDONLY|O_NONBLOCK|0x10000) = 4 1015 close(4) = 0 1015 getdents(4, /* 58 entries */, 3933) = 1156 1015 write(1, "dir\t\t nsmail\n", 16) = 16 1015 close(1) = 0
I used strace to diagnose a really slow running netscape. It turned
out that it was running a real-time java applet and was constantly checking
an NFS mounted cache directory for its files. The solution was to turn
off file caching.
Services
syslog
syslog is a system logger. Combined with a syslog.conf file, one can define what machines and what files certain logs get sent to. These logs can be useful in debugging machines, tracking malicious users, or just watching for anything out of the ordinary.
Here are some examples of common syslog messages...
Jan 31 09:22:35 jupiter kernel: nfs: server saturn not responding, timed out Jan 31 09:23:09 mail sendmail[17310]: JAA17309: forward /u/kscott/.forward.mail+: World writable directory Jan 31 00:12:58 mars sshd2[6450]: connection from "192.168.11.1"
email may be the most commonly used tool in system administration. It's used to send warnings to people when they are running out of disk space, keep notes of things that need to be fixed, inform users of impending downtime, and just about anything else you can think of. Many automated scripts that run in the night send out reports via email.
And of course, once something is so widely used as email, there needs to be another tool to filter out the useful from the useless. procmail is just this tool. This is an entry from my procmailrc that deletes any mail messages from root with the subject cron in it.
# Use this when we are going to be down for a while :0 * ^From: root@mailhost.nmt.edu * ^Subject: cron:* /dev/null
cron is the clock daemon in UNIX, and is used to run programs at specific times. cron can be configured to run anywhere from once a minute, to a certain day of the week, to once a year. Many of the above tools can be combined into a monitoring utility spawned via cron. This example looks for files larger then 10 MB every night at 2 minutes to midnight and mails the names of them to root.
58 23 * * * find /home -type f -xdev -size +10000k -print | mail -s "large files" root
Scripts
Scripts provide a way of utilizing many tools into one
comprehensive tool that can be run easily or, even better, run
automaticly when needed.
Shell Scripts
Shell scripts are probably the most commonly used language by sysadmins. A shell script is simply a list of instructions that one could just as easily type on the command line, except they are executed in sequential order from a file. By putting them in a file, the list of commands can be treated like a program.
A common place to find shell scripts is in the rc startup files. Since most UNIXes use startup scripts differently, you can usually find them by reading the man page on init; the first process started after bootup.
A simple example of a shell script is one that rotates syslog files. If left alone, the files that syslog logs to will continue to grow until the disk runs out of space, or hit a file size limit in the kernel (which is 2gigs for many versions of UNIX).
#!/bin/sh
# Usage: rotate
PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/bin ; export PATH
logfile=$1 # name of the logfile without any extensions
to_num=$2 # extension that the rotated file will have
while [ $to_num -gt 1 ] ; do
from_num=`expr $to_num - 1` # extension of the file to be rotated
if [ -f $logfile.$from_num ] ; then # if the logfile already exists
mv $logfile.$from_num $logfile.$to_num
elif [ -f $logfile.$from_num.gz ] ; then # in case they are compressed
mv $logfile.$from_num.gz $logfile.$to_num.gz
fi
to_num=$from_num
done
if [ -f $logfile ] ; then # sanity check
cp $logfile $logfile.1
cp /dev/null $logfile # truncates the file
gzip --quiet $logfile.1 # compress it to save space
fi
perl is a programming language that has found a home with system administrators. It has much of the power of C combined with the scripting and shell interface of sh.
Here is the same logfile rotate script from above, only this time written in perl.
#!/usr/loca/bin/perl
# Usage: rotate
$logfile = $ARGV[0] ; # name of the logfile without any extensions
$to_num = $ARGV[1] ; # extension that the rotated file will have
while($to_num > 1)
{
$from_num = $to_num - 1 ; # extension of the file to be rotated
if(-f "$logfile.$from_num") # if the logfile already exists
{
system("/bin/mv $logfile.$from_num $logfile.$to_num") ;
}
elsif(-f "$logfile.$from_num.gz") # in case they are compressed
{
system("/bin/mv $logfile.$from_num.gz $logfile.$to_num.gz") ;
}
$to_num-- ;
}
if(-f $logfile) # sanity check
{
system("/bin/cp $logfile $logfile.1") ;
system("/bin/cp /dev/null $logfile") ; # truncates the file
system("gzip --quiet $logfile.1") ; # compress it to save space
}
debugging tcpdump strace/truss syslog ps/top du/df telnet automation cron scripts other find mail/procmail tar cf - . | (cd /dir ; tar xfvp -) or (GNU) cp -pR . /dir lsof fuser scripts grep/awk/sed sh/perl/C