Genbank Mirror Site Administration

This is an excerpt from Administration Notes I wrote for the Genbank mirror site at the University of Texas Center for High-Performance Computing.  It gives an idea of the kind of work I did with Genbank and Sybase, and of my documentation style. I also edited shell scripts and SQL needed for maintenance.  Sarah Barron, the biologist in charge, can provide a reference for this work.

 next      up
 

CHPC SYBASE Administration Notes   5/5/92  dlk

These instructions are intended to aid the administration of the Sybase database server at CHPC, and especially its 'genbank' database, which is updated remotely from Los Alamos.  It largely supercedes Ike Brenner's similar report.

The most important Sybase documentation for the administrator is the SYBASE System Administration Guide, especially chapters 2 (User Permissions) 3 (Physical Resources) and 4 (Backup and Recovery) .  These chapters are essential for administering Sybase, and give its overall structure. You are likely to understand these notes better if you read those chapters first (?) .  For new installations, the Sybase Installation manual is also required reading.  The Commands Reference Manual and Transact-SQL User's Guide are also very useful references.

The utility  syman  is the equivalent of man for Sybase commands and Sql. It lists online versions of the commands manual.  Its directory is $SYBASE/bin.

On 4/29/92, version 4.8 of Sybase was installed.  This version is capable of using the multi-processor power of almach.
 
 

Getting Started

The Unix user 'sybase' has been created for doing various administration tasks.  This user actually owns the disk partitions that comprise the database, and has permission to start the server, run and examine backups, etc.  A number of aliases and variables have been set in this account to make administration more convenient, and of course the administrator is welcome to make additional changes to this account.  The only essential environment variable is the following:

setenv SYBASE /local/pkg/sybase

which gives the location of the Sybase software (NOT the databases). ALL Sybase users must set this variable before accessing Sybase.

$SYBASE/bin

should also be added to your path (as it is for user sybase) .  This is the location of (most) Sybase executables, especially isql, the Sybase sql command interpreter.  Some are also located in $SYBASE/install.

The following variables are set in sybase's .cshrc; you might find it useful to set them in your own account as well (I have)

set sa = /chpc/cmbg/sybase.adm 
set sd = $sa/dumps
set sb = $sa/bin
set path = ($path $SYBASE/bin $SYBASE/install $sb $sd

set s=$SYBASE 
set si=$SYBASE/install 
set b=/stage/backups/SYBASE 
set g=/chpc/cmbg/local_gb

alias sql isql -Usa 
alias ldu "less -M $5d/dumplog*" 
alias ler "less -M $si/errorlog"

These give a short way to refer to often-used directories, whose use will be explained later.

The Sybase system administrator ('sa') should use the Sybase login 'sa'
Within isql.  The command is:

anysun% isql -Usa
password:

(or just 'sql' if the above alias is set) .  Note that 'sa' is not a unix login, it has meaning only within Sybase.  The password entered next is the same one used for user sybase (see Sarah Barron if you need to know it) You can run this command from any account, from any sun.

The database disk partitions are located on almach, and the server must be run on this machine only.  The backup script, as well as any Sybase installation programs, must also be run from almach.  However, the $SYBASE software is NFS-mounted on all the suns (it is physically located on spica, although this is of little importance), and the Sybase clients (isql, and the various tools) can be run from any sun, even for administration purposes.  The dumplogs and errorlog can also be examined from any sun.

The main administration tasks are monitoring backups, managing disk space, installing new releases, maintaining user permissions within Sybase, and monitoring the automatic updates of the genbank database, which sometimes involves communication with Los Alamos (where the master copy of genbank is maintained)
 
 

Current Disk Layout

Sybase database 'devices' can be unix files or raw disk partitions. We use partitions exclusively, for reasons of efficiency and data integrity explained in the Sybase manuals.  These are the partitions currently used for our Sybase databases:
Unix partition  Sybase device name     size    used for these databases
--------------  ------------------     ----    ------------------------
(on IPI disk 001)
/dev/rid00ld    gb_data_device         442mb   genbank
/dev/rid00le    gb_log_device          92mb    genbank
/dev/rid00lf    the_master_device      110mb   master,
                                               model, pubs, pubs2,
                                               tempdb, bibliography,
                                               igref, software,
                                               survival
(on IPI disk 002)
/dev/rid00le   gb_index_device         221mb   genbank
/dev/rid00lf   gb_text_device          961mb   genbank
                                      ------
                              total:  1826mb

(Here 'mb', as reported by Sybase, is 2^20 = 1,048,576 bytes, not 1,000,000).

As you can see, genbank, by far the largest database, occupies 4 of the 5 partitions, while all other databases are placed on the_master_device.

The master database is maintained by Sybase itself, and contains all essential system information for this server.  Its location (/dev/rid00lf) is set at install time, and is given as a parameter to the server when it starts. All the other database devices were added by hand within isql.  Sybase encodes such information in the master database.

The model, pubs, pubs2, and tempdb databases were also created by Sybase. Setting certain parameters in model causes the same value to be the default in any new databases created.  The pubs databases are practice databases, which all users can read and write into.  They can be easily reinitialized to their standard contents.  tempdb is used as a temporary area in processing queries, and is recreated automatically each time the server is started.
 The bibliography, igref, software, and survival databases are small (<=4mb) user databases created by Gongxian Liu (xxae020) for miscellaneous purposes related to computational biology.  For now, all new databases have been put into the same device as the master, where there is enough room for them all. This is the default Sybase behavior, but not really recommended.  If more space becomes available in the future, it may be a good idea to create a new device to hold any new user databases.
 
 

Configuring and Manipulating Disk Space

In my opinion, Sybase has done a rather muddled job of providing tools for this purpose.  Implementation details are not properly hidden, and are presented at too low a level.  Irrelevant or poorly explained detail is presented on reports that overrun the line, while important information is missing.  Information is presented in a variety of units; often these units are not shown on the report.  In addition, it is rather difficult to move or reduce space once it has been allocated for a particular purpose.  Fortunately, however, some of the most often needed functions CAN be accomplished in a fairly simple manner.  I will give an overview of the most common commands; for more detail see Ch. 3 of the Administration Manual.  These commands are run within isql.

Disk Init

This creates a 'device' from a unix partition.  The partition must first be allocated and sybase given its ownership.  This is probably the only place where the systems group (unix 'superuser') needs to be involved. Here is an example of proper partition permissions:
crw-------  1 sybase   operator  72,  12 May  6 09:46 /dev/rid00ld

Here is the sql command that created the gb_data_device on ridOOld:

1> disk init name = "gb_data_device", physname = "/dev/rid001d",
2> vdevno = 1, size = 226083  (2kb units)
3> go

Simple as that.

Create Database <name> on <device> = <size(mb)> [, <device> = size ...] 
Alter Database   "  "  "  "

The first command creates a database, specifying how much space on which device[s] it is to use.  The second expands space used by a database to include additional chunks on the same or different devices.  It has similar syntax.  Each chunk added in this manner is called a 'device fragment', but the whole set of fragments on a particular device for a particular database is used as a unit by the system.  I'll call this set the database's 'allocation' on the device (and I'd prefer to forget about the useless 'fragment' concept) .  Sybase also calls the allocation the 'device', but note that an entire device (e.g., the_master_device) is not necessarily allocated for a particular database's exclusive use, but its allocation is.  In our case, however, all of *genbank's* devices ARE entirely allocated to it, so enough confusing talk.  The command used to allocate genbank's devices was:

1> create database genbank on
2> gb_data_device = 442, gb_text_device = 962, gb_index_device = 221
3> log on gb_log_device = 92  (why can't we just say 'use it all'?)
4> go

.
.
.
 

Excerpt below is from the shell script for backing up the Genbank databases, based loosely on several scripts by Ike Brenner.
 

 next      up