From owner-networker@LISTSERV.TEMPLE.EDU Sun Sep 10 16:35:31 2000 Return-Path: Received: from listserv.temple.edu (listserv.temple.edu [155.247.166.105]) by mailhost.nmt.edu (8.10.2/8.10.2) with SMTP id e8AMZQv12017 for ; Sun, 10 Sep 2000 16:35:27 -0600 Received: (qmail 23750 invoked by uid 0); 10 Sep 2000 22:35:21 -0000 Received: from listserv.temple.edu (155.247.166.105) by listserv.temple.edu with SMTP; 10 Sep 2000 22:35:21 -0000 Received: from LISTSERV.TEMPLE.EDU by LISTSERV.TEMPLE.EDU (LISTSERV-TCP/IP release 1.8d) with spool id 731567 for NETWORKER@LISTSERV.TEMPLE.EDU; Sun, 10 Sep 2000 18:35:19 -0400 Delivered-To: NETWORKER@LISTSERV.TEMPLE.EDU Received: (qmail 24144 invoked by uid 0); 10 Sep 2000 22:35:14 -0000 Received: from agora.rdrop.com (0@199.2.210.241) by listserv.temple.edu with SMTP; 10 Sep 2000 22:35:14 -0000 Received: from joan.burling.com (root@ppp-d7.rdrop.com [199.2.212.40]) by agora.rdrop.com (8.8.7/8.8.7) with ESMTP id PAA00566 for ; Sun, 10 Sep 2000 15:33:01 -0700 (PDT) (envelope-from llywrch@agora.rdrop.com) Received: from joan (IDENT:geoff@joan [127.0.0.1]) by joan.burling.com (8.9.3/8.9.3) with ESMTP id LAA08484 for ; Sun, 10 Sep 2000 11:55:05 -0700 X-Sender: geoff@joan.burling.com MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Message-ID: Date: Sun, 10 Sep 2000 11:55:05 -0700 Reply-To: Geoff Burling Sender: Legato NetWorker discussion From: Geoff Burling Subject: [Networker] Networker FAQ -- Part 2 of 7 To: NETWORKER@LISTSERV.TEMPLE.EDU Status: RO Content-Length: 17227 Lines: 469 I've been lurking here for the last couple of months, & have noticed quite a few questions that should be answered in the FAQ. I checked with the folks who said they'd carry this chore on, & since they had no objection, & although I'm no longer responsible for it, I'm reposting the FAQ one more time. I hope this answers a few questions & save a little bandwidth. Geoff Burling =============================================================== all or new tapes, or setup a 32kb block device and mark it as read only. You can then use this device for recovering the old tapes (32kb). For the old tape, you have to manually scrach the tape (override the label) and relabel it on 64kb block device. PS. upgrade to sp4 may resolve the hanging issue too, because there is a new tcpip and thread shipped. 4.3 Lockfs errors. Q. A backup of one of my filesystems errored out with the following message: mhost:/archivelogs unexpected lockf problem: Bad file number A. Both the original poster and several people identified that this was caused by a stale lock file (.lck). The original poster added Another symptom is that savepnpc seems to be ignoring the precmd and pstcmd. Rodney Wines posted (2 May 2000): Yep. It's a common problem, and it's a lock file problem. However, the file ain't named ".lck", it's named ".tmp", and is in ".../nsr/tmp". And "" is the name of the group that your client is a part of. > This all started after we used the GUI to stop a running backup. Yes indeed. The problem can occur if the backup stops prematurely for any reason. There is also a known bug that causes this error message, according to Bill Benford (10 May 2000): Hi Rick. Here is the explanation from the Legato TechDialog site specific for your error message. Title: Unexpected lockf or unexpected LockFileEx; Apply fix from FTP site Description: Savepnpc fails with: unexpected lockf, bad file descriptor - (UNIX) unexpected LockFileEx, bad file handle - (NT) ANALYSIS: NetWorker for UNIX 5.5.1 NetWorker for Windows NT 5.5.1 This issue has been reported to Legato engineering under LGTpa18060 and LGTpa20931. SOLUTION: Apply fix from Legato's FTP site. Please read README file for complete details on installing this fix: ftp://ftp.legato.com:/outgoing/savepnpc/README For Networker for UNIX 5.5.1, 5.5.1-001, or 5.5.2 download one of the following files, depending upon you system type: ftp.legato.com:/outgoing/savepnpc/unix/preclntsave.aix41 ftp.legato.com:/outgoing/savepnpc/unix/preclntsave.decaxp ftp.legato.com:/outgoing/savepnpc/unix/preclntsave.hp10 ftp.legato.com:/outgoing/savepnpc/unix/preclntsave.solaris For NetWorker for NT 5.5.x download: ftp.legato.com:/outgoing/savepnpc/nt/install_me_nt86.zip -OR- ftp.legato.com:/outgoing/savepnpc/nt/install_me_ntaxp.zip Created: 4/28/2000 BE/nka Last Update: TechDialog Legato Technical Support Information Server Copyright(c) 1996 Legato Systems, Inc. CasePoint(r) WebServer Copyright 1995-97 Inference Corporation, Novato, CA Hope this helps you out. 4.4 RPC errors. Q. I am getting the message ``RPC error: failed to send chunk to MMD" when I attempt to backup. A. Daniel Lim posted (8 Dec 1999): As a summary, changing NIC and switch port settings from Auto Sense to 100/Full resolve 90% of my problem on RPC error - and that's definitely something worth to try first for any similar problem. 4.5 WISS errors. Q. During a backup, I get an error message involving WISS errors. A. ``WISS" refers to the media index. Greg Feczko's email (17 Sep 1999) explains what the acronym means. WISS stands for Wisconsin Storage System, a program Legato purchased from (if I remember correctly) a college in Wisconsin, that was modified for Legato's use. The reason they chose it was because its highly tuned for insertions. Q. Similar error messages: unable to start nsrd ``exited on signal 11" daemons dieing various save errors Dr Watson error A. Legato Tech Bulletin #352 also sets out a number of steps to follow to fix corruption with media indices. K. Scott Rowe's answer (24 Mar 2000) also has some tips: You need to run nsrck -F to 'attempt' to fix the error, which generally occurs only when the database grows over 2GB. If this does not work, you then have two options: 1.Delete the index.db and run nsrck -c, this generates a blank index.db for you, and you can then run your backups. 2.Run and mmrecov to restore the last good database. Please run nsrck -F on this database as well, just to make sure everything is clean for Networker. Also see section 5.1, ``Index Corruption" for further details about solving this problem. For a Solaris-specific cause, see 7.4.4, ``Why does Networker seem to hang?" ----------------------------------------------------------------------- 5. Care & maintenance of indices. 5.1 Index Corruption. Index corruption can be a fatal problem with Networker. Nsrd will abort prematurely if there are problems with the indices. Stopping networker, then running nsrck -F is the best first step. From reading the man pages on nsrck, mmrecov, & nsrim -- as well as previous email in the Networker archive -- other commands to try in fixing corrupt indices are, in the order of their power: nsrck -m mmrecov [From the bugs section of the man page on mmrecov: Mmrecov is mis-named, causing unsuspecting users to use it (and its brute force features) when it is not needed. A name like "recover_server_index_or_media_index_when_either_ is_missing" is more descriptive. Note that any part of the bootstrap save set contents are recoverable using normal recover procedures provided that the server's on-line index and media index are in good shape.] nsrim -X On a related issue, Stan Horowitz posted (20 March 2000): > At the end of this process, the /legato file system was at > 100% and the nsrd daemon stopped. After shutting down the > remaining daemons and cleaning up the indexes, we attempted > to restart Legato. The result was the following: > > root 10708 57170 1 09:35:06 pts/8 0:00 grep nsr root 55320 1 0 > 08:46:41 - 0:00 legato/bin/nsrexecd -s localhost -s > r0011isp.st11.meijer.com > > As one can see the nsrd daemon did not start. Attempts were > made to start it manually, but also failed. > > Any ideas where I can look to find a solution and/or what > needs to be done to get Legato going again. Thanks in advance > for any suggestions. Your NSR server's media database is corrupt, probably because there is no disk space left for NSR to operate. Add more disk space for NSR. Note that NSR needs at least twice the available disk space as the largest client index you have. then try to running "nsrck -X" should fix the problem. Lynn Glessner, writing on 16 March 2000, shared that she is one of several who use nsrck -F as part of a regular maintenance plan on her Legato indices. ``I tend towards over-maintenance ;)" Also see section 4.5, ``WISS errors." 5.2 Moving indices to a new server. Mike Allmen posted (16 May 2000) the following procedure from Legato: We recommend, as the only safe method for moving the indexes, that you backup the /nsr part of the directory tree on the old machine and then recover it on the new machine. This will be a "disaster recovery" using mmrecov. The reason for this is that many copy programs do not handle the database (index) files correctly. These files have empty places, where old records have been deleted and many copy programs will insert blanks or nulls or compress the files, etc. resulting in index corruption. Keep in mind that the name of the new machine must be the same as the old machine, at least at first. After the transfer is done, you can go through the process of changing the name. The best way I've found to move from UNIX==>UNIX is to do this: 1) savegrp -O on the server. This will backup all the indexes and bootstrap to one tape. 2) shut down the current nsr server 3) Move the tape devices to the new server 4) Give the new server the same name as the old server. This may require them to disconnect the new server from the network 5) do a mmrecov on the new server using the tape from step 1 6) recover all indexes for all the clients if the new server is going to have the same name of the old server skip to step 17. 7) shutdown networker 8) Rename the server to it's new name 9) Reboot the server -- Networker should start 10) Update any client's servers files or nsrexecd -s options to reflect the new servers name. If the new server will not be having the indexes as the old server skip to step 17. If the old server will be a client of the new server, skip to step 17. 11) Delete the old server from the new server's client list. 12) Make the old server an alias of the new server. 13) Stop networker. 14) Rename the /nsr/index/ directory to /nsr/index/ NOTE: You will need to delete the /nsr/index/ directory before renaming. NOTE: DO NOT copy the index from one directory to the other use move or rename. 15) Restart Networker. 16) When the indexes are finished cross-checking you can delete the aliases for the old server. 17) Complete the host transfer procedure from Customer Service to get a new authorization code if the IP address of the new server is different from the old server. You have 15 days from the time of the move to do this or Networker will disable its self. Registering New Server: Shut down NetWorker on the source server. Start the NetWorker daemons on the target server. The following messages appear on the destination server: new_server syslog: NetWorker Server: (notice) started new_server syslog: NetWorker Registration: (notice) invalid auth codes detected. new_server syslog: The auth codes for the following licenses enablers are now invalid. new_server syslog: The cause may be that you moved the NetWorker server to a new computer. new_server syslog: You must re-register these enablers within 15 days to obtain new codes. new_server syslog: License enabler # xxxxxx-xxxxxx-xxxxxx (NetWorker Advanced/10) Register your new NetWorker server. After moving NetWorker from one computer to another, you have 15 days to register the new server with Legato. To register the new NetWorker server, follow these steps: Start the GUI version of the NetWorker Administrator program using the following command: # nwadmin Open the Registration window by selecting registration from the Administration pull-down menu of the main Administration window. Select Tabular from the View pull-down menu to display the tabular view of the window. Select print from the File pull-down menu to send a copy of the Registration window to a printer, or select Save from the File pull-down menu. Fax this printout along with your name, company and telephone number to Legato at (650) 812-6220. Legato will send you a NetWorker Host Transfer Affidavit which you must complete and return to Legato. When Legato receives the completed affidavit, you will receive a new authorization code. You must enter the new authorization code into the Authorization field of the Registration window. 5.3 Other index issues. Q. Legato insists on writing the indices for backups to the default pool. A. Elmar Kolkman posted (30 Aug 1999): To get the indexes on the backup tapes, you need to specify level 9 backups to be allowed on your backup tapes, since the incremental backups of your indexes are level 9 backups. And if you have specified the savesets allowed on the pools, add the index saveset. Brian Dockter adds (27 Mar 2000): The most common reason for Legato to ask for a tape from the default pool is to back up the indices. For levels 1 thru 9 and full, the indices are backed up at the same level as the save being performed. For an incremental backup, the indices are backed up at level 9. If the pool you are doing your incrementals to does not have level 9 backups enabled, Networker asks for a tape from the Default pool. In order to keep Networker from asking for a tape from the Default pool, all you need to do is enable level 9 backups on all pools that are used for incrementals. ----------------------------------------------------------------------- 6. Directives and Savepnpc. 6.1 Directives. Q. I can't make my directive skip/include this file. A. Directives are not well documented by Legato, but what information exists can be found in the man pages for nsr(5), nsr_directive(5), & uasm(1m) for UNIX. Use quotes to handle white spaces & other nonstandard characters. [Has anyone had any experience with using ``%20" in substitution for white space?] Filenames (and extensions of type *.tmp) are case sensitive, but paths are not. Example directives can be found in the Administrator's Guide for UNIX on pages 154-155, and the man page for nsr(5). Here are some OS-specific tips: For NT -- Make sure the backslash indicating the root directory is included -- and paths that include a colon need to be quoted. Example: <> [instructions] should instead read <<"F:\">> [instructions] Stein Bjorndal adds (29 Mar 2000) some more points with NT: - If explorer is set to not show file name extentions, winworkr.exe won't either. - For savesets on NT servers Networker prefers to to have savesets in all uppercase. Mixed or lowercase will work, but will result in messages like clientname:c:\temp save: using `C:\TEMP' for `c:\temp' [Is this also the case with Windows 2000?] For Novell -- see Legato Tech Note 062. S. Bjorndal's points about NT also apply to Netware. For UNIX -- note file pattern matching or wildcards follows sh(1) practice, not regex, or C or Korn shell practices. As Ulrich Oldendorf reminded me in private email, wild cards only work for files, not directories. 6.2 Savepnpc Q. What are some pointers for setting up a savepnpc script? A. Rodney Rutherford wrote (16 May 2000): (Solaris server/client example with the groupname TEST) Client has savepnpc specified in the "Backup command:" field on server /nsr/res/TEST.res file on client contains: type: savepnpc; precmd: /nsr/bin/precmd.sh; pstcmd: /nsr/bin/pstcmd.sh; timeout: "11:00pm"; When a savegroup starts the following happens (example using the group TEST): - The server starts the group. - The server sees the client has savepnpc specified as the backup command. - The server contacts the client and checks for a /nsr/res/TEST.res file. (If there is not one, a dummy res file will be created.) - The client executes the pre command/script specified in the file (precmd: /nsr/bin/precmd.sh;). It creates a tmp file indicating that savepnpc is running, /nsr/tmp/TEST.tmp It also executes a process that monitors the status of the saves waiting for completion and/or the timeout value specified in /nsr/res/TEST.res. (timeout: "11:00pm";) - After the pre script successfully completes, the server then initiates the filesystem saves. - During this time the client is constantly checking the server to see if the saves are completed. As soon as they have completed (or the timeout value has been reached), the client executes the post script specified in the /nsr/res/TEST.res file. (pstcmd: /nsr/bin/pstcmd.sh) It then removes the /nsr/tmp/TEST.tmp file. NOTES: - The pre/post scripts need to have the necessary environment setup in them, as savepnpc does not pass any kind of environment to the scripts. - The start/stop of the pre and post commands are recorded in -- Note: To sign off this list, send a "signoff" command via email to listserv@listserv.temple.edu or visit the list's Web site at http://listserv.temple.edu/archives/networker.html where you can also view and post messages to the list. =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=