StorageWorks(TM) Array Controllers ------------------------------------------------------------ HS Family of Array Controllers Service Manual Order Number: EK-HSFAM-SV. B01 This manual contains necessary servicing information for the HS family of array controllers. Information included pertains to configuration, normal operating procedures, troubleshooting and error analysis, field replaceable units, and removal and replacement procedures. Digital Equipment Corporation Maynard, Massachusetts ------------------------------------------------------------ April 1994 While Digital believes the information included in this manual is correct as of the date of publication, it is subject to change without notice. Digital Equipment Corporation makes no representations that the interconnection of its products in the manner described in this document will not infringe existing or future patent rights, nor do the descriptions contained in this document imply the granting of licenses to make, use, or sell equipment or software in accordance with the description. Possession, use, or copying of the software or firmware described in this documentation is authorized only pursuant to a valid written license from Digital, an authorized sublicensor, or the identified licensor. No responsibility is assumed for the use or reliability of firmware on equipment not supplied by Digital Equipment Corporation or its affiliated companies. Restricted Rights: Use, duplication, or disclosure by the U.S. Government is subject to restrictions as set forth in subparagraph (K) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013. NOTE: This equipment generates, uses, and may emit radio frequency energy. The equipment has been type tested and found to comply with the limits for a Class A digital device pursuant to Part 15 of the FCC rules. These limits are designed to provide reasonable protection against harmful interference in a residential installation. Any changes or modifications made to this equipment may void the user 's authority to operate the equipment. Operation of this equipment in a residential area may cause interference, in which case the user at his own expense will be required to take whatever measures may be needed to correct the interference. Copyright © Digital Equipment Corporation 1993, 1994 Printed in U.S.A. All rights reserved. AXP, CI, DCL, DEC, DECconnect, DECserver, Digital, HSC, HSC95, HSJ, HSD30, HSD05, HSZ, MSCP, OpenVMS, StorageWorks, TMSCP, VAX, VAXcluster, VAX 7000, VAX 10000, VMS, VMScluster, VT, and the DIGITAL logo are trademarks of Digital Equipment Corporation. Intel is a trademark of Intel Corporation. OSF and OSF/1 are trademarks of Open Software Foundation Inc. All other trademarks and registered trademarks are the property of their respective holders. The postpaid READER'S COMMENTS card requests the user 's critical evaluation to assist in preparing future documentation. This document was prepared using VAX DOCUMENT Version 2.1. ------------------------------------------------------------ Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Manufacturer 's Declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi 1 General Information and Subsystem Overview 1.1 Technical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1.2 Maintenance Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4 1.3 Maintenance Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 1.4 Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6 1.4.1 Electrostatic Discharge Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6 1.4.2 Module Handling Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6 1.4.3 Program Card Handling Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7 1.4.4 Cable Handling Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8 1.4.4.1 CI Cable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9 1.4.4.2 DSSI Cable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9 1.4.4.3 SCSI Cable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9 1.5 Controller Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9 1.6 Controller Environmental Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10 2 Functional Description 2.1 HS Controller Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 2.1.1 Policy Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 2.1.1.1 Intel 80960CA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 2.1.1.2 Instruction/Data Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 2.1.2 Program Card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 2.1.3 Diagnostic Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 2.1.4 Operator Control Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 2.1.5 Maintenance Terminal Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 2.1.6 Dual Controller Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 2.1.7 Nonvolatile Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 2.1.8 Bus Exchangers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 2.1.9 Shared Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 2.1.10 Device Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 2.1.11 Cache Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 2.1.11.1 Common Cache Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 2.1.11.2 Read Cache Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 2.1.12 Host Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 2.1.12.1 HSJ-Series (CI Interface) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 2.1.12.2 HSD-Series (DSSI Interface) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 2.1.12.3 HSZ-Series (SCSI-2 Interface) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7 2.2 HS Controller Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8 iii 2.2.1 Core Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 2.2.1.1 Tests and Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 2.2.1.2 Executive Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 2.2.2 Host Interconnect Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 2.2.3 Operator Interface and Subsystem Management Functions . . . . . . . . . 2-9 2.2.3.1 Command Line Interpreter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10 2.2.3.2 Diagnostic Utility Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10 2.2.3.3 HSZ-Series Virtual Terminal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10 2.2.3.4 Local Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10 2.2.3.5 Error Logging and Fault Management . . . . . . . . . . . . . . . . . . . . . . 2-10 2.2.4 Device Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11 2.2.5 Value-Added Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12 2.2.5.1 RAID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12 2.2.5.2 Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12 2.2.5.3 Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12 2.3 Addressing Storage Within the Subsystem . . . . . . . . . . . . . . . . . . . . . . . . 2-13 2.3.1 Controller Storage Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13 2.3.2 Host Storage Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13 2.3.3 Host Storage Addressing (HSZ-series) . . . . . . . . . . . . . . . . . . . . . . . . . 2-15 3 Configuration Rules and Restrictions 3.1 Ordering Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 3.2 Cabinets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 3.2.1 SW800-Series Data Center Cabinet . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 3.2.2 SW500-series Cabinets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 3.3 Shelves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8 3.4 Device Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9 3.4.1 3½-inch SBB Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9 3.4.2 5¼-inch SBB Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9 3.4.2.1 Table Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10 3.4.3 3½-inch SBBs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10 3.4.4 5¼-inch SBBs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13 3.4.5 Intermixing 5¼-inch and 3½-inch SBBs . . . . . . . . . . . . . . . . . . . . . . . . 3-14 3.4.6 Atypical Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 3.5 Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15 3.5.1 Nonredundant Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15 3.5.2 Dual-Redundant Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16 3.5.3 Optimal Performance Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16 3.5.4 Optimal Availability Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-18 3.6 Host Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19 3.6.1 Host Cables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19 3.6.2 Host Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19 4 Normal Operation 4.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 4.1.1 Controller Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 4.1.2 Dual-Redundant Configuration Initialization . . . . . . . . . . . . . . . . . . . . 4-1 4.1.3 Subsystem Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 4.2 Operator Control Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 4.3 Command Line Interpreter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 4.3.1 Accessing the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 4.3.2 Exiting the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 iv 4.3.3 Command Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 4.3.4 Initial Configuration (Nonredundant Controller) . . . . . . . . . . . . . . . . . 4-4 4.3.5 Initial Configuration (Dual-redundant Controllers) . . . . . . . . . . . . . . . 4-6 4.3.6 Configuring Storage Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8 4.4 Acceptance Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10 4.5 Maintenance Terminal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10 4.6 Virtual Terminal (HSJ- and HSD-Series Controllers) . . . . . . . . . . . . . . . . . 4-10 4.7 Virtual Terminal (HSZ-series Controllers) . . . . . . . . . . . . . . . . . . . . . . . . . 4-11 4.8 VAXcluster Console System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11 4.9 Operating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11 4.9.1 Controller Disks as System Initialization Disks . . . . . . . . . . . . . . . . . . 4-12 4.9.2 Operating System Nodes (OpenVMS) . . . . . . . . . . . . . . . . . . . . . . . . . . 4-12 4.9.3 AUTOGEN.COM (OpenVMS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13 4.9.4 Other Conditions (OpenVMS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-14 4.10 Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-15 4.10.1 Setting Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16 4.10.2 Exiting Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16 4.10.3 Failing Over . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16 4.10.4 Failover Setup Mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-17 4.11 Moving Devices Between Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-17 5 Error Analysis and Fault Isolation 5.1 Special Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 5.1.1 Nonredundant Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 5.1.2 Dual-redundant Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 5.1.3 Cache Module Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 5.2 Types of Error Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2 5.3 Troubleshooting Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2 5.4 Operator Control Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2 5.4.1 Normal Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 5.4.2 Fault Notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 5.5 Device LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 5.5.1 Storage SBB Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 5.5.2 Device Shelf Status and Power Supply Status . . . . . . . . . . . . . . . . . . . 5-9 5.6 Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 5.6.1 Diagnostic Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12 5.6.2 NVPM Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12 5.6.3 CLI Automatic Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-14 5.6.4 Shelf Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15 5.6.5 Failover Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15 5.6.6 Other CLI Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-16 5.7 Host Error Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-16 5.7.1 Translation Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-16 5.7.2 Host Error Log Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-16 6 Diagnostics, Exercisers, and Utilities 6.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 6.1.1 Built-In Self-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 6.1.2 Core Module Integrity Self-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 6.1.3 Module Integrity Self-Test DAEMON . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3 6.1.3.1 Self-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4 6.2 Disk Inline Exerciser (HSJ- and HSD-Series Controllers) . . . . . . . . . . . . . 6-5 v 6.2.1 Invoking DILX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6 6.2.2 Interrupting DILX Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6 6.2.3 DILX Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7 6.2.3.1 Basic Function Test--DILX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7 6.2.3.2 User-Defined Test--DILX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8 6.2.4 DILX Test Definition Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8 6.2.5 DILX Output Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14 6.2.6 DILX End Message Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-18 6.2.7 DILX Event Information Packet Displays . . . . . . . . . . . . . . . . . . . . . . 6-18 6.2.8 DILX Data Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21 6.2.9 DILX Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22 6.2.9.1 DILX Example--Using All Defaults . . . . . . . . . . . . . . . . . . . . . . . . 6-22 6.2.9.2 DILX Example--Using All Functions . . . . . . . . . . . . . . . . . . . . . . . 6-23 6.2.9.3 DILX Examples--Auto-Configure with All Units . . . . . . . . . . . . . . 6-25 6.2.10 Interpreting the DILX Performance Summaries . . . . . . . . . . . . . . . . . 6-27 6.2.11 DILX Abort Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-29 6.2.12 DILX Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-30 6.3 Tape Inline Exerciser (HSJ- and HSD-Series Controllers) . . . . . . . . . . . . . 6-30 6.3.1 Invoking TILX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-31 6.3.2 Interrupting TILX Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-31 6.3.3 TILX Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-32 6.3.3.1 Basic Function Test--TILX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-32 6.3.3.2 User-Defined Test--TILX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-32 6.3.3.3 Read Only Test--TILX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-33 6.3.4 TILX Test Definition Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-33 6.3.5 TILX Output Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-37 6.3.6 TILX End Message Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-42 6.3.7 TILX Error Information Packet Displays . . . . . . . . . . . . . . . . . . . . . . . 6-42 6.3.8 TILX Data Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-44 6.3.9 TILX Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-45 6.3.9.1 TILX Example--Using All Defaults . . . . . . . . . . . . . . . . . . . . . . . . 6-45 6.3.9.2 TILX Example--Using All Functions . . . . . . . . . . . . . . . . . . . . . . . 6-46 6.3.10 Interpreting the TILX Performance Summaries . . . . . . . . . . . . . . . . . . 6-48 6.3.11 TILX Abort Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-49 6.3.12 TILX Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-50 6.4 Disk Inline Exerciser (HSZ-Series Controllers) . . . . . . . . . . . . . . . . . . . . . 6-50 6.4.1 Invoking DILX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-51 6.4.2 Interrupting DILX Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-51 6.4.3 DILX Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-51 6.4.3.1 Basic Function Test--DILX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-51 6.4.3.2 User-Defined Test--DILX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-52 6.4.4 DILX Test Definition Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-53 6.4.5 DILX Output Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-58 6.4.6 DILX Sense Data Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-61 6.4.7 DILX Deferred Error Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-62 6.4.8 DILX Data Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-62 6.4.9 Interpreting the DILX Performance Summaries . . . . . . . . . . . . . . . . . 6-63 6.4.10 DILX Abort Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-65 6.4.11 DILX Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-65 6.5 VTDPY Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-65 6.5.1 How to Run VTDPY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-66 6.5.1.1 Using the VTDPY Control Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-66 6.5.1.2 Using the VTDPY Command Line . . . . . . . . . . . . . . . . . . . . . . . . . 6-67 6.5.1.3 How to Interpret the VTDPY Display Fields . . . . . . . . . . . . . . . . . 6-67 vi 6.6 The CONFIG Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-98 6.6.1 Running the CONFIG Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-98 6.7 HSZUTIL Virtual Maintenance Terminal Application . . . . . . . . . . . . . . . . 6-100 6.7.1 General Implementation Considerations . . . . . . . . . . . . . . . . . . . . . . . 6-100 6.7.2 Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-100 6.7.3 DEC OSF/1 for Alpha AXP Implementations . . . . . . . . . . . . . . . . . . . . 6-100 6.7.3.1 Running HSZUTIL Under DEC OSF/1 AXP . . . . . . . . . . . . . . . . . 6-100 6.7.4 Description of HSZ-series Controller Virtual Terminal Protocol Diagnostic Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-101 6.7.5 Virtual Maintenance Terminal Communications Protocol . . . . . . . . . . 6-102 6.7.5.1 Protocol Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-102 6.7.5.2 Host Virtual Terminal I/O Algorithm . . . . . . . . . . . . . . . . . . . . . . . 6-102 7 Removing and Replacing Field Replaceable Units 7.1 Controller Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 7.1.1 Diagnosing the Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2 7.1.2 Shutting Down a Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2 7.1.3 Nonredundant Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3 7.1.3.1 Tools Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3 7.1.3.2 Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3 7.1.3.3 Module Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 7.1.3.4 Module Replacement/Installation . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7 7.1.3.5 Restoring Initial Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9 7.1.4 One Dual-Redundant Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-13 7.1.4.1 Tools Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-13 7.1.4.2 Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-13 7.1.4.3 Module Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-13 7.1.4.4 Module Replacement/Installation . . . . . . . . . . . . . . . . . . . . . . . . . . 7-15 7.1.4.5 Restoring Initial Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-16 7.1.5 Both Dual-Redundant Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-18 7.2 Cache Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-19 7.2.1 Tools Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-19 7.2.2 Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-19 7.2.3 Module Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-19 7.2.4 Module Replacement/Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-19 7.2.5 Upgrading Cache Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-20 7.3 Program Card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21 7.3.1 Tools Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21 7.3.2 Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21 7.3.3 Card Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-22 7.3.4 Card Replacement/Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-22 7.4 External CI Cables (HSJ-Series) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-23 7.4.1 Tools Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-23 7.4.2 Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-23 7.4.3 Cable Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-23 7.4.4 Cable Replacement/Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-25 7.5 Internal CI Cables (HSJ-series) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-25 7.5.1 Tools Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-25 7.5.2 Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-25 7.5.3 Cable Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-26 7.5.4 Cable Replacement/Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-26 7.6 DSSI Host Cables (HSD-series) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-27 7.6.1 Tools Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-27 vii 7.6.2 Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-27 7.6.3 Cable Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-28 7.6.4 Cable Replacement/Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-29 7.7 SCSI Host Cables (HSZ-Series) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-29 7.7.1 Tools Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-29 7.7.2 Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-30 7.7.3 Cable Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-30 7.7.4 Cable Replacement/Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-31 7.8 SCSI Device Port Cables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-31 7.8.1 Tools Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-31 7.8.2 Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-31 7.8.3 Cable Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-32 7.8.4 Cable Replacement/Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-33 7.9 Blowers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-34 7.9.1 Tools Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-34 7.9.2 Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-34 7.9.3 Blower Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-35 7.9.4 Blower Replacement/Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-36 7.10 Power Supplies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-36 7.10.1 Tools Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-37 7.10.2 Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-37 7.10.3 Power Supply Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-37 7.10.4 Power Supply Replacement/Installation . . . . . . . . . . . . . . . . . . . . . . . . 7-38 7.11 Warm Swap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-38 7.11.1 SBB Warm Swap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-38 7.11.1.1 Tools Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-38 7.11.1.2 Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-39 7.11.1.3 Device Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-39 7.11.1.4 Device Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-40 7.11.1.5 Restoring the Device to the Configuration . . . . . . . . . . . . . . . . . . . 7-41 7.11.2 Controller Warm Swap (HSJ-Series Controllers) . . . . . . . . . . . . . . . . . 7-42 7.11.2.1 Tools Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-42 7.11.2.2 Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-42 7.11.2.3 Controller Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-42 7.11.2.4 Controller Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-44 7.11.2.5 Restoring Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-45 A Field Replaceable Units A.1 Controller Field Replaceable Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 A.2 Required Tools and Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2 A.3 Related Field Replaceable Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3 B Command Line Interpreter B.1 CLI Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1 ADD CDROM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2 ADD DISK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3 ADD STRIPESET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-5 ADD TAPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-6 ADD UNIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-7 CLEAR_ERRORS CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-11 DELETE container-name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-12 viii DELETE unit-number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-13 DIRECTORY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-14 EXIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-15 HELP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-16 INITIALIZE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-17 LOCATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-18 RENAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-20 RESTART OTHER_CONTROLLER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-21 RESTART THIS_CONTROLLER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-23 RUN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-25 SELFTEST OTHER_CONTROLLER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-26 SELFTEST THIS_CONTROLLER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-28 SET disk-container-name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-30 SET FAILOVER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-31 SET NOFAILOVER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-33 SET OTHER_CONTROLLER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-34 SET stripeset-container-name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-37 SET THIS_CONTROLLER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-38 SET unit-number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-41 SHOW CDROMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-44 SHOW cdrom-container-name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-45 SHOW DEVICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-46 SHOW DISKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-47 SHOW disk-container-name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-48 SHOW OTHER_CONTROLLER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-49 SHOW STORAGESETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-51 SHOW STRIPESETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-52 SHOW stripeset-container-name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-53 SHOW TAPES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-54 SHOW tape-container-name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-55 SHOW THIS_CONTROLLER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-56 SHOW UNITS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-58 SHOW unit-number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-59 SHUTDOWN OTHER_CONTROLLER . . . . . . . . . . . . . . . . . . . . . . . . . . . B-60 SHUTDOWN THIS_CONTROLLER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-62 B.2 CLI Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-64 B.2.1 Error Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-64 B.2.2 CLI Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-64 B.2.3 Warning Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-74 B.2.4 CLI Warning Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-74 B.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-77 B.3.1 Setting HSD-Series Parameters, Nonredundant . . . . . . . . . . . . . . . . . B-77 B.3.2 Setting HSJ-Series Parameters, Dual-Redundant . . . . . . . . . . . . . . . . B-77 B.3.3 Setting HSZ-Series Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-77 B.3.4 Setting Terminal Speed and Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . B-77 B.3.5 Adding Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-78 B.3.6 Adding Storage Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-78 B.3.7 Initializing Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-78 B.3.8 Adding Logical Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-78 ix B.3.9 Device Configuration Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-79 C HSJ-Series Error Logging C.1 Reading an HSJ-Series Error Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-1 C.2 Event Log Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-6 C.2.1 Implementation Dependent Information Area . . . . . . . . . . . . . . . . . . . C-6 C.2.2 Common Event Log Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-8 C.2.2.1 CI Host Interconnect Services Common Event Log Fields . . . . . . . C-8 C.2.2.2 Host/Server Connection Common Fields . . . . . . . . . . . . . . . . . . . . C-10 C.2.2.3 Byte Count/Logical Block Number Common Fields . . . . . . . . . . . . C-10 C.2.2.4 Device Location/Identification Common Fields . . . . . . . . . . . . . . . . C-11 C.2.2.5 SCSI Device Sense Data Common Fields . . . . . . . . . . . . . . . . . . . . C-13 C.2.3 Specific Event Log Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-22 C.2.3.1 Last Failure Event Log (Template 01) . . . . . . . . . . . . . . . . . . . . . . C-22 C.2.3.2 Failover Event Log (Template 05) . . . . . . . . . . . . . . . . . . . . . . . . . C-25 C.2.3.3 Nonvolatile Parameter Memory Component Event Log (Template 11) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-27 C.2.3.4 Backup Battery Failure Event Log (Template 12) . . . . . . . . . . . . . C-29 C.2.3.5 Subsystem Built-In Self-Test Failure Event Log (Template 13) . . . C-31 C.2.3.6 Memory System Failure Event Log (Template 14) . . . . . . . . . . . . . C-34 C.2.3.7 CI Port Event Log (Template 31) . . . . . . . . . . . . . . . . . . . . . . . . . . C-36 C.2.3.8 CI Port/Port Driver Event Log (Template 32) . . . . . . . . . . . . . . . . . C-38 C.2.3.9 CI System Communication Services Event Log (Template 33) . . . . C-40 C.2.3.10 Device Services Nontransfer Error Event Log (Template 41) . . . . . C-43 C.2.3.11 Disk Transfer Error Event Log (Template 51) . . . . . . . . . . . . . . . . C-45 C.2.3.12 Disk Bad Block Replacement Attempt Event Log (Template 57) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-47 C.2.3.13 Tape Transfer Error Event Log (Template 61) . . . . . . . . . . . . . . . . C-50 C.2.3.14 Media Loader Error Event Log (Template 71) . . . . . . . . . . . . . . . . C-52 C.2.3.15 Disk Copy Data Correlation Event Log . . . . . . . . . . . . . . . . . . . . . C-55 C.3 Event Log Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-56 C.4 Event Notification/Recovery Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . C-119 C.5 Recommended Repair Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-120 C.6 Deskew Command Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-123 D HSD-Series Error Logging D.1 Reading an HSD-series Error Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-1 D.2 Event Log Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-2 D.3 Event Log Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-2 D.4 Recommended Repair Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-4 E HSZ-Series Error Logging E.1 Reading an HSZ-Series Error Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-1 x Glossary Index Examples 6-1 DILX End Message Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-18 6-2 Controller Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-19 6-3 Memory Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-19 6-4 Disk Transfer Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-20 6-5 Bad Block Replacement Attempt Error . . . . . . . . . . . . . . . . . . . . . . . . 6-20 6-6 Using All Defaults--DILX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22 6-7 All Functions--DILX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-23 6-8 Auto-Configuration with All Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-25 6-9 Auto-Configuration with Half of All Units . . . . . . . . . . . . . . . . . . . . . . 6-26 6-10 TILX End Message Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-42 6-11 Controller Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-43 6-12 Memory Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-43 6-13 Tape Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-44 6-14 Using All Defaults--TILX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-46 6-15 Using All Functions--TILX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-46 6-16 DILX Sense Data Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-61 6-17 DILX Deferred Error Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-62 C-1 Disk Transfer Error Event Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2 C-2 Deskew Command Procedure Example . . . . . . . . . . . . . . . . . . . . . . . . C-123 C-3 ERF Error Log Before Command Procedure . . . . . . . . . . . . . . . . . . . . C-125 C-4 ERF Error Log After Command Procedure . . . . . . . . . . . . . . . . . . . . . C-126 E-1 The uerf utility Error Event Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-2 Figures 1-1 SW800-Series Data Center Cabinet . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 1-2 SW500-Series Cabinet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 1-3 Shelf Grounding Stud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7 1-4 Program Card Eject Button . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8 2-1 HS Controller Common Hardware Block Diagram . . . . . . . . . . . . . . . . 2-2 2-2 HS Controller Operator Control Panel . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 2-3 HSJ-Series CI Host Interface Hardware Block Diagram . . . . . . . . . . . 2-6 2-4 HSD-Series DSSI Host Interface Hardware Block Diagram . . . . . . . . . 2-7 2-5 HSZ-Series SCSI-2 Host Interface Hardware Block Diagram . . . . . . . 2-7 2-6 Controller Storage Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14 2-7 Host Storage Addressing (HSZ-series) . . . . . . . . . . . . . . . . . . . . . . . . . 2-15 3-1 SW800-Series Data Center Cabinet Loading . . . . . . . . . . . . . . . . . . . . 3-3 3-2 SW800-Series Data Center Cabinet Controller/Storage/(1-2) Tape Drive Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 3-3 SW800-Series Data Center Cabinet Controller/Storage/(3-4) Tape Drive Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5 xi 3-4 SW500-Series Cabinet Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 3-5 SW500-Series Cabinet Controller/Storage/Tape Drive Locations . . . . . 3-7 3-6 Single Extension from Device Shelf to Device Shelf . . . . . . . . . . . . . . . 3-8 3-7 Adjacent Devices on a Single Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9 3-8 Balanced Devices Within Device Shelves . . . . . . . . . . . . . . . . . . . . . . . 3-17 3-9 Optimal Availability Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-18 5-1 HS Controller Operator Control Panel . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 5-2 Solid OCP Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 5-3 Flashing OCP Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5 5-4 Storage SBB LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9 5-5 Power Supply LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-10 6-1 Controller Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 6-2 VTDPY Default Display for CI Controllers . . . . . . . . . . . . . . . . . . . . . 6-68 6-3 VTDPY Default Display for DSSI Controllers . . . . . . . . . . . . . . . . . . . 6-69 6-4 VTDPY Default Display for SCSI Controllers . . . . . . . . . . . . . . . . . . . 6-70 6-5 VTDPY Device Performance Display . . . . . . . . . . . . . . . . . . . . . . . . . . 6-71 6-6 VTDPY Unit Cache Performance Display . . . . . . . . . . . . . . . . . . . . . . 6-72 6-7 VTDPY Brief CI Status Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-73 6-8 VTDPY Brief DSSI Status Display . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-74 6-9 VTDPY Brief SCSI Status Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-75 6-10 HSZ-series Controller CLI Send Diagnostic Page Format . . . . . . . . . . 6-101 6-11 HSZ-series Controller CLI Receive Diagnostic Page Format . . . . . . . . 6-101 7-1 Cabinet Grounding Stud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 7-2 Reset LED, HSJ40 Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5 7-3 Eject Button, HSJ40 Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6 7-4 Trilink Connector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7 7-5 OCP Cable, HSJ-Series Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8 7-6 Controller Shelf Rails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9 7-7 External and Internal CI Cables (HSJ-series) . . . . . . . . . . . . . . . . . . . 7-24 7-8 DSSI Host Cables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-28 7-9 SCSI Host Cable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-30 7-10 Volume Shield . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-32 7-11 SCSI Device Cables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-33 7-12 Replacing a Blower . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-35 7-13 Power Supply Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-37 7-14 SBB Warm Swap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-40 C-1 Implementation Dependent Information Format . . . . . . . . . . . . . . . . . C-7 C-2 Instance Code Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-7 C-3 CI Host Interconnect Services Common Event Log Fields . . . . . . . . . . C-9 C-4 Host/Server Connection Common Fields . . . . . . . . . . . . . . . . . . . . . . . C-10 C-5 Byte Count/Logical Block Number Common Fields . . . . . . . . . . . . . . . C-11 C-6 Device Location/Identification Common Fields . . . . . . . . . . . . . . . . . . . C-12 C-7 Device Locator Field Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-12 C-8 SCSI Device Sense Data Common Fields . . . . . . . . . . . . . . . . . . . . . . . C-14 C-9 Sense Data Qualifier Field Format . . . . . . . . . . . . . . . . . . . . . . . . . . . C-14 C-10 SCSI Sense Data Byte Zero (``ercdval'') Field Format . . . . . . . . . . . . . . C-15 C-11 SCSI Sense Data Byte Two (``snsflgs'') Field Format . . . . . . . . . . . . . . C-17 xii C-12 SCSI Sense Data Byte 0F through 11 (``keyspec'') Field--Field Pointer Bytes Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-20 C-13 SCSI Sense Data Byte 0F through 11 (``keyspec'') Field--Actual Retry Count Bytes Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-21 C-14 SCSI Sense Data Byte 0F through 11 (``keyspec'') Field--Progress Indication Bytes Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-21 C-15 Last Failure Event Log (Template 01) Format . . . . . . . . . . . . . . . . . . . C-23 C-16 Last Failure Code Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-24 C-17 Failover Event Log (Template 05) Format . . . . . . . . . . . . . . . . . . . . . . C-26 C-18 Nonvolatile Parameter Memory Component Event Log (Template 11) Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-28 C-19 Backup Battery Failure Event Log (Template 12) Format . . . . . . . . . . C-30 C-20 Subsystem Built-In Self-Test Failure Event Log (Template 13) Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-32 C-21 Memory System Failure Event Log (Template 14) Format . . . . . . . . . . C-35 C-22 CI Port Event Log (Template 31) Format . . . . . . . . . . . . . . . . . . . . . . . C-37 C-23 CI Port/Port Driver Event Log (Template 32) Format . . . . . . . . . . . . . C-39 C-24 CI System Communication Services Event Log (Template 33) Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-41 C-25 Device Services Nontransfer Error Event Log (Template 41) Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-44 C-26 Disk Transfer Error Event Log (Template 51) Format . . . . . . . . . . . . . C-46 C-27 Disk Bad Block Replacement Attempt Event Log (Template 57) Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-49 C-28 Tape Transfer Error Event Log (Template 61) Format . . . . . . . . . . . . . C-51 C-29 Media Loader Error Event Log (Template 71) Format . . . . . . . . . . . . . C-53 Tables 1 Related Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii 1-1 HS Controller Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1-2 Summary of HS Controller Product Features . . . . . . . . . . . . . . . . . . . 1-4 1-3 HS Controller Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9 1-4 Environmental Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10 3-1 3½-Inch SBB Configurations, 6-Port Controller . . . . . . . . . . . . . . . . . . 3-11 3-2 3½-Inch SBB Configurations, 3-Port Controller . . . . . . . . . . . . . . . . . . 3-12 3-3 5¼-Inch SBB Configurations, 6-Port Controller . . . . . . . . . . . . . . . . . . 3-13 3-4 5¼-Inch SBB Configurations, 3-Port Controller . . . . . . . . . . . . . . . . . . 3-14 3-5 Small Shelf Count Configurations, 6-Port Controller . . . . . . . . . . . . . . 3-15 3-6 Small Shelf Count Configurations, 3-Port Controller . . . . . . . . . . . . . . 3-15 3-7 High-performance Devices per Port . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16 3-8 SCSI Bus Maximum Lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19 4-1 Operating System Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11 4-2 Transportable and Nontransportable Devices . . . . . . . . . . . . . . . . . . . 4-18 5-1 Storage SBB Status LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9 5-2 Shelf and Single Power Supply Status LEDs . . . . . . . . . . . . . . . . . . . 5-10 5-3 Shelf and Dual Power Supply Status LEDs . . . . . . . . . . . . . . . . . . . . 5-11 6-1 Cache Module Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4 xiii 6-2 DILX Data Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21 6-3 DILX Abort Codes and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-30 6-4 DILX Error Codes and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-30 6-5 TILX Data Pattern Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-45 6-6 TILX Abort Codes and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-50 6-7 TILX Abort Codes and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-50 6-8 DILX Data Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-62 6-9 DILX Abort Codes and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-65 6-10 DILX Error Codes and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-65 6-11 VTDPY Control Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-66 6-12 VTDPY Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-67 6-13 Thread Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-80 7-1 Cache Upgrade, HSJ40 Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-20 7-2 Cache Upgrade, HSJ30 Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21 7-3 Cache Upgrade, HSD30 Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21 7-4 Cache Upgrade, HSZ40 Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21 7-5 Module Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-43 7-6 Module Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-45 A-1 HSJ40 FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 A-2 HSJ30 FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 A-3 HSD30 FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2 A-4 HSZ40 FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2 A-5 Controller Related FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3 C-1 Template Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-5 C-2 Firmware Component Identifier Codes . . . . . . . . . . . . . . . . . . . . . . . . . C-56 C-3 Host Interconnect Services Status Codes . . . . . . . . . . . . . . . . . . . . . . . C-56 C-4 CI Message Operation Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-58 C-5 CI Virtual Circuit State Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-58 C-6 Port/Port Driver Message Operation Codes . . . . . . . . . . . . . . . . . . . . . C-59 C-7 System Communication Services Message Operation Codes . . . . . . . . C-59 C-8 CI Connection State Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-60 C-9 Supported SCSI Device Type Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . C-60 C-10 SCSI Command Operation Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-61 C-11 SCSI Buffered Modes Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-63 C-12 SCSI Sense Key Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-64 C-13 SCSI ASC/ASCQ Codes For Direct-Access Devices (such as magnetic disk) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-65 C-14 SCSI ASC/ASCQ Codes For Sequential-Access Devices (such as magnetic tape) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-68 C-15 SCSI ASC/ASCQ Codes For CDROM Devices. . . . . . . . . . . . . . . . . . . . C-72 C-16 SCSI ASC/ASCQ Codes For Medium Changer Devices (such as jukeboxes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-75 C-17 HSJ30/40 Controller Vendor Specific SCSI ASC/ASCQ Codes . . . . . . . C-77 C-18 Last Failure Event Log (Template 01) Instance/MSCP Event Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-78 C-19 Failover Event Log (Template 05) Instance/MSCP Event Codes . . . . . C-78 xiv C-20 Nonvolatile Parameter Memory Component Event Log (Template 11) Instance/MSCP Event Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-79 C-21 Backup Battery Failure Event Log (Template 12) Instance/MSCP Event Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-79 C-22 Subsystem Built-In Self-Test Failure Event Log (Template 13) Instance/MSCP Event Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-80 C-23 Memory System Failure Event Log (Template 14) Instance/MSCP Event Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-80 C-24 CI Port Event Log (Template 31) Instance/MSCP Event Codes . . . . . . C-81 C-25 CI Port/Port Driver Event Log (Template 32) Instance/MSCP Event Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-81 C-26 CI System Communication Services Event Log (Template 33) Instance/MSCP Event Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-82 C-27 Device Services Nontransfer Error Event Log (Template 41) Instance/MSCP Event Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-84 C-28 Disk Transfer Error Event Log (Template 51) Instance/MSCP Event Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-89 C-29 Disk Bad Block Replacement Attempt Event Log (Template 57) Instance/MSCP Event Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-90 C-30 Tape Transfer Error Event Log (Template 61) Instance/MSCP Event Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-91 C-31 Media Loader Error Event Log (Template 71) Instance/MSCP Event Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-92 C-32 Disk Copy Data Correlation Event Log ``event dependent information'' Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-93 C-33 Executive Services Last Failure Codes . . . . . . . . . . . . . . . . . . . . . . . . . C-93 C-34 Value Added Services Last Failure Codes . . . . . . . . . . . . . . . . . . . . . . C-97 C-35 Device Services Last Failure Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . C-101 C-36 Fault Manager Last Failure Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . C-107 C-37 Dual Universal Asynchronous Receiver/Transmitter Services Last Failure Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-108 C-38 Failover Control Last Failure Codes . . . . . . . . . . . . . . . . . . . . . . . . . . C-108 C-39 Nonvolatile Parameter Memory Failover Control Last Failure Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-109 C-40 Command Line Interpreter Last Failure Codes . . . . . . . . . . . . . . . . . . C-110 C-41 Host Interconnect Services Last Failure Codes . . . . . . . . . . . . . . . . . . C-111 C-42 Host Interconnect Port Services Last Failure Codes . . . . . . . . . . . . . . C-112 C-43 Disk and Tape MSCP Server Last Failure Codes . . . . . . . . . . . . . . . . . C-113 C-44 Diagnostics and Utilities Protocol Server Last Failure Codes . . . . . . . C-116 C-45 System Communication Services Directory Service Last Failure Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-116 C-46 Disk Inline Exerciser (DILX) Last Failure Codes . . . . . . . . . . . . . . . . . C-116 C-47 Tape Inline Exerciser (TILX) Last Failure Codes . . . . . . . . . . . . . . . . . C-117 C-48 Automatic Device Configuration Program (CONFIG) Last Failure Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-118 C-49 Controller Restart Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-118 C-50 Event Notification/Recovery Threshold Classifications . . . . . . . . . . . . . C-119 C-51 Recommended Repair Action Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . C-120 D-1 Template Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-1 xv D-2 Host Interconnect Services Status Codes . . . . . . . . . . . . . . . . . . . . . . . D-2 D-3 DSSI Port/Port Driver Event Log (Template 32) Instance/MSCP Event Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-3 D-4 Host Interconnect Services Last Failure Codes . . . . . . . . . . . . . . . . . . D-3 D-5 Host Interconnect Port Services Last Failure Codes . . . . . . . . . . . . . . D-3 D-6 Recommended Repair Action Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . D-4 xvi ------------------------------------------------------------ Preface This manual describes how to maintain and service the HS family of array controllers. The manual details configuration, controls and indicators, normal operating procedures, error reporting, troubleshooting and fault analysis, field replaceable units (FRUs), and removal and replacement procedures. Intended Audience This manual is intended for Digital(TM) Mutlivendor Services Personnel and customers who need assistance in operating and maintaining the HS array controllers. Familiarity with the StorageWorks Array Controllers HS Family of Array Controllers User 's Guide is assumed. Structure This manual contains the following chapters: Chapter 1 Provides an overview of the HS controllers. Chapter 2 Provides a technical explanation of HS controller hardware and firmware. Chapter 3 Defines physical configuration rules for the HS controller subsystem. Chapter 4 Provides operation and configuration instructions. Chapter 5 Discusses how to translate error information and perform initial fault analysis. Chapter 6 Details the diagnostics, inline exercisers, and utilities for the HS controllers. Chapter 7 Provides procedures for the removal and replacement of FRUs. Appendix A Lists the HS controller FRUs, including part numbers and related FRUs. Appendix B Provides complete details for CLI commands and their usage. Appendix C Describes HSJ-series controller error logging. Appendix D Describes HSD-series controller error logging. Appendix E Describes HSZ-series controller error logging. Glossary Lists acronyms and terms specific to the HS controllers. xvii Related Documentation Table 1 lists documents containing information related to this product. Table 1 Related Documentation ------------------------------------------------------------ Document Title Order Number ------------------------------------------------------------ HSJxx Array Controller Software Product Description (SPD47.26.04) AE-PYTGA-TE HSD30 Array Controller Software Product Description (SPD53.53.00) AE-Q6HKA-TE HSZ40 Array Controller Software Product Description (SPD53.54.00) AE-Q6HMA-TE StorageWorks Array Controllers HS Family of Array Controllers Pocket Service Guide EK-HSFAM-PS StorageWorks Array Controllers HS Family of Array Controllers User 's Guide EK-HSFAM-UG StorageWorks Array Controllers HSJ40 and HSJ30 Array Controller Operating Firmware Release Notes EK-HSFAM-RN StorageWorks Array Controllers HSD30 Array Controller Operating Firmware Release Notes EK-HSD30-RN StorageWorks Array Controllers HSZ40 Array Controller Operating Firmware Release Notes EK-HSZ40-RN StorageWorks Solutions Building Block User 's Guide EK-SBB35-UG StorageWorks Solutions Controller Shelf User 's Guide EK-350MA-UG StorageWorks Solutions Configuration Guide EK-BA350-CG StorageWorks Solutions Shelf and SBB User 's Guide EK-BA350-UG StorageWorks Solutions Shelf Metric Mounting Kit User 's Guide EK-35XRD-IG StorageWorks Solutions SW800-Series Data Center Cabinet Installation and User 's Guide EK-SW800-IG StorageWorks Solutions SW800-Series Data Center Cabinet Cable Distribution Unit Installation Sheet EK-SWCDU-IS StorageWorks Solutions SW500-Series Cabinet Installation and User 's Guide EK-SW500-IG StorageWorks Solutions SW500-Series Cabinet Cable Distribution Unit Installation Sheet EK-SW5CU-IS The Digital Guide to RAID Storage Technology EC-B1960-45 VAXcluster Console System User 's Guide AA-GV45D-TE VAXcluster Systems Guidelines for VAXcluster System Configurations EK-VAXCS-CG ------------------------------------------------------------ xviii Documentation Conventions The following conventions are used in this manual: boldface type Boldface type in examples indicates user input. Boldface type in text indicates the first instance of terms defined in either the text, the glossary, or both. italic type Italic type indicates emphasis, variables in command strings, and complete manual titles. UPPERCASE Words in uppercase text indicate a command, the name of a file, or an abbreviation for a system privilege. Ctrl/x CTRL/x indicates that you hold down the Ctrl key while you press another key, indicated by x. For DILX and TILX, the caret symbol (^) is equivalent to the Ctrl key and these same instructions apply. CDROM This refers to both a command and a hardware device. The proper usage of CD-ROM with a hyphen is not used to avoid reader confusion. HSJ-series This refers to all CI-based controllers covered in this manual, as listed in Table 1-1. HSD-series This refers to all DSSI-based controllers covered in this manual, as listed in Table 1-1. HSZ-series This refers to all SCSI-based controllers covered in this manual, as listed in Table 1-1. xix ------------------------------------------------------------ Manufacturer 's Declarations ------------------------------------------------------------ CAUTION ------------------------------------------------------------ This is a class A product. In a domestic environment, this product may cause radio interference, in which case the user may be required to take adequate measures. ------------------------------------------------------------ ------------------------------------------------------------ ACHTUNG ! ------------------------------------------------------------ Dieses ist ein Gerät der Funkstörgrenzwertklasse A. In Wohnbereichen können bei Betrieb dieses Gerätes Rundfunkstörungen auftreten, in welchen Fällen die Benutzer für entsprechende Gegenmaßnahmen verantwortlich sind. ------------------------------------------------------------ ------------------------------------------------------------ ATTENTION ! ------------------------------------------------------------ Ceci est un produit de Classe A. Dans un environment domestique, ce produit risque de créer des interférences radiélectriques, il appartiendra alors à l’utilisateur de prendre les mesures spécifiques appropriées. ------------------------------------------------------------ Für Bundesrepublik Deutschland For Federal Republic of Germany Pour la République féderale d'Allemagne Hochfrequenzgerätezulassung und Betriebsgenehmigung Bescheinigung des Herstellers/Importeurs: Hiermit wird bescheinigt, daß die Einrichtung in Übereinstimmung mit den Bestimmungen der DBP-Verfügung 523/1969, Amtsblatt 113/1969, und Grenzwertklasse ``A'' der VDE0871, funkenstört ist. Das Bundesamt für Zulassungen in der Telekommunikation der Deutschen Bundespost (DBP), hat diesem Gerät eine FTZ-Serienprüfnummer zugeteilt. Betriebsgenehmigung: Hochfrequenzgeräte dürfen erst in Betrieb genommen werden, nachdem hierfür von dem für den vorgesehenen Aufstellungsort zuständigen Fernmeldeamt mit Funkstörungsmeßstelle die Genehmigung erteilt ist. Als Antrag auf Erteilung einer Genehmigung dient eine Anmeldepostkarte (Anhang des Handbuches) mit Angabe der FTZ-Serienprüfnummer. Der untere Teil der Postkarte ist vom Betreiber zu vervollständigen und an das örtliche Fernmeldeamt zu schicken. Der obere Teil bleibt beim Gerät. xxi Betreiberhinweis: Das Gerät wurde funktechnisch sorgfältig entstört und geprüft. Die Kennzeichnung mit der Zulassungsnummer bietet lhnen die Gewähr, daß dieses Gerät keine anderen Fernmeldeanlagen einschließlich Funkanlagen stört. Sollten bei diesen Geräten ausnahmsweise trotzdem, z.B. im ungünstigsten Fall beim Zusammenschalten mit anderen EVA-Geräten, Funstörungen auftreten kann das im Einzelnen zusätzliche Funkentstörungsmaßnahmen durch den Benutzer erfordern. Bei Fragen hierzu wenden Sie sich bitte an die örtlich zuständige Funkstörungsmeßstelle lhres Fernmeldeamtes. Externe Datenkabel: Sollte ein Austausch der von Digital spezifizierten Datenkabel nötig werden, muß der Betreiber für eine einwandfreie Funkentstörung sicherstellen, daß Austauschkabel im Aufbau und Abschirmqualität dem Digital Originalkabel entsprechen. Kennzeichnung: Die Geräte werden bereits in der Fertigung mit der Zulassungsnummer gekennzeichnet und mit einer Anmeldepostkarte versehen. Sollte Kennzeichnung und Anmeldepostkarte übergangsweise nicht mit ausgeliefert werden kontaktieren Sie bitte das nächstgelegene Digital Equipment Kundendienstbüro. xxii 1 ------------------------------------------------------------ General Information and Subsystem Overview This chapter contains general information and technical overview information on the hierarchial storage (HS) controller. For purposes of this manual, "HS controller" refers to several models, as shown in Table 1-1: Table 1-1 HS Controller Models ------------------------------------------------------------ Type Model ------------------------------------------------------------ HSJ(TM)-series HSJ40 HSJ30 HSD-series HSD30(TM) HSZ(TM)-series HSZ40 ------------------------------------------------------------ Controllers not covered in this manual ------------------------------------------------------------ Any HSC(TM) controller HSD05(TM) HSZ1x ------------------------------------------------------------ 1.1 Technical Overview The HS controllers are an integral part of Digital's family of array controllers. The controllers connect Small Computer System Interface generation 2 (SCSI-2) storage devices to a variety of host interfaces, including CI(TM), DSSI(TM), and SCSI. Each HS controller consists of the following: · A controller module · A read cache module (optional) The two modules are housed together in a BA350-MA controller shelf. The controller shelf can be inserted in different StorageWorks(TM) cabinets. The cabinets are shown in Figures 1-1 and 1-2. Firmware that controls the HS controllers (hierarchial storage operating firmware) resides on a Personal Computer Memory Card Industry Association (PCMCIA) program card. The card plugs into the controller module. To receive the most current controller and device support, Digital recommends replacing this card with the latest firmware as each new version is released. General Information and Subsystem Overview 1-1 Figure 1-1 SW800-Series Data Center Cabinet ------------------------------------------------------------ CXO-3658B-PHğTX Figure 1-2 SW500-Series Cabinet ------------------------------------------------------------ CXO-4138A-PHğTX Table 1-2 Summary of HS Controller Product Features ------------------------------------------------------------ Feature HSJ40 HSJ30 HSD30 HSZ40 ------------------------------------------------------------ Host system bus CI(TM) CI DSSI SCSI-2 Host protocol SCS, MSCP(TM), TMSCP(TM) SCS, MSCP, TMSCP SCS, MSCP, TMSCP SCSI-2 Storage device protocol SCSI-2 SCSI-2 SCSI-2 SCSI-2 Number of SCSI-2 ports 6 3 3 6 Number of SCSI-2 devices per port 6 (or 7)+ 6 (or 7)+ 6 (or 7)+ 6 (or 7) Maximum number of SCSI-2 devices 36 (or 42)+ 18 (or 21)+ 18 (or 21)+ 36 (or 42) Shared memory (nonvolatile memory) 32 KB 32 KB 32 KB 32 KB Read cache module 16- or 32-MB 16- or 32-MB 16- or 32-MB 16- or 32-MB RAID levels supported RAID 0/1a RAID 0/1a RAID 0/1a RAID 0/1a Mixed disk and tape support+ Yes Yes Yes No tapes Tape drive media loader support Sequential access device Sequential access device Sequential access device N/A Dual-redundant configurations Yes Yes Yes No Program card firmware update Yes Yes Yes Yes Error detection code (EDC) Validation of program card firmware Validation of program card firmware Validation of program card firmware Validation of program card firmware Error correction code (ECC) on cache and shared memory Yes Yes Yes Yes Power fail write nonvolatile journal Yes Yes Yes Yes Data integrity and byte parity (all buses/memory) Yes Yes Yes Yes ------------------------------------------------------------ +The dual-redundant controller configuration supports up to six devices per port. Nonredundant configurations support up to seven devices per port, but this sacrifices a convenient upgrade to high availability and redundant/backup power options. +On the same or different ports ------------------------------------------------------------ 1.2 Maintenance Strategy Maintain the HS controller subsystem by removing and replacing field replaceable units (FRUs) as necessary. Chapter 7 contains FRU removal and replacement procedures. See Appendix A for a list of FRUs and FRU part numbers. ------------------------------------------------------------ Note ------------------------------------------------------------ Do not attempt to replace or repair components within field replaceable units (FRUs). Use the controller internal diagnostics and error logs to isolate FRU-level failures. ------------------------------------------------------------ 1-4 General Information and Subsystem Overview 1.3 Maintenance Features The HS controllers have the following features to aid in troubleshooting and maintenance: · Initialization diagnostics Various levels of initialization diagnostics execute on the controller. These tests ensure that the subsystem is ready to come on line after it has been reset, powered on, and so forth. You can elect to rerun many of the diagnostics even after initialization completes, in order to test the controller operation. See Chapter 6 for more information about the controller initialization diagnostics. · Utilities You can run the VTDPY utility to display current controller state and performance data, including processor utilization, host port activity and status, device state, logical unit state, and cache and I/O performance. See Chapter 6 for detailed information on this utility. The configuration utility checks the SCSI device ports for any device not previously added. This utility will add and name these devices. See Chapter 6 for more information on the configuration utility. · Exercisers The controller can run both the disk exerciser (DILX) and the tape exerciser (TILX). These exercisers simulate high levels of user activity, so running them provides performance information you may use to determine the health of the controller and the devices attached to it. See Chapter 6 for more information about the exercisers. · Terminal access You can use a virtual (host) terminal or a maintenance terminal to check status and set operating parameters. The terminal connection provides access to the following: - Command Line Interpreter (CLI) (See Chapter 4, Appendix B) - Error messages (See Chapter 5) - Error logs (See Chapter 5, Appendices C through E) · Controller warm swap (HSJ-series controller) You can efficiently remove and replace, or warm swap, one controller in a dual-redundant configuration. When you warm swap a controller, you are changing out a controller in the most transparent method available to the HS controller subsystem. Warm swapping a controller has minimal system and device impact. For more information on warm swapping, see Chapter 7. · Operator control panel The operator control panel (OCP) on the front of the controller has seven buttons and LEDs. The buttons and LEDs serve different functions with respect to controlling the SCSI ports and/or reporting fault and normal conditions. See Chapter 5 for a complete description of the OCP. General Information and Subsystem Overview 1-5 1.4 Precautions This section describes necessary precautions and procedures for properly maintaining and servicing HS controllers. 1.4.1 Electrostatic Discharge Protection Electrostatic discharge (ESD) is a common problem for any electronic device and may cause data loss, system down time, and other problems. The most common source of static electricity is the movement of people in contact with carpets and clothing. Low humidity also increases the amount of static electricity. You must discharge all static electricity prior to touching electronic equipment. In general, you should follow routine ESD protection procedures when handling controller modules and cache modules and when working around the cabinet and shelf that houses the modules. Follow these guidelines to further minimize ESD problems: · Maintain more than 40-percent humidity in the room where the equipment is installed. · Place the subsystem cabinet away from heavy traffic paths. · Do not place the subsystem on carpet, if possible. If carpet is necessary, choose antistatic carpet. If the carpet is already in place, place antistatic mats around the subsystem. · Use ESD wrist straps, antistatic bags, and grounded ESD mats when handling FRUs. 2 · Obey the module handling guidelines listed in Section 1.4.2. 1.4.2 Module Handling Guidelines Prior to handling the controller module or cache module, follow these grounding guidelines: ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Refer to ESD guidelines in Section 1.4.1 prior to handling the controller module or cache module. Damage to the modules can result if the guidelines are not followed. ------------------------------------------------------------ · Obtain and wear an ESD wrist strap on your wrist. Make sure the strap fits snugly. · Plug the ESD strap into the grounding stud located on the vertical rail between the BA350-MA controller shelves and the device shelves. You can find the stud approximately half way down from the top of the rail (Figure 1-3). 3 · After removing a module from the shelf, place the module into an approved antistatic bag or onto a grounded antistatic mat. ------------------------------------------------------------ 2 Not required for handling the program card 3 The grounding stud is moveable, and can be relocated to another part of the cabinet 1-6 General Information and Subsystem Overview Figure 1-3 Shelf Grounding Stud · Remain grounded while installing a replacement module. 1.4.3 Program Card Handling Guidelines Follow these guidelines when handling the program card: ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Follow program card guidelines or damage to the program card and firmware may result. ------------------------------------------------------------ · Keep the program card in its original carrying case when not in use. · Do not twist or bend the program card. · Do not touch the card contacts. · Keep the card out of direct sunlight. · Do not immerse the card in water or chemicals. General Information and Subsystem Overview 1-7 · Always push the program card eject button, shown in Figure 1-4, to remove the card; do not pull on the card. Figure 1-4 Program Card Eject Button 1.4.4 Cable Handling Guidelines Use the guidelines presented in the following sections when handling the host interface cables to the controller. See Chapter 7 for host cable removal and replacement instructions. ------------------------------------------------------------ Note ------------------------------------------------------------ Always halt activity on the host path to the target controller before servicing its host cables (see Chapter 7). ------------------------------------------------------------ 1-8 General Information and Subsystem Overview 1.4.4.1 CI Cable ------------------------------------------------------------ CAUTION ------------------------------------------------------------ If the internal CI cable connectors should become grounded, damage to the equipment can result. Never leave external CI cables, terminated or not, attached at the star coupler and disconnected at the internal CI cable connector. This minimizes adverse effects on the cluster and prevents a short circuit between the two ground references. Always terminate the connections of the star coupler when removing external CI cables. ------------------------------------------------------------ When handling or moving the internal CI cables, it is very important that the connectors do not become grounded. No metal may contact the metal connectors on these cables, other than an external CI host cable connector. 1.4.4.2 DSSI Cable Turn off all power to HSD-series controllers and all other devices, including the host CPU, on a DSSI bus before removing a DSSI host cable. If you accidentally short DSSI connector pins during aligning and inserting/removing a DSSI connector, you risk blowing the fuses of all members on the DSSI bus. 1.4.4.3 SCSI Cable Always terminate open active SCSI connections to the host CPU when SCSI cables are removed. 1.5 Controller Specifications Table 1-3 lists the physical and electrical specifications for the HS controllers and their cache modules. ------------------------------------------------------------ Note ------------------------------------------------------------ Measurements in Table 1-3 are nominal measurements; tolerances are not listed. ------------------------------------------------------------ Table 1-3 HS Controller Specifications ------------------------------------------------------------ Hardware Length Width Power Current at +5 V Current at +12 V ------------------------------------------------------------ HSJ40 controller module 12.5 inches 9.50 inches 40.5 W 6.2 A 670 mA HSJ30 controller module 12.5 inches 9.50 inches 40.5 W 6.2 A 670 mA HSD30 controller module 12.5 inches 8.75 inches 20.9 W 3.2 A 10 mA HSZ40 controller module 12.5 inches 8.75 inches 24.8 W 4.6 A 10 mA Read cache, 16 MB 12.5 inches 7.75 inches 1.5 W 300 mA 2 mA Read cache, 32 MB 12.5 inches 7.75 inches 2.0 W 300 mA 2 mA ------------------------------------------------------------ Refer to the StorageWorks Solutions Controller Shelf User 's Guide for power requirements for the BA350-MA controller shelf. General Information and Subsystem Overview 1-9 1.6 Controller Environmental Specifications The HS controllers are intended for installation in a Class A computer room environment. The StorageWorks product line environmental specifications listed in Table 1-4 are the same as for other Digital storage devices. Table 1-4 Environmental Specifications ------------------------------------------------------------ Condition Specification ------------------------------------------------------------ Optimum Operating Environment ------------------------------------------------------------ Temperature Rate of change Step change +18° to +24°C (+65° to +75°F) 3°C (5.4°F) 3°C (5.4°F) Relative humidity 40% to 60% (noncondensing) with a step change of 10% or less (noncondensing) Altitude From sea level to 2400 m (8000 ft) Air quality Maximum particle count .5 micron or larger, not to exceed 500,000 particles per cubic ft of air Inlet air volume .026 cubic m per second (50 cubic ft per minute) ------------------------------------------------------------ Maximum Operating Environment (Range) ------------------------------------------------------------ Temperature +10° to +40°C (+50° to +104°F) Derate 1.8°C for each 1000 m (1.0°F for each 1000 ft) of altitude Maximum temperature gradient 11°C/hr (20°F/hr) ±2°C/hr (4°F/hr) Relative humidity 10% to 90% (noncondensing) Maximum wet bulb temperature: 28°C (82°F) Minimum dew point: 2°C (36°F) ------------------------------------------------------------ Maximum Nonoperating Environment (Range) ------------------------------------------------------------ Temperature -40° to +66°C (-40° to +151°F) (During transportation and associated short-term storage) Relative humidity Nonoperating 8% to 95% in original shipping container (noncondensing); otherwise, 50% (noncondensing) Altitude From -300 m (-1000 ft) to +3600 m (+12,000 ft) MSL+ ------------------------------------------------------------ +Mean sea level ------------------------------------------------------------ 1-10 General Information and Subsystem Overview 2 ------------------------------------------------------------ Functional Description This chapter provides a detailed functional description of the HS controller hardware and firmware. 2.1 HS Controller Hardware The HS controller provides a connection between a host computer and an array of SCSI-2 compatible storage devices. The controller hardware consists of core circuitry common to all models of HS controllers, as follows: · Policy processor · Program card · Diagnostic registers · Operator control panel · Maintenance terminal port · Dual controller port · Nonvolatile memory (NVMEM) · Bus exchangers · Shared memory · Device ports · Cache module Each controller model also has a unique interface tailored to the appropriate host system. Figure 2-1 shows a block diagram of the HS controller hardware. 2.1.1 Policy Processor The policy processor consists of microprocessor hardware necessary for running the HS controller. 2.1.1.1 Intel 80960CA The heart of the policy processor is an Intel® 80960CA processor chip. This processor chip runs the firmware from the program card and provides a consistent 25 MIPs. The processor chip controls all but low-level device and host port operations. Functional Description 2-1 Figure 2-1 HS Controller Common Hardware Block Diagram 2.1.1.2 Instruction/Data Cache Although the Intel 80960CA processor chip has an internal cache, the internal cache is not large enough to offset performance degradation caused by shared memory. To compensate for this, a separate Instruction/Data (I/D) cache is part of the policy processor. This 32-KB static RAM (SRAM) cache helps the Intel 80960CA processor chip achieve faster access to instructions and variables. A write-through cache design maintains data coherency in the I/D cache. 2.1.2 Program Card The program card is a PCMCIA standard program card device containing the firmware for operating the controller. The firmware is validated and then loaded from the program card into shared memory each time the controller is initialized. 2.1.3 Diagnostic Registers The HS controller has two write and two read diagnostic registers. Diagnostic and functional firmware use the write diagnostic registers to control HS controller and StorageWorks operations. Certain bits in the registers activate test modes for forcing errors in the HS controller. Other bits control the operator control panel (OCP) LEDs. The policy processor reads the read diagnostic registers to determine the cause of an interrupt, when an interrupt occurs. 2.1.4 Operator Control Panel The OCP includes the following: · One reset button with embedded green LED 2-2 Functional Description · One button per SCSI port · Six amber LEDs 1 Figure 2-2 shows an example of the OCP from the HSZ40 controller. The buttons and LEDs serve different functions with respect to controlling the SCSI ports and/or reporting fault and normal conditions. See Chapter 5 for further information on using the OCP. Figure 2-2 HS Controller Operator Control Panel 2.1.5 Maintenance Terminal Port Each HS controller has a modified modular jack (MMJ) on its front bezel that can support an EIA-423 compatible maintenance terminal. You must connect the maintenance terminal during controller installation to set initial controller parameters. During normal operation, you may use either the maintenance terminal or a virtual (host) terminal to add devices, and storage sets, or to perform other storage configuration tasks. However, a maintenance terminal is required when a host connection is not available. ------------------------------------------------------------ 1 The HSJ-series has the amber LEDs embedded in the port buttons. Functional Description 2-3 ------------------------------------------------------------ Note ------------------------------------------------------------ If you connect a maintenance terminal to one controller in a dual- redundant configuration, and both controllers are functioning, you can communicate with both controllers. ------------------------------------------------------------ A VAXcluster(TM) console system (VCS) or serial interface can also be connected to the EIA-423 terminal port for maintenance. 2.1.6 Dual Controller Port The HSJ-series and HSD-series controllers have an internal serial port for communication with a second controller of the same model. The second controller needs to be mounted in the same controller shelf, with communication passing through the ports and shelf backplane. A dual-redundant configuration allows one controller to take over for another (failed) controller. The takeover process is called failover. During failover, the surviving controller supports the SCSI-2 devices linked to the failed controller. ------------------------------------------------------------ Note ------------------------------------------------------------ The HSZ-series controller does not support dual-redundant configurations, thus failover cannot occur. ------------------------------------------------------------ 2.1.7 Nonvolatile Memory The HS controller has 32 KB of nonvolatile memory (NVMEM). NVMEM is implemented using battery backed up SRAM. This memory stores parameter and configuration information such as device and unit number assignments entered by you and by the HS controller firmware. 2.1.8 Bus Exchangers Bus exchange devices allow high-speed communication between bus devices and shared memory. One bus exchanger handles address lines while the other exchanger handles data lines. The bus exchangers are classified as four-way cross-point switches, which means the bus exchangers allow connections between one port and any other port on the switch. 2.1.9 Shared Memory Shared memory consists of a dynamic RAM controller and arbitration engine (DRAB) gate array controller and 8 MB of associated dynamic RAM (DRAM). Shared memory uses parity-protected 9-bit error correction code (ECC) and error detection code (EDC) for improved data integrity. The shared memory stores the HS controller firmware and is shared between bus devices for data structures as well as data buffers. One portion of shared memory contains instructions for the Intel 80960CA processor chip, firmware variables, and data structures, including the look-up table for the Intel 80960CA processor chip. In the absence of the HS controller cache module, another portion of shared memory acts as a cache. Otherwise, this portion contains cache module context for cache look-ups when a cache module is in place. 2-4 Functional Description 2.1.10 Device Ports The HS controller SCSI-2 device ports are a combination of NCR® 53C710 SCSI port processors and SCSI transceivers. The 53C710 processors perform operations in 8-bit, single-ended normal or fast mode. The 53C710 processors execute scripts read from shared memory and under control of the policy processor. Each SCSI-2 port can have up to six or seven attached devices depending on controller configuration (dual-redundant and nonredundant, respectively). In a dual-redundant configuration, subsystem availability improves because each controller has access to the other controller 's devices. 2.1.11 Cache Module The HS controllers can run with a companion read cache module, available in 16 or 32 MB. 2.1.11.1 Common Cache Functions The HS controller cache module increases the controller I/O performance. During normal operation, a host read operation accesses data either from the fast memory of the cache module or from an I/O device. If a host read is a cache ``hit'' (data already in the cache), the data are supplied to the host immediately, improving I/O performance by reducing latency. If the host read is a cache ``miss'' (data not in the cache), the HS controller accesses the appropriate disk to satisfy the request. Then the controller reads the data, returns it to the host, and writes it to the cache. Cache entry sizes are fixed at 64 KB (128 logical blocks) for each logical unit. Read caching is enabled by default but can be optionally disabled using the CLI Logical Unit SET command on a per unit basis (see Appendix B). The data replacement algorithm is a least recently used (LRU) replacement algorithm. When the cache is full and new data must be written, the LRU algorithm removes the oldest resident cached data with the least number of references and replaces it with the new data. 2.1.11.2 Read Cache Module During a host write operation using the read cache, data are written to the disk and the cache. This is known as write-through caching, and it improves the performance of subsequent reads, because often the requested data were previously written to the cache. The read cache consists of DRAM storage. However, the read cache is volatile. Subsystem power failures will cause the loss of all data in the read cache. 2.1.12 Host Interface The following sections provide descriptions of the host interface hardware for each series of HS controller. 2.1.12.1 HSJ-Series (CI Interface) Figure 2-3 shows a block diagram of the HSJ-series to CI host interface hardware. Functional Description 2-5 Figure 2-3 HSJ-Series CI Host Interface Hardware Block Diagram The CI interface for the HSJ-series controllers consists of a YACI CI gate array and CI receiver/transmit (CIRT) chips for the individual CI ports. The YACI allows direct memory access of data between the host CI port and controller shared memory. Specialized host port firmware running on the policy processor sets up and maintains the CI port. The HSJ-series controller supports dual data link (DDL) operations on the CI bus. With DDL, the controller can have operations in progress simultaneously on both CI paths (Path A and Path B). Receive/receive, receive/transmit, or transmit/transmit operations can be active at the same time. The only restriction is that simultaneous transmits and simultaneous receives may not be active on the same virtual circuit. The packets that are simultaneously active can be to any two separate CI nodes, or a transmit/receive operation may be active to the same node if it also supports DDL operation (such as to a CIXCD adapter). Each CI path (Path A and Path B) runs in half duplex. This means the path can either be transmitting or receiving, but not both at the same time. 2.1.12.2 HSD-Series (DSSI Interface) Figure 2-4 shows a block diagram of the HSD-series to DSSI host interface hardware. The SCSI to DSSI interface gets implemented with the NCR 53C720 chip plus specific DSSI logic and transceivers. The NCR 53C720 chip reads and runs scripts from controller shared memory to perform command and DMA operations on the DSSI interface. The policy processor sets up and maintains the operation of the NCR 53C720 chip. 2-6 Functional Description Figure 2-4 HSD-Series DSSI Host Interface Hardware Block Diagram Figure 2-5 HSZ-Series SCSI-2 Host Interface Hardware Block Diagram 2.1.12.3 HSZ-Series (SCSI-2 Interface) Figure 2-5 shows a block diagram of the HSZ-series to SCSI-2 host interface hardware. Functional Description 2-7 The HSZ-series interfaces with a SCSI-2 Fast-Wide-Differential (FWD) 16-bit host bus or a SCSI-2 8-bit differential bus. The hardware consists of the NCR 53C720 chip and tranceivers, and functions in much the same way as the DSSI interface (refer to Section 2.1.12.2). ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Although the HSD-series and HSZ-series interfaces are similar, care should be taken not to accidentally install an HSD-series controller in an HSZ-series system, or vice versa. Equipment damage would result. ------------------------------------------------------------ 2.2 HS Controller Firmware The HS controller firmware, or hierarchical storage operating firmware, consists of functional code, diagnostics, utilities, and excercisers. HS operating firmware is stored in a PCMCIA program card. Digital ships the card along with your HS controller. Thereafter, each time HS operating firmware is updated, new cards are manufactured. You can purchase the update cards on a per release basis or through an update service contract. Once the card is installed in the HS controller, the contents are validated and loaded into shared memory. Any time you reset the controller, this validating and loading process gets repeated. Because of this scheme, when the firmware executes, only part of the controller initialization diagnostics run directly from the program card. The remaining diagnostics, all functional code, and all utilities run from controller shared memory. Refer to the StorageWorks Array Controllers HS Family of Array Controllers User 's Guide for information on controller I/O performance using HS operating firmware. The HS operating firmware consists of five function areas: · Core functions · Host interconnect functions · Operator interface and subsystem management · Device services · Value-added functions These functions are discussed in the following sections. 2-8 Functional Description 2.2.1 Core Functions HS operating firmware provides the following core functions, in the order they are executed following turning on the controller: 1. Tests and diagnostics 2. Executive functions 2.2.1.1 Tests and Diagnostics HS controller tests and diagnostics are integrated in a controller self-test procedure performed when the controller is switched on. The output of self- test is a simple go/nogo status of the controller subsystem. Self-test includes a test of the cache module. See Chapter 6 for additional self-test information. 2.2.1.2 Executive Functions Executive functions act as the operating system kernel for the HS controller. The executive functions are common among the different controller models described in this manual. Executive functions control firmware execution with respect to interrupts, thread control, queuing support, timers, and so forth. The executive functions establish the HS controller environment as a non-preemptive interrupt-driven process. 2.2.2 Host Interconnect Functions The three different host interconnections HS operating firmware supports are CI, DSSI, and SCSI. The following list briefly describes the protocols used for host access of controller storage: · CI--SCS/MSCP(TM) (and/or TMSCP(TM)) protocol and DUP · DSSI--SCS/MSCP(TM) (and/or TMSCP(TM)) protocol and DUP · SCSI--SCSI-2 protocol with SCSI pass-through software to the Command Line Interpreter (CLI), tagged command queuing on the host and device side, and mode select/sense support for SCSI 2.2.3 Operator Interface and Subsystem Management Functions The operator interface and subsystem management functions support the user interface, subsystem management, subsystem verification, and error logging/fault management. These functions are presented in the following sections. Functional Description 2-9 2.2.3.1 Command Line Interpreter The Command Line Interpreter (CLI) is the primary user interface for HS controllers. The CLI contains firmware for responding to most management functions plus local program execution. Appendix B contains a full description of CLI operation. Briefly, the CLI provides the following two types of commands: · SET/SHOW commands for the controller itself. This includes setting and showing of controller ID, name, path controls, and other vital information. · Configuration commands to add/delete devices, storage sets, and logical units. 2.2.3.2 Diagnostic Utility Protocol Diagnostic Utility Protocol (DUP) from the host is supported over CI and DSSI (HSJ- and HSD-series controllers). DUP allows you to access the CLI and local programs through a host virtual terminal in much the same way as using a maintenance terminal. See Chapter 4 for more information. 2.2.3.3 HSZ-Series Virtual Terminal A virtual terminal port can be created using a host-based application called HSZUTIL (HSZ-series controller). The HSZUTIL application uses SCSI diagnostic send/receive commands to deliver and receive characters to/from the HSZ-series CLI and local programs. See Chapter 6 for more information on the HSZUTIL application. 2.2.3.4 Local Programs There are several local utilities available for HS controller subsystem management/verification, as follow: · DILX and TILX allow you to test and verify operation of the controller with attached SCSI-2 storage under a high or low I/O load. These utilities place the load on the controller, bypassing the host port. Chapter 6 provides a full description of DILX and TILX. · VTDPY allows the user to display current controller state and performance data, including processor utilization, host port activity and status, device state, logical unit state, and cache and I/O performance. It is similar to the VTDPY for an HSC50 controller. See Chapter 6 for detailed information on this utility. · Controller warm swap (C_SWAP) for HSJ-series controllers efficiently removes and replaces one controller in a dual-redundant configuration. When you warm swap a controller, you are changing out a controller in the most transparent method available to the HS controller subsystem. Warm swapping a controller has minimal system and device impact, as explained in Chapter 7. · Configure (CONFIG) checks the SCSI device ports for any device not previously added. This utility will add and name these devices. See Chapter 6 for more information on the configuration utility. 2.2.3.5 Error Logging and Fault Management Error Logging and Fault Management is integrated function that collects system errors in a central firmware location to send the error information to the host. See Chapter 5 and Appendices C through E for more information on error logging. 2-10 Functional Description 2.2.4 Device Services SCSI-2 device service firmware includes device port drivers, mixed disk and tape support on one controller, and physical device addressing and access. Device service consists of normal functions such as read and write, plus error recovery code. It also contains firmware for controlling and observing the BA350-SB shelf and StorageWorks building blocks (SBBs), such as LED, power, and blower monitoring. Specific features include the following: · Normal SCSI-2, 8-bit, single-ended support. · FAST, synchronous, 8-bit, single-ended device support. · Tagged queueing for SCSI-2 devices. · Read and write physical device addressing and access. This is the read and write path to and from devices, and from and to the value-added portion of HS operating firmware. · Specified device support per HS operating firmware release. Refer to your HS operating firmware release notes to identify specifically supported devices. · Mixed disk and tape support. You can mix disk and tape storage on one controller. Furthermore, Disks and tapes may be placed together on one of the controller 's six SCSI-2 ports. ------------------------------------------------------------ Note ------------------------------------------------------------ Tapes are not currently supported for the HSZ-series controller. Refer to your StorageWorks Array Controller Operating Firmware Release Notes for specific information and restrictions for tape drives. ------------------------------------------------------------ · Device warm swap. You can remove and replace devices without taking the subsystem off line (see Chapter 7). · Device shelf and SBB observation and control. This service monitors SHELF_ OK signals and alerts you you of fan and power supply failures. This firmware also controls the fault LEDs on the SBBs for use in warm swap and identifying device failures or configuration mismatches. · Device error recovery. This service performs error recovery and read and write retries directly, making every attempt to serve data to and from the host before declaring an unrecoverable error or marking a device as failed. · Controller warm swap (HSJ-series controllers). This service supports this feature under control from a local program running at CLI. Device services must quiesce all the SCSI buses in order to safely allow you to remove and replace a controller (see Chapter 7). Functional Description 2-11 2.2.5 Value-Added Functions HS operating firmware contains value-added functions to enhance availability, performance, subsystem management and maintenance, and connectivity features of the HS controller subsystem. These value-added functions are presented in the following sections. 2.2.5.1 RAID HS operating firmware supports levels of Redundant Array of Independent Disks (RAID) storage methods. · HS operating firmware supports host-based volume shadowing (HBVS) assistance, also referred to as RAID level 1a. With HBVS assistance, shadow copy operations requested by the host between two units under one controller run under direction from the controller. This leaves the host CPU free for other operations. · HS operating firmware supports RAID level 0 (striping). Striping allows for parallel transfers to all stripeset members. This feature enhances performance in the areas of latency and throughput. Stripesets can be from 2 to 14 members. Striping firmware is tuned to balance the load across devices and not for maximum data transfer bandwidth. Refer to The Digital Guide to RAID Storage Technology for a description of RAID and how the various levels of RAID improve data integrity. 2.2.5.2 Failover HSJ- and HSD-series controllers: A failover component (FOC) in HS operating firmware links two controllers in a dual-redundant configuration. The controllers exchange status signals and configuration information. When one controller fails, the surviving controller takes over service to the failed controller 's units. An HSJ40 controller can execute failover within 15 seconds. Failover also allows for easier system management, because only one terminal connection is required to access both controllers. See Chapter 4 for more information on failover. 2.2.5.3 Caching Cache firmware within the value added section of HS operating firmware will address the following areas: · Read caching · Write-through caching · Handling of up to 32 MB of cache · Logical Block Number (LBN) extent locking · Least Recently Used (LRU) replacement policy (Refer to Section 2.1.11.1 for a description of the LRU algorithm.) · Read and write-through caching enabled on a per logical unit basis The Cache policies for the product are as follow: · Transfer defined extent (TDE) based cache. · Data caching based on transfer size; maximum read and write size is changed on a per logical unit basis. · All I/O subject to locking. 2-12 Functional Description 2.3 Addressing Storage Within the Subsystem This section provides an overview about how storage is addressed in a controller subsystem. Storage is seen in two different ways, depending on your perspective and controller model: · From the controller SCSI device interface--At the physical device level · From the host interface--At the virtual device level Following are descriptions of both levels of storage addressing. 2.3.1 Controller Storage Addressing ------------------------------------------------------------ Note ------------------------------------------------------------ This section on controller storage applies to all controller models. ------------------------------------------------------------ Figure 2-6 shows a typical physical storage device interface for a controller. Each of the controller 's six device ports supports a SCSI bus connected with up to six devices. The devices typically reside in a StorageWorks BA350-SB storage shelf. The current implementation of all controllers supports only one controller LUN per physical device. LUN 0 is the default controller LUN address for each device. Controller Port Target LUN Addressing Controller Port Target LUN (PTL) addressing is the process by which the controller selects storage space within a specific, physical, storage device. The process takes place in three steps: 1. The port selection--The controller selects the SCSI bus port connected to a particular device. 2. The target selection--The controller selects the device's SCSI ID (that is, the target) on that port. 3. The LUN selection--The controller selects the desired LUN within that physical device. (In the current implementation, there is only one LUN on each device, and its LUN address is always 0.) Note that controller PTL addressing is always tied to a physical storage device. 2.3.2 Host Storage Addressing ------------------------------------------------------------ Note ------------------------------------------------------------ The information in this section applies to all controllers. However, see Section 2.3.3 for additional, specialized information on how a SCSI host addresses storage. ------------------------------------------------------------ A typical host device interface consists of a number of host ports each connected to a bus containing devices. From the host perspective, the controller is one of these devices. Functional Description 2-13 Figure 2-6 Controller Storage Addressing To support certain high-level storage subsystem functions such as RAID, the controller presents the entire physical device configuration (from Figure 2-6) to the host as a group of host logical units. A host logical unit often consists of storage space (a storage set) distributed throughout more than one physical device. The controller presents these logical units to the host as individually- addressable, virtual devices. You configure host logical units using the CLI. 2-14 Functional Description ------------------------------------------------------------ Note ------------------------------------------------------------ Controller LUNs (devices) and host logical units may represent the same structure, but only if you configure the controller devices in a one-to-one unit relationship with the host. This situation may or may not occur under normal operation. For this reason, host addressing is often tied to a virtual storage device (a storage set). ------------------------------------------------------------ 2.3.3 Host Storage Addressing (HSZ-series) Figure 2-7 shows a typical connection between an HSZ-series controller and its host. In this case, the SCSI host device interface consists of device ports, each connected to a SCSI bus containing up to eight devices. The HSZ-series controller resides on one of the SCSI buses. The HSZ-series controller can be assigned one or two SCSI IDs on the bus. Figure 2-7 Host Storage Addressing (HSZ-series) Functional Description 2-15 A SCSI host also sees host logical units through the controller. (However, in SCSI systems there can only be up to eight units per ID. For the HSZ-series controller, this translates as up to 16 units, or eight per each ID) Furthermore, the host addresses each unit by a SCSI logical unit number, also called a LUN. ------------------------------------------------------------ Note ------------------------------------------------------------ Although they share the same name, controller LUNs and SCSI host LUNs are logical addresses for two different storage structures. Controller LUNs exist on the controller 's device interface, and SCSI host LUNs exist on a SCSI host's device interface. Controller LUNs and SCSI host LUNs may represent the same structure, but only if the user configures (up to) eight controller devices in a one-to- one unit relationship with the host. This situation may or may not occur under normal operation. ------------------------------------------------------------ Host Port Target LUN Addressing (HSZ-series) ------------------------------------------------------------ Note ------------------------------------------------------------ Non-SCSI hosts (CI, DSSI), though they access virtual devices, do not use a PTL addressing scheme. Any unit seen by these hosts is simply called a host logical unit (not a LUN). ------------------------------------------------------------ Host PTL addressing is the process by which a SCSI host selects a logical unit made up of physical devices connected to an HSZ-series controller. The process takes place in three steps: 1. The port selection--The host selects the SCSI bus that has the HSZ-series controller connected to it. 2. The target selection--The host selects the controller 's SCSI ID (that is, the target) on that port/bus. The HSZ-series controller may act as one or two target IDs. 3. The LUN selection--The host presents the controller with the LUN of the desired host logical unit. The controller translates the LUN into the physical device addresses required to allow the host access to the virtual device. 2-16 Functional Description 3 ------------------------------------------------------------ Configuration Rules and Restrictions This chapter describes rules and restrictions as they apply to the physical configuration and connection of the following HS controller subsystem hardware: · Cabinets · Shelves · Devices · Controllers · Hosts The information in this chapter describes physical configurations with respect to both standard and nonstandard (customized) subsystems. Further information can be found in the specific StorageWorks cabinet, shelf, and configuration documentation. ------------------------------------------------------------ Note ------------------------------------------------------------ Configuration rules and restrictions apply to all controllers unless stated otherwise. ------------------------------------------------------------ 3.1 Ordering Considerations Digital provides the following configuration approaches for ordering controller subsystems: · Preconfigured, packaged starter subsystems 1 · Configured-to-order (CTO) subsystems (custom configurations) · A combination of preconfigured and CTO subsystems Refer to the StorageWorks Array Controllers HS Family of Array Controllers User 's Guide for a list of preconfigured controller subsystem option numbers. Not all controller models have preconfigured subsystem option numbers. 3.2 Cabinets The following sections present information to keep in mind when loading controller and storage shelves in SW800-series data center cabinets and SW500-series cabinets. ------------------------------------------------------------ 1 Preconfigured subsystems include a range of solutions for various capacities, performance levels, and availability. Configuration Rules and Restrictions 3-1 3.2.1 SW800-Series Data Center Cabinet This section presents the rules to apply to subsystem configurations in SW800- series data center cabinets. Refer to the StorageWorks Solutions SW800-Series Data Center Cabinet Installation and User 's Guide for more details. ------------------------------------------------------------ Note ------------------------------------------------------------ In Figures 3-1 through 3-5 ``S'' indicates a BA350-SB storage shelf, and ``C'' indicates a BA350-MA controller shelf. ------------------------------------------------------------ Figure 3-1 shows the loading sequence for storage and controller shelves in an SW800-series data center cabinet. Figure 3-2 shows the loading sequence for storage and controller shelves when one or two TZ8xx-series tape devices are installed. Figure 3-3 shows the loading sequence for storage and controller shelves when three or four TZ8xx-series tape devices are installed. · Standard shelf configuration A standard of three (or four) BA350-MA shelves connected to 18 BA350-SB shelves in a single SW800-series data center cabinet is suggested. · Two device shelves per port (jumpered pairs) Two BA350-SB shelves can be joined on the same controller port with the following restrictions: - The SCSI-2 cable to the first BA350-SB storage shelf is 1.0 meter or less. 2 - The SCSI-2 cable from the first BA350-SB shelf to the second shelf is 0.5 meters or less. This requires two shelves to be immediately adjacent to each other. - The first BA350-SB storage shelf is configured for an unterminated single SCSI cable. · TZ8x7 half-rack tape loader Any TZ8x7 half-rack tape loader device must be located at the top front positions filling two or four top BA350-SB shelf positions (front and back). Note that each tape loader occupies the full cabinet depth. Up to four tape drive loader devices can be loaded in an SW800-series data center cabinet, displacing shelves S6 and S12-S18 (leaving 10 BA350-SB shelves remaining). ------------------------------------------------------------ 2 The associated BA350-MA controller shelf must be located near enough to satisfy this restriction. 3-2 Configuration Rules and Restrictions Figure 3-1 SW800-Series Data Center Cabinet Loading Single (or paired) TZ8x7 devices must be connected with a 0.2 meter (8-inch) SCSI-1-to-StorageWorks transition cable (order number 17-03831-01), then to a 2 meter SCSI-2 cable (order number BN21H-02) that connects to one of the controller SCSI-2 ports. · Use of an upper controller shelf By convention, controller shelf C3 would use (only) the top three (or four) storage shelves in the front of the cabinet; the fourth controller shelf (C4) would use the top three (or four) storage shelves in the back of the cabinet. Configuration Rules and Restrictions 3-3 Figure 3-2 SW800-Series Data Center Cabinet Controller/Storage/(1-2) Tape Drive Locations · Number of devices Up to 42 devices can be attached using 7 3½-inch SBBs in each of 6 BA350- SB shelves attached to controllers with 6 controller ports. 3 · Maximum number of device shelves Up to 18 horizontal BA350-SB device shelves are allowed (16 if one or two TZ8x7 tape loaders are present). An earlier cabinet configuration had a provision for 19 horizontal device shelves, however Digital no longer recommends that configuration. ------------------------------------------------------------ 3 Redundant power and dual-redundant controllers are not supported when using 42 devices. This is not a recommended configuration. 3-4 Configuration Rules and Restrictions Figure 3-3 SW800-Series Data Center Cabinet Controller/Storage/(3-4) Tape Drive Locations · Vertical device shelves Vertical shelves are not used for device shelves because some devices require horizontal alignment. If desired, vertical shelf locations can be used for most disk drives. Refer to the device-specific documentation for requirements. (Any of the vertical shelves can be used. However, Digital recommends surrendering controller positions C4, then C3, first for storage shelves. Refer to Figure 3-1.) Configuration Rules and Restrictions 3-5 3.2.2 SW500-series Cabinets The rules presented in this section apply to subsystem configurations in SW500- series cabinets. Refer to the StorageWorks Solutions SW500-Series Cabinet Installation and User 's Guide for more details. Figure 3-4 shows the loading sequence for storage and controller shelves in an SW500-series cabinet. Figure 3-4 SW500-Series Cabinet Loading Figure 3-5 shows the loading sequence for storage and controller shelves when TZ8xx-series tape devices are installed. · Standard shelf configuration A standard of one BA350-MA controller shelf connected to six BA350-SB storage shelves in a single SW500-series cabinet is suggested. · Two BA350-MA shelves can be housed with a maximum of four BA350-SB shelves as two subsystems. 3-6 Configuration Rules and Restrictions Figure 3-5 SW500-Series Cabinet Controller/Storage/Tape Drive Locations · Two device shelves per port (jumpered pairs) Two BA350-SB shelves can be joined on the same controller port with the following restrictions: - The SCSI-2 cable to the first BA350-SB storage shelf is 1.0 meter or less. 4 - The SCSI-2 cable from the first BA350-SB shelf to the second shelf is 0.5 meters or less. This requires two shelves to be immediately adjacent to each other. - The first BA350-SB storage shelf is configured for unterminated single SCSI. - Controller shelf position C1 can be used with the pairs S1-S2 and S3-S4, and controller shelf position C2 can be used with the pair S7-S8, to satisfy these restrictions. A single subsystem (C1) can thus accommodate up to 16 5¼-inch SBBs. · TZ8x7 half-rack tape loader (Figure 3-5): Any TZ8x7 half-rack tape loader must be located at the top front positions filling the two top BA350-SB shelf positions (front and rear). Note that each tape loader occupies the full cabinet depth. Up to two tape drive loader devices can be loaded in an SW500-series cabinet, displacing shelves S4, S9, and S7-S8 (moving the CDUs to shelf location S7). Single (or paired) TZ8x7 ------------------------------------------------------------ 4 The associated BA350-MA controller shelf must be located near enough to satisfy this restriction. Configuration Rules and Restrictions 3-7 devices must be connected to a controller port, as in the SW800-series data center cabinet. · Use of a second controller shelf By convention, the first controller shelf (C1) would use positions S1-S4 and S9; the second controller shelf (C2) would use positions S5, S7, and S8. This permits two subsystems, one with up to 24-28 3½-inch SBB devices (in the front), and the other with 18-21 3½-inch SBB devices (in the rear). 3.3 Shelves Device shelves can be arranged in any SCSI-2 legal configuration, subject to the following: · No more than a single extension joining two BA350-SB device shelves is permitted. The two BA350-SB shelves must be physically adjacent to each other. Figure 3-6 shows an example of device shelves in a single extension configuration. Figure 3-6 Single Extension from Device Shelf to Device Shelf · Half-rack/full-depth devices, for example all TZ867 tapes, must be on their own port and cannot be connected as an extension from a BA350-SB shelf. Only two such devices (maximum) may be configured per controller port, and those devices must be physically adjacent to each other at the top of a cabinet. Figure 3-7 shows two adjacent tape drives attached to a single port of the controller shelf. 3-8 Configuration Rules and Restrictions Figure 3-7 Adjacent Devices on a Single Port · Connecting a 1.0 meter cable from a controller shelf to a device shelf allows for device shelf jumpering. Connecting a 2.0 meter cable does not permit shelf jumpering. (Required cable length will vary depending on cabinet type, device shelf position, and controller shelf position.) 3.4 Device Placement The following sections describe recommended device configurations for 3½-inch and 5¼-inch SBBs. ------------------------------------------------------------ Note ------------------------------------------------------------ Intermixing disk SBBs and tape SBBs on the same controller port is permitted, provided all other configuration rules in this chapter are also obeyed. ------------------------------------------------------------ 3.4.1 3½-inch SBB Restrictions There are no restrictions for adding 3½-inch SBBs to a configuration. Refer to your SPD and release notes for a list of specific supported device types. 3.4.2 5¼-inch SBB Restrictions The following restrictions apply when adding 5¼-inch SBBs to a configuration. Refer to your SPD and release notes for a list of specific supported device types. · A maximum of two 5¼-inch SBBs are allowed per port (in a single shelf), or four 5¼-inch SBBs per port (in adjacent jumpered shelves). No more than four 5¼-inch SBBs are allowed on a single port (that would take three shelves, which cannot be configured within SCSI-2 cable limits). · Intermixing 5¼-inch and 3½-inch SBBs is permitted using up to six devices per port (maximum of two shelves), with no more than three 5¼-inch SBBs. You can use two 5¼-inch SBBs and four 3½-inch SBBs in two BA350-SB shelves, or one 5¼-inch SBB and four 3½-inch SBBs in one BA350-SB shelf. Configuration Rules and Restrictions 3-9 · When using jumpered shelves, only five jumpered-pair shelves (for a total of ten shelves) can be used within each SW800-series data center cabinet. This leaves the sixth controller port unused. Alternately, four jumpered ports permit two single-shelf connections on the remaining two controller ports, which is preferable. This setup is only permitted in the lower front of the cabinet from the C1 controller position. Five such ports can take up to a maximum of ten front shelf locations, with no allowance for cable access to shelves or devices in the rear of the SW800-series cabinet. (Refer to Figure 3-1.) A more balanced configuration consists of four 5¼-inch SBBs on each of four ports, and two ports each with two 5¼-inch SBBs. · When using jumpered shelves, only two jumpered-pair shelves (for a total of four shelves) can be used with an SW500-series cabinet. · When five ports (SW800) or two ports (SW500) have doubled shelves for 5¼-inch SBBs (4+2), TZ8x7 tapes cannot be connected or even mounted in the cabinet because all or most (front) shelf locations are needed for the 5¼-inch SBBs. 3.4.2.1 Table Conventions The following describes the designations used in Tables 3-1 through 3-6. The designation shows the possible devices in each shelf and the possible number of devices in similarly configured shelves. (n)mxoT (n)mxoJ where: n is the number of device shelves. m is the number of SCSI-2 connections to a device shelf. o is the number of devices on each SCSI-2 connection. T indicates the device shelf is terminated. J indicates the device shelf is jumpered. According to the formula: m  o = possible devices in each shelf. n  m  o = possible number of devices in similarly configured shelves. 3.4.3 3½-inch SBBs Tables 3-1 and 3-2 list some recommended configurations for 3½-inch SBBs. 3-10 Configuration Rules and Restrictions Table 3-1 3½-Inch SBB Configurations, 6-Port Controller ------------------------------------------------------------ Number of Devices Number of BA350-SB Shelves* Configure as** Available for 3½-inch SBBs*** Ports Used ------------------------------------------------------------ 1-2 1 (1)2x3T 5-4 1-2 3-4 2 (2)2x3T 9-8 3-4 5-18 3 (3)2x3T 13-0 5-6 19-24 4 (2)2x3T 5-0 6 (2)1x6T 25-30 5 (1)2x3T 5-0 6 (4)1x6T 31-36 6 (6)1x6T 5-0 6 37-42**** 6 (6)1x7T 5-0 6 ------------------------------------------------------------ Notes 2x3T: Two (split) SCSI-2 connections, separately terminated in the shelf. The devices appear as IDs 0, 2, 4, and 1, 3, 5. 1x6T: Single path SCSI-2 connection terminated in the shelf. The devices appear as IDs 0 through 5. 1x7T: Single path SCSI-2 connection terminated in the shelf. The devices appear as IDs 0 through 6. * Consult the StorageWorks Solutions Shelf User 's Guide for BA350-SB shelf information. ** Each BA350-SB shelf 's upper SCSI-2 port connector is cabled to a controller port. The lower SCSI-2 port connector is attached to a controller port for 2x3T configurations and is unused for a 1x6T or 1x7T. *** Available for future expansion. **** Nonredundant controller and power (not recommended). ------------------------------------------------------------ Configuration Rules and Restrictions 3-11 Table 3-2 3½-Inch SBB Configurations, 3-Port Controller ------------------------------------------------------------ Number of Devices Number of BA350-SB Shelves* Configure as** Available for 3½-inch SBBs*** Ports Used ------------------------------------------------------------ 1-2 1 (1)2x3T 5-4 1-2 3-12 2 (1)2x3T 9-0 3 (1)1x6T 13-18 3 (3)1x6T 5-0 3 19-21**** 3 (3)1x7T 2-0 3 ------------------------------------------------------------ Notes 2x3T: Two (split) SCSI-2 connections, separately terminated in the shelf. The devices appear as IDs 0, 2, 4, and 1, 3, 5. 1x6T: Single path SCSI-2 connection terminated in the shelf. The devices appear as IDs 0 through 5. 1x7T: Single path SCSI-2 connection terminated in the shelf. The devices appear as IDs 0 through 6. * Consult the StorageWorks Solutions Shelf User 's Guide for BA350-SB shelf information. ** Each BA350-SB shelf 's upper SCSI-2 port connector is cabled to a controller port. The lower SCSI-2 port connector is attached to a controller port for 2x3T configurations and is unused for a 1x6T or 1x7T. *** Available for future expansion. **** Nonredundant controller and power (not recommended). ------------------------------------------------------------ 3-12 Configuration Rules and Restrictions 3.4.4 5¼-inch SBBs Tables 3-3 and 3-4 list some recommended configurations for 5¼-inch SBBs. Table 3-3 5¼-Inch SBB Configurations, 6-Port Controller ------------------------------------------------------------ Number of Devices Number of BA350-SB Shelves* Configure as Available for 5¼-inch SBBs** Ports Used ------------------------------------------------------------ 1-2 1 (1)2x3T 1-0 1-2 3-4 2 (2)2x3T 1-0 3-4 5-6 3 (3)2x3T 1-0 5-6 7-8 4 (2)1x6T 1-0 6 (2)2x3T 9-10 5 (4)1x6T 1-0 6 (1)2x3T 11-12 6 (6)1x6T 1-0 6 13-14*** 7 (6)1x6T 1-0 6 (1)1x6J 15-16*** 8 (6)1x6T 1-0 6 (2)1x6J 17-18*** 9+ (6)1x6T 1-0 6 (3)1x6J 19-20*** 10+ (6)1x6T 1-0 6 (4)1x6J ------------------------------------------------------------ Notes Each BA350-SB shelf has its upper connector cable attached to either the adjacent BA350-SB shelf 's lower connector (1x6J), or a controller port connector (2x3T or 1x6T). The lower connector cable is attached to either an adjacent BA350-SB shelf 's upper connector (1x6J, as in the first list item), controller port connector (2x3T), or is unused (1x6T). * Consult the StorageWorks Solutions Shelf User 's Guide for BA350-SB shelf information. ** Available for additional 5¼-inch device. *** When used with the controller in the C1 position in an SW800-series or SW500-series cabinet. (Refer to Figures 3-1 and 3-5.) + Cannot be configured in SW500-series cabinets. ------------------------------------------------------------ Configuration Rules and Restrictions 3-13 Table 3-4 5¼-Inch SBB Configurations, 3-Port Controller ------------------------------------------------------------ Number of Devices Number of BA350-SB Shelves* Configure as Available for 5¼-inch SBBs** Ports Used ------------------------------------------------------------ 1-2 1 (1)2x3T 1-0 1-2 3-4 2 (1)2x3T 1-0 3 (1)1x6T 5-6 3 (3)1x6T 1-0 3 7-8 4 (2)1x6T 1-0 3 (1)1x6J 9-10 5 (1)1x6T 1-0 3 (2)1x6J 11-12 6+ (3)1x6J 1-0 3 ------------------------------------------------------------ Notes Each BA350-SB shelf has its upper connector cable attached to either the adjacent BA350-SB shelf 's lower connector (1x6J), or a controller port connector (2x3T or 1x6T). The lower connector cable is attached to either an adjacent BA350-SB shelf 's upper connector (1x6J, as in the first list item), controller port connector (2x3T), or is unused (1x6T). * Consult the StorageWorks Solutions Shelf User 's Guide for BA350-SB shelf information. ** Available for additional 5¼-inch device. + Cannot be configured in SW500-series cabinets. ------------------------------------------------------------ 3.4.5 Intermixing 5¼-inch and 3½-inch SBBs Use these guidelines for intermixing 5¼-inch and 3½-inch SBBs: · Treat each 5¼-inch SBB as three 3½-inch SBBs. · Each 5¼-inch SBB must have its SCSI-2 ID set manually using the address switch on the rear of the SBB, or by setting the switch to automatic and letting the slot connector dictate the device address. (Refer to the StorageWorks Solutions Shelf and SBB User 's Guide.) · A 5¼-inch SBB may be located in the same shelf with three or four 3½-inch SBBs. 3.4.6 Atypical Configurations By unbalancing the number of devices per controller port, configurations can be devised with a smaller shelf count. This results in lower performance and/or availability. The minimum shelf count for various numbers of 3½-inch SBBs is listed in Tables 3-5 and 3-6. 3-14 Configuration Rules and Restrictions Table 3-5 Small Shelf Count Configurations, 6-Port Controller ------------------------------------------------------------ Number of Devices Number of BA350-SB Shelves* Configure as ------------------------------------------------------------ 1-6 1 1x6T 7-12 2 1x6T 13-18 3 1x6T 19-24 4 1x6T 25-30 5 1x6T 31-36 6 1x6T 37-42** 6 1x7T ------------------------------------------------------------ Notes * Consult the StorageWorks Solutions Shelf User 's Guide for BA350-SB shelf information. ** Nonredundant controller and power configurations (not recommended). ------------------------------------------------------------ Table 3-6 Small Shelf Count Configurations, 3-Port Controller ------------------------------------------------------------ Number of Devices Number of BA350-SB Shelves* Configure as ------------------------------------------------------------ 1-6 1 1x6T 7-12 2 1x6T 13-18 3 1x6T 19-21** 3 1x7T ------------------------------------------------------------ Notes * Consult the StorageWorks Solutions Shelf User 's Guide for BA350-SB shelf information. ** Nonredundant controller and power configurations (not recommended). ------------------------------------------------------------ 3.5 Controllers This section describes specifics of configuring the controllers. 3.5.1 Nonredundant Controllers The following guidelines apply to nonredundant controllers: · A single controller must be installed in the slot furthest from the BA350-MA shelf 's SCSI connectors. This slot is SCSI ID 7. By using SCSI ID 7, SCSI ID 6 (the other controller slot) is available as an additional ID on the device shelf. · The maximum recommended controller subsystem configuration is six devices per controller port. This allows for the addition of another controller, and additional power supplies in the storage shelves. A nonredundant controller configuration can support seven devices per port. However, Digital still recommends six devices per port to permit the ease of future upgrade. · (HSZ-series controller) The HSZ-series controller may currently only be configured as nonredundant. Two nonredundant HSZ-series controllers may not be placed in the same BA350-MA controller shelf. Configuration Rules and Restrictions 3-15 3.5.2 Dual-Redundant Controllers The following guidelines apply to dual-redundant controllers: · Only HSJ- and HSD-series controllers may be configured as dual-redundant. · Dual-redundant controllers are located in the same BA350-MA shelf, and are connected to each other through the shelf backplane. Both controllers have access to all the devices on each other 's ports. This setup increases availability and provides for failover when one controller in the pair fails. (The surviving controller takes over service to all devices.) · Dual-redundant configurations follow the same guidelines as nonredundant configurations, except there is no option to increase to seven devices per port. · Both controller 's cache modules must have the same number of megabytes, and both firmware versions must be identical. If there is a mismatch, neither controller will access any devices. · Dual-redundant HSJ-series controllers must be on the same star coupler. · Dual-redundant HSD-series controllers must be on the same DSSI bus. 3.5.3 Optimal Performance Configuration For optimal performance, configure to the following guidelines: · Balance the number of devices on each port of a controller. For example, for 18 3½-inch SBBs, place 3 devices on each of 6 ports. This permits parallel activity on the controller 's available ports to the attached devices. Figure 3-8 is an example of how to balance devices across ports. · Evenly distribute higher performance devices across separate ports so that higher and lower performance devices are intermixed on the same port. (For example, put multiple solid state disks on separate ports.) This intermixing of higher and lower performance devices on the same port benefits overall performance. Use the guidelines in Table 3-7. Table 3-7 High-performance Devices per Port ------------------------------------------------------------ Number of high-performance devices Number of high-performance devices per port ------------------------------------------------------------ 1-6 1 7-12 2 13-18 3 ------------------------------------------------------------ · Limit the number of devices per controller port to three in dual-redundant configurations. In doing so, both controllers access three devices per each other 's port, maintaining six SCSI-2 devices total. · Maximize the amount of cache memory per controller with the 16- or 32-MB cache module option. 3-16 Configuration Rules and Restrictions Figure 3-8 Balanced Devices Within Device Shelves Highest Performance To obtain the highest performance possible, use a dual-redundant configuration and balance the number of devices across the two controllers. Do this through your operating system by ordering how devices are mounted or sequenced and by setting preferred path definitions. Following this guideline results in approximately half of the devices normally accessed through each controller. Should one controller fail, the surviving controller automatically will assume service to the failed controller 's devices. Configuration Rules and Restrictions 3-17 3.5.4 Optimal Availability Configuration For optimal availability, configure to the following guidelines: · Use dual-redundant controllers and redundant power supplies in all shelves. · Place storage set members on different controller ports and different device shelves. · Use predesignated spares on separate controller ports and device shelves. · Place storage set members on separate controllers when implementing host-based RAID (for example, HBVS). Figure 3-9 shows examples of optimal configurations for raidset members and designated spares on separate controller ports. Figure 3-9 Optimal Availability Configurations 3-18 Configuration Rules and Restrictions Highest Availability For highest availability, especially with RAID implementations, follow these guidelines: · For host-based RAID implementations, split the normal access path between controllers. · Use redundant power supplies in all shelves. 3.6 Host Considerations The following sections explain important considerations when configuring the HS controller and subsystem to the host CPU. 3.6.1 Host Cables Following are special guidelines for configuring host cables/buses to and from the HS controller. HSD-series controllers · DSSI cable length between nodes/members on the DSSI bus must be no greater than 16 feet (4.9 meters). · Total DSSI cable length (end-to-end) on one DSSI bus must be no greater than 60 feet (18.3 meters). HSZ-series controllers The maximum length (end-to-end) of fast and slow buses is summarized in Table 3-8: Table 3-8 SCSI Bus Maximum Lengths ------------------------------------------------------------ Bus Type Transfer Rate Meters Feet ------------------------------------------------------------ 8-bit, single-ended 5 MB/s 6 19.7 8-bit, single-ended 10 MB/s 3 9.8 16-bit, differential 20 MB/s 25 82.0 ------------------------------------------------------------ 3.6.2 Host Adapters The HSJ-series controllers follow the same CI configuration rules as the HSC controller product family, which supports from 1 to 31 host nodes. Consult your HSJ-series controller software product description (SPD) and firmware release notes for specific restrictions and a current list of supported host adapters. Also for the HSJ-series controllers, all host adapter CI ports in a CI configuration must have the quiet slot time set to 10. Some older systems may have the quiet slot time set to 7, which will cause incorrect operation of the CI. Configuration Rules and Restrictions 3-19 The following host adapters currently are supported: · HSJ-series controllers - CIXCD (for XMI-based systems) - CIBCA-B (for BI-based systems) 5 - CI780 (for SBI-based systems) · HSD-series controllers - SHAC (for various DEC and VAX systems) - D4000 (for DEC 4000 systems) - KFMSA (for XMI based systems) · HSZ-series controllers - KZTSA 6 (for DEC 3000 systems) - KZMSA (for DEC 7000/10000 systems via DWZZA) Consult your controller SPD and firmware release notes for current lists of supported host adapters. ------------------------------------------------------------ 5 Supersedes CIBCA-A; CIBCA-A is no longer supported. 6 See the HSZ-series firmware release notes for restrictions. 3-20 Configuration Rules and Restrictions 4 ------------------------------------------------------------ Normal Operation This chapter describes operating conditions and procedures for the HS controllers. Included is information about both storage and controller configurations. The ``configurations'' discussed in this chapter are those set by the operator, employing user interfaces such as the HS operating firmware and/or operating system commands. Refer to Chapter 3 for physical configuration of the subsystem hardware. Also given are cross references to other sections of this manual where more information about controller operation is provided. 4.1 Initialization The following sections discuss the operating conditions surrounding initialization of the controller and subsystem. 4.1.1 Controller Initialization The controller will initialize after any of the following conditions: · Power is turned on. · The firmware resets the controller. · The operator presses the green reset (//) button. · The host clears the controller. ------------------------------------------------------------ Note ------------------------------------------------------------ Keep the program card in its slot during controller subsystem operation. If the program card is removed, the controller will reset. ------------------------------------------------------------ See Chapter 6 for a description of the initialization of both the controller and its cache module. (The process is described in Chapter 6 because some of the initialization diagnostics are available as a controller self-test function for the operator.) 4.1.2 Dual-Redundant Configuration Initialization The controllers in a dual-redundant configuration run the same initialization sequence that is described in Chapter 6, except they exchange signals during their individual initialization sequences. The first signal occurs after one controller starts initializing. The signal informs the other controller that an initialization is occurring. This way the other controller will not assume that the initializing controller is not functioning and will not attempt to disable it. Normal Operation 4-1 4.1.3 Subsystem Initialization Full StorageWorks subsystem initialization take place when the subsystem is switched on for the first time. In the event of a reset due to one of the following conditions, a subset of the initialization sequence is run: · A partial or complete power failure · Equipment failure · An error condition A complete StorageWorks subsystem initialization includes the following: 1. When the subsystem is turned on, all shelves in the subsystem are reset. Then, entities in the shelves (including storage devices, controllers, and cache modules) run their initialization and self-test sequences. 2. During initialization, the controller interrogates the entities with which it has connections, including other controllers in the subsystem. 3. When the initialization sequence on all entities is completed, the controller begins data transfer and other operations with the host. 4.2 Operator Control Panel The operator can use the operator control panel (OCP) to reset the controller, control the SCSI-2 buses attached to the controller, and interpret error conditions that result in LED error codes. The OCP and its use are described in Chapter 5. 4.3 Command Line Interpreter The Command Line Interpreter (CLI) is the user interface to the controller. The CLI allows you to control storage and controller configurations through commands. The following sections explain how to use the CLI, and how it defines and modifies configurations. A detailed description of CLI commands is provided in Appendix B. 4.3.1 Accessing the CLI You can access the CLI through a maintenance terminal (see Section 4.5) or through a virtual terminal. To access the CLI through a maintenance terminal (all controllers), connect the terminal and press the ------------------------------------------------------------ ------------------------------------------------------------ Return ------------------------------------------------------------ ------------------------------------------------------------ key. You must use a maintenance terminal to set the controller initial configuration. This is because the controller arrives with an invalid ID, and its host ports (HSJ-, HSD-series controllers) are initially off. Thereafter, you may use a virtual (host) terminal to modify the configuration. The method of establishing the virtual terminal connection varies depending on your operating system and interface. 4-2 Normal Operation For example, for HSJ- and HSD-series controllers under the OpenVMS operating system for VAX hardware, the following command connects a host terminal to the CLI (the command requires the DIAGNOSE privilege): ------------------------------------------------------------ Note ------------------------------------------------------------ The controller SCS node name must be specified. ------------------------------------------------------------ $ SET HOST/LOG=CONFIGURATION.INFO/DUP/SERVER=MSCP$DUP/TASK=CLI SCS_nodename Establishing a virtual terminal for HSZ-series controllers requires using the HSZUTIL application, which is described in Chapter 6. ------------------------------------------------------------ Note ------------------------------------------------------------ Your CLI> prompt may be factory-set to reflect your controller model, such as HSJ>, HSD>, or HSZ>. Appendix B provides details on how to change the CLI> prompt. ------------------------------------------------------------ 4.3.2 Exiting the CLI When exiting the CLI, keep the following guidelines in mind: · If you are using a maintenance terminal, you cannot exit the CLI. Entering the EXIT command merely restarts the CLI and redisplays the copyright notice, controller type, and any last fail error information. · If you are using the DUP connection/virtual terminal, enter the following command to exit the CLI and return the terminal to the host: CLI> EXIT · If you connect a virtual terminal via the OpenVMS VAX operating system, you can specify the qualifier /LOG=CONFIGURATION.INFO on the DCL command line. This qualifier creates a log file of your CLI session. Then, when you exit the CLI, you can open the log file to remember how you configured your subsystem. 4.3.3 Command Sets The CLI consists of the following six command sets: · Failover commands - Failover commands support dual-redundant controller configurations. · Controller commands - Set and show the basic controller parameters. - Set the controller ID (CI or DSSI node number or SCSI target ID),. - Set the resident terminal characteristics. - Restart the controller. - Run resident diagnostics and utilities (see Chapter 6). Normal Operation 4-3 · Device commands · Device commands specify and show the location of physical SCSI-2 devices attached to the controller. Locations of devices are specified using the SCSI Port-Target-LUN (PTL) designation. · Only devices that have been defined by the ADD command are seen or used by the controller. Devices that have been placed in a shelf, but have not been added, will not be automatically used by the controller. Use the CONFIG utility to quickly add such devices (see Chapter 6). · Storage set commands · Storage set commands add, modify, rename, and show storage sets (such as stripesets). · Logical unit commands · Logical unit commands add, modify, and show logical units built from devices and storage sets. · Exerciser commands · The exerciser commands invoke disk and tape exercisers that test device data transfer capabilities. The exercisers (DILX and TILX) are fully described in Chapter 6. ------------------------------------------------------------ Note ------------------------------------------------------------ Remember these two guidelines when using the CLI: · Not all configuration parameters need to be specified on one line. They can be entered by using multiple SET commands. · Only enough of each command need be entered to make the command unique (usually three characters). For example, SHO is equivalent to SHOW. ------------------------------------------------------------ 4.3.4 Initial Configuration (Nonredundant Controller) After installation of a nonredundant controller, use the CLI to define its parameters in the following order (from a maintenance terminal). ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Do not install HSJ-series CI host port cables until after setting all parameters listed here. Failure to follow this procedure may result in adverse effects on the host/cluster. ------------------------------------------------------------ ------------------------------------------------------------ Note ------------------------------------------------------------ Not all steps are applicable to all controller models. Steps applicable to certain models are designated as such. ------------------------------------------------------------ 4-4 Normal Operation 1. Enter the following command to set the MAX_NODES (HSJ-series controllers): CLI> SET THIS_CONTROLLER MAX_NODES=n where n is 8, 16, or 32. 2. Enter the following command to set a valid controller ID: CLI> SET THIS_CONTROLLER ID=n where n is the (HSJ-series controller) CI node number (0 through (MAX_NODES 1)). or n is the (HSD-series controller) one-digit DSSI node number (0 through 7). Each controller DSSI node number must be unique on its DSSI interconnect. or n is the (HSZ-series controller) SCSI target ID(s) (0 through 7). 3. Enter the following command to set the SCS node (HSJ- and HSD-series controllers): CLI> SET THIS_CONTROLLER SCS_NODENAME="xxxxxx" where xxxxxx is a one- to six-character alphanumeric name for this node. The node name must be enclosed in quotes with an alphabetic character first. Each SCS node name must be unique within its VMScluster. 1 4. Enter the following command to set the MSCP allocation class (HSJ- and HSD-series controllers): CLI> SET THIS_CONTROLLER MSCP_ALLOCATION_CLASS=n where n is 0 through 255. 5. Enter the following command to set the TMSCP allocation class (HSJ- and HSD-series controllers): CLI> SET THIS_CONTROLLER TMSCP_ALLOCATION_CLASS=n where n is 0 through 255. ------------------------------------------------------------ Note ------------------------------------------------------------ Always restart the controller after setting the ID, SCS node name, or allocation classes. ------------------------------------------------------------ 6. Restart the controller either by pressing the green reset (//) button, or by entering the following command: CLI> RESTART THIS_CONTROLLER 7. Enter the following command to verify the preceding parameters were set: CLI> SHOW THIS_CONTROLLER ------------------------------------------------------------ 1 See Section 4.9.2 for important information about VMS node names. Normal Operation 4-5 ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Do not plug the host port cable into an HSD-series controller while the power is on to any devices on the DSSI bus. Doing so risks short circuits that may blow fuses on all the devices. ------------------------------------------------------------ 8. Connect the host port cable to the front of the controller (see Chapter 7). 9. Enter the following commands to enable CI paths A and B to the host (HSJ-series controllers): CLI> SET THIS_CONTROLLER PATH_A CLI> SET THIS_CONTROLLER PATH_B Enter the following command to enable the host port path (HSD-series controllers): CLI> SET THIS_CONTROLLER PATH The host port path for HSZ-series controllers is always on, so no command is needed. 4.3.5 Initial Configuration (Dual-redundant Controllers) In a dual-redundant configuration, one terminal can set both controller configurations. After installation of both controllers, use the CLI to define their parameters in the following order (from a maintenance terminal connected to one controller): ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Do not install HSJ-series CI host port cables until after setting all parameters listed here. Failure to follow this procedure may result in adverse effects on the host/cluster. ------------------------------------------------------------ ------------------------------------------------------------ Note ------------------------------------------------------------ Not all steps are applicable to all controller models. Steps applicable to certain models are designated as such. ------------------------------------------------------------ 1. Enter the following command to set the MAX_NODES (HSJ-series controllers): CLI> SET THIS_CONTROLLER MAX_NODES=n where n is 8, 16, or 32. 2. Enter the following command to set a valid controller ID: CLI> SET THIS_CONTROLLER ID=n where n is the (HSJ-series controller) CI node number (0 through (MAX_NODES 1)). or n is the (HSD-series controller) one-digit DSSI node number (0 through 7). Each controller DSSI node number must be unique on its DSSI interconnect. 4-6 Normal Operation 3. Enter the following command to set the SCS node: CLI> SET THIS_CONTROLLER SCS_NODENAME="xxxxxx" where xxxxxx is a one- to six-character alphanumeric name for this node. The node name must be enclosed in quotes with an alphabetic character first. Each SCS node name must be unique within its VMScluster. 2 4. Enter the following command to set the MSCP allocation class: CLI> SET THIS_CONTROLLER MSCP_ALLOCATION_CLASS=n where n is 1 through 255. Digital recommends providing a unique allocation class value for every pair of dual-redundant controllers in the same cluster. 5. Enter the following command to set the TMSCP allocation class: CLI> SET THIS_CONTROLLER TMSCP_ALLOCATION_CLASS=n where n is 1 through 255. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ The SET FAILOVER command establishes controller-to-controller communication and copies configuration information. Always enter this command on one controller only. COPY=configuration-source specifies where the good configuration data are located. Never blindly specify SET FAILOVER. Know where your good configuration information resides before entering the command. ------------------------------------------------------------ 6. Enter the following command to copy parameters to the other controller (the one not connected to): CLI> SET FAILOVER COPY=THIS_CONTROLLER ------------------------------------------------------------ Note ------------------------------------------------------------ Always restart the controllers after setting the ID, SCS node name, or allocation classes. ------------------------------------------------------------ 7. Restart both controllers either by pressing the green reset (//) buttons, or by entering the following commands: CLI> RESTART OTHER_CONTROLLER CLI> RESTART THIS_CONTROLLER ------------------------------------------------------------ 2 See Section 4.9.2 for important information about VMS node names. Normal Operation 4-7 8. Enter the following commands to verify the preceding parameters were set. CLI> SHOW THIS_CONTROLLER CLI> SHOW OTHER_CONTROLLER ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Do not plug host port cables into an HSD-series controller while the power is on to any members on the DSSI bus, including the controller and host. Doing so risks short circuits that may blow fuses on all the members. ------------------------------------------------------------ 9. Connect the host port cables to the front of the controllers (see Chapter 7). Do not connect the two controllers in a dual-redundant pair to separate or different star couplers (HSJ-series) or DSSI buses (HSD-series). 10. Enter the following commands to enable CI paths A and B to the host (HSJ-series controllers): CLI> SET THIS_CONTROLLER PATH_A CLI> SET THIS_CONTROLLER PATH_B CLI> SET OTHER_CONTROLLER PATH_A CLI> SET OTHER_CONTROLLER PATH_B Enter the following commands to enable the host port path (HSD-series controllers): CLI> SET THIS_CONTROLLER PATH CLI> SET OTHER_CONTROLLER PATH 4.3.6 Configuring Storage Devices To automatically configure devices on the controller, use the CONFIG utility described in Chapter 6. ------------------------------------------------------------ Note ------------------------------------------------------------ If you use the ADD command to add a removable media device (such as a tape or CDROM) to an HSJ- or HSD-series controller, the host will not be able to access the device until one of the following occurs: · The media is loaded into the device. · The controller is reinitialized. · The host is reinitialized. · The virtual circuit is broken and reestablished. ------------------------------------------------------------ For manual configuration, the following steps add devices, storage sets, and logical units. Use the CLI to complete these steps so that the host will recognize the storage device. (These steps can be run from a virtual terminal.) 4-8 Normal Operation 1. Add the physical devices by using the following command: CLI> ADD device-type device-name scsi-location For example: CLI> ADD DISK DISK100 1 0 0 CLI> ADD TAPE TAPE510 5 1 0 CLI> ADD CDROM CDROM0 6 0 0 where: device-type is the type of device to be added. This can be DISK, TAPE, or CDROM. device-name is the name to refer to that device. The name is referenced when creating units or storage sets. SCSI-location is the port, target, and LUN (PTL) for the device. When entering the PTL, at least one space must separate the port, target, and LUN. 2. Add the storage sets for the devices. See Appendix B for examples for adding storage sets. (If you do not desire storage sets in your configuration, skip this step.) ------------------------------------------------------------ CAUTION ------------------------------------------------------------ The INITIALIZE command destroys all data on a container. See Appendix B for specific information on this command. ------------------------------------------------------------ 3. Enter the following command to initialize the containers (devices, or storage sets, or both) prior to adding logical units to the configuration. CLI> INITIALIZE container-name where container-name is a device or storage set that will become part of a unit. When initializing a single-device container: · If NOTRANSPORTABLE (the default) was specified when the device was added, a small amount of disk space was made inaccessible to the host and used for metadata. The metadata will now be initialized. · If TRANSPORTABLE was specified, any metadata on the device will now be destroyed. See Appendix B for details on metadata and when INITIALIZE is required. 4. Add the units that use either the devices or the storage sets built from the devices by entering the following command: CLI> ADD UNIT logical-unit-number container-name where: logical-unit-number is the unit number the host uses to access the device. container-name identifies the device or the storage set. Normal Operation 4-9 4.4 Acceptance Test After you install, set parameters for, and configure your controller, follow the guidelines in this section to acceptance test your subsystem. 1. Turn your system on. This resets all shelves and starts the spin-up cycle on devices within the shelves. This includes the initialization (diagnostics) on the controller(s) and device self-tests. 2. Run DILX using the default answers to the test questions (see Chapter 6). This tests all disk devices in your subsystem. 3. Run TILX using the default answers to the test questions (see Chapter 6). This tests all tape devices in your subsystem. 4.5 Maintenance Terminal A maintenance terminal is a locally connected EIA-423 compatible terminal (a terminal connected directly to the controller MMJ). You do not need a maintenance terminal for normal operation. However, you must connect a maintenance terminal for initial controller configuration. Thereafter, use either a maintenance terminal or a host (virtual) terminal to communicate with the controller. Follow this procedure to connect a maintenance terminal: 1. Make sure the power switch on the back of the terminal is off (O). 2. Connect one end of the terminal cable to the back of the terminal. 3. Connect the other end of the terminal cable to the MMJ on the controller. 4. Set your terminal at 9600 baud, 8 data bits, 1 stop bit, and no parity. Refer to your terminal documentation for terminal setup instructions. 4.6 Virtual Terminal (HSJ- and HSD-Series Controllers) After installation and setting of initial controller parameters through a maintenance terminal, controller functions may be executed from a virtual host terminal through a DUP connection. Refer to Section 4.3.1 for information on making the virtual connection. Establishing a virtual terminal session under the OpenVMS VAX and OpenVMS AXP operating systems (SET HOST/DUP) requires the FYDRIVER. The following error indicates that the FYDRIVER has not been loaded: %HSCPAD-F-DRVNOTLOAD, FYDRIVER not loaded -SYSTEM-W-NOSUCHDEV, no such device available If you receive this message, load the FYDRIVER as follows: · For OpenVMS VAX $ MCR SYSGEN SYSGEN> LOAD SYS$LOADABLE_IMAGES:FYDRIVER SYSGEN> CONNECT FYA0 /NOADAPTER SYSGEN> EXIT $ 4-10 Normal Operation · For OpenVMS AXP $ MCR SYSMAN SYSMAN> IO CONNECT FYA0 /NOADAPTER/DRIVER=SYS$FYDRIVER SYSMAN> EXIT $ Once FYDRIVER is loaded, you may make the virtual terminal connection as follows: $ SET HOST/LOG=CONFIGURATION.INFO/DUP/SERVER=MSCP$DUP/TASK=CLI SCS_nodename 4.7 Virtual Terminal (HSZ-series Controllers) A virtual terminal port can be created through a host-based application called HSZUTIL (HSZ-series controller). This program uses SCSI diagnostic send and receive commands to deliver and receive characters to and from the HSZ-series CLI and local programs. See Chapter 6 for more information on the HSZUTIL application. 4.8 VAXcluster Console System You can run VAXcluster Console System (VCS) with any HS controller. If you are unfamiliar with VCS, refer to the VCS Software Manual for instructions. ------------------------------------------------------------ Note ------------------------------------------------------------ VCS can only be used from a terminal connected to a maintenance terminal port. ------------------------------------------------------------ 4.9 Operating Systems The following sections describe particulars associated with host operating systems which may help in understanding and servicing the HS controllers. The two primary operating systems that support the HS controllers are the OpenVMS and DEC OSF/1 AXPoperating systems as shown in Table 4-1: Table 4-1 Operating System Support ------------------------------------------------------------ Operating System HSJ-series HSD-series HSZ-series ------------------------------------------------------------ OpenVMS AXP V1.5 1 V1.5 1 N/S 2 OpenVMS VAX V5.5-2 1 V5.5-2 N/S 2 VAX VMS V5.5-1 1 N/S 2 N/S 2 DEC OSF/1 AXP N/S 2 N/S 2 V2.0 ------------------------------------------------------------ 1 Supported with limitations. 2 Not supported at time of printing. ------------------------------------------------------------ Refer to your firmware release notes for updates to the list of operating system support. Normal Operation 4-11 Although certain specifics regarding operating systems are covered here, you should refer to the StorageWorks Array Controllers HS Family of Array Controllers User 's Guide for complete information on operating system support. 4.9.1 Controller Disks as System Initialization Disks HSJ-series controllers HSJ-series controller disks as VAX 7000(TM) and VAX 10000(TM) initialization devices--HS operating firmware supports manual and automatic initialization for VAX 7000/10000 systems. For a disk drive connected to an HSJ-series controller to be both a VAX 7000/10000 manual and automatic initialization device, the following conditions must be met: · VAX 7000/10000 console code must be at version V3.2 or higher. · HS operating firmware must be at version V1.0B or higher. ------------------------------------------------------------ Note ------------------------------------------------------------ Contact Digital Multivendor Services if you need to upgrade to V3.2 or greater VAX 7000/10000 console code. ------------------------------------------------------------ If your VAX 7000/10000 console code version is earlier than V3.2, you are limited to manual initialization. To manually initialize, perform the following steps: 1. Make sure that the disk drives attached to the HSJ-series controller are visible to the initialization driver by entering the SHOW DEVICE command repeatedly (from the virtual terminal) until the disk drives attached to the HSJ-series controller are reported (usually two repetitions are sufficient). 2. Enter the default initialization device string. (Refer to the VAX console instructions in the VAX console documentation.) 3. Enter BOOT. HSD-series controllers An HSD-series unit can be an OpenVMS operating system initialization disk. HSZ-series controllers An HSZ-series unit can be a DEC OSF/1 AXP operating system initialization disk if the system unit is LUN 0 as seen by the host CPU. 3 4.9.2 Operating System Nodes (OpenVMS) Be aware of the following condition for HSJ-series controllers: · If a controller is already an active member of an OpenVMS cluster and you change its current CI node number but not its CI node name, and then restart the controller with the new node number, access to its devices and overall cluster operation will be adversely affected. This occurs because the OpenVMS operating system makes continuous attempts to establish new virtual circuits with new nodes, and it will find a known node name at a new node address. This operation is a security feature provided by the operating system to prevent one CI node from masquerading as another. ------------------------------------------------------------ 3 See the HSZ-series firmware release notes for restrictions. 4-12 Normal Operation · If the controller CI node number and node name are both changed, and you restart the controller while the OpenVMS cluster remains operational, the operating system will establish communication with the controller using the new CI node address and CI node name. Normal operation will occur, with the exception that the controller 's devices will be assigned new device names based on the controller 's new node name. · If it is necessary to change only the controller 's CI node number, all CI host CPU nodes must be shut down and then restarted. 4.9.3 AUTOGEN.COM (OpenVMS) The OpenVMS AUTOGEN.COM file must be edited for HSJ- and HSD-series controller-attached disks to be recognized. If AUTOGEN is run without modification in a system that includes such controller-attached disk drives, the following error message is displayed: "** WARNING ** - unsupported system disk type. Using speed and size characteristics of an RK07." The AUTOGEN program does not recognize the device types of the controller 's attached devices. The OpenVMS DCL lexical F$GETDVI returns the following values: OpenVMS VAX V6.0 VAX VMS V5.5-1 OpenVMS VAX V6.1 OpenVMS VAX V5.5-2 ---------------- ------------------ 141 - HSX00 35 - unknown device 142 - HSX01 35 - unknown device The AUTOGEN.COM DCL procedure must be modified as follows to support these values: VAX VMS V5.5-1 and OpenVMS V5.5-2 The AUTOGEN.COM DCL procedure will select a -1 (unsupported device) from the speed list. To circumvent this problem, perform the following steps: 1. Make a copy of the AUTOGEN.COM DCL file in case restoration of the original state is required. 2. The section of AUTOGEN.COM (from OpenVMS software V5.5-2) dealing with devices is shown below. Change one element in the speed list (the -1 shown enclosed in a box) to a 4. $speed_list=" -1, 2, 2, 4, 4, 4, 4, 4, 4, 1, 1,-1,-1, 4,-1, 4,-1,-1, 1, 2" $speed_list=speed_list + ", 4, 4, 4, 2, 2, 1,-1, 1, 1, 2, 4, 1, 1,-1,-1, ------------------------------------------------------------ ------------------------------------------------------------ -1, ------------------------------------------------------------ ------------------------------------------------------------ -1,-1, 4, 4" $speed_list=speed_list + ", 1, 1, 1, 4, 4, 1, 4,-1, 4, 4, 4, 4,-1,-1, 4,-1, 4, 4,-1, 4" $speed_list=speed_list + ", 4, 4,-1,-1, 4, 4, 2,-1,-1,-1, 4,-1, 1,-1, 4, 4, 4, 4, 4, 4" $speed_list=speed_list + ", 4, 4, 4, 4,-1, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4" $speed_list=speed_list + ", 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4" $speed_list=speed_list + ", 4, 4, 4, 4, 4, 4, 4" $diskspeed=-1 $temp = F$GETDVI("sys$sysdevice","DEVTYPE") $IF (temp .LE. 126) .AND. (temp .GE. 1) - THEN diskspeed = F$ELEMENT(temp,",",speed_list) $disksize = F$GETDVI("sys$sysdevice","MAXBLOCK") $IF diskspeed .NE. -1 THEN GOTO getdata30 Normal Operation 4-13 3. Run the AUTOGEN program. Completing this procedure causes the disk drives to be recognized as supported device types. OpenVMS VAX V6.0 The AUTOGEN.COM DCL procedure does not support device types above 137 although HSX00 and HSX01 are properly defined in the speed list. To circumvent this problem, perform the following steps: 1. Make a copy of the AUTOGEN.COM DCL file in case restoration of the original state is required. 2. Edit the AUTOGEN.COM file. Change the value 137 in the following statement to 142. $IF (temp .LE. 137) .AND. (temp .GE. 1) - 3. Run the AUTOGEN program. This change will allow AUTOGEN to run successfully against the controller- attached disk drives used as system disks. OpenVMS VAX V6.1 The OpenVMS VAX V6.1 operating system does not require modifications to AUTOGEN.COM as described in the previous sections. 4.9.4 Other Conditions (OpenVMS) The following conditions and recommendations also apply to controllers running under the OpenVMS operating system: · MSCP and TMSCP controller timeouts The MSCP and TMSCP controller timeouts have been split and the TMSCP timeout has been increased from 200 to 255 seconds. This is to reduce host resets from the TU driver in OpenVMS VAX V5.5-2 that occur when the driver sends multiple position commands to a tape drive with shorter timeouts. This change in HSJ- and HSD-series controller firmware will reduce but not eliminate the rate of these host resets. · Write history log The write history log has been increased from 512 to 2048 entries. The allocation failure entry table has also been increased from 128 to 512 entries. This should eliminate or drastically reduce VMS crashes from entries and tables filling up while the OpenVMS software is using Host-Based Volume Shadowing (HBVS) on the HSJ- or HSD-series controller. · Increased storage set size Fourteen-member RAID 0 storage sets are now supported. Previous versions of HS operating firmware supported only five-member storage sets. The OpenVMS VAX operating system maximum capacity restriction for file- structured volumes, 16,777,216 blocks (about 8.5 gigabytes), remains in effect for operating system versions prior to V6.0. · The CLUSTER_SIZE qualifier for large devices or storage sets Digital recommends that the formula displayed by the OpenVMS HELP DEVICE INIT/CLUSTER_SIZE command be used to determine the proper OpenVMS file system cluster size. Using too small a file system cluster size may prevent some of the device or storage set capacity from being 4-14 Normal Operation accessed; too large a cluster size usually wastes storage capacity by allocating large blocks of storage for small files. · Shadow set operation In the OpenVMS VAX operating system versions earlier than V6.0, timed-out I/O requests to shadow set members may lead to member disks attached to controllers being dropped from shadow sets. In some cases, this may lead to host crashes. To avoid this possibility, Digital recommends changing the value of the SYSGEN parameter SHADOW_MBR_TMO to at least 120 (seconds) for systems running operating system versions earlier than V6.0. (Be aware that your system may temporarily pause during the 120 second interval.) Version 6.0 of the OpenVMS VAX operating system avoids this problem by retrying timed-out operations to shadow set members several times. · PAPOLLINTERVAL and PANUMPOLL parameters Digital recommends that the SYSGEN parameters PAPOLLINTERVAL and PANUMPOLL be set such that all nodes in the cluster are polled within 30 seconds or less. This ensures proper operation of the HSJ-series CI in the event of controller reinitialization. Failure to set this value may result in MSCP command timeouts. The default values are set to poll 16-node clusters every 5 seconds and 32-node clusters every 10 seconds. 4.10 Failover Failover takes place when one controller fails in a dual-redundant configuration. To support failover, information is shared between the two controllers, such as: · Physical device PTL configurations · Storage set names · Logical unit definitions Prior to failover, all resources are considered unbound to a particular controller, until a logical unit is brought on line by the host through (one of) the controllers. At this point, all containers used by the logical unit become solely accessible through the one controller. In a failover configuration, all commands are shared between the two controllers except the following: SET THIS_CONTROLLER SET OTHER_CONTROLLER SHOW THIS_CONTROLLER SHOW OTHER_CONTROLLER RESTART THIS_CONTROLLER RESTART OTHER_CONTROLLER SHUTDOWN THIS_CONTROLLER SHUTDOWN OTHER_CONTROLLER In these cases, the command will be directed to the correct controller: · THIS_CONTROLLER refers to the controller to which the terminal is connected. · OTHER_CONTROLLER refers to the other controller in the dual-redundant pair. Normal Operation 4-15 4.10.1 Setting Failover To place two controllers into failover configuration, enter the following command: CLI> SET FAILOVER COPY=configuration-source where configuration-source is either THIS_CONTROLLER or OTHER_ CONTROLLER, depending on where the ``good'' copy of device configuration information is found. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Digital recommends that the controllers be set for failover before any device configuration commands are entered. Then, as devices, storage sets, and units are added to one controller 's configuration, they are automatically added to the other controller 's configuration. Given two controllers, it is possible to fully configure one controller, and then enter the SET FAILOVER command, but if the wrong configuration- source is specified, all device configuration information will be lost (overwritten). Never blindly specify SET FAILOVER. Know where your good configuration information resides before entering the command. (A considerable amount of work and effort could easily be lost by overwriting good information.) ------------------------------------------------------------ ------------------------------------------------------------ Note ------------------------------------------------------------ Due to the amount of information that must be passed between the two controllers, the SET FAILOVER command may take up to one minute to complete. ------------------------------------------------------------ 4.10.2 Exiting Failover To take two controllers out of the failover configuration, enter the following command: CLI> SET NOFAILOVER This removes the controller from the failover configuration (as well as the other controller, if it is reachable). No device configuration information is lost from either controller. 4.10.3 Failing Over A failed or unresponsive controller in a dual-redundant configuration is disabled by its companion controller. The functioning controller sends a signal to the other controller to induce failover. The functioning controller assumes control of the storage devices that were on line to the disabled controller. Maintenance can now take place on the failed controller. Failover should normally complete in 30 seconds or less (15 seconds or less for three-port controllers). If there is no outstanding drive I/O activity at the time of controller failure, failover should require substantially less than 30 seconds. If drive I/O is in progress at the time of failure, the surviving controller must reset any SCSI buses with outstanding I/O. These bus resets can require up to 5 seconds per port to complete. 4-16 Normal Operation Whenever you need to revive a controller that was disabled, you must enter the following command from a terminal connected to the functioning controller: CLI> RESTART OTHER_CONTROLLER Then, press the reset (//) button to initialize the controller. You may test failover by removing the program card from one of the controllers. The other controller will assume service to the dormant controller 's devices until you reinsert the program card and reinitialize/restart the controller. 4.10.4 Failover Setup Mismatch During failover mismatch, one controller will function while the second controller will not recognize any devices. Although it is rare, a failover mismatch may occur during the following scenarios: · If the controllers initialize at exactly the same time, one controller may be set for failover while the other is not. · If one controller is running (operating normally) while the second controller is initialized, mismatch may occur. For example, this can happen after one controller was undergoing maintenance. To correct a failover mismatch, stop all processes on the devices for both controllers. Then enter the following commands to determine which controller has the desired, good configuration information: CLI> SHOW UNITS CLI> SHOW STORAGESETS CLI> SHOW DEVICES After deciding on one of the two configurations, use the SET FAILOVER command to copy the good information from one controller to the other. 4.11 Moving Devices Between Controllers The moving of devices from one controller to another is supported under the following conditions: · For nontransportable devices Under normal operation, the controller makes a small portion of a disk inaccessible to the host and uses this area to store metadata. Metadata improves error detection and media defect management. Devices utilizing metadata are called nontransportable. Initializing a device that is set as nontransportable will place/reset metadata on the device. When bringing other HS controller 4 (nontransportable) devices to an HS controller subsystem, simply add the device to your configuration using the ADD command. Do not initialize the device or you will reset/destroy forced error information on the device. When adding devices, the controller firmware will verify that metadata is present. If in doubt, try to add the device so that the controller will check for ------------------------------------------------------------ 4 For purposes of setting transportable/nontransportable devices, the HSC K.scsi controllers may be considered compatible with the HS controllers. Normal Operation 4-17 metadata. If an error stating that there is no metadata occurs, initialize the device before adding it. A nontransportable device is interchangeable with an HSC(TM) K.scsi module or another HS controller subsystem. Nontransportable devices are MSCP compliant and support forced error. · For transportable devices A transportable feature is provided for transfer of devices between non- HS controller systems and HS controller arrays. Transportable devices do not have metadata on them, and initializing a device after setting it as transportable will destroy metadata (if any) on the device. Before moving devices from an HS controller subsystem to a non-HS controller system, delete the unit associated with the device and set the device as transportable. Then, initialize the device to remove any metadata. When bringing non-HS controller devices to an HS controller subsystem, initialize the device after setting it transportable, then copy the data on the device to another, nontransportable, unit. Then, reinitialize the device after setting it nontransportable (thereby putting metadata on the device). You must initialize these devices because they may contain intact metadata blocks, which can ``fool'' the controller into attempting to run with the device. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Do not keep any device set as transportable on an HS controller subsystem. Doing so sacrifices forced error support on all units attached to the device. This is mandatory for HBVS and improving data integrity on the entire array. ------------------------------------------------------------ A transportable device is interchangeable with any SCSI interface that does not utilize the device metadata (for example, a VAX workstation, an SZ200, or a PC). Transportable devices are not MSCP compliant, do not support forced error, and may not be members of a shadow set. A controller error (see Chapter 5) will occur if the operating system attempts to write forced error information to a transportable device. ------------------------------------------------------------ Note ------------------------------------------------------------ Be careful not to confuse the terms ``transportable'' and ``nontransportable'' with the commands TRANSPORTABLE and NOTRANSPORTABLE. See Appendix B for more information on these commands. ------------------------------------------------------------ Transportable/nontransportable device support is summarized in Table 4-2. Table 4-2 Transportable and Nontransportable Devices ------------------------------------------------------------ Media Format VAX or AXP Workstation HSC K.scsi HSD05 HS Controller ------------------------------------------------------------ Transportable Yes No Yes Yes Nontransportable No Yes No Yes ------------------------------------------------------------ 4-18 Normal Operation 5 ------------------------------------------------------------ Error Analysis and Fault Isolation This chapter describes the errors, faults, and significant events that may occur during HS controller initialization and normal operation. A translation of the events, and in most cases how to respond to a specific event, is also given. The error and event descriptions isolate failures to the field replaceable unit (FRU). However, in most cases additional information for diagnosis beyond the FRU is given. This information will help increase your knowledge of controller functions and assist with your report to depot repair personnel. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Do not attempt to replace or repair components within FRUs or equipment damage may result. Use the controller fault indications and error logs to isolate FRU-level failures. ------------------------------------------------------------ 5.1 Special Considerations Some or all of the situations presented iun the following sections may apply when your controller detects a fault. 5.1.1 Nonredundant Configurations When a controller (or its cache module, or both) fails in a nonredundant configuration, a short period of system down time is needed to remove the faulty unit and install a replacement. The devices attached to that controller will be off line for the duration of the remove and replace cycle. 5.1.2 Dual-redundant Configurations When a controller fails in a dual-redundant configuration, fault isolation and corrective actions are similar to a nonredundant configuration. However, failover takes place, so the surviving controller takes over the failed controller 's ports and devices. 5.1.3 Cache Module Failures If a cache module fails, its controller still functions; however, Digital recommends that you replace the cache module as soon as possible. When a cache module fails in a dual-redundant configuration, cache failover occurs so that the companion cache module can take over all caching operations. Error Analysis and Fault Isolation 5-1 5.2 Types of Error Reporting The controller can notify you of an error through one or more of the following means: · The OCP · Device LEDs · Error messages at a host virtual terminal, or error messages at a maintenance terminal (if attached) · Host error logs 5.3 Troubleshooting Basics When an error occurs, use the following steps as top-level guidelines for fault isolation: 1. Make a note of all visual indicators (OCP, device LEDs, or error messages) available to you. 2. Extract and read host error logs (see Section 5.7). 3. Errors can be intermittent; reset the controller to see if the error clears. 1 4. See if the error indication changes after resetting the controller. If the error remains the same, look up the cause for that error. If the indication changes, look up the cause for the newer error. See Sections 5.4 through 5.6 for detailed information about errors and repair actions. 5.4 Operator Control Panel The operator control panel (OCP) includes the following: · One reset button with an embedded green LED · One button per SCSI port · Six amber LEDs 2 Figure 5-1 shows the OCP from the HSZ40 controller. The buttons and LEDs serve different functions with respect to controlling the SCSI ports and/or reporting fault and normal conditions. Button and LED functions are discussed in the following sections. ------------------------------------------------------------ 1 Record which devices have lit/flashing fault LEDs before resetting as a reset may temporarily clear the LED even though the fault remains. 2 The HSJ-series has the amber LEDs embedded in the port buttons. 5-2 Error Analysis and Fault Isolation Figure 5-1 HS Controller Operator Control Panel 5.4.1 Normal Operation The green LED (//) reflects the state of the controller and the host interface. Once controller initialization completes and its firmware is functioning, the green button flashes continuously at 1 Hz. Pressing the green button during this normal operation resets the controller. Under normal operation, the amber LEDs indicate the state of the respective SCSI-2 buses attached to the controller. When the devices on the buses are functioning correctly, the amber LEDs will not be lit or flashing. Pressing one of the port buttons at this time will light its corresponding amber LED and quiesce its SCSI-2 port. You must quiesce a port to remove or warm swap a device on the SCSI-2 bus for that port. (Once you replace the device, you can press the button again to turn off the LED and reactivate the port.) See Chapter 7 for a detailed description of removing and replacing devices. Error Analysis and Fault Isolation 5-3 5.4.2 Fault Notification The OCP LEDs display information when the HS controller encounters a problem with a device configuration, a device, or the controller itself. Should a configuration mismatch or a device fault occur, the amber LED for the affected device's bus will light continuously. For controller problems, LED codes determined by internal diagnostics and operating firmware will indicate either controller faults or HS operating firmware program card faults. In either case, the single (green) reset (//) LED lights continuously when an error is detected. The remaining (amber) LEDs display the error codes in two different ways: · The error code will be lit continuously for faults detected by internal diagnostic and initialization routines. See Figure 5-2 to determine what these codes mean. · The error code will flash at 3 Hz representing faults that occur during normal controller operation. See Figure 5-3 to determine what these codes mean. Figure 5-2 Solid OCP Codes Reset123456Description of ErrorAction ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 00DAEMON hard error.Replace controller module. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 01Repeated firmware bugcheck.Replace controller module. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 02NVMEM version mismatch.Replace program card withlater version. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 03NVMEM write error.Replace controller module. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 04NVMEM read error.Replace controller module. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 05Inconsistent NVMEM structuresrepaired1.RESET (//) the controller. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 06NMI error.Replace controller module. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 07Bugcheck with no restart.RESET (//) the controller. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 08NVMEM contents invalid.Replace controller module. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 3FNo program card seen2.Replace controller module. ------------------------------------------------------------ ------------------------------------------------------------ off ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ lit continuously ------------------------------------------------------------ DAEMON = Diagnostic and Execution MonitorNVMEM = Nonvolatile MemoryNMI = Nonmaskable Interrupt1 A power failure or controller reset during an NVMEM update causes this error. If the error occurs on one controller in a dual­redundantconfiguration, a configuration mismatch will probably occur upon restart.2 Try the card in another module. If the problem moves with the card, replace the card. If the problem does not move with the card, replace thecontroller module. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ Figure 5-3 Flashing OCP Codes Reset123456Description of ErrorAction ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 01Program card EDC error.Replace program card. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 04Timer zero in the timer chip will run whendisabled.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 05Timer zero in the timer chip decrementsincorrectly.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 06Timer zero in the timer chip did not interruptthe processor when requested.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 07Timer one in the timer chip decrementsincorrectly.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 08Timer one in the timer chip did not interruptthe processor when requested.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 09Timer two in the timer chip decrementsincorrectly.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 0ATimer two in the timer chip did not interruptthe processor when requested.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 0BMemory failure in the I/D cache.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 0CNo hit or miss to the I/D cache whenexpected.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 0DOne or more bits in the diagnostic registersdid not match the expected reset valueReplace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 0EMemory error in the nonvolatile journalSRAM.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 0FWrong image seen on program card.Replace program card. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 10At least one register in the controller DRABchip does not read as written.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 11Main memory is fragmented into too manysections for the number of entries in the goodmemory list.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 12The controller DRAB chip does not arbitratecorrectly.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 13The controller DRAB chip failed to detectforced parity or detected parity when notforced.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ off ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ lit continuously ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ flashing ------------------------------------------------------------ I/D = Instruction/Data (cache on the controller module)DRAB = Dynamic RAM Controller and Arbitration Engine (operates controller shared memory)ECC = Error Correction CodeEDC = Error Detection CodeSRAM = Static RAMNXM = Nonexistent Memory ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ Figure 5-3 (Cont.) Flashing OCP Codes Reset123456Description of ErrorAction ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 14The controller DRAB chip failed to verify the EDCcorrectly.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 15The controller DRAB chip failed to report forcedfailed ECC.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 16The controller DRAB chip failed some operation inthe reporting, validating, and testing of the multibitECC memory error.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 17The controller DRAB chip failed some operation inthe reporting, validating, and testing of the multiplesingle­bit ECC memory error.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 18The controller main memory did not write correctlyin one or more sized memory transfers.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 19The controller did not cause an I­to­N bus timeoutwhen accessing a reset host port chip.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 1AThe controller DRAB chip did not report an I­to­Nbus timeout when accessing a reset host port chip.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 1BThe controller DRAB did not interrupt thecontroller processor when expected.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 1CThe controller DRAB did not report an NXM errorwhen nonexistent memory was accessed.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 1DThe controller DRAB did not report an addressparity error when one was forced.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 1EThere was an unexpected nonmaskable interruptfrom the controller DRAB during the DRABmemory test.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 20The required amount of memory available for thecode image to be loaded from the program card isinsufficient.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 21The required amount of memory available in thepool area is insufficient for the controller to run.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 23The required amount of memory available in thebuffer area is insufficient for the controller to run.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ off ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ lit continuously ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ flashing ------------------------------------------------------------ I/D = Instruction/Data (cache on the controller module)DRAB = Dynamic RAM Controller and Arbitration Engine (operates controller shared memory)SRAM = Static RAMECC = Error Correction CodeEDC = Error Detection CodeNXM = Nonexistent Memory ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ Figure 5-3 (Cont.) Flashing OCP Codes Reset123456Description of ErrorAction ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 24The code image was not the same as the image onthe card after the contents were copied to memory.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 30The journal SRAM battery is bad.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 3AThere was an unexpected interrupt from a readcache or the present and lock bit are not workingcorrectly.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 3BThere is an interrupt pending to the controller'spolicy processor when there should be none.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 3CThere was an unexpected fault duringinitialization.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 3DThere was an unexpected maskable interruptreceived during initialization.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 3EThere was an unexpected nonmaskable interruptreceived during initialization.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 3FAn illegal process was activated duringinitialization.Replace controllermodule. ------------------------------------------------------------ ------------------------------------------------------------ off ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ lit continuously ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ flashing ------------------------------------------------------------ I/D = Instruction/Data (cache on the controller module)DRAB = Dynamic RAM Controller and Arbitration Engine (operates controller shared memory)SRAM = Static RAMECC = Error Correction CodeEDC = Error Detection CodeNXM = Nonexistent Memory ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ 5.5 Device LEDs The storage devices (SBBs) and their power supplies have LEDs to indicate power and status. You can use these LEDs in conjuction with the OCP indicators to isolate certain faults, as discussed in the following sections. 5.5.1 Storage SBB Status Device shelves monitor the status of the storage SBBs. When a fault occurs, the fault and the SBB device address (SCSI target ID) are reported to the controller for processing. The SBB internal fault/identity bus controls the fault (lower) LED. As shown in Figure 5-4, each storage SBB has two LED indicators that display the SBB's status. These LEDs have three states: on, off, and flashing. · The upper LED (green) is the device activity LED and is on or flashing when the SBB is active. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Do not remove a storage SBB when the upper LED is on or flashing. This can cause the loss or corruption of data. ------------------------------------------------------------ · The lower LED (amber) is the storage SBB fault LED and indicates an error condition when it is either on or flashing. When this LED indicates a fault, the amber controller OCP LED for the device's port will be lit continuously as well. You should record which devices have lit/flashing fault LEDs before resetting the controller as a reset may temporarily clear the LED even though the fault remains. 5-8 Error Analysis and Fault Isolation Figure 5-4 Storage SBB LEDs DEVICEğTx FAULTğTx (AMBER)ğTX ------------------------------------------------------------ ------------------------------------------------------------ DEVICEğTx ACTIVITYğTx (GREEN)ğTX CXO-3671A-PHğTX ------------------------------------------------------------ · When the lower LED is off, either there is an input power problem or the power supply is not functioning. Figure 5-5 Power Supply LEDs ------------------------------------------------------------ ------------------------------------------------------------ SHELFğTx STATUS LEDğTX POWER SUPPLYğTx STATUS LEDğTX CXO-3613B-PHğTX ------------------------------------------------------------ ------------------------------------------------------------ Table 5-3 Shelf and Dual Power Supply Status LEDs ------------------------------------------------------------ Status LED PS1+ PS2+ Indication ------------------------------------------------------------ Shelf (upper) PS (lower) On On On On Normal status. System is operating normally. Shelf (upper) PS (lower) Off On Off On Fault status. There is a shelf fault; there is no power supply fault. Replace blower as described in Chapter 7. Shelf (upper) PS (lower) Off On Off Off Fault status. PS1 is operational. Replace PS2 as described in Chapter 7. Shelf (upper) PS (lower) Off Off Off On Fault status. PS2 is operational. Replace PS1 as described in Chapter 7. Shelf (upper) PS (lower) Off Off Off Off Fault status. Possible PS1 and PS2 fault or input power problem ------------------------------------------------------------ + Shelf power supply installed in slot 7. + Redundant power supply installed in slot 6. ------------------------------------------------------------ 5.6 Error Messages The HS operating firmware is designed to send messages to a virtual terminal and/or maintenance terminal under certain fault conditions. The messages appear on the lines just before the CLI prompt, as shown in the following example: SWAP signal cleared - all SWAP interrupts re-enabled CLI> You might not have a remote or maintenance terminal connected to display messages. In this case, the HS operating firmware saves messages for you. You need only connect a terminal and press the ------------------------------------------------------------ ------------------------------------------------------------ Return ------------------------------------------------------------ ------------------------------------------------------------ key to see the 15 most recently received error messages. Often, messages will continue to appear each time ------------------------------------------------------------ ------------------------------------------------------------ Return ------------------------------------------------------------ ------------------------------------------------------------ is pressed. To clear the terminal of the errors, enter the CLEAR_ERRORS command. (You may want to make a note of the errors before clearing them because they cannot be recalled afterwards.) ------------------------------------------------------------ Note ------------------------------------------------------------ Because the severity of errors varies, the controller may or may not initialize or operate, or both, even though an error message appears. For example, if all of the SCSI ports, or the host port and local terminal port fail diagnostics, the controller will not operate. However, if the cache module fails during normal operation, the controller will continue to operate. You will have to extract the host error log to determine the cause of this error. ------------------------------------------------------------ Error Analysis and Fault Isolation 5-11 The following sections list automatic messages you may encounter. The controller sends these messages when the specific fault is detected, regardless of whether or not you are interactively viewing or using the virtual or maintenance terminal. These messages differ in this respect from the ones listed in Appendix B, which appear based on your inputs to the CLI. Be aware that not all the error messages listed in this section will pertain to your model of controller. Some messages are specific to the HSJ-, HSD-, or HSZ-series controllers. 5.6.1 Diagnostic Messages This section contains error messages that may be displayed if a fault occurs during initialization or self-test diagnostics. See Chapter 6 for more information on diagnostics. Half CACHE FAILED Diagnostics Explanation: Up to 50% of the cache memory has failed diagnostic tests. Whole CACHE FAILED Diagnostics Explanation: The cache module has failed diagnostics tests. SCSI port n FAILED Diagnostics Explanation: A SCSI-2 port has failed diagnostics. This message can appear even if you do not have a host connection. The variable n indicates which port failed. HOST port FAILED Diagnostics Explanation: The host port of the controller has failed diagnostics. CI Path x has FAILED external loop-back Diagnostics Explanation: The CI path named by x has failed the loop-back diagnostics. x can be A or B. Local Terminal Port FAILED Diagnostics Explanation: The maintenance (EIA-423) terminal port has failed diagnostics. 5.6.2 NVPM Messages The messages listed in this section are displayed because of a problem or fault associated with the nonvolatile parameters in memory (NVPM). ------------------------------------------------------------ Note ------------------------------------------------------------ Some NVPM messages will read "NVPM component-name component initialized to default settings." For some of these initialization cases, corrective action may only clear the error message until the next time the controller is reset because the error could be caused by a fault in NVPM itself. If the error persists, replace the controller module. ------------------------------------------------------------ 5-12 Error Analysis and Fault Isolation NVPM Revision level updated from n to N. Explanation: The format of the NVPM has changed as a result of installing a newer program card (containing updated firmware). However, all subsystem configuration information has been retained. NVPM Failover Information component initialized to default settings. Explanation: The identity of the other controller in a dual-redundant pair has been lost. Enter the SET FAILOVER COPY=OTHER_CONTROLLER command to correct this problem. NVPM Host Interconnect Parameters component initialized to default settings. Explanation: The SCS node name, CI node number, or Path A, or Path B enable settings for this controller have been lost. To correct this problem, enter the SHOW THIS_CONTROLLER and SHOW OTHER_CONTROLLER commands to determine the current controller settings. Use the SET THIS_CONTROLLER and SET OTHER_ CONTROLLER commands to restore settings. NVPM Host Protocol Parameters component initialized to default settings. Explanation: The tape and disk MSCP allocation class settings for this controller have been lost. To correct this problem, enter the SHOW THIS_CONTROLLER and SHOW OTHER_CONTROLLER commands to determine the current controller settings. Use the SET THIS_CONTROLLER and SET OTHER_ CONTROLLER commands to restore settings. NVPM User Interface Parameters component initialized to default settings. Explanation: Terminal setting information has been lost. To correct this problem, enter the SHOW THIS_CONTROLLER and SHOW OTHER_CONTROLLER commands to determine the current terminal settings. Compare the terminal settings with the CONFIGURATION.INFO output information, and use the SET THIS_CONTROLLER and SET OTHER_CONTROLLER commands to restore terminal settings. The following NVPM Configuration Information component elements were initialized to default settings: [n .... Explanation: The settings given by n have been initialized in connection with another NVPM error. To clear this error, perform the following procedure: 1. Enter the following commands: CLI> SHOW DEVICES CLI> SHOW UNITS CLI> SHOW STORAGESETS 2. Compare the information displayed with a printout of the CONFIGURATION.INFO file or with a copy of the most current configuration. 3. Reconfigure the necessary devices, units, or storage sets. (See the CLI commands described in Appendix B.) Error Analysis and Fault Isolation 5-13 ------------------------------------------------------------ CAUTION: Replace the controller immediately if any of the following messages occur. Do not continue to use the controller. ------------------------------------------------------------ NVPM Controller Characteristics component initialized to default settings. The following NVPM Manufacturing Failure Information component elements were initialized to default settings: [...list of component elements NVPM Recursive Bugcheck Information component initialized to default settings. NVPM System Information Page component initialized to default settings. NVPM Volume Serial Number component initialized to default settings. All NVPM components initialized to their default settings. Unknown NVPM Revision Level. Unknown reformat stage encountered during NVPM Revision Level 1 to 2 reformat. Controller Characteristics component reformat failed during NVPM Revision Level 1 to 2 reformat. Host Access Disabled. ------------------------------------------------------------ 5.6.3 CLI Automatic Messages This section lists the automatic messages displayed by the CLI. Device and/or Storageset names changed to avoid conflicts Explanation: Digital adds new CLI keywords at each new HS operating firmware release that can conflict with existing device and/or storage set names. When this happens, HS operating firmware changes your device and/or storage set names and sends this message. The functional operation of your configuration is not changed when this message appears. Controllers misconfigured. Type SHOW THIS_CONTROLLER Explanation: If this message appears, examine the SHOW THIS_ CONTROLLER display to determine the source of the misconfiguration. Taken out of failover due to serial number format error Explanation: An invalid serial number format was entered for the second controller of a dual-redundant pair. Serial number initialized due to format error Explanation: An invalid serial number was entered for the second controller of a dual-redundant pair. Configuration information deleted due to internal inconsistencies Explanation: This message is displayed if a test of nonvolatile memory shows corruption. The configuration information for the controller is deleted when this message is displayed. Restart of the other controller required Explanation: When changing some parameters, you must reinitialize the companion controller in a dual-redundant pair to have the parameter take effect. Restart of this controller required Explanation: A changed parameter requires reinitialization of this controller to take effect. 5-14 Error Analysis and Fault Isolation 5.6.4 Shelf Messages This section lists messages displayed by the controller shelf. Unable to clear SWAP signal on shelf xx - all SWAP interrupts disabled Explanation: The subsystem is unable to clear the swap signal for a swapped device, where xx is the shelf number. This could indicate an unsupported SBB or no power to the device shelf. SWAP signal cleared - all SWAP interrupts re-enabled Explanation: This message indicates that the swap signal is now cleared. Shelf xx has a bad power supply or fan Explanation: Troubleshoot the system to isolate and replace the failed component. Shelf xx fixed Explanation: Shelf number xx has been correctly repaired. 5.6.5 Failover Messages The messages in this section are generated during failover between dual- redundant controllers. Received LAST GASP message from other controller Explanation: One controller in a dual-redundant configuration is attempting an automatic restart after failing or undergoing a bugcheck. See Section 5.7 for more information on this message. Other controller restarted Explanation: The other controller in a dual-redundant pair has successfully restarted after failing or undergoing a bugcheck. See Section 5.7 for more information on this message. Other controller not responding - RESET signal asserted Explanation: One controller in a dual-redundant configuration is locked up, not responding, or the kill line to it is asserted. SCSI Device and HSxxx controller both configured at SCSI address 6 Explanation: This message appears when a device is accidentally configured as SCSI ID 6, and two controllers (SCSI IDs 6 and 7) are in a dual-redundant configuration. Both HSxxx controllers are using SCSI address 6 Explanation: There is a hardware problem with the BA350-MA shelf. This problem probably involves the shelf backplane. Both HSxxx controllers are using SCSI address 7 Explanation: There is a hardware problem with the BA350-MA shelf. This problem probably involves the shelf backplane. Error Analysis and Fault Isolation 5-15 5.6.6 Other CLI Messages The previous sections detailed automatic messages you may encounter. For a list of other messages you may see during interactive use of the CLI, see Appendix B. Consult your firmware release notes for updates to the list of error messages. 5.7 Host Error Logs Events related to controller and device operation are recorded in the host error log. If the OCP, device LEDs, or error messages cannot help you determine the cause of a problem, review the host error logs. They provide the greatest level of detail about the controller and connected devices. 5.7.1 Translation Utilities OpenVMS systems have the Errorlog Report Formatter (ERF) to aid in error log translation. The tool reads the information from the log and provides the operator with more information about what the log means with respect to controller operation and repair. ERF provides bit-to-text translation of the (binary) log, so that the operator can read the information. The OpenVMS DCL command ANALYZE/ERROR_LOG invokes ERF. For a description of the VMS Analyze Error Log Utility, including more information about this command and its qualifiers, refer to the VMS Error Log Utility Reference Manual, or call Digital Multivendor Services. DEC OSF/1 AXP systems use the UNIX Errorlog Report Formatter (uerf) to assist in error log translation. This tool also reads information from the log and provides the operator with indications as to what the log means with respect to controller/host operation. Invoke uerf using the uerf -R -o full command. 5.7.2 Host Error Log Translation The format of transmitted error information varies according to model of HS controller. Consequently, you will find the description of error logs, and how to read the logs, broken into separate appendices for each model. See the following: · For HSJ-series controllers, see Appendix C. · For HSD-series controllers, see Appendix D. · For HSZ-series controllers, see Appendix E. ------------------------------------------------------------ Note ------------------------------------------------------------ Host error log translations are correct as of the date of publication of this manual. However, log information may change with firmware updates. Refer to your StorageWorks Array Controller Operating Firmware Release Notes for error log information updates. ------------------------------------------------------------ 5-16 Error Analysis and Fault Isolation 6 ------------------------------------------------------------ Diagnostics, Exercisers, and Utilities This chapter discusses the automatic and manual programs available to assist operation and diagnosis of the HS controller subsystem, including the following: · Initialization and self-test routines · Disk exerciser (HSJ- and HSD-series controllers) · Tape exerciser (HSJ- and HSD-series controllers) · Disk exerciser (HSZ-series controllers) · VTDPY utility · CONFIG utility · HSZUTIL virtual terminal host-resident application 6.1 Initialization The controller will initialize after any of the following conditions: · Power is turned on. · The firmware resets the controller. · The operator presses the green reset (//) button. · The host clears the controller. Whenever the controller initializes, it steps through a three-phase series of tests designed to detect any hardware or firmware faults. The three test areas are as follow: · Built-in self-test · Core module integrity self-test · Module integrity self-test DAEMON Initialization time will vary depending on your model of controller and what size and type of cache module, if any, you are running. However, initialization will always complete in under 1 minute. Figure 6-1 shows the initialization process. Diagnostics, Exercisers, and Utilities 6-1 Figure 6-1 Controller Initialization 6.1.1 Built-In Self-Test The controller begins initialization by executing its policy processor 's internal built-in self-test (BIST). BIST always executes upon initialization, because it is an integral part of the Intel 80960CA chip (i960) microcode. BIST runs entirely from the i960 chip and a small portion of the firmware program card. Successful completion of BIST means the i960 chip is functioning properly. If BIST fails, the controller will show no activity, and all port indicators on the OCP will be off. (The green reset LED will be solidly lit.) BIST will fail if an incorrect program card is present. 6.1.2 Core Module Integrity Self-Test After BIST completes successfully, initialization routines and diagnostics expand to testing of the controller module itself. The tests are part of the program card firmware and are known as core module integrity self-test (core MIST). Just before beginning core MIST, the controller reads the initial boot record (IBR) to determine the address of hardware setup parameters and process control information. After reading the IBR, the firmware within the program card is initialized to the IBR parameters. Program card firmware then executes core MIST as follows: 1. MIST checks the initial state of the read/write diagnostic register. 2. The test validates program card contents by reading each memory location and computing an error detection code (EDC). The test then compares the computed EDC with a predetermined EDC. The program card contents are valid if both EDCs match. 6-2 Diagnostics, Exercisers, and Utilities 3. Core MIST then tests and/or checks module hardware attached to the buses: · Timer operation · DUART operation · DRAB/DRAM (shared memory) operation - The test writes to and reads all legal addresses. Then, boundaries are checked by attempting to access nonexistent addresses. To pass this test, the first two megabytes of memory must test good. If bad segments are found, the bad segments may divide total memory into no more than 16 good, continuous sections. - The test selects a device, then checks whether or not the bus has selected that device. - The test verifies that each allowable memory transfer size works, and that illegal transfer sizes do not. · Bus parity · Registers (The test checks registers for frozen bits.) · Journal SRAM (The test writes to and reads all journal SRAM addresses.) · I/D cache 4. After core MIST successfully tests the program card and bus hardware, the initialization routine loads the firmware into the first two megabytes of controller shared memory. The initialization routine then uses the EDC method to compare the memory contents with the program card to make sure of a successful download. 5. The policy processor is initialized to the new parameters (the ones read from the IBR). At this time control of initialization passes to the firmware executive (EXEC). EXEC runs from controller shared memory. If, at any time, a fault occurs during core MIST, the OCP will display a code. (Refer to Chapter 5.) 6.1.3 Module Integrity Self-Test DAEMON Once initialization control is passed to EXEC, EXEC calls the diagnostic and execution monitor (DAEMON). DAEMON tests the device port hardware, host port hardware, and cache module. · To test the device ports, DAEMON checks each NCR 53C710 SCSI processor chip. Initialization continues unless all SCSI device ports fail testing. In other words, it is possible for the controller to run with only one functioning device port. · DAEMON tests the host port hardware for the particular controller model. For HSJ-series controllers, this test focuses primarily on the YACI chip. For the HSD- and HSZ-series controllers, the NCR 53C720 host processor chip is tested. Initialization continues even if the host port tests fail. However, DAEMON stops initialization if the DUART test (from core MIST) and the host port tests fail. Diagnostics, Exercisers, and Utilities 6-3 · DAEMON tests the cache module as follows: ------------------------------------------------------------ Note ------------------------------------------------------------ The controller still functions if the cache module fails its testing. In this case, the controller will use its on-board shared memory for caching operations. ------------------------------------------------------------ - DAEMON tests the DRAB (memory controller) on the read cache module. After DAEMON completes, and functional code takes control of the firmware, the cache manager tests the memory on the cache. At least the first megabyte of the memory must test good, or the cache will be declared bad. If cache is locked by the other controller (dual-redundant configurations), then all cache DAEMON diagnostics are postponed. During functional code, when the cache manager determines that the cache is unlocked, the cache manager will test the DRAB followed by the memory. - The tests run by DAEMON and the cache manager are summarized in Table 6-1. Table 6-1 Cache Module Testing ------------------------------------------------------------ Test DAEMON Cache Manager ------------------------------------------------------------ DRAB  All memory is initialized.  Full address test.  No memory is initialized.  Address test on diagnostic pages only. Memory  Never invoked.  Always invokes all memory tests.  Read only, or read/write. ------------------------------------------------------------ After successful test completion, DAEMON releases control. At this time, initialization is finished, and functional controller firmware takes over. DAEMON handles all interrupts and errors received during cache module testing. (If DAEMON receives any interrupt, it stops initialization. DAEMON displays any errors as a code on the OCP.) 6.1.3.1 Self-Test Self-test is a special function of DAEMON, where you set DAEMON to run in a continuous loop. Self-test allows you to diagnose intermittent hardware failures because the loop will continue until an error is detected. In addition, self-test checks the controller hardware without affecting devices on any ports. Digital recommends you run self-test from the maintenance terminal because the host port will disconnect once the controller begins self-test. For self-test to properly execute, you must have a valid configuration and enable the host paths. To run self-test, enter one of the following commands (which command you need will depend on your configuration, which controller the terminal is connected to, and which controller you wish to test.) 6-4 Diagnostics, Exercisers, and Utilities ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Do not use the OVERRIDE_ONLINE qualifier for the SELFTEST command, as customer data may be overwritten. ------------------------------------------------------------ CLI> SELFTEST THIS_CONTROLLER CLI> SELFTEST OTHER_CONTROLLER See Appendix B for more information on the command and its qualifiers. When you run self-test, all outstanding I/O operations complete. The controller will also attempt to flush the cache. However, even if self-test fails to flush the cache, the program will continue to execute. Self-test will halt if it detects a fault. Otherwise, the self-test loop continues until you press the reset (//) button or the cycle controller power off and on, after which the controller reinitializes. 6.2 Disk Inline Exerciser (HSJ- and HSD-Series Controllers) The disk inline exerciser (DILX) is a diagnostic tool used to exercise the data transfer capabilities of selected disks connected to an HSJ- or HSD-series controller. DILX exercises disks in a way that simulates a high level of user activity. Using DILX, you can read and write to all customer-available data areas. DILX can also be run on CDROMs, but must be run in read-only mode only. Thus, DILX can be used to determine the health of a controller and the disks connected to it and to acquire performance statistics. You can run DILX from a maintenance terminal, virtual terminal, or VCS. DILX now allows for auto-configuring of drives. This allows for quick configuring and testing of all units at once. Please be aware that customer data will be lost by running this test. Digital recommends only using auto-configure during initial installations. DILX tests logical units that may consist of storage sets of multiple physical devices. Error reports identify the logical units, not the physical devices. Therefore, if errors occur while running against a unit, its storage set should be reconfigured as individual devices, and then DILX run again, against the individual devices. There are no limitations on the number of units DILX may test at one time. However, Digital recommends only using DILX when no host activity is present. If you must run DILX during a live host connection, you should limit your testing to no more than half of any controller 's units at one time. This conserves controller resources and minimizes performance degradation on the live units you are not testing. DILX and the tape inline exerciser (TILX) may run concurrently with one initiated from a maintenance terminal and the other from a virtual terminal connection. Digital recommends, however, that the exercisers not be run while normal I/O operations are in progress, as system performance will degrade due to the heavy load the exercisers impose on the controller. Diagnostics, Exercisers, and Utilities 6-5 6.2.1 Invoking DILX ------------------------------------------------------------ Note ------------------------------------------------------------ Before running DILX, be sure that all units that you wish to test have been dismounted from the host. ------------------------------------------------------------ The following describes how to invoke DILX from a maintenance terminal at the CLI> prompt or from a VCS, or from a virtual terminal through a DUP connection: · To invoke DILX from a maintenance terminal, enter the following command at the CLI> prompt: CLI> RUN DILX · To invoke DILX from a maintenance terminal using a VCS, enter the following command at the CLI> prompt: CLI> VCS CONNECT node-name where node name is the controller 's SCS node name. Consult the VAXcluster Console System User 's Guide for complete details on using a VCS. ------------------------------------------------------------ Note ------------------------------------------------------------ The node name must be specified for a VCS. ------------------------------------------------------------ · To invoke DILX from a virtual terminal using a DUP connection, enter the command (for the OpenVMS operating system): $ SET HOST/DUP/SERVER=MSCP$DUP/TASK=DILX SCS_nodename Specify the controller 's SCS node-name to indicate where DILX will execute. 6.2.2 Interrupting DILX Execution Use the following guidelines to interrupt DILX execution: ------------------------------------------------------------ Note ------------------------------------------------------------ The symbol ``^'' is equivalent to the Ctrl key. You must press and hold the Ctrl key and type the character key given. ------------------------------------------------------------ ------------------------------------------------------------ Note ------------------------------------------------------------ Do not use Ctrl/G from a VCS because it will cause VCS to terminate. VCS acts on the sequence and the sequence is never sent to DILX. Use Ctrl/T when invoking DILX from a VCS. ------------------------------------------------------------ · Ctrl/G causes DILX to produce a performance summary. DILX continues normal execution without affecting the runtime parameters. 6-6 Diagnostics, Exercisers, and Utilities · Ctrl/C causes DILX to produce a performance summary, stop testing, and asks the ``reuse parameters'' question. · Ctrl/Y causes DILX to abort. The ``reuse parameters'' question is not asked. · Ctrl/T causes DILX to produce a performance summary. DILX then continues executing normally without affecting any of the runtime parameters. 6.2.3 DILX Tests There are two DILX tests, as follow: · The Basic Function test · The User-Defined test 6.2.3.1 Basic Function Test--DILX The Basic Function test for DILX executes in three or four phases. The four phases are as follow: · Initial Write Pass--Is the only optional phase and is always executed first (if selected). The initial write pass writes the selected data patterns to the entire specified data space or until the DILX execution time limit has been reached. Once the initial write pass has completed, it is not re-executed no matter how long the DILX execution time is set. The other phases are re-executed on a 10-minute cycle. · Random I/O--Simulates typical I/O activity with random transfers from one byte to the maximum size I/O possible with the memory constraints DILX runs under. Note that the length of all I/Os is in bytes and is evenly divisible by the sector size (512 bytes). Read, write, access and erase commands are issued using random logical block numbers (LBNs). In the read/write mode, DILX issues the reads and writes in the ratio specified previously under read/write ratio, and issues access and erase commands in the ratio specified previously under access/erase ratio. When read-only mode is chosen, only read and access commands are issued. If compares are enabled, compares are performed on write and read commands using the data compare modifier and DILX internal checks. The percentage of compares to perform can be specified. This phase is executed 60 percent of the time. It is the first phase executed after the initial write pass has completed. It is re-executed at 10-minute intervals with each cycle lasting approximately 6 minutes. Intervals are broken down into different cycles. The interval is repeated until the user-selected time interval expires. <------------------------10 min------------------------------------------> <------6 min Random I/O-----><--2 min Data Inten--><--2 min Seek Inten---> · Data Intensive--Designed to test disk throughput by selecting a starting LBN and repeating transfers to the next sequential LBN that has not been written to by the previous I/O. The transfer size of each I/O equals the maximum sized I/O that is possible with the memory constraints DILX must run under. This phase continues performing spiraling I/O to sequential tracks. Read and write commands are issued in read/write mode. This phase is executed 20 percent of the time after the initial write pass has completed. This phase always executes after the random I/O phase. It is re-executed at 10-minute intervals with each cycle approximately 2 minutes. Diagnostics, Exercisers, and Utilities 6-7 · Seek Intensive--Is designed to stimulate head motion on the selected disk units. Single sector erase and access commands are issued if the test is write enabled. Each I/O uses a different track on each subsequent transfer. The access and erase commands are issued in the ratio that you selected using the access/erase ratio parameter. This phase is executed 20 percent of the time after the initial write pass has completed. This phase always executes after the data intensive I/O phase. It is re-executed at 10-minute intervals with each cycle approximately 2 minutes. 6.2.3.2 User-Defined Test--DILX ------------------------------------------------------------ CAUTION ------------------------------------------------------------ The User-Defined test should be run only by very knowledgeable personnel. Otherwise, customer data can be destroyed. ------------------------------------------------------------ When this test is selected, DILX prompts you for input to define a specific test. In the DILX User-Defined test, a total of 20 or fewer I/O commands can be defined. Once all of the commands are issued, DILX issues the commands again in the same sequence. This is repeated until the selected time limit is reached. As you build the test, DILX collects the following information from you for each command: · The I/O command name (write, read, access or erase, or quit). Note that quit is not a command; instead it indicates to DILX that you have finished defining the test. · The starting Logical Block Number (LBN). · The size of the I/O in 512 byte blocks. · The MSCP command modifiers. 6.2.4 DILX Test Definition Questions The following text is displayed when running DILX. The text includes questions that are listed in the approximate order that they are displayed on your terminal. These questions prompt you to define the runtime parameters for DILX. ------------------------------------------------------------ Note ------------------------------------------------------------ Defaults for each question are given inside [ ]. If you press the ------------------------------------------------------------ ------------------------------------------------------------ Return ------------------------------------------------------------ ------------------------------------------------------------ key as a response to a question, the default is used as the response. ------------------------------------------------------------ After DILX has been started, the following message describing the Auto-Configure option is displayed: The Auto-Configure option will automatically select, for testing, half or all of the disk units configured. It will perform a very thorough test with *WRITES* enabled. The user will only be able to select the run time and performance summary options and whether to test a half or full configuration. The user will not be able to specify specific units to test. The Auto-Configure option is only recommended for initial installations. It is the first question asked. 6-8 Diagnostics, Exercisers, and Utilities Do you wish to perform an Auto-Configure (y/n) [n] ? Explanation: Enter ``Y'' if you wish to invoke the Auto-Configure option. DILX next diplays the following information: If you want to test a dual redundant subsystem, it is recommended that you pick option 2 on the first controller and then option 2 on the other controller. Auto-Configure options are: 1. Configure all disk units for testing. This is recommended for a single controller subsystem. 2. Configure half of all disk units for testing. This is recommended for a dual controller subsystem. 3. Exit Auto-Configure and DILX. Enter Auto-Configure option (1:3) [3] ? Explanation: This is self explanatory. After you enter the desired Auto-Configure option, DILX will display the following caution statement: **CAUTION** All data on the Auto-Configured disks will be destroyed. You *MUST* be sure of yourself. Are you sure you want to continue (y/n) [n] ? Explanation: This question is only asked if the Auto-Configure option was selected and if the user selected Auto-Configure option 1 or 2 as described in the last question above. Use All Defaults and Run in Read Only Mode (y/n)[y]? Explanation: Enter ``Y'' to use the defaults for DILX, run in read-only mode, and most of the other DILX questions are not asked. Enter ``N'' and the defaults are not used. You must then answer each question as it is displayed. The following defaults are assumed for all units selected for testing: · Execution time limit = 10 minutes. · Performance summary interval = 10 minutes. · Displaying hard or soft error Event Information Packets (EIPs) and end messages is disabled. · The hard error limit = 65535. Testing will stop if the limit is reached. · A hex dump of the extended error log information is disabled. · The I/O queue depth = 4. A maximum of 4 I/Os will be outstanding at any time. · The Selected Test = the Basic Function test. · Read-only mode. · All user available LBNs are available for testing. · Data compares are disabled. Enter the execution time limit in minutes (1:65535)[10]? Explanation: Enter the desired time you want DILX to run. The default run time is 10 minutes. Diagnostics, Exercisers, and Utilities 6-9 Enter performance summary interval in minutes (1:65535)[10]? Explanation: Enter a value to set the interval for which a performance summary is displayed. The default is 10 minutes. Include performance statistics in performance summary (y/n)[n]? Explanation: Enter ``Y'' to see a performance summary that includes the performance statistics that include the total count of read, write, access, and erase I/O requests and the kilobytes transferred for each command. Enter ``N'' and no performance statistics are displayed. Display hard/soft errors (y/n)[n]? Explanation: Enter ``Y'' to enable error reporting, including end messages and EIPs. Enter ``N'' to disable error reporting, including end messages and EIPs. The default is disabled error reporting. Display hex dump of Event Information Packet requester specific Information (y/n)[n]? Explanation: Enter ``Y'' to enable the hex dump display of the requester specific information contained in the EIP. Enter ``N'' to disable the hex dump. When the hard error limit is reached, the unit will be dropped from testing. Enter hard error limit (1:65535) [65535] ? Explanation: Enter a value to specify the hard error limit for all units to test. This question is used to obtain the hard error limit for all units under test. If the hard error limit is reached, DILX discontinues testing the unit that reaches the hard error limit. If other units are currently being tested by DILX, testing continues for those units. When the soft error limit is reached, soft errors will no longer be displayed but testing will continue for the unit. Enter soft error limit (1:65535) [32] ? Explanation: Enter a value to specify the soft error limit for all units under test. When the soft error limit is reached, soft errors are no longer displayed, but testing continues for the unit. Enter IO queue depth (1:12) [4]? Explanation: Enter the maximum number of outstanding I/Os for each unit selected for testing. The default is 4. Enter unit number to be tested? Explanation: Enter the unit number for the unit to be tested. ------------------------------------------------------------ Note ------------------------------------------------------------ When DILX asks for the unit number, it requires the number designator for the disk, where D117 would be specified as unit number 117. ------------------------------------------------------------ Unit x will be write enabled. Do you still wish to add this unit (y/n) [n]? Explanation: This is a reminder of the consequences of testing a unit while it is write enabled. This is the last chance to back out of testing the displayed unit. Enter ``Y'' to write enable the unit. Enter ``N'' to back out of testing that unit. 6-10 Diagnostics, Exercisers, and Utilities Select another unit (y/n) [n]? Explanation: Enter ``Y'' to select another unit for testing. Enter ``N'' to begin testing the units already selected. The system will display the following test selections: ***Available tests are: 1. Basic Function 2. User Defined Test Use the Basic Function 99.9% of the time. The User Defined test is for special problems only. Enter test number (1:2) [1]? Explanation: Enter ``1'' for the Basic Function test or ``2'' for the User- Defined test. After selecting a test, the system will then display the following message: * IMPORTANT * If you answer yes to the next question, user data WILL BE destroyed. Write enable disk unit (y/n) [n] ? Explanation: Enter ``Y'' to write enable the unit. Write commands are enabled for the currently selected test. Data within your selected LBN range will be destroyed. Be sure of your actions before answering this question. This question applies to all DILX tests. Enter ``N'' to enable read only mode, where read and access commands are the only commands enabled. Perform initial write (y/n) [n] ? Explanation: Enter ``Y'' to write to the entire user-selected LBN range with the user-selected data patterns. Enter ``N'' for no initial write pass. If you respond with ``Y'', the system performs writes starting at the lowest user-selected LBN and issues spiral I/Os with the largest byte count possible. This continues until the specified LBN range has been completely written. Upon completion of the initial write pass, normal functions of the Random I/O phase start. The advantage of selecting the initial write pass is that compare host data commands can then be issued and the data previously written to the media can be verified for accuracy. It makes sure that all LBNs within the selected range are accessed by DILX. The disadvantage of using the initial write pass is that it may take a long time to complete because a large LBN range was specified. You can bypass this by selecting a smaller LBN range, but this creates another disadvantage in that the entire disk space is not tested. The initial write pass only applies to the Basic Function test. The write percentage will be set automatically. Enter read percentage for random IO and data intensive phase (0:100) [67] ? Explanation: This question is displayed if read/write mode is selected. It allows you to select the read/write ratio to use in the Random I/O and Data Intensive phases. The default read/write ratio is similar to the I/O ratio generated by a typical OpenVMS system. Diagnostics, Exercisers, and Utilities 6-11 Enter data pattern number 0=all, 19=user_defined, (0:19) [0] ? Explanation: The DILX data patterns are used in write commands. This question is displayed when writes are enabled for the Basic Function or User-Defined tests. There are 18 unique data patterns to select from. These patterns were carefully selected as worst case or most likely to produce errors for disks connected to the controller. (See Table 6-2 for a list of data patterns.) The default uses all 18 patterns in a random method. This question also allows you to create a unique data pattern of your own choice. Enter the 8-digit hexadecimal user defined data pattern [ ] ? Explanation: This question is only displayed if you choose to use a user- defined data pattern for write commands. The data pattern is represented in a longword and can be specified with eight hexadecimal digits. Enter start block number (0:highest_lbn_on_the_disk) [0] ? Explanation: Enter the starting block number of the area on the disk you wish DILX to test. Zero is the default. Enter end block number (starting_lbn:highest_lbn_on_the_disk) [highest_lbn_on_the_disk] ? Explanation: Enter the highest block number of the area on the disk you wish DILX to test. The highest block number (of that type of disk) is the default. Perform data compare (y/n) [n] ? Explanation: Enter ``Y'' to enable the use of the compare modifier bit with read and write commands. Enter ``N'' and no data compare operations are done. This question only applies to the Basic Function test. If the compare modifier is set on write commands, the data are written to the disk. The data are then read from the disk and compared against the corresponding DILX buffers. On read commands, the data are read from the disk into the DILX buffers, read again, then compared against the corresponding DILX buffers. If a discrepancy is found, an error is reported. If the initial write was chosen for the Basic Function test and you enter ``Y'' to this question, compare host data commands are then enabled and data previously written to the media are verified for accuracy. Enter compare percentage (1:100) [5] ? Explanation: This question is displayed only if you choose to perform data compares. This question allows you to change the percentage of read and write commands that will have a data compare operation performed. Enter a value indicating the compare percentage. The default is 5. The erase percentage will be set automatically. Enter access percentage for Seek Intensive Phase (0:100) [90] ? Explanation: This question only applies to the Seek Intensive phase if writes are enabled. It allows you to select the percentage of access and erase commands to be issued. Enter a value indicating the access percentage. 6-12 Diagnostics, Exercisers, and Utilities Enter command number x (read, write, access, erase, quit) [ ] ? Explanation: This question only applies to the User-Defined test. It allows you to define command x as a read, write, access, or erase command. Enter quit to finish defining the test. Enter starting LBN for this command (0:highest_lbn_on_the_disk) [ ] ? Explanation: This question only applies to the User-Defined test. It allows you to set the starting LBN for the command currently being defined. Enter the starting LBN for this command. Enter the IO size in 512 byte blocks for this command (1:size_in_blocks) [ ] ? Explanation: This question only applies to the User-Defined test. It allows you to set the I/O size in 512-byte blocks for the command currently being defined. Enter values indicating the I/O size for this command. Enter in HEX, the MSCP Command Modifiers[0] ? Explanation: This question only applies to the User-Defined test. It allows you to specify the MSCP command modifiers. You must understand the meaning of the MSCP command modifiers before you enter any value other than the default. Reuse parameters (stop, continue, restart, change_unit) [stop] ? Explanation: This question is displayed after the DILX execution time limit expires, after the hard error limit is reached for every unit under test, or after you enter Ctrl/C. These options are as follow: · Stop--DILX terminates normally. · Continue--DILX resumes execution without resetting the remaining DILX execution time or any performance statistics. If the DILX execution time limit has expired, or all units have reached their hard error limit, DILX terminates. · Restart--DILX resets all performance statistics and restarts execution so that the test will perform exactly as the one that just completed. However, there is one exception. If the previous test was the Basic Function test with the initial write pass and the initial write pass completed, the initial write pass is not performed when the test is restarted. · Change_unit--DILX allows you to drop or add units to testing. For each unit dropped, another unit must be added, until all units in the configuration have been tested. The unit chosen will be tested with the same parameters that were used for the unit that was dropped from testing. When you have completed dropping and adding units, all performance statistics are initialized and DILX execution resumes with the same parameters as the last run. Drop unit #x (y/n) [n] ? Explanation: This question is displayed if you choose to change a unit as an answer to the reuse parameters (previous) question. Enter the unit number that you wish to drop from testing. Diagnostics, Exercisers, and Utilities 6-13 The new unit will be write enabled. Do you wish to continue (y/n) [n] ? Explanation: This question is displayed if you choose to change a unit as an answer to the reuse parameters question. It is only asked if the unit being dropped was write enabled. This question gives you the chance to terminate DILX testing if you do not want data destroyed on the new unit. Enter ``N'' to terminate DILX. 6.2.5 DILX Output Messages The following message is displayed when DILX is started: Copyright © Digital Equipment Corporation 1993 Disk Inline Exerciser - version 1.4 This message identifies the internal program as DILX and gives the DILX software version number. Change Unit is not a legal option if Auto-Configure was chosen. Explanation: This message will be displayed if the user selected the Auto- Configure option and selected the ``change unit response'' to the ``reuse parameters'' question. You cannot drop a unit and add a unit if all units were selected for testing. DILX - Normal Termination. Explanation: This message is displayed when DILX terminates under normal conditions. Insufficient resources. Explanation: Following this line is a second line that gives more information about the problem, which could be one of the following messages: · Unable to allocate memory. DILX was unable to allocate the memory it needed to perform DILX tests. You should run DILX again but choose a lower queue depth and/or choose fewer units to test. · Cannot perform tests. DILX was unable to allocate all of the resources needed to perform DILX tests. You should run DILX again but choose a lower queue depth and/or choose fewer units to test. · Unable to change operation mode to maintenance. DILX tried to change the operation mode from normal to maintenance using the SYSAP$CHANGE_STATE( ) routine but was not successful due to insufficient resources. This problem should not occur. If it does occur, submit a CLD (error report), then reset the controller. Disk unit x does not exist. Explanation: An attempt was made to allocate a unit for testing that does not exist on the controller. Unit x successfully allocated for testing. Explanation: All processes that DILX performs to allocate a unit for testing, have been completed. The unit is ready for DILX testing. 6-14 Diagnostics, Exercisers, and Utilities Unable to allocate unit. Explanation: This message should be preceded by a reason why the unit could not be allocated for DILX testing. DILX detected error, code x. Explanation: The ``normal'' way DILX recognizes an error on a unit is through the reception of an EIP. This loosely corresponds to an MSCP error log. However, the following are some errors that DILX will detect without the reception of an EIP: · Illegal Data Pattern Number found in data pattern header. Unit x This is code 1. DILX read data from the disk and found that the data were not in a pattern that DILX previously wrote to the disk. · No write buffers correspond to data pattern Unit x. This is code 2. DILX read a legal data pattern from the disk at a place where DILX wrote to the disk, but DILX does not have any write buffers that correspond to the data pattern. Thus, the data have been corrupted. · Read data do not match what DILX thought was written to the media. Unit x. This is code 3. DILX writes data to the disk and then reads it and compares it against what was written to the disk. This indicates a compare failure. More information is displayed to indicate where in the data buffer the compare failed and what the data were and should have been. · Compare Host Data should have reported a compare error but did not. Unit x This is code 4. A compare host data compare was issued in a way that DILX expected to receive a compare error but no error was received. DILX terminated. A termination, a print summary or a reuse parameters request was received but DILX is currently not testing any units. Explanation: The user entered a Ctrl/Y (termination request), a Ctrl/G (print summary request), or a Ctrl/C (reuse parameters request) before DILX had started to test units. DILX cannot satisfy the second two requests so DILX treats all of these requests as a termination request. DILX will not change the state of a unit if it is not NORMAL. Explanation: DILX cannot allocate the unit for testing because it is already in Maintenance mode. (Maintenance mode can only be invoked by the firmware. If another DILX session is in use, the unit is considered in Maintenance mode.) Unit is not available - if you dismount the unit from the host, it may correct this problem. Explanation: The unit has been placed on line by another user (or host) or the media is not present. The most common reason for this message is that the unit is mounted on the host. Diagnostics, Exercisers, and Utilities 6-15 Soft error reporting disabled. Unit x. Explanation: This message indicates that the soft error limit has been reached and therefore no more soft errors will be displayed for this unit. Hard error limit reached, unit x dropped from testing. Explanation: This message indicates that the hard error limit has been reached and the unit must be dropped from testing. Soft error reporting disabled for controller errors. Explanation: This indicates that the soft error limit has been reached for controller errors. Thus, controller soft error reporting is disabled. Hard error limit reached for controller errors. All units dropped from testing. Explanation: This message is self explanatory. Unit is already allocated for testing. Explanation: This message is self explanatory. No drives selected. Explanation: DILX parameter collection was exited without choosing any units to test. Maximum number of units are now configured. Explanation: This message is self explanatory. (Testing will start after this message is displayed.) Unit is write protected. Explanation: The user wants to test a unit with a write commands, or erase commands, or both enabled but the unit is write protected. The unit status and/or the unit device type has changed unexpectedly. Unit x dropped from testing. Explanation: The unit status may change if the unit experienced hard errors or if the unit is disconnected. Either way, DILX cannot continue testing the unit. Last Failure Information follows. This error was NOT produced by running DILX. It represents the reason why the controller crashed on the previous controller run. Explanation: This message may be displayed while allocating a unit for testing. It does not indicate any reason why the unit is or is not successfully allocated, but rather represents the reason why the controller went down in the previous run. The information that follows this message is the contents of an EIP. Disk unit numbers on this controller include: Explanation: After this message is displayed, a list of disk unit numbers on the controller is displayed. 6-16 Diagnostics, Exercisers, and Utilities IO to unit x has timed out. DILX aborting. Explanation: One of the DILX I/Os to this unit did not complete within the command timeout interval and when examined, was found not progressing. This indicates a failing controller. DILX terminated prematurely by user request. Explanation: A Ctrl/Y was entered. DILX interprets this as a request to terminate. This message is displayed and DILX terminates. Unit is owned by another sysap. Explanation: DILX could not allocate the unit specified because the unit is currently allocated by another system application. Terminate the other system application or reset the controller. Exclusive access is declared for this unit. Explanation: The unit could not be allocated for testing because exclusive access has been declared for the unit. The other controller has exclusive access declared for this unit. Explanation: This message is self explanatory. This unit is marked inoperative. Explanation: The unit could not be allocated for testing because the controller internal tables have the unit marked as inoperative. The unit does not have any media present. Explanation: The unit could not be allocated for testing because no media is present. The RUNSTOP_SWITCH is set to RUN_DISABLED. Explanation: The unit could not be allocated for testing because the RUNSTOP_SWITCH is set to RUN_DISABLED. This is enabled and disabled through the Command Line Interpreter (CLI). Unable to continue, run time expired. Explanation: A continue response was given to the ``reuse parameters'' question. This is not a valid response if the run time has expired. Reinvoke DILX. When DILX starts to exercise the disk units, the following message is displayed with the current time of day: DILX testing started at: xx:xx:xx Test will run for x minutes Type ^T(if running DILX through a VCS) or ^G(in all other cases) to get a current performance summary Type ^C to terminate the DILX test prematurely Type ^Y to terminate DILX prematurely Diagnostics, Exercisers, and Utilities 6-17 6.2.6 DILX End Message Display To interpret the end message fields correctly, you must contact Digital Multivendor Services. Example 6-1 is an example of a DILX end message display. Example 6-1 DILX End Message Display Bad Value Added Completion Status for unit x, End message in hex Event Code x Op Code x Cmd Ref Number x Byte Count x Error Byte Count x Sequence Number x Flags x 6.2.7 DILX Event Information Packet Displays A DILX EIP display may or may not include a hex dump of the Requestor Specific Data. This is an option you can select as a DILX parameter. The EIP will be in one of the following formats that corresponds to MSCP error log formats: · Controller Error · Memory Error · Disk Transfer Error · Bad Block Replacement Attempt Error Examples 6-2 through 6-5 are examples of each display. Each display includes the optional requestor specific information. In all cases, the Instance code, template type, and all requestor specific information correspond to event (error) log device dependent parameters, while everything else has a one-to- one correspondence to error log fields. See Appendices C and D for a translation of these codes. 6-18 Diagnostics, Exercisers, and Utilities Example 6-2 Controller Error Error Information Packet in hex Cmd Ref Number x Unit Number x Log Sequence x Format x Flags x Event Code x Controller ID x Controller SW ver x Controller HW ver x Multi Unit Code x Instance x Template Type x Requestor Information Size x Requestor Specific Data bytes 0 7 xx xx xx xx xx xx xx xx Requestor Specific Data bytes 8 15 xx xx xx xx xx xx xx xx : : Requestor Specific Data bytes xx xx xx xx xx xx xx xx xx xx Example 6-3 Memory Error Error Information Packet in hex Cmd Ref Number x Unit Number x Log Sequence x Format x Flags x Event Code x Controller ID x Controller SW ver x Controller HW ver x Multi Unit Code x Memory Address x Instance x Template Type x Requestor Information Size x Requestor Specific Data bytes 0 7 xx xx xx xx xx xx xx xx Requestor Specific Data bytes 8 15 xx xx xx xx xx xx xx xx : : Requestor Specific Data bytes xx xx xx xx xx xx xx xx xx xx Diagnostics, Exercisers, and Utilities 6-19 Example 6-4 Disk Transfer Error Error Information Packet in hex Cmd Ref Number x Unit Number x Log Sequence x Format x Flags x Event Code x Controller ID x Controller SW ver x Controller HW ver x Multi Unit Code x Unit ID[0] x Unit ID[1] x Unit Software Rev x Unit Hardware Rev x Recovery Level x Retry Count x Serial Number x Header Code x Instance x Template Type x Requestor Information Size x Requestor Specific Data bytes 0 7 xx xx xx xx xx xx xx xx Requestor Specific Data bytes 8 15 xx xx xx xx xx xx xx xx : : Requestor Specific Data bytes xx xx xx xx xx xx xx xx xx xx Example 6-5 Bad Block Replacement Attempt Error Error Information Packet in hex Cmd Ref Number x Unit Number x Log Sequence x Format x Flags x Event Code x Controller ID x Controller SW ver x Controller HW ver x Multi Unit Code x Unit ID[0] x Unit ID[1] x Unit Software Rev x Unit Hardware Rev x Replace Flags x Serial Number x Bad LBN x Old RBN x New RBN x Cause x Instance x Template Type x Requestor Information Size x Requestor Specific Data bytes 0 7 xx xx xx xx xx xx xx xx Requestor Specific Data bytes 8 15 xx xx xx xx xx xx xx xx : (continued on next page) 6-20 Diagnostics, Exercisers, and Utilities Example 6-5 (Cont.) Bad Block Replacement Attempt Error : Requestor Specific Data bytes xx xx xx xx xx xx xx xx xx xx 6.2.8 DILX Data Patterns Table 6-2 defines the data patterns used with the DILX Basic Function or User-Defined tests. There are 18 unique data patterns. These data patterns were selected as worst case, or the ones most likely to produce errors on disks connected to the controller. Table 6-2 DILX Data Patterns ------------------------------------------------------------ Pattern Number Pattern in hex ------------------------------------------------------------ 1 0000 2 8B8B 3 3333 4 3091 5, shifting 1s 0001, 0003, 0007, 000F, 001F, 003F, 007F, 00FF, 01FF, 03FF, 07FF, 0FFF, 1FFF, 3FFF, 7FFF 6, shifting 0s FIE, FFFC, FFFC, FFFC, FFE0, FFE0, FFE0, FFE0, FE00, FC00, F800, F000, F000, C000, 8000, 0000 7, alternating 1s, 0s 0000, 0000, 0000, FFFF, FFFF, FFFF, 0000, 0000, FFFF, FFFF, 0000, FFFF, 0000, FFFF, 0000, FFFF 8 B6D9 9 5555, 5555, 5555, AAAA, AAAA, AAAA, 5555, 5555, AAAA, AAAA, 5555, AAAA, 5555, AAAA, 5555, AAAA, 5555 10 DB6C 11 2D2D, 2D2D, 2D2D, D2D2, D2D2, D2D2, 2D2D, 2D2D, D2D2, D2D2, 2D2D, D2D2, 2D2D, D2D2, 2D2D, D2D2 12 6DB6 13, ripple 1 0001, 0002, 0004, 0008, 0010, 0020, 0040, 0080, 0100, 0200, 0400, 0800, 1000, 2000, 4000, 8000 14, ripple 0 FIE, FFFD, FFFB, FFF7, FFEF, FFDF, FFBF, FF7F, FEFF, FDFF, FBFF, F7FF, EFFF, BFFF, DFFF, 7FFF 15 DB6D, B6DB, 6DB6, DB6D, B6DB, 6DB6, DB6D, B6DB, 6DB6, DB6D, B6DB, 6DB6, DB6D 16 3333, 3333, 3333, 1999, 9999, 9999, B6D9, B6D9, B6D9, B6D9, FFFF, FFFF, 0000, 0000, DB6C, DB6C 17 9999, 1999, 699C, E99C, 9921, 9921, 1921, 699C, 699C, 0747, 0747, 0747, 699C, E99C, 9999, 9999 18 FFFF Default--Use all of the above patterns in a random method ------------------------------------------------------------ Diagnostics, Exercisers, and Utilities 6-21 6.2.9 DILX Examples This section provides DILX examples using different options. 6.2.9.1 DILX Example--Using All Defaults In Example 6-6, DILX is run using all defaults. DILX is executed in read-only mode. No data on the units under test are destroyed. The entire user-available LBN range on each disk is accessible for DILX testing. DILX was invoked from a maintenance terminal. Example 6-6 Using All Defaults--DILX HSJ> show disk Name Type Port Targ LUN Used by ------------------------------------------------------------------------------ DISK100 disk 1 0 0 D10 DISK120 disk 1 2 0 D12 DISK140 disk 1 4 0 D14 DISK210 disk 2 1 0 D21 DISK230 disk 2 3 0 D23 DISK610 disk 6 1 0 D61 DISK630 disk 6 3 0 D63 HSJ> run dilx Copyright © Digital Equipment Corporation 1993 Disk Inline Exerciser - version 1.4 The Auto-Configure option will automatically select, for testing, half or all of the disk units configured. It will perform a very thorough test with *WRITES* enabled. The user will only be able to select the run time and performance summary options and whether or not to test a half or full configuration. The user will not be able to specify specific units to test. The Auto-Configure option is only recommended for initial installations. Do you wish to perform an Auto-Configure (y/n) [n] ?n Use all defaults and run in read only mode (y/n) [y] ?y Disk unit numbers on this controller include: 10 12 14 21 23 61 63 Enter unit number to be tested ?10 Unit 10 successfully allocated for testing Select another unit (y/n) [n] ?y Enter unit number to be tested ?12 Unit 12 successfully allocated for testing Select another unit (y/n) [n] ?n DILX testing started at: 13-JAN-1993 04:47:57 Test will run for 10 minutes Type ^T(if running DILX through VCS) or ^G(in all other cases) to get a current performance summary Type ^C to terminate the DILX test prematurely Type ^Y to terminate DILX prematurely DILX Summary at 13-JAN-1993 04:49:14 Test minutes remaining: 9, expired: 1 (continued on next page) 6-22 Diagnostics, Exercisers, and Utilities Example 6-6 (Cont.) Using All Defaults--DILX Unit 10 Total IO Requests 4530 No errors detected Unit 12 Total IO Requests 2930 No errors detected Reuse Parameters (stop, continue, restart, change_unit) [stop] ? DILX - Normal Termination HSJ> 6.2.9.2 DILX Example--Using All Functions In Example 6-7, all functions are chosen for DILX. DILX was invoked from the virtual terminal using the DUP connection from an OpenVMS system. This is an extensive (long) run because the initial write pass was chosen, and because there was enough time for the initial write pass to complete and for normal testing to continue for a reasonable length of time after the initial write pass. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ This test writes to disks. All user data will be destroyed. ------------------------------------------------------------ Example 6-7 All Functions--DILX $ SHOW CLUSTER/CONTINUOUS View of Cluster from system ID 9038 node: ENGHRN 7-APR-1993 14:54:01 SYSTEMS MEMBERS NODE SOFTWARE STATUS ENGHRN VMS V5.5 MEMBER FORCE HSC V700 WODWND VMS V5.5 MEMBER CYMBAL VMS V5.5 MEMBER LUTE VMS V5.5 MEMBER MASS2 HSJ TM4I MASS1 HSJ XM4I (Entered a Ctrl/C here.) DUP> set host/dup/server=mscp$dup MASS1/task=DILX %HSCPAD-I-LOCPROGEXE, Local program executing - type ^\ to exit Copyright © Digital Equipment Corporation 1993 Disk Inline Exerciser - version 1.4 The Auto-Configure option will automatically select, for testing, half or all of the disk units configured. It will perform a very thorough test with *WRITES* enabled. The user will only be able to select the run time and performance summary options and whether or not to test a half or full configuration. The user will not be able to specify specific units to test. The Auto-Configure option is only recommended for initial installations. Do you wish to perform an Auto-Configure (y/n) [n] ? (continued on next page) Diagnostics, Exercisers, and Utilities 6-23 Example 6-7 (Cont.) All Functions--DILX Use all defaults and run in read only mode (y/n) [y] ?n Enter execution time limit in minutes (1:65535) [10] ?45 Enter performance summary interval in minutes (1:65535) [10] ?45 Include performance statistics in performance summary (y/n) [n] ?y Display hard/soft errors (y/n) [n] ?y Display hex dump of Error Information Packet Requester Specific information (y/n) [n] ?y When the hard error limit is reached, the unit will be dropped from testing. Enter hard error limit (1:65535) [65535] ? When the soft error limit is reached, soft errors will no longer be displayed but testing will continue for the unit. Enter soft error limit (1:65535) [32] ? Enter IO queue depth (1:20) [4] ?10 *** Available tests are: 1. Basic Function 2. User Defined Use the Basic Function test 99.9% of the time. The User Defined test is for special problems only. Enter test number (1:2) [1] ?1 **CAUTION** If you answer yes to the next question, user data WILL BE destroyed. Write enable disk unit(s) to be tested (y/n) [n] ?y The write percentage will be set automatically. Enter read percentage for Random IO and Data Intensive phase (0:100) [67] ? Enter data pattern number 0=ALL, 19=USER_DEFINED, (0:19) [0] ? Perform initial write (y/n) [n] ?y The erase percentage will be set automatically. Enter access percentage for Seek Intensive phase (0:100) [90] ? Perform data compare (y/n) [n] ?y Enter compare percentage (1:100) [5] ? Disk unit numbers on this controller include: 10 12 14 21 23 61 63 Enter unit number to be tested ?10 Unit 10 will be write enabled. Do you still wish to add this unit (y/n) [n] ?y Enter start block number (0:1664214) [0] ? Enter end block number (0:1664214) [1664214] ? Unit 10 successfully allocated for testing Select another unit (y/n) [n] ?y Enter unit number to be tested ?12 Unit 12 will be write enabled. Do you still wish to add this unit (y/n) [n] ?y Enter start block number (0:832316) [0] ? Enter end block number (0:832316) [832316] ? Unit 12 successfully allocated for testing Select another unit (y/n) [n] ?n DILX testing started at: 13-JAN-1993 04:52:26 Test will run for 45 minutes Type ^T(if running DILX through VCS) or ^G(in all other cases) to get a current performance summary Type ^C to terminate the DILX test prematurely Type ^Y to terminate DILX prematurely (continued on next page) 6-24 Diagnostics, Exercisers, and Utilities Example 6-7 (Cont.) All Functions--DILX DILX Summary at 13-JAN-1993 04:56:20 Test minutes remaining: 42, expired: 3 Unit 10 Total IO Requests 40794 Read Count 0 Write Count 40793 Access Count 0 Erase Count 0 KB xfer Read 0 Write 326344 Total 326344 No errors detected Unit 12 Total IO Requests 13282 Read Count 0 Write Count 13281 Access Count 0 Erase Count 0 KB xfer Read 0 Write 106248 Total 106248 No errors detected Reuse Parameters (stop, continue, restart, change_unit) [stop] ? DILX - Normal Termination HSJ> 6.2.9.3 DILX Examples--Auto-Configure with All Units In Example 6-8, DILX is run using the Auto-Configure option with the all units option. Example 6-8 Auto-Configuration with All Units HSJ> run dilx Copyright © Digital Equipment Corporation 1993 Disk Inline Exerciser - version 1.4 The Auto-Configure option will automatically select, for testing, half or all of the disk units configured. It will perform a very thorough test with *WRITES* enabled. The user will only be able to select the run time and performance summary options and whether or not to test a half or full configuration. The user will not be able to specify specific units to test. The Auto-Configure option is only recommended for initial installations. Do you wish to perform an Auto-Configure (y/n) [n] ?y If you want to test a dual redundant subsystem, it is recommended that you pick option 2 on the first controller and then option 2 on the other controller. Auto-Configure options are: 1. Configure all disk units for testing. This is recommended for a single controller subsystem. 2. Configure half of all disk units for testing, this is recommended for a dual controller subsystem. 3. Exit Auto-Configure and DILX. Enter Auto-Configure option (1:3) [3] ?1 **** C a u t i o n **** All data on the Auto-Configured disks will be destroyed. You *MUST* be sure of yourself. (continued on next page) Diagnostics, Exercisers, and Utilities 6-25 Example 6-8 (Cont.) Auto-Configuration with All Units Are you sure you want to continue (y/n) [n] ?y Enter execution time limit in minutes (1:65535) [60] ? Enter performance summary interval in minutes (1:65535) [60] ? Unit 10 successfully allocated for testing Unit 12 successfully allocated for testing Unit 14 successfully allocated for testing Unit 21 successfully allocated for testing Unit 23 successfully allocated for testing Unit 61 successfully allocated for testing Unit 63 successfully allocated for testing DILX testing started at: 13-JAN-1993 04:42:39 Test will run for 60 minutes Type ^T(if running DILX through VCS) or ^G(in all other cases) to get a current performance summary Type ^C to terminate the DILX test prematurely Type ^Y to terminate DILX prematurely DILX Summary at 13-JAN-1993 04:44:11 Test minutes remaining: 59, expired: 1 Unit 10 Total IO Requests 9595 No errors detected Unit 12 Total IO Requests 5228 No errors detected Unit 14 Total IO Requests 10098 No errors detected Unit 21 Total IO Requests 9731 No errors detected Unit 23 Total IO Requests 5230 No errors detected Unit 61 Total IO Requests 11283 No errors detected Unit 63 Total IO Requests 5232 No errors detected Reuse Parameters (stop, continue, restart, change_unit) [stop] ? DILX - Normal Termination HSJ> In Example 6-9, DILX is run using the Auto-Configure option with the half of all units option. Example 6-9 Auto-Configuration with Half of All Units HSJ> run dilx Copyright © Digital Equipment Corporation 1993 Disk Inline Exerciser - version 1.4 The Auto-Configure option will automatically select, for testing, half or all of the disk units configured. It will perform a very thorough test with *WRITES* enabled. The user will only be able to select the run time and performance summary options and whether or not to test a half or full configuration. The user will not be able to specify specific units to test. The Auto-Configure option is only recommended for initial installations. Do you wish to perform an Auto-Configure (y/n) [n] ?y (continued on next page) 6-26 Diagnostics, Exercisers, and Utilities Example 6-9 (Cont.) Auto-Configuration with Half of All Units If you want to test a dual redundant subsystem, it is recommended that you pick option 2 on the first controller and then option 2 on the other controller. Auto-Configure options are: 1. Configure all disk units for testing. This is recommended for a single controller subsystem. 2. Configure half of all disk units for testing, this is recommended for a dual controller subsystem. 3. Exit Auto-Configure and DILX. Enter Auto-Configure option (1:3) [3] ?2 **** C a u t i o n **** All data on the Auto-Configured disks will be destroyed. You *MUST* be sure of yourself. Are you sure you want to continue (y/n) [n] ?y Enter execution time limit in minutes (1:65535) [60] ? Enter performance summary interval in minutes (1:65535) [60] ? Unit 12 successfully allocated for testing Unit 21 successfully allocated for testing Unit 61 successfully allocated for testing DILX testing started at: 13-JAN-1993 04:39:20 Test will run for 60 minutes Type ^T(if running DILX through VCS) or ^G(in all other cases) to get a current performance summary Type ^C to terminate the DILX test prematurely Type ^Y to terminate DILX prematurely DILX Summary at 13-JAN-1993 04:41:39 Test minutes remaining: 58, expired: 2 Unit 12 Total IO Requests 8047 No errors detected Unit 21 Total IO Requests 15239 No errors detected Unit 61 Total IO Requests 19270 No errors detected Reuse Parameters (stop, continue, restart, change_unit) [stop] ? DILX - Normal Termination HSJ> 6.2.10 Interpreting the DILX Performance Summaries A DILX performance display is produced under the following conditions: · When a specified performance summary interval elapses · When DILX terminates for any conditions except an abort · When Ctrl/G is entered (or Ctrl/T when running from a VCS) The performance display has different formats depending on whether or not performance statistics are requested in the user-specified parameters and if errors are detected. The following is an example of a DILX performance display where performance statistics were not selected and where no errors were detected: Diagnostics, Exercisers, and Utilities 6-27 DILX Summary at 18-JUN-1993 06:18:41 Test minutes remaining: 0, expired: 6 Unit 1 Total IO Requests 482 No errors detected Unit 2 Total IO Requests 490 No errors detected The following is an example of a DILX performance display where performance statistics were selected and no errors were detected: DILX Summary at 18-JUN-1993 06:18:41 Test minutes remaining: 0, expired: 6 Unit 1 Total IO Requests 482 Read Count 292 Write Count 168 Access Count 21 Erase Count 0 KB xfer Read 7223 Write 4981 Total 12204 No errors detected The following is an example of a DILX performance display where performance statistics were not selected and where errors were detected on a unit under test: DILX Summary at 18-JUN-1993 06:18:41 Test minutes remaining: 0, expired: 6 ! Unit 10 Total IO Requests 153259 No errors detected " Unit 40 Total IO Requests 2161368 Err in Hex: IC:031A4002 PTL:04/00/00 Key:04 ASC/Q:B0/00 HC:0 SC:1 Total Errs Hard Cnt 0 Soft Cnt 1 # Unit 55 Total IO Requests 2017193 Err in Hex: IC:03094002 PTL:05/05/00 Key:01 ASC/Q:18/89 HC:0 SC:1 Err in Hex: IC:03094002 PTL:05/05/00 Key:01 ASC/Q:18/86 HC:0 SC:1 $ Total Errs Hard Cnt 0 Soft Cnt 2 where: ! Represents the unit number and the total I/O requests to this unit. " Represents the unit number and total I/O requests to this unit. All values for the following codes are described in Appendices C and D. This also includes the following items associated with this error, and the total number of hard and soft errors for this unit: · The HSJ-/HSD-series Instance code (in hex) · The Port Target LUN (PTL) · The SCSI Sense Key · The SCSI ASC and ASQ (ASC/Q) codes · The total hard and soft count for this error # Represents information about the first two unique errors. All values for the following codes are described in Appendices C and D. This also includes the following items associated with this error, and the total number of hard and soft errors for this unit: · The HSJ-/HSD-series Instance code (in hex) · The Port Target LUN (PTL) · The SCSI Sense (Key) 6-28 Diagnostics, Exercisers, and Utilities · The SCSI ASC and ASQ (ASC/Q) codes · The total hard and soft count for this error A line of this format may be displayed up to three times in a performance summary. There would be a line for each unique error reported to DILX for this unit, up to three errors. $ Represents the total hard and soft errors experienced for this unit. The following is an example of a DILX performance display where performance statistics were not selected and where a controller error was detected: DILX Summary at 18-JUN-1993 06:18:41 Test minutes remaining: 0, expired: 6 Cnt err in HEX IC:07080064 Key:06 ASC/Q:A0/05 HC:1 SC:0 Total Cntrl Errs Hard Cnt 1 Soft Cnt 0 Unit 1 Total IO Requests 482 No errors detected Unit 2 Total IO Requests 490 No errors detected For the previous examples, the following definitions apply. These codes are translated in Appendices C and D. · IC--The HSJ-/HSD-series Instance code · ASC/Q--The SCSI ASC and ASCQ code associated with this error · HC--The hard count of this error · SC--The soft count of this error · PTL--The location of the unit (Port Target LUN) The performance displays contain error information for up to three unique errors. Hard errors always have precedence over soft errors. A soft error represented in one display may be replaced with information on a hard error in subsequent performance displays. 6.2.11 DILX Abort Codes Table 6-3 lists the DILX abort codes and definitions. Diagnostics, Exercisers, and Utilities 6-29 Table 6-3 DILX Abort Codes and Definitions ------------------------------------------------------------ Value Definition ------------------------------------------------------------ 1 An IO has timed out. 2 dcb_p->htb_used_count reflects an available HTB to test IOs but none could be found. 3 FAO returned either FAO_BAD_FORMAT or FAO_OVERFLOW. 4 TS$SEND_TERMINAL_DATA returned either an ABORTED or INVALID_BYTE_COUNT. 5 TS$READ_TERMINAL_DATA returned either an ABORTED or INVALID_BYTE_COUNT. 6 A timer is in an unexpected expired state that prevents it from being started. 7 The semaphore was set after a oneshot IO was issued but nothing was found in the received HTB que. 8 A termination, a print summary, or a reuse parameters request was received when DILX was not testing any units. 9 User requested an abort via ^Y. ------------------------------------------------------------ 6.2.12 DILX Error Codes Table 6-4 list the DILX error codes and definitions for DILX-detected errors. Table 6-4 DILX Error Codes and Definitions ------------------------------------------------------------ Value Definition ------------------------------------------------------------ 1 Illegal Data Pattern Number found in data pattern header. 2 No write buffers correspond to data pattern. 3 Read data does not match write buffer. 4 Compare Host Data should have reported a compare error but did not. ------------------------------------------------------------ 6.3 Tape Inline Exerciser (HSJ- and HSD-Series Controllers) TILX is a diagnostic tool used to exercise the data transfer capabilities of selected tape drives connected to an HSJ- or HSD-series controller. TILX exercises tape drives in a way that simulates a high level of user activity. Thus, TILX can be used to determine the health of the controller and the tape drives connected to it. You can run TILX from a maintenance terminal or from a virtual terminal. DILX and TILX may run concurrently with one initiated from a maintenance terminal and the other from a virtual terminal connection. Digital recommends, however, that the exercisers not be run while normal I/O operations are in progress, as system performance will degrade due to the heavy load the exercisers impose on the controller. 6-30 Diagnostics, Exercisers, and Utilities 6.3.1 Invoking TILX ------------------------------------------------------------ Note ------------------------------------------------------------ Before running TILX, be sure that all units you wish to test have been dismounted from the host. ------------------------------------------------------------ The following describes how to invoke TILX from a maintenance terminal at the CLI> prompt or a VCS, or from a virtual terminal through the DUP connection. · To invoke TILX from a maintenance terminal, enter the following command at the CLI> prompt: CLI> RUN TILX · To invoke TILX from a maintenance terminal using a VCS, enter the following command at the CLI> prompt: CLI> VCS CONNECT node name where node name is the controller 's SCS node name. Consult the VAXcluster Console System User 's Guide for complete details on using a VCS. ------------------------------------------------------------ Note ------------------------------------------------------------ The node name must be specified for a VCS. ------------------------------------------------------------ · To invoke TILX from a virtual terminal, enter the following command (for OpenVMS software): $ SET HOST/DUP/SERVER=MSCP$DUP/TASK=TILX SCS_nodename where SCS_nodename indicates where TILX will execute. 6.3.2 Interrupting TILX Execution Use the following guidelines to interrupt TILX execution: ------------------------------------------------------------ Note ------------------------------------------------------------ The symbol ``^'' is equivalent to the Ctrl key. You must press and hold the Ctrl key and type the character key given. ------------------------------------------------------------ ------------------------------------------------------------ Note ------------------------------------------------------------ Do not use Ctrl/G from a VCS because it will cause VCS to terminate. VCS acts on the sequence and the sequence is never sent to TILX. Use Ctrl/T when invoking TILX from a VCS. ------------------------------------------------------------ · Ctrl/G causes TILX to produce a performance summary. TILX continues normal execution without affecting the runtime parameters. · Ctrl/C causes TILX to produce a performance summary, stop testing, and asks the ``reuse parameters'' question. Diagnostics, Exercisers, and Utilities 6-31 · Ctrl/Y causes TILX to terminate. The ``reuse parameters'' question is not asked. · Ctrl/T causes TILX to produce a performance summary. TILX then continues executing normally without affecting any of the runtime parameters. 6.3.3 TILX Tests There are three TILX tests, as follow: · The Basic Function test · The User-Defined test · The Read Only test 6.3.3.1 Basic Function Test--TILX The Basic Function test executes a write pass followed by a read pass. The write pass executes in two phases, as follows: · Data Intensive--The first one third of the records are written in this phase. All records written to the tape have a byte count of 16 kilobytes. With this high byte count and the default queue depth, this phase should test the streaming capability (if supported) of the tape unit. · Random--This test is performed for the remaining two-thirds of the selected record count. It consists of writes with random byte counts. Intermixed is the sequence write, reposition back one record, read. This sequence performed three times in a row. Tape mark writing is also intermixed in the test. The write pass is complete when the selected record count is reached, or if the end of tape (EOT) is reached. The tape is rewound and the read pass is started. The read pass consists of the following three phases: · Data Intensive--Consists of reads of fixed record sizes with a byte count equal to the expected tape record byte count. When tape marks are encountered, forward position commands are issued. · Random--Begins at the point where random sized records were written to the tape. Most reads are issued with a byte count equal to the expected tape record byte count. Occasionally, reads will be intermixed with a byte count less than or greater than the expected tape record byte count. When tape marks are encountered, forward position commands are issued. · Position Intensive--Begins half way down from the start of the area where random sized records are located. In the Position Intensive phase, reads and position commands are intermixed so that the test gradually proceeds toward the EOT. When tape marks are encountered, forward position commands are issued. In all phases, if the EOT is detected, the tape is rewound to the beginning of tape (BOT), and the write pass is again entered. 6.3.3.2 User-Defined Test--TILX ------------------------------------------------------------ CAUTION ------------------------------------------------------------ The User-Defined test should be run only by very knowledgeable personnel. Otherwise, customer data can be destroyed. ------------------------------------------------------------ 6-32 Diagnostics, Exercisers, and Utilities When the TILX User-Defined test is selected, TILX prompts you for input to allow a specific test to be defined. In a User-Defined test, a total of 20, or fewer, I/O commands can be defined. Once all of the commands are issued, TILX issues the commands again in the same sequence. This is repeated until the selected time limit is reached. As you build the test, TILX collects the following information for each command: · The I/O command operation (write, read, reposition record, reposition file, write tape mark, rewind, quit. Note that quit is not a command; instead it indicates to TILX that you have finished defining the test). · The number of times to repeat the command. (Applies only to write, read, and write tape mark.) · The number of records or file marks to reposition. · The data pattern to use. · The direction of reposition operation (toward EOT or BOT). · The size of the I/O in bytes. · The TMSCP command modifiers. 6.3.3.3 Read Only Test--TILX The Read Only test should only be used to verify that a tape is readable. The Read Only test reads records until the EOT or the selected record count is reached. At that point, the tape is rewound and another read pass proceeds. Tape marks are ignored. This test will most likely issue reads with incorrect record sizes. If there are record size mismatches, they will be ignored. All other errors will be recorded. 6.3.4 TILX Test Definition Questions The following section lists the questions that TILX asks to collect the parameters needed to perform a TILX test. Each of the following sections discusses specific TILX questions. The test questions are listed in the approximate order that they are displayed on your terminal. These questions prompt you to define the runtime parameters for TILX. ------------------------------------------------------------ Note ------------------------------------------------------------ Defaults for each question are given inside [ ]. If you press the ------------------------------------------------------------ ------------------------------------------------------------ Return ------------------------------------------------------------ ------------------------------------------------------------ key as a response to a question, the default is used as the response. ------------------------------------------------------------ Use all defaults (y/n) [y] ? Explanation: Enter ``Y'' to use the defaults for TILX and most of the other TILX questions are not asked. Enter ``N'' and the defaults are not used. You must then answer each question as it is displayed. The following defaults are assumed for all units selected for testing: · Execution time limit = 10 minutes. · Performance summary interval = 10 minutes. Diagnostics, Exercisers, and Utilities 6-33 · Displaying performance statistics is disabled. ------------------------------------------------------------ Note ------------------------------------------------------------ This does not include total I/O requests. ------------------------------------------------------------ · Displaying hard/soft EIPs and end messages is disabled. · Hard error limit = 65535. Testing will stop if the limit is reached. · Hex dump of extended error log information is disabled. · I/O queue depth = 4. A maximum of 4 I/Os will be outstanding at one time. · Selected test = Basic Function test. · The record count = 4096. · All data patterns are used. · Data compares are disabled. Enter execution time limit in minutes (1:65535) [10] ? Explanation: Enter the desired time you want TILX to run. The default run time is 10 minutes. Enter performance summary interval in minutes (1:65535) [10] ? Explanation: Enter a value to set the interval for which a performance summary is displayed. The default is 10 minutes. Include performance statistics in performance summary (y/n) [n] ? Explanation: Enter ``Y'' to see a performance summary that includes the performance statistics that include the total count of read and write I/O requests and the kilobytes transferred for each command. Enter ``N'' and no performance statistics are displayed. Display hard/soft errors (y/n) [n] ? Explanation: Enter ``Y'' to enable error reporting, including end messages and EIPs. Enter ``N'' to disable error reporting, including end messages and EIPs. The default is disabled error reporting. Display hex dump of Event Information Packet Requester Specific information(y/n) [n] ? Explanation: Enter ``Y'' to enable the hex dump display of the requester specific information contained in the EIP. Enter ``N'' to disable the hex dump. When the hard error limit is reached, the unit will be dropped from testing. Enter hard error limit (1:65535) [65535] ? Explanation: Enter a value to specify the hard error limit for all units to test. This question is used to obtain the hard error limit for all units under test. If the hard error limit is reached, TILX discontinues testing the unit that reaches the hard error limit. If other units are currently being tested by TILX, testing continues for those units. 6-34 Diagnostics, Exercisers, and Utilities When the soft error limit is reached, soft errors will no longer be displayed but testing will continue for the unit. Enter soft error limit (1:65535) [32] ? Explanation: Enter a value to specify the soft error limit for all units under test. If the soft error limit is reached for a unit under test, soft error reporting is disabled for that unit only. However, testing continues for that unit. Enter IO queue depth (1:20) [4] ? Explanation: Enter the maximum number of outstanding I/Os for each unit selected for testing. The default is 4. Enter unit number to be tested ? Explanation: Enter the unit number for the (tape drive) unit to be tested. ------------------------------------------------------------ Note ------------------------------------------------------------ When TILX asks for the unit number, it requires the actual number of the tape, where T177 would be specified as unit number 177. ------------------------------------------------------------ Is a tape loaded and ready, answer Yes when ready ? Explanation: This question is self explanatory. Select another unit (y/n) [n] ? Explanation: Enter ``Y'' to select another unit to test. Enter ``N'' to begin testing the units selected. The system will display the following test selections: *** Available tests are: 1. Basic Function 2. User Defined Test 3. Read Only Use the Basic Function test 99.9% of the time. The User-Defined test is for special problems only. Enter test number (1:3) [1] ? Explanation: This question allows you to pick which TILX test you want to run on all selected units. The following questions define the TILX tests. Enter data pattern number 0=all, 19=user_defined, (0:19) [0] ? Explanation: The TILX data patterns are used in write commands. This question is displayed for the Basic Function and User-Defined tests. There are 18 unique data patterns from which to select. These patterns were carefully selected as worst case or most likely to produce errors for tapes connected to the controller. (See Table 6-5 for a list of the data patterns.) The default uses all 18 patterns in a random method. This question also allows you to create a unique data pattern of your choice. Diagnostics, Exercisers, and Utilities 6-35 Enter record count (1:4294967295) [4096] ? Explanation: Enter the number of records to write to the tape. ------------------------------------------------------------ Note ------------------------------------------------------------ The record count does not include tape marks that are intermixed with the records written to the tape in the Basic Function test. ------------------------------------------------------------ Enter the 8-digit hexadecimal user defined data pattern [ ] ? Explanation: This question is only displayed if you choose to use a User- Defined data pattern for write commands. The data pattern is represented in a longword and can be specified with eight hexadecimal digits. Perform data compare (y/n) [n] ? Explanation: Enter ``Y'' to enable the compare modifier bit with the read and write commands. This question only applies to the Basic Function test. If the compare modifier is set on write commands, the data are written to the tape. The data are then read from the tape and compared against the corresponding TILX buffers. On read commands, the data are read from the tape into the TILX buffers, read again, and then compared against the corresponding TILX buffers. If a discrepancy is found, an error is reported. Enter ``N'' and the compare modifier bit is disabled. The default is to have the bit disabled. Enter compare percentage (1:100) [2] ? Explanation: This question is displayed only if you choose to perform data compares. It allows you to enter the percentage of read and write commands that will have a data compare operation performed. Enter command number x (red, wrt, rew, wtm, rpr, rpf, quit) [ ] ? Explanation: This question only applies to the User-Defined test. It allows you to define command x as a read, write, rewind, write tape mark, reposition records, or reposition file marks. Enter quit to finish defining the test. Reposition towards EOT (y=EOT/n=BOT) [y] ? Explanation: If you specify the reposition records or reposition file marks command in the User-Defined test, this question is displayed. Enter the direction of the reposition operation you want, either towards the end of tape (EOT) or at the beginning of tape (BOT). Enter number of records to reposition (1:255) [1] ? Explanation: If you specify the reposition records command in the User- Defined test, this question is displayed. The question is self explanatory. Enter number of tape marks to reposition (1:255) [1] ? Explanation: If you specify the reposition file marks command in the User-Defined test, this question is displayed. The question is self explanatory. Enter IO size in bytes (1,65535) [ ] ? Explanation: This question is only asked in the User-Defined test for read or write commands. The question is self explanatory. 6-36 Diagnostics, Exercisers, and Utilities Enter in HEX, the TMSCP Command Modifiers [0] ? Explanation: This question only applies to the User-Defined test. It allows you to specify the TMSCP command modifiers. You must understand the meaning of the TMSCP command modifiers before entering any value other than the default. Contact Digital Multivendor Services if you wish to use other than default values. Reuse Parameters (stop, continue, restart, change_unit) [stop] ? Explanation: This question is displayed after the TILX execution time limit expires after the hard error limit is reached for every unit under test, or after you enter Ctrl/C. The options are as follow: · Stop--TILX terminates normally. · Continue--TILX resumes execution without resetting the remaining TILX execution time or any performance statistics. If the TILX execution time limit has expired, or all units have reached their hard error limit, TILX terminates. · Restart--TILX resets all performance statistics and restarts execution so that the test will perform exactly as the test that just completed. · Change_unit--If you select this option, TILX allows you to drop a unit from testing and add a unit to testing. For each unit dropped, another unit must be added until all units in the configuration have been tested. The unit chosen will be tested with the same parameters chosen for the unit that was dropped from testing. When you have completed adding and dropping units, all performance statistics are initialized and TILX execution resumes with the same parameters as the last run. Drop unit #x (y/n) [n] ? Explanation: This question is displayed if you choose to change a unit as an answer to the ``reuse parameters'' question. It is asked for every unit that was tested. After entering ``Y'', you are prompted for the unit number. Enter the unit number to drop from testing. Enter ``N'' if you do not wish to drop a unit from testing. ------------------------------------------------------------ Note ------------------------------------------------------------ For each unit dropped from testing, one must be added. ------------------------------------------------------------ 6.3.5 TILX Output Messages The following message is displayed when TILX is started: Copyright © Digital Equipment Corporation 1993 Tape Inline Exerciser - version 1.4 This message identifies the internal program as TILX and gives the TILX software version number. TILX - Normal Termination. Explanation: This message is displayed when TILX terminates under normal conditions. Diagnostics, Exercisers, and Utilities 6-37 Insufficient resources. Explanation: Following this line is a second line that gives more information about the problem, which could be one of the following messages: · Unable to allocate memory. TILX was unable to allocate the memory needed to perform TILX tests. You should run TILX again but choose a lower queue depth and/or choose fewer units to test. · Cannot perform tests. TILX was unable to allocate all of the resources needed to perform TILX tests. You should run TILX again but choose a lower queue depth and/or choose fewer units to test. · Unable to change operation mode to maintenance. TILX tried to change the operation mode from normal to maintenance using the SYSAP$CHANGE_STATE( ) routine, but was not successful due to insufficient resources. This problem should not occur. If it does occur, submit an error report. Then reset the controller. Tape unit x does not exist. Explanation: An attempt was made to allocate a unit for testing that does not exist on the controller. Unit x successfully allocated for testing. Explanation: All processes that TILX performs to allocate a unit for testing have been completed. The unit is ready for TILX testing. Unable to allocate unit. Explanation: This message should be preceded by a reason why the unit could not be allocated for TILX testing. Cannot enable eip notification. Explanation: This message indicates that TILX was not successful in enabling EIP notification. This should only occur if another copy of TILX is running. Wait for the first copy to finish or terminate the second copy. If there are no copies of TILX running, submit a CLD (error report) and restart the controller. TILX detected error, code x. Explanation: The ``normal'' way TILX recognizes an error on a unit is through the reception of an EIP, which loosely corresponds to an error log. However, there are some errors that TILX will detect without the reception of an EIP. These errors are as follow: · Illegal Data Pattern Number found in data pattern header. Unit x. This is code 1. TILX read data from the tape unit and found that the data were not in a pattern that TILX previously wrote to the tape. · No write buffers correspond to data pattern. Unit x. This is code 2. TILX read a legal data pattern from the tape at a place where TILX wrote to the tape, but TILX does not have any write buffers that correspond to the data pattern. Thus, the data have been corrupted. 6-38 Diagnostics, Exercisers, and Utilities · Read data do not match what TILX thought was written to the media. This is code 3. TILX writes data to the tape and then reads it and compares it against what TILX thought it wrote to the tape. This indicates a compare failure. More information is displayed to indicate where in the data buffer the compare failed, and what the data were and should have been. · TILX/Tape record size mismatch. This is code 4. This error would only be detected on a read pass. Because TILX knows what was written to the tape, TILX expects to encounter the records (of different sizes), tape marks, and the EOT in exactly the same positions as previously written. This error most likely means that the tape unit has a positioning problem. · A tape mark was detected in a place not expected by TILX. This is code 5. This error would only be detected on a read pass. Because TILX knows what was written to the tape, TILX expects to encounter the records, tape marks, and the EOT in exactly the same positions as previously written. This error most likely means that the tape unit has a positioning problem. · Record Data Truncated not generated. This is code 6. This error would only be detected on a read pass. Occasionally, TILX issues a read with a byte count less than what TILX knows was written to the current tape record. Thus, TILX would expect to receive a Record Data Truncated status. If TILX does not receive the Record Data Truncated status when expected, this TILX detected error is reported. · EOT encountered in unexpected position. This is code 7. This error would only be detected on a read pass. Because TILX knows what was written to the tape, TILX expects to encounter the records, tape marks, and the EOT in exactly the same positions as previously written. This error most likely means that the tape unit has a positioning problem. TILX terminated. A termination, a print summary or a reuse parameters request was received but TILX is currently not testing any units. Explanation: A Ctrl/Y (termination request), Ctrl/G (print summary request), or a Ctrl/C (reuse parameters request) was entered before TILX started to test units. TILX cannot satisfy the second two requests, so TILX treats all of these requests as a termination request. TILX will not change the state of a unit if it is not NORMAL. Explanation: TILX cannot allocate the unit for testing because it is already in Maintenance mode. (Maintenance mode can only be invoked by the firmware. If another TILX session is in use, the unit is considered in Maintenance mode.) Unit is not available - if you dismount the unit from the host, it may correct this problem. Explanation: The unit has been placed on line by another user (or host) or the media is not present. Diagnostics, Exercisers, and Utilities 6-39 Soft error reporting disabled. Unit x. Explanation: This message indicates that the soft error limit has been reached and that no more soft errors will be printed for this unit. Hard error limit reached, unit x dropped from testing. Explanation: This message indicates that the hard error limit has been reached and the unit must be dropped from testing. Soft error reporting disabled for controller errors. Explanation: This indicates that the soft error limit has been reached for controller errors. Controller soft error reporting is disabled. Hard error limit reached for controller errors. All units dropped from testing. Explanation: This message is self explanatory. Unit is already allocated for testing. Explanation: This message is self explanatory. No drives selected. Explanation: TILX parameter collection was exited without choosing any units to test. Maximum number of units are now configured. Explanation: This message is self explanatory. (Testing will start after this message is displayed.) Unit is write protected. Explanation: The user wants to test a unit with write and/or erase commands enabled but the unit is write protected. The unit status and/or the unit device type has changed unexpectedly. Unit x dropped from testing. Explanation: The unit status may change if the unit experienced hard errors or if the unit is disconnected. Either way, TILX cannot continue testing the unit. Last Failure Information follows. This error, was NOT produced by running TILX. It represents the reason why the controller crashed on the previous controller run. Explanation: This message may be displayed while allocating a unit for testing. It does not indicate any reason why the unit is or is not successfully allocated, but rather represents the reason why the controller went down in the previous run. The information that follows this message is the contents of an EIP. Tape unit numbers on this controller include: Explanation: After this message is displayed, a list of tape unit numbers on the controller is displayed. 6-40 Diagnostics, Exercisers, and Utilities IO to unit x has timed out. TILX aborting. Explanation: One of the TILX I/Os to this unit did not complete within the command timeout interval and when examined, was found not progressing. This indicates a failing controller. TILX terminated prematurely by user request. Explanation: A Ctrl/Y was entered. TILX interprets this as a request to terminate. This message is then displayed and TILX terminates. Unit is owned by another sysap. Explanation: TILX could not allocate the unit specified because the unit is currently allocated by another system application. Terminate the other system application or reset the controller. Exclusive access is declared for this unit. Explanation: The unit could not be allocated for testing because exclusive access has been declared for the unit. The other controller has exclusive access declared for this unit. Explanation: This message is self explanatory. This unit is marked inoperative. Explanation: The unit could not be allocated for testing because the controller internal tables have the unit marked as inoperative. The unit does not have any media present. Explanation: The unit could not be allocated for testing because no media is present. The RUNSTOP_SWITCH is set to RUN_DISABLED. Explanation: The unit could not be allocated for testing because the RUNSTOP_SWITCH is set to RUN_DISABLED. This is enabled and disabled through the Command Line Interpreter (CLI). Unable to continue, run time expired. Explanation: A continue response was given to the ``reuse parameters'' question. This is not a valid response if the run time has expired. Reinvoke TILX. When TILX starts to exercise the tape units, the following is displayed with the current time of day: TILX testing started at: xx:xx:xx Test will run for x minutes Type ^T(if running TILX through a VCS) or ^G(in all other cases) to get a current performance summary Type ^C to terminate the TILX test prematurely Type ^Y to terminate TILX prematurely Diagnostics, Exercisers, and Utilities 6-41 6.3.6 TILX End Message Display The Value Added Status field corresponds to the TMSCP end message status. Example 6-10 is an example of a TILX end message display. Example 6-10 TILX End Message Display Bad Value Added Completion Status for unit x, End message in hex Event Code x Op Code x Cmd Ref Number x End Flags x Host Xfer Byte Count x Tape Rec Byte Count x Tape Position x Sequence Number x 6.3.7 TILX Error Information Packet Displays Contact Digital Multivendor Services for assistance in deciphering the EIP fields. A TILX EIP display may or may not include a hex dump of the Requestor Specific Data. This is an option you can select for TILX selectable parameters. The EIP will be in one of the following formats that corresponds to MSCP error log formats: · Controller Error · Memory Error · Tape Error Examples 6-11 through 6-13 are samples of each display. Each display includes the optional requestor specific information. In all cases, the Instance code, template type, and all requestor specific information correspond to event (error) log device dependent parameters, while everything else has a one-to- one correspondence to error log fields. See Appendices C and D for a translation of these codes. 6-42 Diagnostics, Exercisers, and Utilities Example 6-11 Controller Error Error Information Packet in hex Cmd Ref Number x Unit Number x Log Sequence x Format x Flags x Event Code x Controller ID x Controller SW ver x Controller HW ver x Multi Unit Code x Instance x Template Type x Requestor Information Size x Requestor Specific Data bytes 0 7 xx xx xx xx xx xx xx xx Requestor Specific Data bytes 8 15 xx xx xx xx xx xx xx xx : : Requestor Specific Data bytes xx xx xx xx xx xx xx xx xx xx Example 6-12 Memory Error Error Information Packet in hex Cmd Ref Number x Unit Number x Log Sequence x Format x Flags x Event Code x Controller ID x Controller SW ver x Controller HW ver x Multi Unit Code x Memory Address x Instance x Template Type x Requestor Information Size x Requestor Specific Data bytes 0 7 xx xx xx xx xx xx xx xx Requestor Specific Data bytes 8 15 xx xx xx xx xx xx xx xx : : Requestor Specific Data bytes xx xx xx xx xx xx xx xx xx xx Diagnostics, Exercisers, and Utilities 6-43 Example 6-13 Tape Error Error Information Packet in hex Cmd Ref Number x Unit Number x Log Sequence x Format x Flags x Event Code x Controller ID x Controller SW ver x Controller HW ver x Multi Unit Code x Unit ID[0] x Unit ID[1] x Unit Software Rev x Unit Hardware Rev x Recovery Level x Retry Count x Position x Formatter SW version x Formatter HW version x Instance x Template Type x Requestor Information Size x Requestor Specific Data bytes 0 7 xx xx xx xx xx xx xx xx Requestor Specific Data bytes 8 15 xx xx xx xx xx xx xx xx : : Requestor Specific Data bytes xx xx xx xx xx xx xx xx xx xx 6.3.8 TILX Data Patterns Table 6-5 defines the data patterns used with the TILX Basic Function or User-Defined tests. There are 18 unique data patterns. These data patterns were selected as worst case, or the ones most likely to produce errors on tapes connected to the controller. 6-44 Diagnostics, Exercisers, and Utilities Table 6-5 TILX Data Pattern Definitions ------------------------------------------------------------ Pattern Number Pattern in hex ------------------------------------------------------------ 1 0000 2 8B8B 3 3333 4 3091 5, shifting 1s 0001, 0003, 0007, 000F, 001F, 003F, 007F, 00FF, 01FF, 03FF, 07FF, 0FFF, 1FFF, 3FFF, 7FFF 6, shifting 0s FIE, FFFC, FFFC, FFFC, FFE0, FFE0, FFE0, FFE0, FE00, FC00, F800, F000, E000, C000, 8000, 0000 7, alternating 1s, 0s 0000, 0000, 0000, FFFF, FFFF, FFFF, 0000, 0000, FFFF, FFFF, 0000, FFFF, 0000, FFFF, 0000, FFFF 8 B6D9 9 5555, 5555, 5555, AAAA, AAAA, AAAA, 5555, 5555, AAAA, AAAA, 5555, AAAA, 5555, AAAA, 5555, AAAA, 5555 10 DB6C 11 2D2D, 2D2D, 2D2D, D2D2, D2D2, D2D2, 2D2D, 2D2D, D2D2, D2D2, 2D2D, D2D2, 2D2D, D2D2, 2D2D, D2D2 12 6DB6 13, ripple 1 0001, 0002, 0004, 0008, 0010, 0020, 0040, 0080, 0100, 0200, 0400, 0800, 1000, 2000, 4000, 8000 14, ripple 0 FIE, FFFD, FFFB, FFF7, FFEF, FFDF, FFBF, FF7F, FEFF, FDFF, FBFF, F7FF, EFFF, BFFF, DFFF, 7FFF 15 DB6D, B6DB, 6DB6, DB6D, B6DB, 6DB6, DB6D, B6DB, 6DB6, DB6D, B6DB, 6DB6, DB6D 16 3333, 3333, 3333, 1999, 9999, 9999, B6D9, B6D9, B6D9, B6D9, FFFF, FFFF, 0000, 0000, DB6C, DB6C 17 9999, 1999, 699C, E99C, 9921, 9921, 1921, 699C, 699C, 0747, 0747, 0747, 699C, E99C, 9999, 9999 18 FFFF Default-Use all of the above patterns in a random method ------------------------------------------------------------ 6.3.9 TILX Examples This sections provides some TILX examples with different options chosen. 6.3.9.1 TILX Example--Using All Defaults In Example 6-14, TILX is run using all defaults. This is a semi-extensive test even though the test only runs for 10 minutes. The only function not performed is data compares. Data compares are a time consuming operation with tapes. TILX is invoked from a maintenance terminal. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ TILX should only be run using scratch tapes. This test will write to the tape and destroy any data that exist on the tape. ------------------------------------------------------------ Diagnostics, Exercisers, and Utilities 6-45 Example 6-14 Using All Defaults--TILX HSJ> show tape Name Type Port Targ LUN Used by ------------------------------------------------------------------------------ TAPE500 tape 5 0 0 T50 TAPE520 tape 5 2 0 T52 HSJ> run tilx Copyright © Digital Equipment Corporation 1993 Tape Inline Exerciser - version 1.4 Use all defaults (y/n) [y] ? Tape unit numbers on this controller include: 50 52 Enter unit number to be tested ?50 Is a tape loaded and ready, answer Yes when ready ?y Unit 50 successfully allocated for testing Select another unit (y/n) [n] ?y Enter unit number to be tested ?52 Is a tape loaded and ready, answer Yes when ready ?y Unit 52 successfully allocated for testing Maximum number of units are now configured TILX testing started at: 13-JAN-1993 04:35:08 Test will run for 10 minutes Type ^T(if running TILX through VCS) or ^G(in all other cases) to get a current performance summary Type ^C to terminate the TILX test prematurely Type ^Y to terminate TILX prematurely TILX Summary at 13-JAN-1993 04:36:24 Test minutes remaining: 9, expired: 1 Unit 50 Total IO Requests 868 No errors detected Unit 52 Total IO Requests 860 No errors detected Reuse Parameters (stop, continue, restart, change_unit) [stop] ? TILX - Normal Termination HSJ> 6.3.9.2 TILX Example--Using All Functions In Example 6-15, TILX is run using all functions and using a longer run time and higher record count than the default. The performance statistics and a performance summary are displayed every 15 minutes. TILX is invoked from a maintenance terminal. This is an extensive test. Example 6-15 Using All Functions--TILX HSJ> run tilx Copyright © Digital Equipment Corporation 1993 Tape Inline Exerciser - version 1.4 Enter TILX hex debug flags (0:ffff) [0] ? (continued on next page) 6-46 Diagnostics, Exercisers, and Utilities Example 6-15 (Cont.) Using All Functions--TILX Use all defaults (y/n) [y] ?n Enter execution time limit in minutes (10:65535) [10] ? Enter performance summary interval in minutes (1:65535) [10] ? Include performance statistics in performance summary (y/n) [n] ?y Display hard/soft errors (y/n) [n] ?y Display hex dump of Error Information Packet requester specific information (y/n) [n] ?y When the hard error limit is reached, the unit will be dropped from testing. Enter hard error limit (1:65535) [32] ? When the soft error limit is reached, soft errors will no longer be displayed but testing will continue for the unit. Enter soft error limit (1:65535) [32] ? Enter IO queue depth (1:20) [4] ?6 Suppress caching (y,n) [n] ? *** Available tests are: 1. Basic Function 2. User Defined 3. Read Only Use the Basic Function test 99.9% of the time. The User Defined test is for special problems only. Enter test number (1:3) [1] ?1 Enter data pattern number 0=ALL, 19=USER_DEFINED, (0:19) [0] ? Enter record count (1:4294967295) [4096] ?1000 Perform data compare (y/n) [n] ?y Enter compare percentage (1:100) [2] ?1 Tape unit numbers on this controller include: 50 52 Enter unit number to be tested ?50 Is a tape loaded and ready, answer Yes when ready ?y Unit 50 successfully allocated for testing Select another unit (y/n) [n] ?y Enter unit number to be tested ?52 Is a tape loaded and ready, answer Yes when ready ?y Unit 52 successfully allocated for testing Maximum number of units are now configured TILX testing started at: 13-JAN-1993 04:38:15 Test will run for 10 minutes Type ^T(if running TILX through VCS) or ^G(in all other cases) to get a current performance summary Type ^C to terminate the TILX test prematurely Type ^Y to terminate TILX prematurely TILX Summary at 13-JAN-1993 04:40:14 Test minutes remaining: 9, expired: 1 Unit 50 Total IO Requests 724 Read Count 3 Write Count 681 Reposition Count 3 Total KB xfer 6718 Read 10 Write 6707 No errors detected Unit 52 Total IO Requests 731 Read Count 3 Write Count 687 Reposition Count 3 Total KB xfer 6743 Read 10 Write 6733 No errors detected Reuse Parameters (stop, continue, restart, change_unit) [stop] ? TILX - Normal Termination HSJ> Diagnostics, Exercisers, and Utilities 6-47 6.3.10 Interpreting the TILX Performance Summaries A TILX performance display is produced under the following conditions: · When the user-selectable performance summary interval elapses · When TILX terminates for any conditions except an abort · When Ctrl/G is entered (or Ctrl/T when running from a VCS) The performance display has different formats depending on whether or not performance statistics were requested in the user-specified parameters and if errors were detected. The following is an example of a TILX performance display where performance statistics were not selected and where no errors were detected: TILX Summary at 18-JUN-1993 06:18:41 Test minutes remaining: 0, expired: 6 Unit 1 Total IO Requests 482 No errors detected Unit 2 Total IO Requests 490 No errors detected The following is an example of a TILX performance display where performance statistics were selected and no errors were detected: TILX Summary at 18-JUN-1993 06:18:41 Test minutes remaining: 0, expired: 6 Unit 1 Total IO Requests 482 Read Count 292 Write Count 168 Access Count 21 Erase Count 0 KB xfer Read 7223 Write 4981 Total 12204 No errors detected The following is an example of a TILX performance display where performance statistics were not selected and where errors were detected: TILX Summary at 18-JUN-1993 06:18:41 Test minutes remaining: 0, expired: 6 ! Unit 10 Total IO Requests 153259 No errors detected " Unit 40 Total IO Requests 2161368 Err in Hex: IC:031A4002 PTL:04/00/00 Key:04 ASC/Q:B0/00 HC:0 SC:1 Total Errs Hard Cnt 0 Soft Cnt 1 # Unit 55 Total IO Requests 2017193 Err in Hex: IC:03094002 PTL:05/05/00 Key:01 ASC/Q:18/89 HC:0 SC:1 Err in Hex: IC:03094002 PTL:05/05/00 Key:01 ASC/Q:18/86 HC:0 SC:1 Total Errs Hard Cnt 0 Soft Cnt 2 where: ! Represents the unit number and the total I/O requests to this unit. " Represents the unit number and total I/O requests to this unit. All values for the following codes are described in Appendices C and D. This also includes the items associated with this error and the total number of hard and soft errors for this unit: · The HSJ-/HSD-series Instance code (in hex) · The Port Target LUN (PTL) 6-48 Diagnostics, Exercisers, and Utilities · The SCSI Sense (Key) · The SCSI ASC and ASQ (ASC/Q) codes · The hard and soft count for this error # Represents information about the first two unique errors for this unit. All values for the following codes are described in Appendices C and D. This also includes the items associated with this error and the total number of hard and soft errors for this unit: · The HSJ-/HSD-series Instance code (in hex) · The Port Target LUN (PTL) · The SCSI Sense (Key) · The SCSI ASC and ASQ (ASC/Q) codes · The hard and soft count for this error A line of this format may be displayed up to three times in a performance summary. There would be a line for each unique error reported to TILX for this unit, up to three errors. The following is an example of a TILX performance display where performance statistics were not selected and where a controller error error was detected: TILX Summary at 18-JUN-1993 06:18:41 Test minutes remaining: 0, expired: 6 Cnt err in HEX IC:07080064 Key:06 ASC/Q:A0/05 HC:1 SC:0 Total Cntrl Errs Hard Cnt 1 Soft Cnt 0 Unit 1 Serial Number 1 Total IO Requests 482 No errors detected Unit 2 Serial Number 2 Total IO Requests 490 No errors detected The performance displays contain error information on up to three unique errors. It should be noted that hard errors always have precedence over soft errors. A soft error represented in one display may be replaced with information on a hard error in subsequent performance displays. 6.3.11 TILX Abort Codes Table 6-6 list TILX abort codes and definitions. Diagnostics, Exercisers, and Utilities 6-49 Table 6-6 TILX Abort Codes and Definitions ------------------------------------------------------------ Value Definition ------------------------------------------------------------ 1 An IO has timed out. 2 A HTB was not available to issue an IO when it should have been. 3 FAO returned either FAO_BAD_FORMAT or FAO_OVERFLOW. 4 TS$SEND_TERMINAL_DATA returned either an ABORTED or INVALID_BYTE_COUNT. 5 TS$READ_TERMINAL_DATA returned either an ABORTED or INVALID_BYTE_COUNT. 6 A timer is in an unexpected expired state that prevents it from being started. 7 The semaphore was set after a oneshot IO was issued but nothing was found in the received HTB que. 8 A termination or a print summary or a reuse parameters request was received when TILX was not testing any units. 9 User requested abort via control Y. ------------------------------------------------------------ 6.3.12 TILX Error Codes Table 6-7 lists TILX defined error codes and definitions for TILX-detected errors. Table 6-7 TILX Abort Codes and Definitions ------------------------------------------------------------ Value Definition ------------------------------------------------------------ 1 Illegal Data Pattern Number found in data pattern header. 2 No write buffers correspond to data pattern. 3 Read data do not match write buffer. 4 TILX/TAPE record size mismatch. 5 A tape mark was detected in a place where it was not expected. 7 EOT encountered in unexpected position. ------------------------------------------------------------ 6.4 Disk Inline Exerciser (HSZ-Series Controllers) ------------------------------------------------------------ Note ------------------------------------------------------------ The information on DILX for the HSZ-series controllers is presented separately because the messages and performance summaries differ from those of the HSJ- and HSD-series controllers. ------------------------------------------------------------ DILX is a diagnostic tool used to exercise the data transfer capabilities of selected disks connected to an HSZ-series controller. DILX exercises disks in a way that simulates a high level of user activity. Using DILX, you can read and write to all customer-available data areas. DILX can also be run on CDROMs, but must be run in read-only mode only. Thus, DILX can be used to determine the health of a controller and the disks connected to it and to acquire performance statistics. You can run DILX from a maintenance terminal. 6-50 Diagnostics, Exercisers, and Utilities DILX now allows for auto-configuring of drives. This allows for quick configuring and testing of all units at once. Please be aware that customer data will be lost by running this test. Digital recommends only using the Auto-Configure option during initial installations. DILX tests logical units that may consist of storage sets of multiple physical devices. Error reports identify the logical units, not the physical devices. Therefore, if errors occur while running against a unit, its storage set should be reconfigured as individual devices, and then DILX run again, against the individual devices. There are no limitations on the number of units DILX may test at one time. However, Digital recommends only using DILX when no host activity is present. If you must run DILX during a live host connection, you should limit your testing to no more than half of any controller 's units at one time. This conserves controller resources and minimizes performance degradation on the live units you are not testing. 6.4.1 Invoking DILX To invoke DILX from a maintenance terminal, enter the following command at the CLI> prompt: CLI> RUN DILX 6.4.2 Interrupting DILX Execution Use the following guidelines to interrupt DILX execution. ------------------------------------------------------------ Note ------------------------------------------------------------ The symbol ``^'' is equivalent to the Ctrl key. You must press and hold the Ctrl key and type the character key given. ------------------------------------------------------------ · Ctrl/G or Ctrl/T causes DILX to produce a performance summary. DILX continues normal execution without affecting the runtime parameters. · Ctrl/C causes DILX to produce a performance summary, stop testing, and ask the ``reuse parameters'' question. · Ctrl/Y causes DILX to abort. The ``reuse parameters'' question is not asked. 6.4.3 DILX Tests There are two DILX tests, as follow: · The Basic Function test · The User-Defined test 6.4.3.1 Basic Function Test--DILX The Basic Function test for DILX executes in two or three phases. The three phases are as follow: · Initial Write Pass--Is the only optional phase and is always executed first (if selected). The initial write pass writes the selected data patterns to the entire specified data space or until the DILX execution time limit has been reached. Once the initial write pass has completed, it is not re-executed no matter how long the DILX execution time is set. The other phases are re-executed on a 10-minute cycle. Diagnostics, Exercisers, and Utilities 6-51 · Random I/O--Simulates typical I/O activity with random transfers from one byte to the maximum size I/O possible with the memory constraints DILX runs under. Note that the length of all I/Os is in bytes and is evenly divisible by the sector size (512 bytes). Read and write (if enabled) commands are issued using random logical block numbers (LBNs). In the read/write mode, DILX issues the reads and writes in the ratio specified previously under read/write ratio. When read-only mode is chosen, only read commands are issued. If compares are enabled, compares are performed on read commands using DILX internal checks. The percentage of compares to perform can be specified. This phase is executed 80 percent of the time. It is the first phase executed after the initial write pass has completed. It is re-executed at 10-minute intervals with each cycle lasting approximately 8 minutes. Intervals are broken down into different cycles. The interval is repeated until the user-selected time interval expires. <--------------------------------10 min----------------------------------> <-----------------8 min Random I/O----------------><--2 min Data Inten---> · Data Intensive--Designed to test disk throughput by selecting a starting LBN and repeating transfers to the next sequential LBN that has not been accessed by the previous I/O. The transfer size of each I/O equals the maximum sized I/O that is possible with the memory constraints DILX must run under. This phase continues performing spiraling I/O to sequential tracks. Read and write commands are issued in read/write mode. This phase is executed 20 percent of the time after the initial write pass has completed. This phase always executes after the random I/O phase. It is re-executed at 10-minute intervals with each cycle approximately 2 minutes. 6.4.3.2 User-Defined Test--DILX ------------------------------------------------------------ CAUTION ------------------------------------------------------------ The User-Defined test should be run only by very knowledgeable personnel. Otherwise, customer data can be destroyed. ------------------------------------------------------------ When this test is selected, DILX prompts you for input to define a specific test. In the DILX User-Defined test, a total of 20 or fewer I/O commands can be defined. Once all of the commands are issued, DILX issues the commands again in the same sequence. This is repeated until the selected time limit is reached. As you build the test, DILX collects the following information from you for each command: · The I/O command name (write, read, or quit). Quit is not a command; instead it indicates to DILX that you have finished defining the test. · The starting logical block number (LBN). · The size of the I/O in 512 byte blocks. 6-52 Diagnostics, Exercisers, and Utilities 6.4.4 DILX Test Definition Questions The following text is displayed when running DILX. The text includes questions that are listed in the approximate order that they are displayed on your terminal. These questions prompt you to define the runtime parameters for DILX. ------------------------------------------------------------ Note ------------------------------------------------------------ Defaults for each question are given inside [ ]. If you press the ------------------------------------------------------------ ------------------------------------------------------------ Return ------------------------------------------------------------ ------------------------------------------------------------ key as a response to a question, the default is used as the response. ------------------------------------------------------------ After DILX has been started, the following message and prompt is displayed: It is recommended that DILX only be run when there is no host activity present on the HSZ-series controller. Do you want to continue (y/n) [n] ? The following message describing the Auto-Configure option is displayed: The Auto-Configure option will automatically select, for testing, all of the disk units configured. It will perform a very thorough test with *WRITES* enabled. The user will only be able to select the run time and performance summary options. The user will not be able to specify specific units to test. The Auto-Configure option is only recommended for initial installations. It is the first question asked. Do you wish to perform an Auto-Configure (y/n) [n] ? Explanation: Enter ``Y'' if you wish to invoke the Auto-Configure option. After the Auto-Configure option is selected, DILX will display the following caution statement: **CAUTION** All data on the Auto-Configured disks will be destroyed. You *MUST* be sure of yourself. Are you sure you want to continue (y/n) [n] ? Explanation: This question is self explanatory. Use All Defaults and Run in Read Only Mode (y/n)[y]? Explanation: Enter ``Y'' to use the defaults for DILX, run in read-only mode, and most of the other DILX questions are not asked. Enter ``N'' and the defaults are not used. You must then answer each question as it is displayed. The following defaults are assumed for all units selected for testing: · Execution time limit = 10 minutes. · Performance summary interval = 10 minutes. · Displaying sense data for hard or soft errors is disabled. · The hard error limit = 65535. Testing will stop if the limit is reached. · The I/O queue depth = 4. A maximum of 4 I/Os will be outstanding at any time. · The Selected Test = the Basic Function test. · Read-only mode. · All user available LBNs are available for testing. Diagnostics, Exercisers, and Utilities 6-53 · Data compares are disabled. Enter the execution time limit in minutes (1:65535)[10]? Explanation: Enter the desired time you want DILX to run. The default run time is 10 minutes. Enter performance summary interval in minutes (1:65535)[10]? Explanation: Enter a value to set the interval for which a performance summary is displayed. The default is 10 minutes. Include performance statistics in performance summary (y/n)[n]? Explanation: Enter ``Y'' to see a performance summary that includes the performance statistics that include the total count of read and write I/O requests and the kilobytes transferred for each command type. Enter ``N'' and no performance statistics are displayed. Display hard/soft errors (y/n)[n]? Explanation: Enter ``Y'' to enable displays of sense data and deferred errors. Enter ``N'' to disable error reporting. The default is disabled error reporting. When the hard error limit is reached, the unit will be dropped from testing. Enter hard error limit (1:65535) [65535] ? Explanation: Enter a value to specify the hard error limit for all units to test. This question is used to obtain the hard error limit for all units under test. If the hard error limit is reached, DILX discontinues testing the unit that reaches the hard error limit. If other units are currently being tested by DILX, testing continues for those units. When the soft error limit is reached, soft errors will no longer be displayed but testing will continue for the unit. Enter soft error limit (1:65535) [32] ? Explanation: Enter a value to specify the soft error limit for all units under test. When the soft error limit is reached, soft errors are no longer displayed, but testing continues for the unit. Enter IO queue depth (1:12) [4]? Explanation: Enter the maximum number of outstanding I/Os for each unit selected for testing. The default is 4. Enter unit number to be tested? Explanation: Enter the unit number for the unit to be tested. ------------------------------------------------------------ Note ------------------------------------------------------------ When DILX asks for the unit number, it requires the number designator for the disk, where D117 would be specified as unit number 117. ------------------------------------------------------------ Unit x will be write enabled. Do you still wish to add this unit (y/n) [n]? Explanation: This is a reminder of the consequences of testing a unit while it is write enabled. This is the last chance to back out of testing the displayed unit. Enter ``Y'' to write enable the unit. Enter ``N'' to back out of testing that unit. 6-54 Diagnostics, Exercisers, and Utilities Select another unit (y/n) [n]? Explanation: Enter ``Y'' to select another unit for testing. Enter ``N'' to begin testing the units already selected. The system will display the following test selections: ***Available tests are: 1. Basic Function 2. User Defined Test Use the Basic Function 99.9% of the time. The User Defined test is for special problems only. Enter test number (1:2) [1]? Explanation: Enter ``1'' for the Basic Function test or ``2'' for the User- Defined test. After selecting a test, the system will then display the following messages: In the User-Defined test, you may define up to 20 commands. They will be executed in the order entered. The commands will be repeated until the execution time limit expires. ** CAUTION ** If you define write commands, user data will be destroyed. Enter command number x (read, write, quit) [ ]? Explanation: This question only applies to the User-Defined test. It allows you to define command x as a read or write command. Enter quit to finish defining the test. After making your command selection(s), the following message is displayed by DILX: * IMPORTANT * If you answer yes to the next question, user data WILL BE destroyed. Write enable disk unit (y/n) [n] ? Explanation: Enter ``Y'' to write enable the unit. Write commands are enabled for the currently selected test. Data within your selected LBN range will be destroyed. Be sure of your actions before answering this question. This question applies to all DILX tests. Enter ``N'' to enable read only mode, where read and access commands are the only commands enabled. Perform initial write (y/n) [n] ? Explanation: Enter ``Y'' to write to the entire user-selected LBN range with the user-selected data patterns. Enter ``N'' for no initial write pass. If you respond with ``Y'', the system performs writes starting at the lowest user-selected LBN and issues spiral I/Os with the largest byte count possible. This continues until the specified LBN range has been completely written. Upon completion of the initial write pass, normal functions of the Random I/O phase start. The advantage of selecting the initial write pass is that compare host data commands can then be issued and the data previously written to the media can be verified for accuracy. It makes sure that all LBNs within the selected range are accessed by DILX. Diagnostics, Exercisers, and Utilities 6-55 The disadvantage of using the initial write pass is that it may take a long time to complete because a large LBN range was specified. You can bypass this by selecting a smaller LBN range, but this creates another disadvantage in that the entire disk space is not tested. The initial write pass only applies to the Basic Function test. The write percentage will be set automatically. Enter read percentage for random IO and data intensive phase (0:100) [67] ? Explanation: This question is displayed if read/write mode is selected. It allows you to select the read/write ratio to use in the Random I/O and Data Intensive phases. The default read/write ratio is similar to the I/O ratio generated by a typical OpenVMS system. Enter data pattern number 0=all, 19=user_defined, (0:19) [0] ? Explanation: The DILX data patterns are used in write commands. This question is displayed when writes are enabled for the Basic Function or User-Defined tests. There are 18 unique data patterns to select from. These patterns were carefully selected as worst case or most likely to produce errors for disks connected to the controller. (See Section 6.4.8 for a list of data patterns.) The default uses all 18 patterns in a random method. This question also allows you to create a unique data pattern of your own choice. Enter the 8-digit hexadecimal user defined data pattern [ ] ? Explanation: This question is only displayed if you choose to use a User- Defined data pattern for write commands. The data pattern is represented in a longword and can be specified with eight hexadecimal digits. Enter start block number (0:highest_lbn_on_the_disk) [0] ? Explanation: Enter the starting block number of the area on the disk you wish DILX to test. Zero is the default. Enter end block number (starting_lbn:highest_lbn_on_the_disk) [highest_lbn_on_the_disk] ? Explanation: Enter the highest block number of the area on the disk you wish DILX to test. The highest block number (of that type of disk) is the default. Perform data compare (y/n) [n] ? Explanation: Enter ``Y'' to enable data compares. Enter ``N'' and no data compare operations are done. This question is only asked if you select the initial write option. Data compares are only performed on reads. This option can be used to test data integrity. Enter compare percentage (1:100) [5] ? Explanation: This question is displayed only if you choose to perform data compares. This question allows you to change the percentage of read and write commands that will have a data compare operation performed. Enter a value indicating the compare percentage. The default is 5. 6-56 Diagnostics, Exercisers, and Utilities Enter command number x (read, write, quit) [ ] ? Explanation: This question only applies to the User-Defined test. It allows you to define command x as a read, write, access, or erase command. Enter quit to finish defining the test. Enter starting LBN for this command (0:highest_lbn_on_the_disk) [ ] ? Explanation: This question only applies to the User-Defined test. It allows you to set the starting LBN for the command currently being defined. Enter the starting LBN for this command. Enter the IO size in 512 byte blocks for this command (1:size_in_blocks) [ ] ? Explanation: This question only applies to the User-Defined test. It allows you to set the I/O size in 512-byte blocks for the command currently being defined. Enter values indicating the I/O size for this command. Reuse parameters (stop, continue, restart, change_unit) [stop] ? Explanation: This question is displayed after the DILX execution time limit expires, after the hard error limit is reached for every unit under test, or after you enter Ctrl/C. These options are as follow: · Stop--DILX terminates normally. · Continue--DILX resumes execution without resetting the remaining DILX execution time or any performance statistics. If the DILX execution time limit has expired, or all units have reached their hard error limit, DILX terminates. · Restart--DILX resets all performance statistics and restarts execution so that the test will perform exactly as the one that just completed. However, there is one exception. If the previous test was the Basic Function test with the initial write pass and the initial write pass completed, the initial write pass is not performed when the test is restarted. · Change_unit--DILX allows you to drop or add units to testing. For each unit dropped, another unit must be added until all units in the configuration have been tested. The unit chosen will be tested with the same parameters that were used for the unit that was dropped from testing. When you have completed dropping and adding units, all performance statistics are initialized and DILX execution resumes with the same parameters as the last run. Drop unit #x (y/n) [n] ? Explanation: This question is displayed if you choose to change a unit as an answer to the ``reuse parameters'' (previous) question. Enter the unit number that you wish to drop from testing. The new unit will be write enabled. Do you wish to continue (y/n) [n] ? Explanation: This question is displayed if you choose to change a unit as an answer to the ``reuse parameters'' question. It is only asked if the unit being dropped was write enabled. This question gives you the chance to terminate DILX testing if you do not want data destroyed on the new unit. Enter ``N'' to terminate DILX. Diagnostics, Exercisers, and Utilities 6-57 6.4.5 DILX Output Messages The following message is displayed when DILX is started: Copyright © Digital Equipment Corporation 1993 Disk Inline Exerciser - version 1.4 This message identifies the internal program as DILX and gives the DILX software version number. Change Unit is not a legal option if Auto-Configure was chosen. Explanation: This message will be displayed if the user selected the Auto- Configure option and selected the ``change unit response'' to the ``reuse parameters'' question. You cannot drop a unit and add a unit if all units were selected for testing. DILX - Normal Termination. Explanation: This message is displayed when DILX terminates under normal conditions. Insufficient resources. Explanation: Following this line is a second line that gives more information about the problem, which could be one of the following messages: · Unable to allocate memory. DILX was unable to allocate the memory it needed to perform DILX tests. You should run DILX again but choose a lower queue depth and/or choose fewer units to test. · Cannot perform tests. DILX was unable to allocate all of the resources needed to perform DILX tests. You should run DILX again but choose a lower queue depth and/or choose fewer units to test. · Unable to change operation mode to maintenance. DILX tried to change the operation mode from normal to maintenance using the SYSAP$CHANGE_STATE( ) routine but was not successful due to insufficient resources. This problem should not occur. If it does occur, submit a CLD (error report), then reset the controller. Disk unit x does not exist. Explanation: An attempt was made to allocate a unit for testing that does not exist on the controller. Unit x successfully allocated for testing. Explanation: All processes that DILX performs to allocate a unit for testing, have been completed. The unit is ready for DILX testing. Unable to allocate unit. Explanation: This message should be preceded by a reason why the unit could not be allocated for DILX testing. 6-58 Diagnostics, Exercisers, and Utilities DILX detected error, code x. Explanation: The ``normal'' way DILX recognizes an error on a unit is through the reception of SCSI sense data. This loosely corresponds to an MSCP error log. However, the following are some errors that DILX will detect using internal checks without SCSI sense data: · Illegal Data Pattern Number found in data pattern header. Unit x This is code 1. DILX read data from the disk and found that the data were not in a pattern that DILX previously wrote to the disk. · No write buffers correspond to data pattern Unit x. This is code 2. DILX read a legal data pattern from the disk at a place where DILX wrote to the disk, but DILX does not have any write buffers that correspond to the data pattern. Thus, the data have been corrupted. · Read data do not match what DILX thought was written to the media. Unit x. This is code 3. DILX writes data to the disk and then reads it and compares it against what was written to the disk. This indicates a compare failure. More information is displayed to indicate where in the data buffer the compare failed and what the data were and should have been. DILX terminated. A termination, a print summary or a reuse parameters request was received but DILX is currently not testing any units. Explanation: You entered a Ctrl/Y (termination request), a Ctrl/G (print summary request) or a Ctrl/C (reuse parameters request) before DILX had started to test units. DILX cannot satisfy the second two requests so DILX treats all of these requests as a termination request. DILX will not change the state of a unit if it is not NORMAL. Explanation: DILX cannot allocate the unit for testing because it is already in Maintenance mode. (Maintenance mode can only be invoked by the firmware. If another DILX session is in use, the unit is considered in Maintenance mode.) Unable to bring unit online. Explanation: This message is self explanatory. Soft error reporting disabled. Unit x. Explanation: This message indicates that the soft error limit has been reached and therefore no more soft errors will be displayed for this unit. Hard error limit reached, unit x dropped from testing. Explanation: This message indicates that the hard error limit has been reached and the unit is dropped from testing. Soft error reporting disabled for controller errors. Explanation: This indicates that the soft error limit has been reached for controller errors. Thus, controller soft error reporting is disabled. Diagnostics, Exercisers, and Utilities 6-59 Hard error limit reached for controller errors. All units dropped from testing. Explanation: This message is self explanatory. Unit is already allocated for testing. Explanation: This message is self explanatory. No drives selected. Explanation: DILX parameter collection was exited without choosing any units to test. Maximum number of units are now configured. Explanation: This message is self explanatory. (Testing will start after this message is displayed.) Unit is write protected. Explanation: The user wants to test a unit with write and/or erase commands enabled but the unit is write protected. The unit status and/or the unit device type has changed unexpectedly. Unit x dropped from testing. Explanation: The unit status may change if the unit experienced hard errors or if the unit is disconnected. Either way, DILX cannot continue testing the unit. Last Failure Information follows. This error was NOT produced by running DILX. It represents the reason why the controller crashed on the previous controller run. Explanation: This message may be displayed while allocating a unit for testing. It does not indicate any reason why the unit is or is not successfully allocated, but rather represents the reason why the controller went down in the previous run. The information that follows this message is the contents of an EIP. Disk unit numbers on this controller include: Explanation: After this message is displayed, a list of disk unit numbers on the controller is displayed. IO to unit x has timed out. DILX aborting. Explanation: One of the DILX I/Os to this unit did not complete within the command timeout interval and when examined, was found not progressing. This indicates a failing controller. DILX terminated prematurely by user request. Explanation: A Ctrl/Y was entered. DILX interprets this as a request to terminate. This message is displayed and DILX terminates. Unit is owned by another sysap. Explanation: DILX could not allocate the unit specified because the unit is currently allocated by another system application. Terminate the other system application or reset the controller. 6-60 Diagnostics, Exercisers, and Utilities This unit is reserved. Explanation: The unit could not be allocated for testing because a host has reserved the unit. This unit is marked inoperative. Explanation: The unit could not be allocated for testing because the controller internal tables have the unit marked as inoperative. The unit does not have any media present. Explanation: The unit could not be allocated for testing because no media is present. The RUNSTOP_SWITCH is set to RUN_DISABLED. Explanation: The unit could not be allocated for testing because the RUNSTOP_SWITCH is set to RUN_DISABLED. This is enabled and disabled through the Command Line Interpreter (CLI). Unable to continue, run time expired. Explanation: A continue response was given to the ``reuse parameters'' question. This is not a valid response if the run time has expired. Reinvoke DILX. When DILX starts to exercise the disk units, the following message is displayed with the current time of day: DILX testing started at: xx:xx:xx Test will run for x minutes Type ^T(if running DILX through a VCS) or ^G(in all other cases) to get a current performance summary Type ^C to terminate the DILX test prematurely Type ^Y to terminate DILX prematurely 6.4.6 DILX Sense Data Display To interpret the sense data fields correctly, refer to SCSI-2 specifications. Example 6-16 is an example of a DILX sense data display. Example 6-16 DILX Sense Data Display Sense data in hex for unit x Sense Key x Sense ASC x Sense ASQ x Instance x Diagnostics, Exercisers, and Utilities 6-61 6.4.7 DILX Deferred Error Display Example 6-17 is an example of a DILX deferred error display. Example 6-17 DILX Deferred Error Display Deferred error detected, hard error counted against each unit. Sense Key x Sense ASC x Sense ASQ x Instance x 6.4.8 DILX Data Patterns Table 6-8 defines the data patterns used with the DILX Basic Function or User-Defined tests. There are 18 unique data patterns. These data patterns were selected as worst case, or the ones most likely to produce errors on disks connected to the controller. Table 6-8 DILX Data Patterns ------------------------------------------------------------ Pattern Number Pattern in hex ------------------------------------------------------------ 1 0000 2 8B8B 3 3333 4 3091 5, shifting 1s 0001, 0003, 0007, 000F, 001F, 003F, 007F, 00FF, 01FF, 03FF, 07FF, 0FFF, 1FFF, 3FFF, 7FFF 6, shifting 0s FIE, FFFC, FFFC, FFFC, FFE0, FFE0, FFE0, FFE0, FE00, FC00, F800, F000, F000, C000, 8000, 0000 7, alternating 1s, 0s 0000, 0000, 0000, FFFF, FFFF, FFFF, 0000, 0000, FFFF, FFFF, 0000, FFFF, 0000, FFFF, 0000, FFFF 8 B6D9 9 5555, 5555, 5555, AAAA, AAAA, AAAA, 5555, 5555, AAAA, AAAA, 5555, AAAA, 5555, AAAA, 5555, AAAA, 5555 10 DB6C 11 2D2D, 2D2D, 2D2D, D2D2, D2D2, D2D2, 2D2D, 2D2D, D2D2, D2D2, 2D2D, D2D2, 2D2D, D2D2, 2D2D, D2D2 12 6DB6 13, ripple 1 0001, 0002, 0004, 0008, 0010, 0020, 0040, 0080, 0100, 0200, 0400, 0800, 1000, 2000, 4000, 8000 14, ripple 0 FIE, FFFD, FFFB, FFF7, FFEF, FFDF, FFBF, FF7F, FEFF, FDFF, FBFF, F7FF, EFFF, BFFF, DFFF, 7FFF 15 DB6D, B6DB, 6DB6, DB6D, B6DB, 6DB6, DB6D, B6DB, 6DB6, DB6D, B6DB, 6DB6, DB6D 16 3333, 3333, 3333, 1999, 9999, 9999, B6D9, B6D9, B6D9, B6D9, FFFF, FFFF, 0000, 0000, DB6C, DB6C 17 9999, 1999, 699C, E99C, 9921, 9921, 1921, 699C, 699C, 0747, 0747, 0747, 699C, E99C, 9999, 9999 (continued on next page) 6-62 Diagnostics, Exercisers, and Utilities Table 6-8 (Cont.) DILX Data Patterns ------------------------------------------------------------ Pattern Number Pattern in hex ------------------------------------------------------------ 18 FFFF Default--Use all of the above patterns in a random method ------------------------------------------------------------ 6.4.9 Interpreting the DILX Performance Summaries A DILX performance display is produced under the following conditions: · When a specified performance summary interval elapses · When DILX terminates for any conditions except an abort · When Ctrl/G or Ctrl/T is entered The performance display has different formats depending on whether or not performance statistics are requested in the user-specified parameters and if errors are detected. The following is an example of a DILX performance display where performance statistics were not selected and where no errors were detected: DILX Summary at 18-JUN-1993 06:18:41 Test minutes remaining: 0, expired: 6 Unit 1 Total IO Requests 482 No errors detected Unit 2 Total IO Requests 490 No errors detected The following is an example of a DILX performance display where performance statistics were selected and no errors were detected: DILX Summary at 18-JUN-1993 06:18:41 Test minutes remaining: 0, expired: 6 Unit 1 Total IO Requests 482 Read Count 292 Write Count 168 KB xfer Read 7223 Write 4981 Total 12204 No errors detected The following is an example of a DILX performance display where performance statistics were not selected and where errors were detected on a unit under test: DILX Summary at 18-JUN-1993 06:18:41 Test minutes remaining: 0, expired: 6 ! Unit 10 Total IO Requests 153259 No errors detected " Unit 40 Total IO Requests 2161368 Err in Hex: IC:031A4002 PTL:04/00/00 Key:04 ASC/Q:B0/00 HC:0 SC:1 Total Errs Hard Cnt 0 Soft Cnt 1 # Unit 55 Total IO Requests 2017193 Err in Hex: IC:03094002 PTL:05/05/00 Key:01 ASC/Q:18/89 HC:0 SC:1 Err in Hex: IC:03094002 PTL:05/05/00 Key:01 ASC/Q:18/86 HC:0 SC:1 $ Total Errs Hard Cnt 0 Soft Cnt 2 where: ! Represents the unit number and the total I/O requests to this unit. Diagnostics, Exercisers, and Utilities 6-63 " Represents the unit number and total I/O requests to this unit. All values for the following codes are described in Appendix E. This also includes the following items associated with this error, and the total number of hard and soft errors for this unit: · The HSZ-series Instance code (in hex) · The Port Target LUN (PTL) · The SCSI Sense Key · The SCSI ASC and ASQ (ASC/Q) codes · The total hard and soft count for this error # Represents information about the first two unique errors for this unit. All values for the following codes are described in Appendix E. This also includes the following items associated with this error, and the total number of hard and soft errors for this unit: · The HSZ-series Instance code (in hex) · The Port Target LUN (PTL) · The SCSI Sense (Key) · The SCSI ASC and ASQ (ASC/Q) codes · The total hard and soft count for this error A line of this format may be displayed up to three times in a performance summary. There would be a line for each unique error reported to DILX for up to three errors for each unit. $ Represents the total hard and soft errors experienced for this unit. The following is an example of a DILX performance display where performance statistics were not selected and where a controller error was detected: DILX Summary at 18-JUN-1993 06:18:41 Test minutes remaining: 0, expired: 6 Cnt err in HEX IC:07080064 Key:06 ASC/Q:A0/05 HC:1 SC:0 Total Cntrl Errs Hard Cnt 1 Soft Cnt 0 Unit 1 Total IO Requests 482 No errors detected Unit 2 Total IO Requests 490 No errors detected For the previous examples, the following definitions apply. These codes are translated in Appendix E. · IC--The HSZ-series Instance code. · ASC/Q--The SCSI ASC and ASCQ code associated with this error. · HC--The hard count of this error. · SC--The soft count of this error. · PTL--The location of the unit (Port Target LUN). 6-64 Diagnostics, Exercisers, and Utilities The performance displays contain error information for up to three unique errors. Hard errors always have precedence over soft errors. A soft error represented in one display may be replaced with information on a hard error in subsequent performance displays. 6.4.10 DILX Abort Codes Table 6-9 lists the DILX abort codes and definitions. Table 6-9 DILX Abort Codes and Definitions ------------------------------------------------------------ Value Definition ------------------------------------------------------------ 1 An IO has timed out. 2 dcb_p->htb_used_count reflects an available HTB to test IOs but none could be found. 3 FAO returned either FAO_BAD_FORMAT or FAO_OVERFLOW. 4 TS$SEND_TERMINAL_DATA returned either an ABORTED or INVALID_BYTE_COUNT. 5 TS$READ_TERMINAL_DATA returned either an ABORTED or INVALID_BYTE_COUNT. 6 A timer is in an unexpected expired state that prevents it from being started. 7 The semaphore was set after a oneshot IO was issued but nothing was found in the received HTB que. 8 A termination, a print summary, or a reuse parameters request was received when DILX was not testing any units. 9 User requested an abort via ^Y. ------------------------------------------------------------ 6.4.11 DILX Error Codes Table 6-10 list the DILX error codes and definitions for DILX-detected errors. Table 6-10 DILX Error Codes and Definitions ------------------------------------------------------------ Value Definition ------------------------------------------------------------ 1 Illegal Data Pattern Number found in data pattern header. 2 No write buffers correspond to data pattern. 3 Read data do not match write buffer. ------------------------------------------------------------ 6.5 VTDPY Utility The VTDPY utility gathers and displays system state and performance information for the HS family of modular storage controllers. The information displayed includes processor utilization, host port activity and status, device state, logical unit state, and cache and I/O performance. The VTDPY utility requires a video terminal that supports ANSI control sequences, such as a VT220, VT320, or VT420 terminal. A graphics display that provides emulation of an ANSI-compatible video terminal also can be used. For DSSI- and CI-based HS controllers, VTDPY can be run on terminals either directly connected to the HS controller, or on terminals connected through a host based DUP connection. For SCSI-based HS controllers, the VTDPY utility can Diagnostics, Exercisers, and Utilities 6-65 be run only on terminals connected the the HS controller maintenance terminal port. ------------------------------------------------------------ Note ------------------------------------------------------------ VCS can only be used from a terminal attached to the EIA-423 terminal port of the controller. ------------------------------------------------------------ The VTDPY utility is conceptually based on the HSC utility of the same name. Though the information displayed differs from the HSC utility due to system implementation differences, a user familiar with the HSC utility should be able to easily understand this display terminology. The following sections show how to use the VTDPY utility. 6.5.1 How to Run VTDPY Only one VTDPY session can be run on each controller at one time. ------------------------------------------------------------ Note ------------------------------------------------------------ Prior to running VTDPY, be sure the terminal is set in NOWRAP mode. Otherwise, the top line of the display scrolls off of the screen. ------------------------------------------------------------ To initiate VTDPY from the maintenance terminal at the CLI> prompt, enter the following command: CLI>RUN VTDPY To initiate VTDPY from a virtual terminal, refer to Chapter 4. 6.5.1.1 Using the VTDPY Control Keys Use the following control key sequences to work the VTDPY display: Table 6-11 VTDPY Control Keys ------------------------------------------------------------ Control Key Sequence Function ------------------------------------------------------------ Ctrl/C Prompts for commands Ctrl/G Updates the screen (same as Ctrl/Z) Ctrl/O Pauses or resumes screen updates Ctrl/R Refreshes current screen display (same as Ctrl/W) Ctrl/W Refreshes current screen display (same as Ctrl/R) Ctrl/Y Terminates VTDPY and resets screen characteristics Ctrl/Z Updates the screen (same as Ctrl/G) ------------------------------------------------------------ ------------------------------------------------------------ Note ------------------------------------------------------------ While VTDPY and the maintenance terminal interface support passing all of the listed control characters, some host-based terminal interfaces restrict passing some of the characters. All of the listed characters have equivalent text string commands. ------------------------------------------------------------ 6-66 Diagnostics, Exercisers, and Utilities 6.5.1.2 Using the VTDPY Command Line VTDPY contains a command line interpreter that is invoked by entering Ctrl/C any time after the program has begun execution. The command line interpreter is used to modify the characteristics of the VTDPY display. Commands also exist to duplicate the function of the control keys listed in Section 6.5.1.1. Table 6-12 VTDPY Commands ------------------------------------------------------------ Command String Function ------------------------------------------------------------ DISPLAY CACHE Uses 132-column unit caching statistics display DISPLAY DEFAULT Uses default 132-column system performance display DISPLAY DEVICE Uses 132-column device performance display DISPLAY STATUS Uses 80-column controller status display EXIT Terminates program (same as QUIT) INTERVAL Changes update interval HELP Displays help message text REFRESH Refreshes the current display QUIT Terminates program (same as EXIT) UPDATE Updates screen display ------------------------------------------------------------ The keywords in the command strings can be abbreviated to the minimum number of characters that are necessary to uniquely identify the keyword. Typing a question mark (?) after a keyword causes the parser to provide a list of keywords or values that may follow the supplied keyword. The CLI is not case sensitive, so keywords may be entered in uppercase, lowercase, or mixed case. Upon successful execution of a command other than HELP, the CLI is exited and the display is resumed. Entering a carriage return without a command also exits the CLI and resumes the display. If an error occurs in the command, the user prompts for command expansion help, or the HELP command is entered, the CLI prompts for an additional command instead of returning to the display. 6.5.1.3 How to Interpret the VTDPY Display Fields This section describes the major fields in the VTDPY displays. Examples of the VTDPY screens are shown followed by an explanation of each field of the screens. Diagnostics, Exercisers, and Utilities 6-67 Figure 6-2 VTDPY Default Display for CI Controllers 6-68 Diagnostics, Exercisers, and Utilities Figure 6-3 VTDPY Default Display for DSSI Controllers Diagnostics, Exercisers, and Utilities 6-69 Figure 6-4 VTDPY Default Display for SCSI Controllers 6-70 Diagnostics, Exercisers, and Utilities Figure 6-5 VTDPY Device Performance Display Diagnostics, Exercisers, and Utilities 6-71 Figure 6-6 VTDPY Unit Cache Performance Display 6-72 Diagnostics, Exercisers, and Utilities Figure 6-7 VTDPY Brief CI Status Display Diagnostics, Exercisers, and Utilities 6-73 Figure 6-8 VTDPY Brief DSSI Status Display 6-74 Diagnostics, Exercisers, and Utilities Figure 6-9 VTDPY Brief SCSI Status Display Diagnostics, Exercisers, and Utilities 6-75 Display Header HSJ40 ! S/N: CX00000002 " SW: V14J # HW: A-02 $ VTDPY Monitor Copyright © 1994, Digital Equipment Corp.% Description This subdisplay provides title information for the display. For 132-column displays, this subdisplay will be spread across one line of the display. ! Controller model. " Controller serial number. # Controller firmware version. $ Controller hardware version. % Copyright notice. 6-76 Diagnostics, Exercisers, and Utilities Date and Time 29-JAN-1994 13:46:34 ! Up: 1 3:45.19 " Description This subdisplay provides time information for the display. ! System date and time. This information is not displayed for SCSI-based HS controllers. " Time in days, hours, minutes, and seconds since the last controller boot. Diagnostics, Exercisers, and Utilities 6-77 Controller Performance Summary 88% I/D Hit ! 47.2% Idle " 1225 KB/S #106 Rq/S $ Description This subdisplay provides total system performance information. ! Instruction and data cache hit rate. " Policy processor idle rate. # Cumulative data transfer rate in kilobytes per second. When logical units are being displayed, this is the transfer rate between the host and the controller. When physical devices are being displayed, this is the transfer rate between the controller and the devices. $ Cumulative unit or device request rate per second. When logical units are being displayed, this is the request rate between the host and the controller. When physical devices are being displayed, this is the request rate between the controller and the devices. 6-78 Diagnostics, Exercisers, and Utilities Controller Threads Display Pr! Name" Stk/Max# Typ$ Sta% CPU%& 0 NULL 0/ 0 Rn 47.2 3 HPT 40/ 7 FNC Rn 40.3 8 VTDPY 10/ 3 DUP Rn 0.1 18 FMTHRD 10/ 2 FNC Bl 0.0 19 DS_HB 10/ 2 FNC Bl 0.0 20 DUP 10/ 2 FNC Bl 1.3 21 SCS 10/ 2 FNC Bl 0.0 22 MSCP 20/ 6 FNC Bl 0.0 24 VA 10/ 3 FNC Bl 1.2 25 DS_1 40/ 6 FNC Rn 8.9 26 DS_0 20/ 4 FNC Bl 0.0 27 HIS 10/ 2 FNC Bl 0.0 28 CLIMAIN 16/ 6 FNC Bl 0.0 30 FOC 16/ 4 FNC Bl 0.0 31 DUART 10/ 2 FNC Bl 0.0 Description This display shows the status and characteristics of the active threads in the controller. Threads that are not active, such as DUP Local Program threads, will not be displayed until they become active. If the number of active threads exceeds the available space, not all of them will be displayed. ! The Pr column lists the thread priority. The higher the number, the higher the priority. " The Name column contains the thread name. For DUP Local Program threads, this is the name used to invoke the program. # The Stk column lists the allocated stack size in 512-byte pages. The Max column lists the number of stack pages actually used. $ The Typ column lists the thread type. The following thread types may be displayed: · FNC--Functional thread. Those threads that are started when the controller boots and never exit. · DUP--DUP Local Program threads. These threads are only active when run either from a DUP connection or through the command line interpreter 's RUN command. · NULL--The NULL thread does not have a thread type because it is a special type of thread that only executes when no other thread is executable. % The Sta column lists the current thread state. The following thread states may be displayed: · Bl--The thread is blocked waiting for timer expiration, resources, or a synchronization event. · Io--A DUP Local Program is blocked waiting for terminal I/O completion. · Rn--The thread is currently executable. Diagnostics, Exercisers, and Utilities 6-79 & The CPU% column lists the percentage of execution time credited to each thread since the last screen update. The values may not add up to exactly 100 percent due to both rounding errors and the fact that there may not be enough room to display all of the threads. An unexpected amount of time may be credited to some threads because the controller 's firmware architecture allows code from one thread to execute in the context of another thread without a context switch. Table 6-13 describes the processes that may be displayed in the active thread display. ------------------------------------------------------------ Note ------------------------------------------------------------ It is possible that different versions of the controller firmware will have different threads or different names for the threads. ------------------------------------------------------------ Table 6-13 Thread Description ------------------------------------------------------------ Thread Name Description ------------------------------------------------------------ CLI A local program that provides an interface to the controller 's command line interpreter thread. CLIMAIN The command line interpreter (CLI) thread. CONFIG A local program that locates and adds devices to an HS array controller configuration. DILX A Local Program that exercises disk devices. DIRECT A local program that returns a listing of available Local Programs. DS_0 Device error recovery management thread. DS_1 The thread that handles successful completion of physical device requests. DS_HB The thread that manages the device and controller error indicator lights and port reset buttons. DUART The console terminal interface thread. DUP The DUP protocol server thread. FMTHREAD The thread that performs error log formatting and fault reporting for the controller. FOC The thread that manages communication between the controllers in a dual-controller configuration. HIS The SCS protocol interface thread for CI and DSSI controllers. HPT The thread that handles interaction with the host port logic and PPD protocol for CI and DSSI controllers. MSCP The MSCP and TMSCP protocol server thread. NULL The process that is scheduled when no other process can be run. NVFOC The thread that initiates state change requests for the other controller in a dual-controller configuration. REMOTE The thread that manages state changes initiated by the other controller in a dual-controller configuration. RMGR The thread that manages the data buffer pool. (continued on next page) 6-80 Diagnostics, Exercisers, and Utilities Table 6-13 (Cont.) Thread Description ------------------------------------------------------------ Thread Name Description ------------------------------------------------------------ SCS The SCS directory thread. SCSIVT A thread that provides a virtual terminal connection to the CLI over the host SCSI bus. SHIS The host SCSI protocol interface thread for SCSI controllers. TILX A Local Program that exercises tape devices. VA The thread that provides host protocol independent logical unit services. VTDPY A Local Program thread that provides a dynamic display of controller configuration and performance information. ------------------------------------------------------------ Diagnostics, Exercisers, and Utilities 6-81 CI/DSSI Host Port Characteristics Node HSJ501 ! Port 13 " SysId 4200100D0720 # Description This subdisplay shows the current host port identification information. This subdisplay is only available for CI- or DSSI-based controllers. ! SCS node name. " Port number. # SCS system ID. 6-82 Diagnostics, Exercisers, and Utilities SCSI Host Port Characteristics Xfer Rate T!W"I#Mhz$ 1 W 7 10.00 2 W Async% Description This subdisplay shows the current host port SCSI target identification, any initiator that has negotiated synchronous transfers, and the negotiated transfer method currently in use between the controller and the initiators. This subdisplay is only available for SCSI-based HS controllers. ! SCSI host port target ID. " Transfer width. W indicates 16-bit or wide transfers are being used. A space indicates 8-bit transfers are being used. # The initiator with which synchronous commication has been negotiated. $ A numeric value indicates the synchronous data rate that has been negotiated with the initiator at the specified SCSI ID. The value is listed in megahertz (Mhz). In this example, the negotiated synchronous transfer rate is approximately 3.57 Mhz. To convert this number to the nanosecond period, invert and multiply by 1000. The period for this is approximately 280 nanoseconds. % Async indicates communication between this target and all initiators is being done in asynchronous mode. This is the default communication mode and will be used unless the initiator successfully negotiates for synchronous communications. If there is no communication with a given target ID, the communication mode will be listed as asynchronous. Diagnostics, Exercisers, and Utilities 6-83 CI Performance Display Path A Pkts Pkts/S RCV 5710 519 ! ACK 11805 1073 " NAK 2073 188 # NOR 1072 97 $ Path B Pkts Pkts/S RCV 5869 533 ACK 11318 1028 NAK 2164 196 NOR 445 40 Description This display indicates the number of packets sent and received over each CI path and the packet rate. This display is only available on CI-based controllers. ! Packets received from a remote node. " Packets sent to a remote node that were acknowledged (ACK). # Packets sent to a remote node that were not acknowledged (NAK). $ Packets sent to a remote node for which no response was received. 6-84 Diagnostics, Exercisers, and Utilities DSSI Performance Display DSSI Pkts Pkts/S RCV 5710 519 ! ACK 11805 1073 " NAK 2073 188 # NOR 1072 97 $ Description This display indicates the number of packets sent and received through the DSSI port and the packet rate. This display is only available on DSSI-based controllers. ! Packets received from a remote node. " Packets sent to a remote node that were acknowledged (ACK). # Packets sent to a remote node that were not acknowledged (NAK). $ Packets sent to a remote node for which no response was received. Diagnostics, Exercisers, and Utilities 6-85 CI/DSSI Connection Status Connections 0123456789 ! 0........MM " 1..C.MV.... 2.......... 3.. Description This display shows the current status of any connections to a remote CI or DSSI node. This display is available only on CI- and DSSI-based controllers. ! Each position in the data field represents one of the possible nodes to which the controller can communicate. To locate the connection status for a given node, use the column on the left to determine the high order digit of the node number and use the second row to determine the low order digit. For CI controllers, the number of nodes displayed is determined by the controllers MAX NODE parameter. The maximum supported value for this parameter is 32. For DSSI controllers, the number of nodes is fixed at 8. " Each location in the grid contains a character to indicate the connection status: · C indicates one connection to that node. In this example, node 12 shows one connection. This usually happens if a host has multiple adaptors and is using more than one adaptor for load balancing. · M indicates multiple connections to that node. Because each host system can make a separate connection to each of the disk, tape, and DUP servers, this field frequently shows multiple connections to a host system. In this example, nodes 8, 9, and 14 show multiple connections. · V indicates that only a virtual circuit is open and no connection is present. This happens prior to establishing a connection. It also will happen when there is another controller on the same network and when there are systems with multiple adaptors connected to the same network. Node 15 demonstrates this principle. · If a period ``.'' is in a position corresponding to a node, that node does not have any virtual circuits or connections to this controller. · A space indicates the address is beyond the visible node range for this controller. 6-86 Diagnostics, Exercisers, and Utilities CI/DSSI Host Path Status Path Status 0123456789 ! 0........^^ " 1..A.B^.... 2.......X.. 3.. Description This display indicates the path status to any system for which a virtual circuit exists. This display is available only on CI- and DSSI-based controllers. ! Each position in the data field represents one of the possible nodes to which the controller can communicate. To locate the path status for a given node, use the column on the left to determine the high order digit of the node number and use the second row to determine the low order digit. For CI controllers, the number of nodes displayed is determined by the controllers MAX NODE parameter. The maximum supported value for this parameter is 32. For DSSI controllers, the number of nodes is fixed at 8. " Each location in the grid contains a character to indicate the path status: · A indicates only CI path A is functioning properly. In this example, node 12 demonstrates this. This value will not be displayed for DSSI-based controllers. · B indicates only CI path B is functioning properly. In this example, node 14 demonstrates this. This value will not be displayed for DSSI-based controllers. · X indicates the CI cables are crossed. In this example, node 27 demonstrates this. This value will not be displayed for DSSI-based controllers. · ^ indicates the single DSSI path or both CI paths are functioning properly. In this example, nodes 8, 9, and 15 demonstrate this. · If a period ``.'' is in a position corresponding to a node, that node does not have any virtual circuits or connections to this controller so either the path status cannot be determined, or neither path is functioning properly. · A space indicates the address is beyond the visible node range for this controller. Diagnostics, Exercisers, and Utilities 6-87 Device SCSI Status Target 01234567 ! P1 DDDDFhH " o2TTT T hH r3DDD hH t4DDDDDDhH 5DDDD hH 6 hH # Description This display shows what devices the controller has been able to identify on the device busses. ------------------------------------------------------------ Note ------------------------------------------------------------ The controller will not look for devices that are not configured into the nonvolatile memory using the CLI ADD command. ------------------------------------------------------------ ! The column headings indicate the SCSI target numbers for the devices. SCSI Targets are in the range 0 through 7. Target 7 is always used by a controller. In a dual controller configuration, target 6 is used by the second controller. " The device grid contains a letter signifying the device type in each port/target location where a device has been found: · C indicates a CDROM device. · D indicates a disk device. · F indicates a device type not listed above. · H indicates bus position of this controller. · h indicates bus position of the other controller. · L indicates a media loader. · T indicates a tape device. · A period ``.'' indicates the device type is unknown. · A space indicates there is no device configured at this location. # This subdisplay contains a row for each SCSI device port supported by the controller. The subdisplay for a controller that has six SCSI device ports is shown. 6-88 Diagnostics, Exercisers, and Utilities Unit Status (abbrievated) Unit! ASWC" KB/S# Rd%$ Wr%% Cm%& HT%' D0110 a^ r 0 0 0 0 0 D0120 a^ r 0 0 0 0 0 D0130 o^ r 236 100 0 0 100 T0220 av 0 0 0 0 0 T0230 o^ 123 0 100 0 0 Description This subdisplay shows the status of the logical units that are known to the controller firmware. It also indicates performance information for the units. Up to 42 units may be displayed in this subdisplay. ! The Unit column contains a letter indicating the type of unit followed by the unit number of the logical unit. The list is sorted by unit number. There may be duplication of unit numbers between devices of different types. If this happens, the order of these devices is arbitrary. The following device type letters that may be displayed are as follow: · D indicates a disk device. · T indicates a tape device. · L indicates a media loader. · C indicates a CDROM device. · F indicates a device type not listed above. · U indicates the device type is unknown. " The ASWC columns indicate respectively the availability, spindle state, write protect state, and cache state of the logical unit. The availability state is indicated using the following letters: · a--Available. Available to be mounted by a host system. · d--Offline, Disabled by Digital Multivendor Services. The unit has been disabled for service. · e--Online, Exclusive Access. Unit has been mounted for exclusive access by a user. · f--Offline, Media Format Error. The unit cannot be brought available due to a media format inconsistancy. · i--Offline, Inoperative. The unit is inoperative and cannot be brought available by the controller. · m--Offline, Maintenance. The unit has been placed in Maintenance mode for diagnostic or other purposes. · o--Online. Mounted by at least one of the host systems. · r--Offline, Rundown. The CLI SET NORUN command has been issued for this unit. · v--Offline, No Volume Mounted. The device does not contain media. · x--Online to other controller. Not available for use by this controller. Diagnostics, Exercisers, and Utilities 6-89 · A space in this column indicates the availability is unknown. The spindle state is indicated using the following characters: · ^--For disks, this symbol indicates the device is at speed. For tapes, it indicates the tape is loaded. · >--For disks, this symbol indicates the device is spinning up. For tapes, it indicates the tape is loading. · <--For disks, this symbol indicates the device is spinning down. For tapes, it indicates the tape is unloading. · v--For disks, this symbol indicates the device is stopped. For tapes, it indicates the tape is unloaded. · For other types of devices, this column is left blank. For disks and tapes, a w in the write protect column indicates the unit is write protected. This column is left blank for other device types. The data caching state is indicated using the following letters: · r--Read caching is enabled. · A space in this column indicates caching is disabled. # KB/S--This column indicates the average amount of kilobytes of data transferred to and from the unit in the previous screen update interval. This data is only available for disk and tape units. $ Rd%--This column indicates what percentage of data transferred between the host and the unit were read from the unit. This data is only contained in the DEFAULT display for disk and tape device types. % Wr%--This column indicates what percentage of data transferred between the host and the unit were written to the unit. This data is only contained in the DEFAULT display for disk and tape device types. & Cm%--This column indicates what percentage of data transferred between the host and the unit were compared. A compare operation may be accompanied by either a read or a write operation, so this column is not cumulative with read percentage and write percentage columns. This data is only contained in the DEFAULT display for disk and tape device types. ' HT%--This column indicates the cache hit percentage for data transferred between the host and the unit. 6-90 Diagnostics, Exercisers, and Utilities Unit Status (full) Unit! ASWC" KB/S# Rd%$ Wr%% Cm%& HT%' PH%( MS%) Purge+> BlChd+? BlHit+@ D0003 o^ r 382 0 100 0 0 0 0 0 6880 0 D0250 o^ r 382 100 0 0 0 0 100 0 6880 0 D0251 o^ r 284 100 0 0 0 0 100 0 5120 0 D0262 a^ r 0 0 0 0 0 0 0 0 0 0 D0280 o^ r 497 44 55 0 0 0 100 0 9011 0 D0351 a^ r 0 0 0 0 0 0 0 0 0 0 D0911 a^ r 0 0 0 0 0 0 0 0 0 0 D1000 a^ r 0 0 0 0 0 0 0 0 0 0 Description This subdisplay shows the status of the logical units that are known to the controller firmware. It also shows I/O performance information and caching statistics for the units. Up to 42 units may be displayed in this subdisplay. ! The Unit column contains a letter indicating the type of unit followed by the unit number of the logical unit. The list is sorted by unit number. There may be duplication of unit numbers between devices of different types. If this happens, the order of these devices is arbitrary. The device type letters that may displayed are as follow: · D indicates a disk device. · T indicates a tape device. · L indicates a media loader. · C indicates a CDROM device. · F indicates a device type not listed above. · U indicates the device type is unknown. " The ASWC columns indicate the availability, spindle state, write protect state, and cache state respectively of the logical unit. The availability state is indicated using the following letters: · a--Available. Available to be mounted by a host system. · d--Offline, Disabled by Digital Multivendor Services. The unit has been disabled for service. · e--Online, Exclusive Access. Unit has been mounted for exclusive access by a user. · f--Offline, Media Format Error. The unit cannot be brought available due to a media format inconsistancy. · i--Offline, Inoperative. The unit is inoperative and cannot be brought available by the controller. · m--Offline, Maintenance. The unit has been placed in maintenance mode for diagnostic or other purposes. · o--Online. Mounted by at least one of the host systems. · r--Offline, Rundown. The CLI SET NORUN command has been issued for this unit. Diagnostics, Exercisers, and Utilities 6-91 · v--Offline, No Volume Mounted. The device does not contain media. · x--On line to other controller. Not available for use by this controller. · A space in this column indicates the availability is unknown. The spindle state is indicated using the following characters: · ^--For disks, this symbol indicates the device is at speed. For tapes, it indicates the tape is loaded. · >--For disks, this symbol indicates the device is spinning up. For tapes, it indicates the tape is loading. · <--For disks, this symbol indicates the device is spinning down. For tapes, it indicates the tape is unloading. · v--For disks, this symbol indicates the device is stopped. For tapes, it indicates the tape is unloaded. · For other types of devices, this column is left blank. For disks and tapes, a w in the write protect column indicates the unit is write protected. This column is left blank for other device types. The data caching state is indicated using the following letters: · r--Read caching is enabled. · A space in this column indicates caching is disabled. # KB/S--This column indicates the average amount of kilobytes of data transferred to and from the unit in the previous screen update interval. This data is only available for disk and tape units. $ Rd%--This column indicates what percentage of data transferred between the host and the unit were read from the unit. This data is only contained in the DEFAULT display for disk and tape device types. % Wr%--This column indicates what percentage of data transferred between the host and the unit were written to the unit. This data is only contained in the DEFAULT display for disk and tape device types. & Cm%--This column indicates what percentage of data transferred between the host and the unit were compared. A compare operation may be accompanied by either a read or a write operation, so this column is not cumulative with read percentage and write percentage columns. This data is only contained in the DEFAULT display for disk and tape device types. ' HT%--This column indicates the cache hit percentage for data transferred between the host and the unit. ( PH%--This column indicates the partial cache hit percentage for data transferred between the host and the unit. ) MS%--This column indicates the cache miss percentage for data transferred between the host and the unit. +> Purge--This column shows the number of blocks purged from the cache in the last update interval. +? BlChd--This column shows the number of blocks added to the cache in the last update interval. 6-92 Diagnostics, Exercisers, and Utilities +@ BlHit--This column shows the number of cached data blocks "hit" in the last update interval. Diagnostics, Exercisers, and Utilities 6-93 Device Status PTL! ASWF" Rq/S# RdKB/S$ WrKB/S% Que& Tg' CR( BR) TR+> D100 A^ 0 0 0 11 0 0 0 0 D120 A^ 0 0 0 0 0 0 0 0 D140 A^ 0 0 0 0 0 0 0 0 D210 A^ 11 93 0 1 1 0 0 0 D230 A^ 0 0 0 0 0 0 0 0 D300 A^ 11 93 0 2 1 0 0 0 D310 A^ 0 0 0 0 0 0 0 0 D320 A^ 36 247 0 12 10 0 0 0 D400 A^ 11 93 0 2 1 0 0 0 D410 A^ 0 0 0 0 0 0 0 0 D420 A^ 36 247 0 10 8 0 0 0 D430 A^ 0 0 0 0 0 0 0 0 D440 A^ 0 0 0 0 0 0 0 0 D450 A^ 0 0 0 0 0 0 0 0 D500 A^ 11 93 0 1 1 0 0 0 D510 A^ 0 0 0 0 0 0 0 0 D520 A^ 0 0 0 0 0 0 0 0 D530 A^ 47 0 375 6 5 0 0 0 Description This subdisplay shows the status of the physical storage devices that are known to the controller firmware. It also shows I/O performance information and bus statistics for these devices. Up to 42 devices may be displayed in this subdisplay. ! The PTL column contains a letter indicating the type of device followed by the SCSI Port, Target, and LUN of the device. The list is sorted by port, target, and LUN. The device type letters that may be displayed are as follow: · D indicates a disk device. · T indicates a tape device. · L indicates a media loader. · C indicates a CDROM device. · F indicates a device type not listed above. · U indicates the device type is unknown. " The ASWF columns indicate the allocation, spindle state, write protect state, and fault state respectively of the device. The availability state is indicated using the following letters: · A--Allocated to this controller. · a--Allocated to the other controller. · U--Unallocated, but owned by this controller. · u--Unallocated, but owned by the other controller. · A space in this column indicates the allocation is unknown. The spindle state is indicated using the following characters: · ^--For disks, this symbol indicates the device is at speed. For tapes, it indicates the tape is loaded. 6-94 Diagnostics, Exercisers, and Utilities · >--For disks, this symbol indicates the device is spinning up. For tapes, it indicates the tape is loading. · <--For disks, this symbol indicates the device is spinning down. For tapes, it indicates the tape is unloading. · v--For disks, this symbol indicates the device is stopped. For tapes, it indicates the tape is unloaded. · For other types of devices, this column is left blank. For disks and tapes, a W in the write protect column indicates the device is hardware write protected. This column is left blank for other device types. A F in the fault column indicates an unrecoverable device fault. If this field is set, the device fault indicator will also be illuminated. # Rq/S--This column shows the average I/O request rate for the device during the last update interval. These requests are up to 8 kilobytes long and are either generated by host requests or cache flush activity. $ RdKB/S--This column shows the average data transfer rate from the device in kilobytes during the previous screen update interval. % WrKB/S--This column shows the average data transfer rate to the device in kilobytes during the previous screen update interval. & Que--This column shows the maximum number of transfer requests waiting to be transferred to the device during the last screen update interval. ' Tg--This column shows the maximum number of transfer requests queued to the device during the last screen update interval. If a device does not support tagged queueing, the maximum value will be 1. ( CR--This column indicates the number of SCSI command resets that occurred since VTDPY was started. ) BR--This column indicates the number of SCSI bus resets that occurred since VTDPY was started. +> TR--This column indicates the number of SCSI target resets that occurred since VTDPY was started. Diagnostics, Exercisers, and Utilities 6-95 Device SCSI Port Performance Port! Rq/S" RdKB/S# WrKB/S$ CR% BR& TR' 1 0 0 0 0 0 0 2 11 93 0 0 0 0 3 48 341 0 0 0 0 4 48 340 0 0 0 0 5 58 93 375 0 0 0 6 0 0 0 0 0 0 Description This subdisplay shows the accumulated I/O performance values and bus statistics for the SCSI device ports. The subdisplay for a controller that has six SCSI device ports is shown. ! The Port column indicates the number of the SCSI device port. " Rq/S--This column shows the average I/O request rate for the port during the last update interval. These requests are up to 8 kilobytes long and are either generated by host requests or cache flush activity. # RdKB/S--This column shows the average data transfer rate from all devices on the SCSI bus in kilobytes during the previous screen update interval. $ WrKB/S--This column shows the average data transfer rate to all devices on the SCSI bus in kilobytes during the previous screen update interval. % CR--This column indicates the number of SCSI command resets that occurred since VTDPY was started. & BR--This column indicates the number of SCSI bus resets that occurred since VTDPY was started. ' TR--This column indicates the number of SCSI target resets that occurred since VTDPY was started. 6-96 Diagnostics, Exercisers, and Utilities Help Example VTDPY> HELP Available VTDPY commands: ^C - Prompt for commands ^G or ^Z - Update screen ^O - Pause/Resume screen updates ^Y - Terminate program ^R or ^W - Refresh screen DISPLAY CACHE - Use 132 column unit caching statistics display DISPLAY DEFAULT - Use default 132 column system performance display DISPLAY DEVICE - Use 132 column device performance display DISPLAY STATUS - Use 80 column controller status display EXIT - Terminate program (same as QUIT) INTERVAL - Change update interval HELP - Display this help message REFRESH - Refresh the current display QUIT - Terminate program (same as EXIT) UPDATE - Update screen display VTDPY> Description This is the sample output from executing the HELP command. Diagnostics, Exercisers, and Utilities 6-97 6.6 The CONFIG Utility The CONFIG utility locates and adds devices to the controller. You should run the CONFIG utility whenever new devices are added to the controller. The CONFIG searches all port/target/LUN device combinations to determine what devices exist on the subsystem. It adds all new devices that are found. The CONFIG utility does not initialize these devices, and it does not add units or storage sets. If a device somewhere in the cluster already has the PTL that the CONFIG utility plans to assign, the program will assign an alpha character after the numbers. For example, if another device is already called DISK100, the program will assign the name DISK100A to the new device. (The program compares DISK100A to other PTLs in the cluster, and if DISK100A has already been used, the program increments to DISK100B and so forth.) This avoids the assignment of duplicate PTLs in the same cluster. 6.6.1 Running the CONFIG Utility You can run the CONFIG utility on either a virtual terminal or on a maintenance terminal. Before running the CONFIG utility, you may use the SHOW DEVICES command to verify the list of devices that are currently configured on the controller, as shown in the following example. The example shows the CONFIG utility as it is run on an HSJ- or HSD-series controller. The text of the prompts may change slightly when run on other controllers in the HS controller family. HSJ> SHOW DEVICES No devices HSJ> RUN CONFIG Copyright © Digital Equipment Corporation 1993 Config Local Program Invoked Config will search all port/target/LUN combinations to determine what devices exist on the subsystem. It will then add all disk, tape and cdrom devices that are found. It will not initialize devices, add units or storage sets. Do you want to continue (y/n) [y] ? YES Config is building its tables and determining what devices exist on the subsystem. Please be patient. add disk DISK100 1 0 0 add disk DISK120 1 2 0 add disk DISK140 1 4 0 add disk DISK210 2 1 0 add disk DISK230 2 3 0 add disk DISK500 5 0 0 add disk DISK520 5 2 0 add tape TAPE600 6 0 0 add tape TAPE610 6 1 0 Config - Normal Termination HSJ> 6-98 Diagnostics, Exercisers, and Utilities HSJ> SHOW DEVICES Name Type Port Targ LUN Used by ------------------------------------------------------------------------------ DISK100 disk 1 0 0 DISK120 disk 1 2 0 DISK140 disk 1 4 0 DISK210 disk 2 1 0 DISK230 disk 2 3 0 DISK500 disk 5 0 0 DISK520 disk 5 2 0 TAPE600 tape 6 0 0 TAPE610 tape 6 1 0 HSJ> After you run the CONFIG utility, you may have to initialize your containers using the INITIALIZE command as described in Appendix B. Diagnostics, Exercisers, and Utilities 6-99 6.7 HSZUTIL Virtual Maintenance Terminal Application This section describes the virtual maintenance terminal application, HSZUTIL. The HSZUTIL program is a host-resident user application that provides a virtual maintenance terminal facility for communicating with an HSZ-series controller over its host SCSI bus interface. The virtual maintenance terminal communication protocol was developed explicitly for the HSZ-series controller. 6.7.1 General Implementation Considerations The HSZUTIL application is written entirely in C language. The portion of the code that is system dependent is contained in separate system-specific modules. The terminal interface uses portable C I/O functions and therefore does not support asynchronous terminal I/O. This is not a restriction of the virtual maintenance terminal protocol. SCSI commands used by the HSZUTIL application in communicating with the HSZ-series controller are as follow: TEST UNIT READY INQUIRY SEND DIAGNOSTIC RECEIVE DIAGNOSTIC RESULTS 6.7.2 Restrictions There are several restrictions that must be noted before running the HSZUTIL application, as follow: · Though the programming interface allows access to most SCSI commands, HSZUTIL is not intended to provide functions beyond those required for maintaining a virtual terminal session. The existing code contains code to several additional SCSI functions. This code is currently disabled. · The HSZUTIL application does not support the RZxx SCSI DUP protocol. 6.7.3 DEC OSF/1 for Alpha AXP Implementations The DEC OSF/1 AXP version issues SCSI commands through the CAM User Agent interface. The user identifies the HSZ-series controller through its bus, target, and LUN identifiers. The HSZ-series controller, therefore, does not need to be configured into the system prior to accessing it through HSZUTIL. SUPERUSER privilege is required to run the HSZUTIL application on DEC OSF/1 AXP. 6.7.3.1 Running HSZUTIL Under DEC OSF/1 AXP The HSZUTIL application is installed in the /USR/LOCAL/BIN directory by SETLD. The program is invoked as follows: #HSZUTIL bus target LUN where: bus is the number of the SCSI bus. target is the target ID of the HSZ-series controller. LUN is the logical unit number of one of the devices connected to the HSZ-series controller. If specified, the parameters must be specified in order. HSZUTIL prompts for missing parameters. The specified device need not be known to the operating system. To exit the program, enter Ctrl/D. 6-100 Diagnostics, Exercisers, and Utilities Control characters to be delivered to the HSZ-series controller CLI are entered by typing the ``^'' character followed by the appropriate letter. For example, Ctrl/G would be entered as ``^G''. 6.7.4 Description of HSZ-series Controller Virtual Terminal Protocol Diagnostic Pages Figures 6-10 and 6-11 present the formats of both the send and receive diagnostic page formats. Figure 6-10 HSZ-series Controller CLI Send Diagnostic Page Format +----+-------------------------------+ | Bit| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |Byte| | | | | | | | | +----+-------------------------------+ | 0 | Page Code (80h) | CLI Data page (vendor specific) +----+-------------------------------+ | 1 | Reserved | +----+-------------------------------+ | 2 |(MSB) | +----+--- Page Length (n-3) ---+ | 3 | (LSB)| +----+-------------------------------+ | 4 | CLI Cmd Code | INQUIRY (1) or ANSWER (2) +----+-------------------------------+ | 5 | Reserved | +----+-------------------------------+ | 6 | | Used for ANSWER only +- -+--- ASCII Text ---+ (132 bytes maximum) | n | | +----+-------------------------------+ Figure 6-11 HSZ-series Controller CLI Receive Diagnostic Page Format +----+-------------------------------+ | Bit| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |Byte| | | | | | | | | +----+-------------------------------+ | 0 | Page Code (80h) | CLI Data page (vendor specific) +----+-------------------------------+ | 1 | Reserved | +----+-------------------------------+ | 2 |(MSB) | +----+--- Page Length (n-3) ---+ | 3 | (LSB)| +----+-------------------------------+ | 4 | Status | SUCCESS (1) or INPUT_REQUESTED (2) +----+-------------------------------+ | 5 | Delay | 0.10 second delay before next cmd +----+-------------------------------+ | 6 | | (132 bytes maximum) +- -+--- ASCII Text ---+ | n | | +----+-------------------------------+ Diagnostics, Exercisers, and Utilities 6-101 6.7.5 Virtual Maintenance Terminal Communications Protocol The following sections describe the communications protocol developed to support the virtual maintenance terminal utility. 6.7.5.1 Protocol Notes The virtual maintenance terminal protocol allows asynchronous delivery of control characters using the CLI SEND DIAGNOSTIC PAGE command. The CLI Command Code field is set to ANSWER, and the control character is placed in the first byte of the ASCII text buffer. Any other characters in the ASCII text buffer are ignored. There is no fixed connection made between the host process and the HSZ-series controller. It is therefore possible to implement a host interface that allows a user to exit the host program while a program is running within the HSZ-series controller. The terminal session could be resumed at a later time. This also implies that if multiple users attempt to have simultaneous virtual terminal sessions, the resulting responses from the controller may be unpredictable. 6.7.5.2 Host Virtual Terminal I/O Algorithm Following is a description of the sequence of events that occurs in the host virtual maintenance terminal I/O algorithm: 1. Obtain the device information. 2. Enter a SCSI INQUIRY command and display the returned INQUIRY information. 3. Make sure the remote device supports the protocol's diagnostic pages by entering a SCSI RECEIVE DIAGNOSTIC RESULTS command for page 0 and comparing the received list with the virtual terminal protocol list. If the diagnostic pages are not supported, then exit. 4. Enter SCSI TEST UNIT READY commands until either the device becomes available or a failure occurs. If a failure occurs, then exit. 5. To start communication, enter Ctrl/C to place the HSZ-series controller CLI into a known state. This is done by entering a SCSI SEND DIAGNOSTIC command for the CLI DATA PAGE, with the CLI Command Code set at ANSWER and a Ctrl/C character in the first byte of the ASCII Text field. If this fails, exit. 6-102 Diagnostics, Exercisers, and Utilities 6. Process the following code: Do If a message was received from the drive, process it. If the message length is greater than 2, { Print the message. If we have a log file, log the message. If the message was a SCSI_CLI_INPUT_REQUEST, { Get terminal input If we have a log file, log the terminal input. If the first character is a '^' the user is trying to send a control character, so convert the string into the appropriate control character. If we got "End of File" on the input string, Put a ^C in the input string to abort the program. Send the input string to the remote program. } } Else { This is a keep alive message, so ignore it. } 7. If the CLI has asked for a polling delay, sleep for the delay period until End Of File is received on the terminal read or until an error occurs while communicating with the HSZ-series controller. Diagnostics, Exercisers, and Utilities 6-103 7 ------------------------------------------------------------ Removing and Replacing Field Replaceable Units This chapter describes how to remove and replace/install the following FRUs in both dual-redundant and nonredundant configurations: · Controller module (including its mounting bracket, OCP, and bezel) · Cache module · Program card · Internal host cable (CI) · External host cables (CI) · Host cable (DSSI and SCSI) · SCSI device port cables · Blowers · Power supplies ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Do not attempt to replace or repair components within FRUs or equipment damage may result. Use the controller fault indications and error logs to isolate FRU-level failures. ------------------------------------------------------------ This chapter also discusses how to warm swap controllers and storage devices. 7.1 Controller Module Servicing a controller module involves several considerations: · Diagnosing the controller · Shutting down controllers · Deciding what to replace - A nonredundant controller - One dual-redundant controller - Both dual-redundant controllers Removing and Replacing Field Replaceable Units 7-1 7.1.1 Diagnosing the Controller If you are presented with a controller failure, you should be aware of the following. Generally, if the green OCP reset (//) button is lit continuously, the controller module needs replacing. However, you need to be as familiar as possible with the failure or reason for replacing the module. Be sure you have followed troubleshooting basics: 1. Make a note of all visual indicators (OCP, device LEDs, and/or error messages) available to you. 2. Extract and read host error logs (Chapter 5). 3. Errors can be intermittent. Reset the controller to see if the error clears. 1 4. See if the error indication changes after resetting the controller. If the error remains the same, look up information for that error. If the indication changes, look up information for the newer error. Refer to Chapter 5 for detailed information about errors and repair actions. ------------------------------------------------------------ Before Proceeding ------------------------------------------------------------ You should decide exactly what you will be servicing (a nonredundant controller, one dual-redundant controller, or both dual-redundant controllers) before proceeding to the following sections, as each procedure varies and has different consequences. ------------------------------------------------------------ 7.1.2 Shutting Down a Controller Controller failures are not the only reason to remove and replace a controller module. You may be moving resources, or removing a functioning controller for use as a replacement somewhere else in your system. ------------------------------------------------------------ Note ------------------------------------------------------------ If you wish to quickly remove and replace one controller in a dual- redundant configuration, you may warm swap (see Section 7.11.2) the controller with a replacement, if you have one. This method provides the fastest, most transparent way of exchanging controllers with minimal system impact and no down time. Unless you are warm swapping a controller, you must shut down a functional controller before removing it. ------------------------------------------------------------ Use the following guidelines to shut down a controller: · Always stop all processes on, and dismount, devices attached to a controller you intend to shut down. · To enter any CLI> SHUTDOWN command, your terminal must be connected to a fully or partially functional controller. A fully functional controller 's green OCP reset (//) LED flashes at 1 Hz. A partially functional controller 's green LED may flash at 3 Hz. ------------------------------------------------------------ 1 Record which devices have lit/flashing fault LEDs before resetting, as a reset may temporarily clear the LED even though the fault remains. 7-2 Removing and Replacing Field Replaceable Units · You cannot enter CLI> SHUTDOWN commands from terminals connected to failed controllers (green LED lit continuously). - For dual-redundant configurations only: You may enter the CLI> SHUTDOWN OTHER_CONTROLLER command from a terminal connected to one of the controllers. The other (shutdown) controller 's green LED will light continuously when shutdown completes. After you shut down one controller in a dual-redundant configuration, the other surviving controller takes over service to the shut down controller 's devices. This process is called failover. - For both nonredundant and dual-redundant configurations: You may enter the CLI> SHUTDOWN THIS_CONTROLLER command from a terminal connected to the controller you want to shut down. The shutdown controller 's green LED will light continuously when shutdown completes. See Appendix B for a complete description of the SHUTDOWN command and its qualifiers. Be sure to understand the consequences to data and devices when using any qualifiers. 7.1.3 Nonredundant Controller When you replace the controller module in a nonredundant configuration, device service is interrupted for the duration of the service cycle. 2 (HSZ-series controllers) In effect, following these procedures to remove and replace an HSZ-series controller is ``warm swapping'' the controller. This is because other targets on the host SCSI bus remain unaffected. However, take care not to confuse removing and replacing an HSZ-series controller with the special warm swap procedure for HSJ-series controllers described in Section 7.11.2. 7.1.3.1 Tools Required You will need the following tools to remove or replace the controller module: · ESD strap · 3/32-inch Allen wrench · 5/32-inch Allen wrench · Flat-head screwdriver · Small flat-head screwdriver 7.1.3.2 Precautions Refer to Chapter 1 for ESD, grounding, module handling, and program card handling guidelines. Ground yourself to the cabinet grounding stud, shown in Figure 7-1, before servicing the controller module. ------------------------------------------------------------ 2 Nonredundant controllers will always be installed in slot (SCSI ID) 7. Slot 7 is the controller shelf slot furthest from the SCSI device cable connectors. Removing and Replacing Field Replaceable Units 7-3 Figure 7-1 Cabinet Grounding Stud 7.1.3.3 Module Removal Use the following procedure to remove the controller module: 1. If you have not done so already, unlock and open the cabinet doors (SW800 series) using a 5/32-inch Allen wrench. 2. Examine the green OCP reset (//) LED, shown in Figure 7-2, on the controller. If the green LED stays lit continuously after troubleshooting (refer to Section 7.1.1), the controller has failed and is already shut down. Proceed to step 6. 3. If the controller is fully or partially functioning (green LED flashing), connect a maintenance terminal to its MMJ, shown in Figure 7-2, and enter the following commands: CLI> SHOW THIS_CONTROLLER FULL CLI> SHOW DEVICES FULL CLI> SHOW UNITS FULL 7-4 Removing and Replacing Field Replaceable Units Figure 7-2 Reset LED, HSJ40 Controller 4. Record the output from the commands and keep it available for reference. ------------------------------------------------------------ Note ------------------------------------------------------------ Never remove a controller while it is still servicing devices. ------------------------------------------------------------ 5. Because the controller is still functioning, you must shut down the controller by following the guidelines listed in Section 7.1.2. ------------------------------------------------------------ Note ------------------------------------------------------------ Earlier controller models had a program card EMI shield. This shield may be discarded. ------------------------------------------------------------ 6. Unsnap and discard the program card EMI shield (if attached; see Figure 7-2). Removing and Replacing Field Replaceable Units 7-5 Figure 7-3 Eject Button, HSJ40 Controller 7. Remove the program card by pushing the eject button, shown in Figure 7-3. Pull the card out and save it for use in the replacement controller module. 8. HSJ-series: Loosen the captive screws on the CI cable connector, shown in Figure 7-3, with a flat-head screwdriver and remove the cable from the front of the controller module. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Do not remove host port cables from an HSD-series controller while the power is on to any members on the DSSI bus, including the controller and host. Doing so risks short circuits that may blow fuses on all the members. ------------------------------------------------------------ HSD-series: Turn off power to all members on the DSSI bus. Then, with a flat-head screwdriver, loosen the captive screws on the DSSI cable connector and terminator, and remove them from the trilink connector, shown in Figure 7-4. 7-6 Removing and Replacing Field Replaceable Units Figure 7-4 Trilink Connector HSZ-series: With a small flat-head screwdriver, loosen the captive screws on the trilink connector and remove the trilink from the front of the controller. You will have to work around any SCSI cable or terminator connections when removing the trilink. Do not remove cables or terminators from the trilink or you will interrupt the host SCSI bus. 9. Remove the maintenance terminal cable (if attached). 10. Loosen the four mounting screws (refer to Figure 7-3) on each side of the front bezel with a 3/32-inch Allen wrench (HSJ-series controllers) or flat-head screwdriver (HSD- and HSZ-series controllers). 11. Use a gentle up-and-down rocking motion to loosen the module from the shelf backplane. 12. Slide the module out of the shelf (noting which rails the module was seated in) and place on an approved ESD work surface or mat. 13. If necessary, you may now remove the cache module as described in Section 7.2.3. 7.1.3.4 Module Replacement/Installation Use the following procedure to replace or install the controller module: 1. You should replace the cache module now, if you removed it. See Section 7.2.4 for further information on replacing of installing the cache module. 2. Make sure the OCP cable (HSJ-series only) is correctly plugged into the underside of the module, as shown in Figure 7-5. 3. Slide the controller module into the shelf using its slot's rightmost rails as guides (see Figure 7-6). Removing and Replacing Field Replaceable Units 7-7 Figure 7-5 OCP Cable, HSJ-Series Controller 4. Use a gentle up-and-down rocking motion to help seat the module into the backplane. Press firmly on the module until it is seated. Finally, press firmly once more to make sure the module is seated. 5. Tighten the four screws on the front bezel using a 3/32-inch Allen wrench (HSJ-series controllers) or flat-head screwdriver (HSD- and HSZ-series controllers). 6. Connect a maintenance terminal to the MMJ of the new controller. ------------------------------------------------------------ Before Proceeding ------------------------------------------------------------ Set initial controller parameters by following the steps in Section 7.1.3.5. ------------------------------------------------------------ 7. Press and hold the controller 's green reset (//) button. Then insert the program card into the new controller. The program card eject button will extend when the card is fully inserted. 8. Release the reset button. 9. Enter the following command to initialize the controller: CLI> RESTART THIS_CONTROLLER If the controller initializes correctly, its green reset LED will begin to flash at 1 Hz. If an error occurs during initialization, the OCP will display a code. Refer to Chapter 5 to analyze the code. 10. If you wish, you may disconnect the maintenance terminal. The terminal is not required for normal controller operation. 7-8 Removing and Replacing Field Replaceable Units Figure 7-6 Controller Shelf Rails 11. Close and lock the cabinet doors (SW800 series) using a 5/32-inch Allen wrench. 7.1.3.5 Restoring Initial Parameters A new controller module has no initial parameters, so you must use the maintenance terminal to enter them. Refer to information in a CONFIGURATION.INFO file or on the configuration sheet packaged with your system, whichever is most current, for parameters. Be sure to use the same parameters from the removed controller when installing a replacement. Removing and Replacing Field Replaceable Units 7-9 After installation of a nonredundant controller, use the CLI to define its parameters in the following order (from a maintenance terminal). ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Do not install HSJ-series CI host port cables until after setting all parameters listed here. Failure to follow this procedure may result in adverse effects on the host/cluster. ------------------------------------------------------------ ------------------------------------------------------------ Note ------------------------------------------------------------ Not all steps are applicable to all controller models. Steps applicable to certain models are designated as such. ------------------------------------------------------------ 1. (HSD-series controller) Turn the controller on before entering parameters. 2. Enter the following command to set the MAX_NODES (HSJ-series controllers): CLI> SET THIS_CONTROLLER MAX_NODES=n where n is 8, 16, or 32. 3. Enter the following command to set a valid controller ID: CLI> SET THIS_CONTROLLER ID=n where n is the (HSJ-series controller) CI node number (0 through (MAX_NODES 1)). or n is the (HSD-series controller) one-digit DSSI node number (0 through 7). Each controller DSSI node number must be unique on its DSSI interconnect. or n is the (HSZ-series controller) SCSI target ID(s) (0 through 7). 4. Enter the following command to set the SCS node (HSJ- and HSD-series controllers): CLI> SET THIS_CONTROLLER SCS_NODENAME="xxxxxx" where xxxxxx is a one- to six-character alphanumeric name for this node. The node name must be enclosed in quotes with an alphabetic character first. Each SCS node name must be unique within its VMScluster. 3 5. Enter the following command to set the MSCP allocation class (HSJ- and HSD-series controllers): CLI> SET THIS_CONTROLLER MSCP_ALLOCATION_CLASS=n where n is 0 through 255. 6. Enter the following command to set the TMSCP allocation class (HSJ- and HSD-series controllers): CLI> SET THIS_CONTROLLER TMSCP_ALLOCATION_CLASS=n ------------------------------------------------------------ 3 Refer to Chapter 4 for important information about VMS node names. 7-10 Removing and Replacing Field Replaceable Units where n is 0 through 255. ------------------------------------------------------------ Note ------------------------------------------------------------ Always restart the controller after setting the ID, SCS node name, or allocation classes. ------------------------------------------------------------ 7. Restart the controller either by pressing the green reset (//) button, or entering the following command: CLI> RESTART THIS_CONTROLLER 8. Enter the following command to verify the preceding parameters were set. CLI> SHOW THIS_CONTROLLER FULL 9. Connect the host port cable to the front of the controller. HSJ-series: Connect the CI cable and tighten its captive screws with a flat-head screwdriver. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Do not connect host port cables to an HSD-series controller while the power is on to any members on the DSSI bus, including the controller and host. Doing so risks short circuits that may blow fuses on all the members. ------------------------------------------------------------ HSD-series: Disconnect controller power. Then connect the DSSI cable and the terminator to the trilink connector, and tighten their captive screws. Restore power to all members on the DSSI bus. HSZ-series: Connect the SCSI cable trilink connector to the front of the controller and tighten its captive screws with a small flat-head screwdriver. You will have to work around any SCSI cable or terminator connections when replacing the trilink. Do not remove cables or terminators from the trilink or you will interrupt the host SCSI bus. 10. Enter the following commands to enable CI paths A and B to the host (HSJ- series controllers): CLI> SET THIS_CONTROLLER PATH_A CLI> SET THIS_CONTROLLER PATH_B Enter the following command to enable the host port path (HSD-series controllers): CLI> SET THIS_CONTROLLER PATH The host port path for HSZ-series controllers is always on, so no command is needed. To automatically configure devices on the controller, use the CONFIG utility described in Chapter 6. Removing and Replacing Field Replaceable Units 7-11 For manual configuration, the following steps add devices, storage sets, and logical units. Use the CLI to complete these steps so that the host will recognize the storage device. (These steps can be run from a virtual terminal.) 1. Add the physical devices by using the following command: CLI> ADD device-type device-name scsi-location where: device-type is the type of device to be added. This can be DISK, TAPE, or CDROM. device-name is the name to refer to that device. The name is referenced when creating units or storage sets. SCSI-location is the port, target, and LUN (PTL) for the device. When entering the PTL, at least one space must separate the port, target, and LUN. For example: CLI> ADD DISK DISK100 1 0 0 CLI> ADD TAPE TAPE510 5 1 0 CLI> ADD CDROM CDROM0 6 0 0 2. Add the storage sets for the devices. See Appendix B for examples for adding storage sets. (If you do not desire storage sets in your configuration, proceed to step 3.) ------------------------------------------------------------ CAUTION ------------------------------------------------------------ The INITIALIZE command destroys all data on a container. See Appendix B for specific information on this command. ------------------------------------------------------------ 3. Enter the following command to initialize the containers (devices and/or storage sets) prior to adding logical units to the configuration. CLI> INITIALIZE container-name where a container-name is a device or storage set that will become part of a unit. When initializing a single-device container: · If NOTRANSPORTABLE (the default) was specified when the device was added, a small amount of disk space was made inaccessible to the host and used for metadata. The metadata will now be initialized. · If TRANSPORTABLE was specified, any metadata on the device will now be destroyed. Refer to Chapter 4 for details on metadata and when INITIALIZE is required. 4. Add the units that use either the devices or the storage sets built from the devices by entering the following command: CLI> ADD UNIT logical-unit-number container-name where: logical-unit-number is the unit number the host uses to access the device. container-name identifies the device or the storage set. 7-12 Removing and Replacing Field Replaceable Units 5. Use the following commands to verify that your configuration matches the earlier, printed configuration: CLI> SHOW DEVICES FULL CLI> SHOW UNITS FULL 7.1.4 One Dual-Redundant Controller ------------------------------------------------------------ CAUTION ------------------------------------------------------------ To perform the procedures in this section, at least one controller must be functioning. ------------------------------------------------------------ To replace one controller in a dual-redundant configuration (or one at a time), use the second controller to service devices while the first controller is absent. This procedure causes no service outage, but system performance will decrease slightly while one controller does the work of two. ------------------------------------------------------------ Note ------------------------------------------------------------ HSD-series controllers: You cannot effectively remove the HSD-series controller in slot (SCSI ID) 7 because of interference from the trilink connector attached to the companion controller. Remove the companion's trilink connector first in this case. ------------------------------------------------------------ 7.1.4.1 Tools Required You will need the following tools to remove or replace the controller module: · ESD strap · 3/32-inch Allen wrench · 5/32-inch Allen wrench · Flat-head screwdriver 7.1.4.2 Precautions Refer to Chapter 1 for ESD, grounding, module handling, and program card handling guidelines. Ground yourself to the cabinet grounding stud (refer to Figure 7-1) before servicing the controller module. 7.1.4.3 Module Removal Use the following procedure to remove the controller module: 1. If you have not done so already, unlock and open the cabinet doors (SW800 series) using a 5/32-inch Allen wrench. 2. Examine the green OCP reset (//) LED (refer to Figure 7-2) on both controllers. At least one green LED should not remain lit continuously after basic troubleshooting (refer to Section 7.1.1). If both green LEDs stay lit continuously, both controllers have failed. Refer to Section 7.1.5. Removing and Replacing Field Replaceable Units 7-13 3. Connect a maintenance terminal to the MMJ (refer to Figure 7-2) of each functioning or partially functioning controller, and enter the following commands: CLI> SHOW THIS_CONTROLLER FULL CLI> SHOW DEVICES FULL CLI> SHOW UNITS FULL 4. Record the output from the commands and keep it available for reference. ------------------------------------------------------------ Note ------------------------------------------------------------ Never remove a controller while it is still servicing devices. ------------------------------------------------------------ 5. If the controller you are removing is still functioning (green LED flashing) you must shut down the controller by following the guidelines in Section 7.1.2. If the controller 's green LED is lit continuously, it has already shut down, and the surviving controller has assumed service to its devices. ------------------------------------------------------------ Note ------------------------------------------------------------ Early controller models had a program card EMI shield. This shield may be discarded. ------------------------------------------------------------ 6. On the controller you are removing, unsnap and discard the program card EMI shield (if attached; refer to Figure 7-2). 7. Remove the program card by pushing the eject button (refer to Figure 7-3) next to the card. Pull the card out and save it for use in the replacement controller module. 8. HSJ-series: Loosen the captive screws on the CI cable connector (refer to Figure 7-3) with a flat-head screwdriver and remove the cable from the front of the controller module. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Do not remove host port cables from an HSD-series controller while the power is on to any members on the DSSI bus, including the controller and host. Doing so risks short circuits that may blow fuses on all the members. ------------------------------------------------------------ HSD-series: Turn off power to all members on the DSSI bus. Then, with a flat-head screwdriver, loosen the captive screws on the DSSI cable connector and terminator, and remove them from the trilink connector. (If necessary for controller access, loosen the captive screws on the trilink connector and remove it from the front of the companion controller.) 9. Remove the maintenance terminal cable (if attached). 10. Loosen the four screws (refer to Figure 7-3) on each side of the front bezel with a 3/32-inch Allen wrench (HSJ-series controllers) or flat-head screwdriver (HSD- and HSZ-series controllers). 11. Use a gentle up-and-down rocking motion to loosen the module from the shelf backplane. 7-14 Removing and Replacing Field Replaceable Units 12. Slide the module out of the shelf (noting which rails the module was seated in) and place on an approved ESD work surface or mat. 13. If necessary, you may now remove the cache module as described in Section 7.2.3. 7.1.4.4 Module Replacement/Installation Use the following procedure to replace the controller module: 1. Replace the cache module now, if you removed it. Refer to 7.2.4. 2. Make sure the OCP cable (HSJ-series only) is correctly plugged into the underside of the module (refer to Figure 7-5). 3. Slide the controller module into the shelf using its slot's rightmost rails as guides (refer to Figure 7-6). 4. Use a gentle up-and-down rocking motion to help seat the module into the backplane. Press firmly on the module until it is seated. Finally, press firmly once more to make sure the module is seated. 5. Tighten the four screws on the front bezel using a 3/32-inch Allen wrench (HSJ-series controllers) or flat-head screwdriver (HSD- and HSZ-series controllers). 6. Connect a maintenance terminal to the MMJ of the new controller. ------------------------------------------------------------ Before Proceeding ------------------------------------------------------------ Restore initial controller parameters by following the steps in Section 7.1.4.5. ------------------------------------------------------------ 7. Press and hold both controllers' green reset (//) buttons. Then insert the program card into the new controller. The program card eject button will extend when the card is fully inserted. 8. Release both reset buttons. 9. Enter the following command to initialize the controller: CLI> RESTART THIS_CONTROLLER If the controllers initialize correctly, their green LEDs will begin to flash at 1 Hz. If an error occurs during initialization, the OCP will display a code. Refer to Chapter 5 to analyze the code. 10. If you wish, you may disconnect the maintenance terminal. The terminal is not required for normal controller operation. 11. Close and lock the cabinet doors (SW800 series) using a 5/32-inch Allen wrench. Removing and Replacing Field Replaceable Units 7-15 7.1.4.5 Restoring Initial Parameters A new controller module has no initial parameters, so you must use a maintenance terminal to enter them. Refer to information in the CONFIGURATION.INFO file or on the configuration sheet packaged with your system, whichever is most current, for parameters. Be sure to use the same parameters from the removed controller when installing a replacement. Follow these steps: ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Do not install HSJ-series CI host port cables until after setting all parameters listed here. Failure to follow this procedure may result in adverse effects on the host/cluster. ------------------------------------------------------------ ------------------------------------------------------------ CAUTION ------------------------------------------------------------ SET FAILOVER establishes controller-to-controller communication and copies configuration information. Always enter this command on one controller only. COPY=configuration-source specifies where the good configuration data are located. Never blindly specify SET FAILOVER. Know where your good configuration information resides before entering the command. ------------------------------------------------------------ ------------------------------------------------------------ Note ------------------------------------------------------------ Not all steps are applicable to all controller models. Steps applicable to certain models are designated as such. ------------------------------------------------------------ 1. (HSD-series controller) Power the controller on before entering parameters. 2. Enter the following command to copy configuration information to the new controller: CLI> SET FAILOVER COPY=OTHER_CONTROLLER 3. Enter the following command to set the MAX_NODES (HSJ-series controllers): CLI> SET THIS_CONTROLLER MAX_NODES=n where n is 8, 16, or 32. 4. Enter the following command to set a valid controller ID: CLI> SET THIS_CONTROLLER ID=n where n is the (HSJ-series controller) CI node number (0 through (MAX_NODES 1)). or n is the (HSD-series controller) one-digit DSSI node number (0 through 7). Each controller DSSI node number must be unique on its DSSI interconnect. 7-16 Removing and Replacing Field Replaceable Units 5. Enter the following command to set the SCS node: CLI> SET THIS_CONTROLLER SCS_NODENAME="xxxxxx" where xxxxxx is a one- to six-character alphanumeric name for this node. The node name must be enclosed in quotes with an alphabetic character first. Each SCS node name must be unique within its VMScluster. 4 6. Enter the following command to set the MSCP allocation class: CLI> SET THIS_CONTROLLER MSCP_ALLOCATION_CLASS=n where n is 1 through 255. Digital recommends providing a unique allocation class value for every pair of dual-redundant controllers in the same cluster. 7. Enter the following command to set the TMSCP allocation class: CLI> SET THIS_CONTROLLER TMSCP_ALLOCATION_CLASS=n where n is 1 through 255. ------------------------------------------------------------ Note ------------------------------------------------------------ Always restart the controllers after setting the ID, SCS node name, or allocation classes. ------------------------------------------------------------ 8. Restart both controllers either by pressing the green reset (//) buttons, or entering the following commands: CLI> RESTART OTHER_CONTROLLER CLI> RESTART THIS_CONTROLLER 9. Enter the following commands to verify the preceding parameters were set: CLI> SHOW THIS_CONTROLLER CLI> SHOW OTHER_CONTROLLER 10. Connect the host port cables to the front of the controllers. Do not connect the controllers in a dual-redundant pair to separate, different host CPUs. HSJ-series: Connect the CI cable and tighten its captive screws with a flat-head screwdriver. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Do not connect host port cables to an HSD-series controller while the power is on to any members on the DSSI bus, including the controller and host. Doing so risks short circuits that may blow fuses on all the members. ------------------------------------------------------------ HSD-series: Disconnect controller power. Then connect the DSSI cable and the terminator to the trilink connector, and tighten their captive screws. Restore power to all members on the DSSI bus. ------------------------------------------------------------ 4 Refer to Chapter 4 for important information about VMS node names. Removing and Replacing Field Replaceable Units 7-17 11. Enter the following commands to enable CI paths A and B to the host (HSJ- series controllers): CLI> SET THIS_CONTROLLER PATH_A CLI> SET THIS_CONTROLLER PATH_B CLI> SET OTHER_CONTROLLER PATH_A CLI> SET OTHER_CONTROLLER PATH_B Enter the following commands to enable the host port path (HSD-series controllers): CLI> SET THIS_CONTROLLER PATH CLI> SET OTHER_CONTROLLER PATH 12. Use the following commands to verify your configuration matches the earlier, printed configuration before proceeding: CLI> SHOW DEVICES FULL CLI> SHOW UNITS FULL 7.1.5 Both Dual-Redundant Controllers In the rare event that both controllers in your dual-redundant configuration fail, both controllers' green OCP reset (//) LEDs will be lit continuously. You will have to replace both controller modules. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Simultaneously replacing both controllers in a dual-redundant configuration causes system down time for the duration of the service cycle. Digital recommends only using this procedure if both controllers fail, or if your system is off line already for another reason. Otherwise, to replace both controllers, follow the steps in Section 7.1.4. Replace the controllers one at a time and maintain device service. ------------------------------------------------------------ Use the following guidelines to simultaneously replace both controllers: 1. Examine the green OCP reset (//) LED on both controllers. Follow basic troubleshooting guidelines (refer to Section 7.1.1), if necessary. 2. For any fully or partially functioning controller, connect a terminal and enter the following commands: CLI> SHOW THIS_CONTROLLER FULL CLI> SHOW DEVICES FULL CLI> SHOW UNITS FULL 3. Record the output from the commands and keep it available for reference. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Never remove a controller while it is still servicing devices. ------------------------------------------------------------ 4. Shut down any fully or partially functioning controller (green LED flashing) by following the guidelines in Section 7.1.2. 5. Remove both controllers by referring to the steps 6 through 13 in Section 7.1.3.3. 7-18 Removing and Replacing Field Replaceable Units 6. Replace the first of the controllers as if this were a nonredundant configuration (refer to Section 7.1.3.4). 7. Replace the second controller by following the dual-redundant procedure (refer to Section 7.1.4.4). 7.2 Cache Module Most controller modules will have a read cache module installed behind them in the controller shelf. Currently there are two read cache modules available: 16 MB and 32 MB. 7.2.1 Tools Required You will need the following tools to remove or replace the read cache module: · ESD strap · nonconductive ESD mat · 3/32-inch Allen wrench · 5/32-inch Allen wrench · Flat-head screwdriver 7.2.2 Precautions Refer to Chapter 1 for ESD, grounding, module handling, and program card handling guidelines. Ground yourself to the cabinet grounding stud (Figure 7-1) before servicing the read cache module. 7.2.3 Module Removal Use the following procedure to remove the read cache module: 1. The controller module is seated in front of the read cache module. Any time you service a read cache, you must shut down the controller(s) based on considerations of configuration, down time, and so on. Refer to Section 7.1. 2. To access the read cache module, remove its controller module. Refer to Section 7.1. 3. Use a gentle up-and-down rocking motion to loosen the module from the shelf backplane. 4. Slide the read cache module out of the shelf, noting which rails it was seated in, and place it on an approved ESD mat. 7.2.4 Module Replacement/Installation Use the following procedure to replace the read cache module: 1. The controller module is seated in front of the read cache module. Any time you service a read cache, you must shut down controller(s) based on considerations of configuration, down time, and so on. Refer to Section 7.1. 2. To replace the read cache module, its controller module must already be removed. (You should replace the read cache module before reinstalling the controller module.) 3. Slide the read cache module into the shelf using its slot's leftmost rails as guides (refer to Figure 7-6). Removing and Replacing Field Replaceable Units 7-19 4. Press firmly and use a gentle up-and-down rocking motion on the module until it is seated. Finally, press firmly once more to make sure the module is seated. 5. Replace the controller module. Refer to Section 7.1. 7.2.5 Upgrading Cache Modules You can upgrade a cache module by increasing memory capacity as follows: 1. Determine your cache module type by entering the CLI> SHOW THIS_CONTROLLER command. The following information is displayed: CLI> SHOW THIS_CONTROLLER Controller: HSJ40 CX01234561 Software V1.4, Hardware 0000 Not configured for dual-redundancy SCSI address 7 Host port: Node name: HSJA7, valid CI node 29, 32 max nodes System ID 4200101DF52F Path A is ON Path B is ON MSCP allocation class 3 TMSCP allocation class 3 Cache: 16 megabyte read cache, version 1 Cache is GOOD Note the cache module size, cache version number, and firmware version. ------------------------------------------------------------ Note ------------------------------------------------------------ If you upgrade from 16- to 32-MB read cache, you will need to return the 16-MB module to Digital for replacement when you order the upgrade. An HSJ40 controller may have a version 1 or 2 cache module. All HSJ30, HSD30, and HSZ40 models will have version 2 cache modules. You must also run HS operating firmware Version 1.4 or higher to operate any version 2 or higher cache module. (Version 1 cache modules are also compatible with firmware Version 1.4.) ------------------------------------------------------------ 2. See Tables 7-1 through 7-4 to find and order the part number you need for the upgrade: Table 7-1 Cache Upgrade, HSJ40 Controller ------------------------------------------------------------ Current Cache Desired Cache Option Required ------------------------------------------------------------ 16 MB (Ver. 1 or 2) 32 MB HSJ40-XE ------------------------------------------------------------ 7-20 Removing and Replacing Field Replaceable Units Table 7-2 Cache Upgrade, HSJ30 Controller ------------------------------------------------------------ Current Cache Desired Cache Option Required ------------------------------------------------------------ None 16 MB 32 MB HSJ30-XD HSJ30-XF 16 MB 32 MB HSJ30-XE ------------------------------------------------------------ Table 7-3 Cache Upgrade, HSD30 Controller ------------------------------------------------------------ Current Cache Desired Cache Option Required ------------------------------------------------------------ None 16 MB 32 MB HSD30-XD HSD30-XF 16 MB 32 MB HSD30-XE ------------------------------------------------------------ Table 7-4 Cache Upgrade, HSZ40 Controller ------------------------------------------------------------ Current Cache Desired Cache Option Required ------------------------------------------------------------ None 16 MB 32 MB HSZ40-XD HSZ40-XF 16 MB 32 MB HSZ40-XE ------------------------------------------------------------ 3. If necessary, remove the cache module as described in Section 7.2.3. 4. Insert the upgraded cache module by following the steps in 7.2.4. 7.3 Program Card Whenever you remove a failed controller module (refer to Section 7.1), you remove the PCMCIA program card. However, there are times when you need to remove only the program card, such as when you install updated firmware. You are allowed to remove one or both program cards from a dual-redundant configuration, or one card from a nonredundant configuration. ------------------------------------------------------------ Note ------------------------------------------------------------ When you update firmware, you must remove both program cards from a dual-redundant configuration. Furthermore, the two cards in a dual-redundant configuration must contain the same version of firmware. ------------------------------------------------------------ Use the procedures in this section when you are removing and replacing only the program card. 7.3.1 Tools Required You will need a 5/32-inch Allen wrench to remove or replace the program card. 7.3.2 Precautions Refer to Chapter 1 for program card handling guidelines. Removing and Replacing Field Replaceable Units 7-21 7.3.3 Card Removal Use the following procedure to remove the program card: 1. If you have not done so already, unlock and open the cabinet doors (SW800 series) using a 5/32-inch Allen wrench. 2. Examine the green OCP reset (//) LED(s) on the controller(s). They should be flashing. If a green LED is lit continuously, its controller has failed. To service the controller, refer to Section 7.1. ------------------------------------------------------------ Note ------------------------------------------------------------ You need not record configuration information; the configuration infomation is not lost when removing a program card. ------------------------------------------------------------ 3. Connect a maintenance terminal to the MMJ of the controller(s) you are removing the program card from, and shut down the controller(s) by following the guidelines in Section 7.1.2. The green LED(s) should light continuously when shutdown completes. ------------------------------------------------------------ Note ------------------------------------------------------------ Earlier controller models had a program card EMI shield. This shield may be discarded. ------------------------------------------------------------ 4. Unsnap and discard the program card EMI shield(s), if attached. 5. Remove the program card(s) by pushing the eject button(s) (refer to Figure 7-3) next to the card(s). 6. Pull the card(s) out. 7. If you are updating firmware, follow the instructions included with your new firmware for used card return or disposal. 7.3.4 Card Replacement/Installation Use the following procedure to replace the program card: ------------------------------------------------------------ Note ------------------------------------------------------------ If you are updating firmware, install your new program card(s) by following the instructions included with the card(s). Otherwise, you may use the following guidelines to replace the program card(s). ------------------------------------------------------------ 1. For a nonredundant configuration: Press and hold the controller green OCP reset (//) button. Then insert the program card. The program card eject button will extend when the card is fully inserted. For a dual-redundant configuration: Press and hold both green reset buttons at the same time, even if you are only 7-22 Removing and Replacing Field Replaceable Units replacing one of the cards. Then insert the program card(s). The program card eject button will extend when the card is fully inserted. 2. Release the reset button(s) to initialize the controller(s). If the controller(s) initialize correctly, the green reset LED(s) will begin to flash at 1 Hz. If an error occurs during initialization, the OCP(s) will display a code. Refer to Chapter 5 to analyze any codes. 3. If you wish, you may disconnect the maintenance terminal. The terminal is not required for normal controller operation. 4. Close and lock the cabinet doors (SW800 series) using a 5/32-inch Allen wrench. 7.4 External CI Cables (HSJ-Series) Use the procedures in this section when you are removing and replacing external CI cables. 7.4.1 Tools Required You will need 5/32-inch Allen wrench to remove or replace external CI cables. 7.4.2 Precautions Refer to Chapter 1 for CI cable handling guidelines. 7.4.3 Cable Removal Use the following procedure to remove external CI cables: 1. The CI interface includes two connections (paths A and B). You should determine what paths are suspect before proceeding. Refer to Chapter 5 for troubleshooting guidelines. ------------------------------------------------------------ Note ------------------------------------------------------------ When only one external CI cable requires replacement, you need only halt activity and disconnect cables for the (one) suspect path. ------------------------------------------------------------ 2. For the suspect path(s), enter one or both of the following commands to halt activity on the suspect host path(s): CLI> SET THIS_CONTROLLER NOPATH_A CLI> SET THIS_CONTROLLER NOPATH_B ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Always disconnect the external CI cable from the star coupler first, then disconnect it from the internal CI cable second. Never leave unterminated paths on the star coupler. Never leave cables, terminated or not, attached at the star coupler and disconnected at the internal CI cable connector. This minimizes adverse effects on the cluster and prevents a short circuit between the two ground references. ------------------------------------------------------------ Removing and Replacing Field Replaceable Units 7-23 3. Disconnect the external CI cable connectors from the star coupler one at a time, in the following order (see Figure 7-7): TXA RXA TXB RXB 4. Attach terminators to the open star coupler connectors. 5. If necessary to access to internal/external CI cable connector, unlock and open the cabinet (SW800 series) using a 5/32-inch Allen wrench. 6. Disconnect the external CI cables from the internal CI cable. 7. Remove the cable. Figure 7-7 External and Internal CI Cables (HSJ-series) 7-24 Removing and Replacing Field Replaceable Units 7.4.4 Cable Replacement/Installation Use the following procedure to replace the external CI cables: ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Always connect the external CI cable to the internal CI cable first, then connect it to the star coupler second. Never leave unterminated paths on the star coupler. Never leave cables, terminated or not, attached at the star coupler and disconnected at the internal CI cable connector. This minimizes adverse effects on the cluster and prevents a short circuit between the two ground references. ------------------------------------------------------------ 1. Connect the external CI cables to the internal CI cable. 2. If necessary, close and lock the cabinet doors (SW800 series) using a 5/32-inch Allen wrench. 3. Remove any terminators from the star coupler connections. 4. Connect the external CI cable connectors to the star coupler one at a time, in the following order (refer to Figure 7-7): RXB TXB RXA TXA 5. For the replaced path(s), enter the following commands to resume activity on the replaced host path(s): CLI> SET THIS_CONTROLLER PATH_A CLI> SET THIS_CONTROLLER PATH_B 7.5 Internal CI Cables (HSJ-series) Servicing internal CI cables causes down time for the affected controller because both host paths (A and B) must be disabled for the duration of the procedure. Use the procedures in this section when you are removing and replacing internal CI cables. 7.5.1 Tools Required You will need the following tools to remove or replace internal CI cables: · 5/32-inch Allen wrench · Tie wrap cutters · Flat-head screwdriver 7.5.2 Precautions Refer to Chapter 1 for CI cable handling guidelines. Removing and Replacing Field Replaceable Units 7-25 7.5.3 Cable Removal Use the following procedure to remove internal CI cables: 1. You should determine that paths are, in fact, suspect before proceeding. Refer to Chapter 5 for troubleshooting guidelines. 2. Enter the following commands to halt activity on both host paths: CLI> SET THIS_CONTROLLER NOPATH_A CLI> SET THIS_CONTROLLER NOPATH_B ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Always disconnect the external CI cable from the star coupler first, then disconnect it from the internal CI cable second. Never leave unterminated paths on the star coupler. Never leave cables, terminated or not, attached at the star coupler and disconnected at the internal CI cable connector. This minimizes adverse effects on the cluster and prevents a short circuit between the two ground references. ------------------------------------------------------------ 3. Disconnect the external CI cable connectors from the star coupler one at a time, in the following order (refer to Figure 7-7): TXA RXA TXB RXB 4. Attach terminators to the open star coupler connectors. 5. Unlock and open the cabinet (SW800 series) using a 5/32-inch Allen wrench. 6. Disconnect the external CI cables from the internal CI cable. 7. Loosen the captive screws on the internal CI cable where it attaches to the front of the controller using a flat-head screwdriver, and disconnect the internal CI cable from the controller. 8. Remove the internal CI cable from the cabinet, cutting tie wraps as necessary. 7.5.4 Cable Replacement/Installation Use the following procedure to replace internal CI cables: 1. Position and route the internal CI cable within the cabinet. 2. Connect the internal CI cable to the front of the controller, and tighten the captive screws on the internal CI cable where it attaches to the controller using a flat-head screwdriver. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Always connect the external CI cable to the internal CI cable first, then connect it to the star coupler second. Never leave unterminated paths on the star coupler. Never leave cables, terminated or not, attached at the star coupler and disconnected at the internal CI cable connector. This minimizes adverse effects on the cluster and prevents a short circuit between the two ground references. ------------------------------------------------------------ 7-26 Removing and Replacing Field Replaceable Units 3. Connect the external CI cables to the internal CI cable. 4. Remove any terminators from the star coupler connections. 5. Connect the external CI cable connectors to the star coupler one at a time, in the following order (refer to Figure 7-7): RXB TXB RXA TXA 6. Install any tie wraps as necessary to hold the internal CI cable in place. 7. Close and lock the cabinet doors (SW800 series) using a 5/32-inch Allen wrench. 8. Enter the following commands to resume activity on the host paths: CLI> SET THIS_CONTROLLER PATH_A CLI> SET THIS_CONTROLLER PATH_B 7.6 DSSI Host Cables (HSD-series) Servicing DSSI host cables (Figure 7-8) causes system down time for all bus members because all power must be disconnected from every member on the DSSI bus before cable removal/replacement. Use the procedures in this section when you are removing and replacing DSSI host cables. (Optional) The trilink connector may be considered part of the DSSI host cable during service. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Do not service the host port cables of an HSD-series controller while the power is on to any members on the DSSI bus, including the controller and host. Doing so risks short circuits that may blow fuses on all the members. ------------------------------------------------------------ 7.6.1 Tools Required You will need the following tools to remove or replace DSSI host cables: · 5/32-inch Allen wrench · Tie wrap cutters · Flat-head screwdriver 7.6.2 Precautions Refer to Chapter 1 for DSSI host cable handling guidelines. Removing and Replacing Field Replaceable Units 7-27 Figure 7-8 DSSI Host Cables 7.6.3 Cable Removal Use the following procedure to remove DSSI host cables: 1. Enter the following command to halt activity on the host path: CLI> SET THIS_CONTROLLER NOPATH 2. Disconnect power from all members, including the HSD-series controller and host, on the DSSI bus. 3. Disconnect the DSSI host cable from the host or other device (the device at the other end of the cable from the controller). 4. If necessary to access the HSD-series controller, unlock and open the cabinet (SW800 series) using a 5/32-inch Allen wrench. 5. Loosen the captive screws on the DSSI host cable where it attaches to the trilink connector on the front of the controller, and disconnect the cable. 6. Remove the DSSI host cable from the cabinet, cutting tie wraps as necessary. 7-28 Removing and Replacing Field Replaceable Units 7. (Optional) Loosen captive screws and remove the terminator or secondary DSSI host cable attached to the trilink connector. 8. (Optional) Loosen captive screws and remove the trilink connector from the front of the controller. 7.6.4 Cable Replacement/Installation Use the following procedure to replace DSSI host cables: 1. (Optional) Attach the trilink connector to the front of the controller and tighten its captive screws. 2. Position and route the DSSI host cable within the cabinet. 3. Connect the DSSI host cable to the trilink connector on the front of the controller, and tighten the captive screws on the DSSI host cable connector. 4. (Optional) Connect and tighten captive screws for the terminator or secondary DSSI host cable (at the open connection of the trilink connector). 5. Install any tie wraps as necessary to hold the DSSI host cable in place. 6. Close and lock the cabinet doors (SW800 series) using a 5/32-inch Allen wrench. 7. Connect the other end of the cable to the appropriate device on the bus. 8. Reapply power to the controller and devices on the DSSI bus. 9. Enter the following command to resume activity on the host path: CLI> SET THIS_CONTROLLER PATH 7.7 SCSI Host Cables (HSZ-Series) Servicing SCSI host cables (Figure 7-9) causes subsystem down time because the host path will be disconnected for the duration of the procedure. Use the procedures in this section when you are removing and replacing SCSI host cables. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Never leave active SCSI host buses unterminated during service. How you service your cables, and what devices you may leave running, terminated, and so on, will depend on your configuration. ------------------------------------------------------------ (Optional) The trilink connector may be considered part of the SCSI host cable during service. 7.7.1 Tools Required You will need the following tools to remove or replace SCSI host cables: · 5/32-inch Allen wrench · Tie wrap cutters · Flat-head screwdriver Removing and Replacing Field Replaceable Units 7-29 Figure 7-9 SCSI Host Cable 7.7.2 Precautions Refer to Chapter 1 for SCSI host cable handling guidelines. 7.7.3 Cable Removal Use the following procedure to remove SCSI host cables: 1. Disconnect the SCSI host cable from the host or other device (the device at the other end of the cable from the controller). 2. If necessary to access the HSZ-series controller, unlock and open the cabinet (SW800 series) using a 5/32-inch Allen wrench. 3. Loosen the captive screws on the SCSI host cable where it attaches to the trilink connector on the front of the controller, and disconnect the cable. 4. Remove the SCSI host cable from the cabinet, cutting tie wraps as necessary. 5. (Optional) Loosen captive screws and remove the terminator or secondary SCSI host cable attached to the trilink connector. 6. (Optional) Loosen captive screws and remove the trilink connector from the front of the controller. 7-30 Removing and Replacing Field Replaceable Units 7.7.4 Cable Replacement/Installation Use the following procedure to replace SCSI host cables: 1. (Optional) Attach the trilink connector to the front of the controller and tighten its captive screws. 2. Position and route the SCSI host cable within the cabinet. 3. Connect the SCSI host cable to the trilink connector on the front of the controller, and tighten the captive screws on the SCSI host cable connector. 4. (Optional) Connect and tighten captive screws for the terminator or secondary SCSI host cable (at the open connection of the trilink connector). 5. Install any tie wraps as necessary to hold the SCSI host cable in place. 6. Close and lock the cabinet doors (SW800 series) using a 5/32-inch Allen wrench. 7. Connect the other end of the cable to the appropriate device on the bus, removing terminators as necessary. 7.8 SCSI Device Port Cables Servicing SCSI device port cables causes subsystem down time because you must remove devices to access SCSI connectors on the BA350-MA (controller) and BA350-SB (device) shelf backplanes. ------------------------------------------------------------ Note ------------------------------------------------------------ If the desired cable connects to a device shelf in the lower part of a cabinet, it may be easier to remove the device shelf rather than attempt this procedure with the shelf installed. Refer to the StorageWorks Solutions Shelf and SBB User 's Guide for procedures to remove a device shelf and for correct SCSI cable lengths. ------------------------------------------------------------ 7.8.1 Tools Required You will need the following tools to remove or replace device port cables: · ESD strap · 3/32-inch Allen wrench · 5/32-inch Allen wrench · Flat-head screwdriver 7.8.2 Precautions Refer to Chapter 1 for ESD, grounding, module handling, and cable handling guidelines. Removing and Replacing Field Replaceable Units 7-31 7.8.3 Cable Removal Use the following procedure to remove device port cables: 1. Unlock and open the cabinet (SW800 series) using a 5/32-inch Allen wrench. 2. Remove the controller(s) and cache module(s) by referencing the procedures described in Sections 7.1 and 7.2. 3. Using a flat-head screwdriver, loosen the two captive screws on each side of the volume shield, and remove the shield (see Figure 7-10). Figure 7-10 Volume Shield 4. Remove the cable from the BA350-MA (controller) shelf backplane by pinching the cable connector side clips and disconnecting the cable. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Digital recommends labelling devices to indicate what slot they were removed from. If SBBs are removed and then returned to a different slot, customer data may be destroyed. 7-32 Removing and Replacing Field Replaceable Units Let disk drives spin down for at least 30 seconds prior to removing them from the device shelf. Gyroscopic motion from a spinning disk may cause you to drop and damage the SBB. ------------------------------------------------------------ 5. Remove any SBBs necessary to access the SCSI cable, as shown in Figure 7-11. (Press down on the two SBB mounting tabs to release it from the shelf, and pull the device straight out.) 6. Remove the cable from the BA350-SB (device) shelf backplane by pinching the cable connector side clips and disconnecting the cable. Figure 7-11 SCSI Device Cables 7.8.4 Cable Replacement/Installation Use the following procedure to replace device port cables: ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Be very careful when inserting cable connectors into connectors within the BA350-MA and BA350-SB shelves. Inserting a poorly aligned cable connector can damage the shelf connector. You must replace the entire shelf if its connectors are damaged. ------------------------------------------------------------ 1. For the device shelf connector, gently slide the cable connector in from one side to the other, and rock the connector from top to bottom to seat it. 2. Listen for the connector to snap into place. 3. For the controller shelf connector, gently slide the cable connector in from one side to the other, and rock the connector from top to bottom to seat it. Removing and Replacing Field Replaceable Units 7-33 4. Listen for the connector to snap into place. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Return a device to the slot from which it was removed. If SBBs are removed and then returned to a different slot, customer data may be destroyed. ------------------------------------------------------------ 5. Insert the SBBs into the device shelf making sure that all SBBs are returned to their original slots. The SBB mounting tabs will snap into place as the SBBs are locked into the shelf. 6. Replace the volume shield in the controller shelf and tighten the captive screws finger tight using a flat-head screwdriver (refer to Figure 7-10). 7. Replace the cache module(s) and controller(s) by referencing the procedures described in Sections 7.1 and 7.2. 8. Close and lock the cabinet doors (SW800 series) using a 5/32-inch Allen wrench. ------------------------------------------------------------ WARNING ------------------------------------------------------------ Service procedures described in this manual that involve blower removal or access to the rear of the shelf must be performed only by qualified service personnel. ------------------------------------------------------------ 7.9 Blowers The BA350-MA and BA350-SB StorageWorks shelves have two rear-mounted blowers that cool the controllers and storage devices (see Figure 7-12). Connectors on the shelf backplane provide +12 Vdc power to operate them. When either blower fails, the shelf status (upper) LED on the power SBB turns off, and an error message is passed to the controller or host. As long as one blower is operating, there is sufficient air flow to prevent an overtemperature condition. If both blowers fail, the shelf can overheat in as little as 60 seconds. 7.9.1 Tools Required You will need the following tools to remove or replace the blower: · 5/32-inch Allen wrench · Phillips screwdriver (#2) 7.9.2 Precautions Refer to Chapter 1 for safety guidelines. 7-34 Removing and Replacing Field Replaceable Units Figure 7-12 Replacing a Blower ------------------------------------------------------------ ------------------------------------------------------------ GUIDEğTX CONNECTORğTX PHILLIPSğTx SCREWğTX MOUNTINGğTx TABğTX BLOWERğTX ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ CXO-3659A-PHğTX ------------------------------------------------------------ ------------------------------------------------------------ 7.9.4 Blower Replacement/Installation ------------------------------------------------------------ WARNING ------------------------------------------------------------ To reduce the risk of electrical energy hazard, disconnect the power cables from the shelf power supplies before replacing shelf blower assemblies or performing service in the backplane area. ------------------------------------------------------------ Use the following procedure to replace a blower: 1. Align the replacement blower connector and push the blower straight in, making sure it is fully seated and that both mounting tabs lock in place. 2. Replace the safety screw in the corner of the blower using a Phillips screwdriver. 3. If you had to remove the shelf to access the blowers, replace the shelf as described in the StorageWorks Solutions Shelf and SBB User 's Guide. Then replace its SCSI device cables as described in Section 7.8. 4. Connect the shelf power cables and verify that the shelf and all SBBs are operating properly. ------------------------------------------------------------ Note ------------------------------------------------------------ If the upper power supply LED (shelf status) does not come on and all the shelf power supplies are operating, the second blower may have failed or the wrong blower may have been replaced. ------------------------------------------------------------ 5. Close and lock the cabinet doors (SW800 series) using the 5/32-inch Allen wrench. 7.10 Power Supplies There are two methods for replacing power supply SBBs: hot swap and cold swap. · Use hot swap to replace a power supply only when there are two power supplies in a shelf. Hot swap allows you to remove the defective power supply while the other supply furnishes power. ------------------------------------------------------------ Note ------------------------------------------------------------ Hot swap does not disable the shelf or its contents. ------------------------------------------------------------ · Use cold swap during installation or when there is no operational shelf power supply. Should this occur on a controller shelf, the controller, cache module, and all associated SCSI buses are disabled until power is restored. On a device shelf, those particular devices are disabled, though their controller will still service devices on other shelves. 7-36 Removing and Replacing Field Replaceable Units 7.10.1 Tools Required You will need a 5/32-inch Allen wrench to remove or replace a power supply. 7.10.2 Precautions Refer to Chapter 1 for safety guidelines. 7.10.3 Power Supply Removal Use the following procedure to remove a power supply (see Figure 7-13): Figure 7-13 Power Supply Removal ------------------------------------------------------------ Note ------------------------------------------------------------ The cold swap procedure is identical, except you should take the shelf contents (devices or controllers) off line before removing the power supply. ------------------------------------------------------------ 1. Unlock and open the cabinet doors (SW800 series) using a 5/32-inch Allen wrench. 2. Make sure the power status (lower) LED on the power supply is off. 3. Unplug the power supply. 4. Press the two mounting tabs together to release the power supply from the shelf. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ The power supply is relatively heavy and can be damaged if dropped. Always use both hands to fully support the power supply during removal. ------------------------------------------------------------ 5. Use both hands to pull the power supply out of the shelf. Removing and Replacing Field Replaceable Units 7-37 7.10.4 Power Supply Replacement/Installation Use the following procedure to replace a power supply (refer to Figure 7-13): ------------------------------------------------------------ CAUTION ------------------------------------------------------------ The power supply is relatively heavy and can be damaged if dropped. Always use both hands to fully support the power supply during replacement. ------------------------------------------------------------ 1. Hold the power supply in both hands and firmly push it into the shelf until you hear the mounting tabs snap into place. 2. Plug the power cord back into the power supply. 3. Observe the power and shelf status LEDs to make sure both turn on. If both LEDs do not turn on, refer to Chapter 5 for troubleshooting basics. 4. Close and lock the cabinet doors (SW800 series) using a 5/32-inch Allen wrench. 7.11 Warm Swap When you warm swap a storage SBB or a controller, you quickly and efficiently remove the hardware and install a replacement. Warm swap is possible without taking your controllers out of service or adversely affecting activity on the rest of the subsystem. Using warm swap also preserves data integrity. ------------------------------------------------------------ Note ------------------------------------------------------------ Warm swap is not applicable to service on unpowered StorageWorks shelves. Do not attempt to execute warm swap on an unpowered shelf. ------------------------------------------------------------ 7.11.1 SBB Warm Swap Device warm swap involves quickly removing and replacing a disk drive, tape drive, or other storage SBB. You can safely remove SBBs without taking your system or controller off line. However, before removing a device, either the controller or the operator must determine that the swap is necessary. 5 · The controller determines that a device is bad by trying to access the device, receiving no response from the device, or detecting excessive errors from the device. · The operator decides to remove a device by examining the OCP codes, the SBB LEDs, system messages, or system error log information. 7.11.1.1 Tools Required You will need a 5/32-inch Allen wrench to warm swap a device. ------------------------------------------------------------ 5 You may also use the SBB warm swap procedure to add a device to an empty shelf slot. 7-38 Removing and Replacing Field Replaceable Units 7.11.1.2 Precautions Refer to Chapter 1 for safety guidelines. 7.11.1.3 Device Removal ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Warm swap supports removal and replacement of only one SBB at a time. Should another SBB need to be swapped, you must repeat the entire warm swap procedure. You must follow steps in this section in their exact order so that the following is ensured: · Preserve data integrity (especially for devices with older SCSI interface designs). · Reduce chances of making a port unusable for a long period, which can render several devices inaccessible. · Prevent the controller from performing unpredictably. ------------------------------------------------------------ Use the following procedure to remove a device: 1. You must dismount the device from the host before proceeding. (For example, enter the DISMOUNT command if you are using the OpenVMS operating system.) Refer to your operating system documentation for procedures necessary for dismounting a device. 2. Unlock and open the cabinet doors (SW800 series) using a 5/32-inch Allen wrench. 3. Quiesce the SBB's port by pressing and holding the controller port button for the SBB. Continue holding the button until all amber OCP LEDs light. ------------------------------------------------------------ Note ------------------------------------------------------------ Only one port may be quiesced at any time. If the button is not held long enough, or multiple buttons are pushed in quick succession, all buttons are ignored (no ports are quiesced). You must press and hold the button again to quiesce the port. ------------------------------------------------------------ 4. Wait until the chosen port LED flashes alternately with the other port LEDs (this indicates I/O has stopped). The alternating pattern flashes for approximately 30 seconds, during which you may remove the SBB. If the pattern does not appear after a minute or two, another shelf is asserting a fault signal that prevents any quiesce function on this controller. To correct the problem, you must locate the suspect shelf and do one of three things: · Remove all devices from the shelf. · Disconnect the shelf 's SCSI device cables (Section 7.8). · Repair/replace the shelf power supply (Section 7.10). Removing and Replacing Field Replaceable Units 7-39 5. To remove the SBB, press its two mounting tabs together to release it from the shelf, and pull it out using both hands (see Figure 7-14). Figure 7-14 SBB Warm Swap ------------------------------------------------------------ CXO-3611B-PHğTX 2. Wait until the chosen port LED flashes alternately with the other port LEDs (this indicates I/O has stopped). The alternating pattern flashes for approximately 30 seconds, during which you may insert the SBB. If the pattern does not appear after a minute or two, another shelf is asserting a fault signal that prevents any quiesce function on this controller. To correct the problem, you must locate the suspect shelf and do one of three things: · Remove all devices from the shelf. · Disconnect the shelf 's SCSI device cables (Section 7.8). · Repair/replace the shelf power supply (Section 7.10). While the OCP LEDs are flashing, any SBBs on the quiesced port that have status LEDs will also flash. ------------------------------------------------------------ Note ------------------------------------------------------------ The length of time required for I/O to stop can vary from zero seconds to several minutes, depending on load, device type, and cache status. ------------------------------------------------------------ 3. Hold the SBB in both hands, and firmly push it into the shelf until you hear the mounting tabs snap into place. 7.11.1.5 Restoring the Device to the Configuration After you insert the SBB, the flashing pattern on the OCP stops, and normal operation on the ports resumes. At this time the port LEDs will turn off. · If you inserted a new device in a previously unused slot, that port's LED remains lit until the device is added by entering the following command (see Appendix B): CLI> ADD device · If a tape SBB is inserted in a slot where a disk SBB was previously installed, the port LED remains lit until the device is added using the ADD command, and you delete the previously installed device from the list of known devices, as follows: CLI> DELETE device-name · If the new disk is to be part of a storage set, you must delete the storage set from the configuration and create (ADD) it again. · Initialize a newly inserted disk by entering the following: CLI> INITIALIZE container where container is either the disk, or a group of disks linked as a storage set. This initializes the metadata on each disk in the container, including the one that was just swapped. ------------------------------------------------------------ Note ------------------------------------------------------------ If you think you have failed to perform warm swap exactly as stated here, you should reinitialize the controller. Otherwise, the controller may perform unpredictably. ------------------------------------------------------------ Removing and Replacing Field Replaceable Units 7-41 Remember to close and lock cabinet doors (SW800 series) using a 5/32-inch Allen wrench after finishing the device warm swap. 7.11.2 Controller Warm Swap (HSJ-Series Controllers) Use warm swap to efficiently remove and replace one controller in a dual- redundant configuration. When you warm swap a controller, you are changing out a controller in the most transparent method available to the HS controller subsystem. Performing warm swap involves removing one controller, while forcing the other controller into failover. Because the remaining controller executes failover, it assumes control of the absent controller 's devices. This minimzes impact to system performance and downtime. ------------------------------------------------------------ Note ------------------------------------------------------------ You must warm swap only one controller at a time. Never attempt to remove both controllers in your dual-redundant configuration using warm swap. Try to have a replacement controller available prior to starting warm swap. Otherwise, you must to terminate the warm swap program and restart it later when you have a replacement. ------------------------------------------------------------ 7.11.2.1 Tools Required You will need the following tools to warm swap a controller: · ESD strap · 3/32-inch Allen wrench · 5/32-inch Allen wrench · Flat-head screwdriver 7.11.2.2 Precautions Refer to Chapter 1 for ESD, grounding, module handling, and program card handling guidelines. Ground yourself to the cabinet grounding stud (refer to Figure 7-1) before servicing the controller module. 7.11.2.3 Controller Removal Use the following procedure to remove the controller: 1. Apply either a virtual terminal connection or a maintenance terminal to the controller you will not be removing. 2. Enter the RUN C_SWAP command. The system responds with the following: Controller Warm Swap, Software Version -V1.4 Copyright © Digital Equipment Corporation 1993. *** Sequence to REMOVE other HSJ40 has begun. *** Do you wish to REMOVE the other HSJ40 Y/N [N]? 3. Enter ``Y'' to continue the procedure. Will its cache module also be removed Y/N [N]? 7-42 Removing and Replacing Field Replaceable Units 4. Enter ``Y'' only if you will be removing the controller 's cache module as well. Killing other controller. Attempting to quiese all ports. Port 1 quiesced. Port 2 quiesced. Port 3 quiesced. Port 4 quiesced. Port 5 quiesced. Port 6 quiesced. All ports quiesced. Remove the other HSJ40 (the one WITHOUT a blinking green LED) within 5 minutes. ------------------------------------------------------------ Note ------------------------------------------------------------ Do not remove the controller with the blinking green LED reset (//) button. ------------------------------------------------------------ 5. You have 5 minutes to remove the controller following the steps described in Table 7-5. Your terminal will update you with the time remaining to complete the removal procedure, as shown in the following example: Time remaining 4 minutes, 40 seconds. ------------------------------------------------------------ Note ------------------------------------------------------------ If you fail to remove the controller within five minutes, the subsystem will restart the quiesced ports, and you will have to begin this procedure again. ------------------------------------------------------------ Table 7-5 Module Removal ------------------------------------------------------------ Step Description ------------------------------------------------------------ 1 Ground yourself to the cabinet grounding stud (refer to Figure 7-1). 2 Unlock and open the cabinet doors (SW800 series) using a 5/32-inch Allen wrench. 3 Unsnap and discard the program card EMI shield (if attached; refer to Figure 7-2). 4 Remove the program card by pushing the eject button (refer to Figure 7-3) next to the card. Pull the card out and save it for use in the replacement controller module. 5 Loosen the captive screws on the host interface (CI) cable connector (refer to Figure 7-3) with a flat-head screwdriver and remove the cable from the front of the controller module. 6 Loosen the four screws (refer to Figure 7-3) on each side of the front bezel with a 3/32-inch Allen wrench. 7 Use a gentle up-and-down rocking motion to loosen the module from the shelf backplane. 8 Slide the module out of the shelf (noting which rails the module was seated in) and place on an approved ESD work surface or mat. 9 If necessary, you may now remove the cache module as described in Section 7.2.3. ------------------------------------------------------------ Removing and Replacing Field Replaceable Units 7-43 Once you remove the controller, you will see the following displayed as the subsystem uses the remaining controller to service the quiesced ports: Restarting ALL ports. Port 1 restarted. Port 2 restarted. Port 3 restarted. Port 4 restarted. Port 5 restarted. Port 6 restarted. 7.11.2.4 Controller Replacement Use the following procedure to replace the controller: 1. The system will prompt you with the following to replace the controller: Do you have a replacement HSJ40 readily available [N]? Try to have a replacement available. If you do not have one, you must answer with ``N''. Then, the warm swap sequence will terminate, and you must restart the routine later when you have a replacement. 2. When you find a replacement, you can restart the sequence by entering the RUN C_SWAP command again. The system responds with the following: Do you have a replacement HSJ40 readily available [N]? Answer ``Y'' if you have the controller. 3. The following is displayed next: *** Sequence to INSERT other HSJ40 has begun. *** Do you wish to INSERT the other HSJ40 [N]? Answer Y to insert the controller. Remember to first reinsert the cache module if applicable. Attempting to quiese all ports. Port 1 quiesced. Port 2 quiesced. Port 3 quiesced. Port 4 quiesced. Port 5 quiesced. Port 6 quiesced. All ports quiesced. Insert the other HSJ40, WITHOUT its program card, and press Return 4. Insert the cache (if applicable) and controller now. Follow the steps outlined in Table 7-6. 7-44 Removing and Replacing Field Replaceable Units Table 7-6 Module Replacement ------------------------------------------------------------ Step Description ------------------------------------------------------------ 1 Ground yourself to the cabinet grounding stud (refer to Figure 7-1). 2 You should replace the cache module now, if you removed it. Refer to Section 7.2.4. 3 Make sure the OCP cable is correctly plugged into side two of the module (refer to Figure 7-5). 4 Slide the controller module into the shelf using its slot's rightmost rails as guides (refer to Figure 7-6). 5 Use a gentle up-and-down rocking motion to help seat the module into the backplane. Press firmly on the module until it is seated. Finally, press firmly once more to make sure the module is seated. 6 Tighten the four screws on the front bezel using a 3/32-inch Allen wrench. 8 Connect a maintenance terminal to the MMJ of the other controller (the one you did not replace) if one is not already connected. ------------------------------------------------------------ Restarting ALL ports. Port 1 restarted. Port 2 restarted. Port 3 restarted. Port 4 restarted. Port 5 restarted. Port 6 restarted. The configuration has two contollers. 5. Follow the steps in the system message: The Controller Warm Swap program has terminated. To restart the other controller: 1) Enter the RESTART OTHER command. 2) Press and hold the Reset button (//) while inserting the program card. 3) Release Reset (//) and the controller will initialize. 4) Configure new controller by referring to the HS Array Controller User's Guide. If the controller initializes correctly, its green reset LED will begin to flash at 1 Hz. If an error occurs during initialization, the OCP will display a code. Refer to Chapter 5 to analyze the code. 6. Restore parameters for the new controller using the steps in Section 7.11.2.5. 7.11.2.5 Restoring Parameters The new controller module has no initial parameters, so you must use a maintenance terminal to enter them. Refer to information in the CONFIGURATION.INFO file or on the configuration sheet packaged with your system, whichever is most current, for parameters. Be sure to use the same parameters from the removed controller when installing a replacement. Follow these steps: ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Do not install HSJ-series CI host port cables until after setting all parameters listed here. Failure to follow this procedure may result in adverse effects on the host/cluster. ------------------------------------------------------------ Removing and Replacing Field Replaceable Units 7-45 ------------------------------------------------------------ CAUTION ------------------------------------------------------------ SET FAILOVER establishes controller-to-controller communication and copies configuration information. Always enter this command on one controller only. COPY=configuration-source specifies where the good configuration data are located. Never blindly specify SET FAILOVER. Know where your good configuration information resides before entering the command. ------------------------------------------------------------ 1. Enter the following command to copy configuration information to the new controller: CLI> SET FAILOVER COPY=THIS_CONTROLLER 2. Enter the following command to set the MAX_NODES: CLI> SET OTHER_CONTROLLER MAX_NODES=n where n is 8, 16, or 32. 3. Enter the following command to set a valid controller ID: CLI> SET OTHER_CONTROLLER ID=n where n is the CI node number (0 through (MAX_NODES 1)). 4. Enter the following command to set the SCS node: CLI> SET OTHER_CONTROLLER SCS_NODENAME="xxxxxx" where xxxxxx is a one- to six-character alphanumeric name for this node. The node name must be enclosed in quotes with an alphabetic character first. Each SCS node name must be unique within its VMScluster. 6 5. Enter the following command to set the MSCP allocation class: CLI> SET OTHER_CONTROLLER MSCP_ALLOCATION_CLASS=n where n is 1 through 255. Digital recommends providing a unique allocation class value for every pair of dual-redundant controllers in the same cluster. 6. Enter the following command to set the TMSCP allocation class: CLI> SET OTHER_CONTROLLER TMSCP_ALLOCATION_CLASS=n where n is 1 through 255. ------------------------------------------------------------ Note ------------------------------------------------------------ Always restart the new controller after setting the ID, SCS node name, or allocation classes. ------------------------------------------------------------ 7. Restart the new controller either by pressing its green reset (//) button, or entering the following command: CLI> RESTART OTHER_CONTROLLER ------------------------------------------------------------ 6 Refer to Chapter 4 for important information about VMS node names. 7-46 Removing and Replacing Field Replaceable Units 8. Enter the following command to verify the preceding parameters were set. CLI> SHOW OTHER_CONTROLLER FULL 9. Connect the host port cable to the front of the new controller. Do not connect the controllers in a dual-redundant pair to separate, different host CPUs. 10. Enter the following commands to enable CI paths A and B to the host: CLI> SET OTHER_CONTROLLER PATH_A CLI> SET OTHER_CONTROLLER PATH_B 11. If you wish, you may disconnect the maintenance terminal. The terminal is not required for normal controller operation. 12. Close and lock the cabinet doors (SW800 series) using a 5/32-inch Allen wrench. Removing and Replacing Field Replaceable Units 7-47 A ------------------------------------------------------------ Field Replaceable Units This appendix lists HS controller field replaceable units (FRUs), required tools and equipment, and related FRUs. A.1 Controller Field Replaceable Units The following FRUs come with the various controller modules. Part numbers are correct as of publication of this manual but are subject to change. Always verify your information in case part numbers or ordering methods have changed. Table A-1 HSJ40 FRUs ------------------------------------------------------------ FRU Part Number ------------------------------------------------------------ HSJ40 CI SCSI controller module (including OCP and bezel) 70-30097-01 16 MB read cache module (Version 1) 54-22229-02 (discontinued) 32 MB read cache module (Version 1) 54-22229-01 (discontinued) 16 MB read cache module (Version 2) 54-22910-02 32 MB read cache module (Version 2) 54-22910-01 StorageWorks HSJ40 program card BG-PYU6A-0A CI internal cables GRAY-17-03427-02 SCSI-2 device port cables BN21H-02 ------------------------------------------------------------ Table A-2 HSJ30 FRUs ------------------------------------------------------------ FRU Part Number ------------------------------------------------------------ HSJ30 CI SCSI controller module (including OCP and bezel) 70-30097-02 16 MB read cache module (Version 2) 54-22910-02 32 MB read cache module (Version 2) 54-22910-01 StorageWorks HSJ30 program card BG-PYU6A-0A CI internal cables GRAY-17-03427-02 SCSI-2 device port cables BN21H-02 ------------------------------------------------------------ Field Replaceable Units A-1 Table A-3 HSD30 FRUs ------------------------------------------------------------ FRU Part Number ------------------------------------------------------------ HSD30 DSSI SCSI controller module (including bezel and trilink connector) 70-31458-01 16 MB read cache module (Version 2) 54-22910-02 32 MB read cache module (Version 2) 54-22910-01 StorageWorks HSD30 program card BG-Q6HL0-0A SCSI-2 device port cables BN21H-02 Trilink connector 12-39921-02 (included in 70-31458-01) 50-pin DSSI bus terminator 12-31281-01 ------------------------------------------------------------ Table A-4 HSZ40 FRUs ------------------------------------------------------------ FRU Part Number ------------------------------------------------------------ HSZ40 SCSI-to-SCSI controller module (including bezel and trilink connector) 70-31457-01 16 MB read cache module (Version 2) 54-22910-02 32 MB read cache module (Version 2) 54-22910-01 StorageWorks HSZ40 program card BG-Q6HN0-0A SCSI-2 device port cables BN21H-02 Trilink connector 12-39921-01 (included in 70-31457-01) 68-pin SCSI bus terminator 12-37004-03 ------------------------------------------------------------ A.2 Required Tools and Equipment The following tools and equipment are required for controller maintenance: · Portable antistatic kit, part number 29-26246-00 · ESD mat--for all module replacement service · 3/32-inch Allen wrench--for replacing HSJ-series controllers · 5/32-inch Allen Wrench--for opening the front door of a SW800 series data center cabinet. · Flat-head screwdriver--for replacing host cables, HSD-series controllers, and HSZ controllers · Small flat-head screwdriver--for replacing trilink connectors while SCSI cables are attached An EIA-423 compatible terminal is needed for setting the initial configuration. When using this terminal, a connecting cable (between the terminal and the controller) that supports EIA-423 communication is required. A-2 Field Replaceable Units A.3 Related Field Replaceable Units The following FRUs are related to the HS controllers. (Refer to the appropriate StorageWorks documentation for removal and replacement procedures for these components if not found in this manual.) Table A-5 Controller Related FRUs ------------------------------------------------------------ FRU Part Number ------------------------------------------------------------ CI external cable BLUE-17-01551-xx+ Controller shelf (with backplane) BA350-MA Device shelf (with backplane) BA350-SB Shelf power supply H7429-AA NULL modem DECconnect laptop 9-pin cable H8571-J DEC connect cable BC16E-xx+ SCSI-1-to-SCSI-2 transition cable, 0.2 meter (8-inch)+ 17-03831-01 ------------------------------------------------------------ +Where xx equals the length in feet. +When using a TZ8x7, a transition cable must be routed between the TZ8x7 device and the SCSI-2 cable (because the device is SCSI-1). ------------------------------------------------------------ Field Replaceable Units A-3 B ------------------------------------------------------------ Command Line Interpreter This appendix provides the following information: · A comprehensive list of all CLI commands · CLI error messages the operator may encounter · Examples of some common CLI-based procedures An overview of using the CLI, as well as a description of how to access and exit the CLI, is provided in Chapter 4. B.1 CLI Commands The following sections detail each of the allowable commands in the CLI with required parameters and qualifiers. The defaults for each qualifier are indicated by a capital ``D'' in parentheses (D). Examples are given after the command format, parameters, description, and qualifiers. Command Line Interpreter B-1 ADD CDROM ------------------------------------------------------------ ADD CDROM Adds a CDROM drive to the known list of CDROM drives. ------------------------------------------------------------ Note ------------------------------------------------------------ This command is valid for HSJ and HSD controllers only. ------------------------------------------------------------ Format ADD CDROM container-name SCSI-location Parameters container-name Specifies the name that will be used to refer to this CDROM drive. This name will be referred to when creating units and stripesets. The name must start with a letter (A through Z) and can then consist of up to eight more characters made up of A through Z, 0 through 9, period (.), dash (-) and underscore (_), for a total of nine characters. SCSI-location The location of the CDROM drive to be added in the form PTL where P designates the port (1 through 6 or 1 through 3, depending on the controller model), T designates the target ID of the CDROM drive, 0 through 6, in a nonfailover configuration, or 0 through 5 if the controller is in a failover configuration, and L designates the LUN of the CDROM drive (0 through 7). When entering PTL, at least one space must separate the the port, target, and LUN. Description Adds a CDROM drive to the known list of CDROM drives and names the drive. This command must be used when a new SCSI-2 CDROM drive is to be added to the configuration. Examples 1. CLI> ADD CDROM CD_PLAYER 1 0 0 A CDROM drive is added to port 1, target 0, LUN 0, and named CD_PLAYER. B-2 Command Line Interpreter ADD DISK ------------------------------------------------------------ ADD DISK Adds a disk drive to the known list of disk drives. Format ADD DISK container-name SCSI-location Parameters container-name Specifies the name that will be used to refer to this disk drive. This name will be referred to when creating units and stripesets. The name must start with a letter (A through Z) and can then consist of up to eight more characters made up of A through Z, 0 through 9, period (.), dash (-) and underscore (_), for a total of nine characters. SCSI-location The location of the disk drive to be added in the form PTL where P designates the port (1 through 6 or 1 through 3, depending on the controller model), T designates the target ID of the disk drive, 0 through 6, in a nonfailover configuration, or 0 through 5 if the controller is in a failover configuration, and L designates the LUN of the disk drive (0 through 7). When entering PTL, at least one space must separate the port, target, and LUN. Description Adds a disk drive to the known list of disk drives and names the drive. This command must be used when a new SCSI-2 disk drive is to be added to the configuration. Qualifiers TRANSPORTABLE NOTRANSPORTABLE (D) In normal operations, the controller makes a small portion of the disk inaccessible to the host and uses this area to store metadata, which improves data reliability, error detection, and recovery. This vast improvement comes at the expense of transportability. If NOTRANSPORTABLE is specified (or allowed to default) and there is no valid metadata on the unit, the unit must be initialized. If TRANSPORTABLE is specified and there is valid metadata on the unit, the unit will have to be initialized in order to remove the metadata. ------------------------------------------------------------ Note ------------------------------------------------------------ Digital recommends that you avoid specifying TRANSPORTABLE unless transportability of disk drives or media is imperative and there is no other way to accomplish the movement of data. ------------------------------------------------------------ When entering an ADD DISK command, NOTRANSPORTABLE is the default. Command Line Interpreter B-3 ADD DISK Examples 1. CLI> ADD DISK RZ26_100 1 0 0 A nontransportable disk is added to port 1, target 0, LUN 0, and named RZ26_100. 2. CLI> ADD DISK DISK0 2 3 0 NOTRANSPORTABLE A nontransportable disk is added to port 2, target 3, LUN 0, and named DISK0. 3. CLI> ADD DISK TDISK0 3 2 0 TRANSPORTABLE A transportable disk is added to port 3, target 2, LUN 0, and named TDISK0. B-4 Command Line Interpreter ADD STRIPESET ------------------------------------------------------------ ADD STRIPESET Creates a stripeset from a number of containers. Format ADD STRIPESET container-name container-name1 container-name2 [container-nameN] Parameters container-name Specifies the name that will be used to refer to this stripeset. The name must start with a letter (A through Z) and can then consist of up to eight more characters made up of A through Z, 0 through 9, period (.), dash (-) and underscore (_), for a total of nine characters. container-name1 container-name2 container-nameN The containers that will make up this stripeset. A stripeset may be made up of from two to fourteen containers. Description Adds a stripeset to the known list of stripesets and names the stripeset. This command must be used when a new stripeset is to be added to the configuration. Qualifiers CHUNKSIZE=n CHUNKSIZE=DEFAULT (D) Specifies the chunksize to be used. The chunksize may be specified in blocks (CHUNKSIZE=n), or you may let the controller determine the optimal chunksize (CHUNKSIZE=DEFAULT). When entering an ADD command, CHUNKSIZE=DEFAULT is the default. Examples 1. CLI> ADD STRIPESET STRIPE0 DISK0 DISK1 DISK2 DISK3 A STRIPESET is created out of four disks (DISK0, DISK1, DISK2 and DISK3). Because the chunksize was not specified, the chunksize will be the default. 2. CLI> ADD STRIPESET STRIPE0 DISK0 DISK1 DISK2 DISK3 CHUNKSIZE=16 A STRIPESET is created out of four disks (DISK0, DISK1, DISK2 and DISK3). The chunksize will be 16 blocks. Command Line Interpreter B-5 ADD TAPE ------------------------------------------------------------ ADD TAPE Adds a tape drive to the known list of tape drives. ------------------------------------------------------------ Note ------------------------------------------------------------ This command is valid for HSJ and HSD controllers only. ------------------------------------------------------------ Format ADD TAPE device-name SCSI-location Parameters device-name Specifies the name that will be used to refer to this tape drive. This name will be referred to when creating units. The name must start with a letter (A through Z) and can then consist of up to eight more characters made up of A through Z, 0 through 9, period (.), dash (-) and underscore (_), for a total of nine characters. SCSI-location The location of the tape drive to be added in the form PTL where P designates the port (1 through 6 or 1 through 3, depending on the controller model), T designates the target ID of the tape drive, 0 through 6, in a nonfailover configuration, or 0 through 5 if the controller is in a failover configuration, and L designates the LUN of the tape drive (0 through 7). When entering PTL, at least one space must separate the the port, target, and LUN. Description Adds a tape drive to the known list of tape drives and names the drive. This command must be used when a new SCSI-2 tape drive is to be added to the configuration. Examples 1. CLI> ADD TAPE TAPE0 1 0 0 A tape drive is added to port 1, target 0, LUN 0, and named TAPE0. B-6 Command Line Interpreter ADD UNIT ------------------------------------------------------------ ADD UNIT Adds a logical unit to the controller. Format ADD UNIT unit-number container-name Parameters unit-number (HSJ and HSD only) The device type letter followed by the logical unit number that the host will use to access the unit. The device type letter is either ``D'' for disk devices (including CDROMs) or ``T'' for tape devices. Using this format, logical unit 3, which is made up of a disk or disks (such as a stripeset), would be specified as D3. Logical unit 7, which is made up of a tape device would be T7. unit-number (HSZ only) The unit number determines both the target (0 though 7) and the LUN that the device will be made available from. The 100's place of the unit number is the target and the 1's place is the LUN. For example D401 would be target 4, LUN 1. D100 would be target 1, LUN 0. D5 would be target 0, LUN 5. ------------------------------------------------------------ Note ------------------------------------------------------------ The only target numbers specified in the unit number must be previously specified in the SET THIS_CONTROLLER ID=(n1, n2) command. A target number may not be specified that has not been previously specified by the SET THIS_CONTROLLER ID= command. ------------------------------------------------------------ container-name The name of the container that will be used to create the unit. Description The ADD UNIT command is used to add a logical unit for the host to access. All requests by the host to the logical unit number will be mapped as requests to the container specified in the ADD UNIT command. For disk devices (and stripesets built out of disk devices), the metadata on the container must be initialized before a unit may be created from it. If the container 's metadata cannot be found, or is incorrect, an error will be displayed and the unit will not be created. Qualifiers for a unit created from a CDROM drive (HSJ and HSD only) MAXIMUM_CACHED_TRANSFER=n MAXIMUM_CACHED_TRANSFER=32 (D) Specifies the maximum size transfer in blocks to be cached by the controller. Any transfers over this size will not be cached. Valid values are 1 through 1024. When entering the ADD UNIT command, MAXIMUM_CACHED_TRANSFER=32 is the default. Command Line Interpreter B-7 ADD UNIT READ_CACHE (D) NOREAD_CACHE Enables and disables the controller 's read cache on this unit. When entering an ADD UNIT command, READ_CACHE is the default. RUN (D) NORUN Enables and disables a unit's ability to be spun up. When RUN is specified, the devices that make up the unit will be spun up. If NORUN is specified, the unit will be spun down. When entering an ADD UNIT command, RUN is the default. Qualifiers for a unit created from a disk drive MAXIMUM_CACHED_TRANSFER=n MAXIMUM_CACHED_TRANSFER=32 (D) Specifies the maximum size transfer in blocks to be cached by the controller. Any transfers over this size will not be cached. Valid values are 1 through 1024. When entering the ADD UNIT command, MAXIMUM_CACHED_TRANSFER=32 is the default. READ_CACHE (D) NOREAD_CACHE Enables and disables the controller 's read cache on this unit. When entering an ADD UNIT command, READ_CACHE is the default. RUN (D) NORUN Enables and disables a unit's ability to be spun up. When RUN is specified, the devices that make up the unit will be spun up. If NORUN is specified the unit will be spun down. When entering an ADD UNIT command, RUN is the default. WRITE_PROTECT NOWRITE_PROTECT (D) Enables and disables write protection of the unit. When entering an ADD UNIT command, NOWRITE_PROTECT is the default. Qualifiers for a unit created from a stripeset MAXIMUM_CACHED_TRANSFER=n MAXIMUM_CACHED_TRANSFER=32 (D) Specifies the maximum size transfer in blocks to be cached by the controller. Any transfers over this size will not be cached. Valid values are 1 through 1024. When entering the ADD UNIT command, MAXIMUM_CACHED_TRANSFER=32 is the default. READ_CACHE (D) NOREAD_CACHE Enables and disables the controller 's read cache on this unit. When entering an ADD UNIT command, READ_CACHE is the default. B-8 Command Line Interpreter ADD UNIT RUN (D) NORUN Enables and disables a unit's ability to be spun up. When RUN is specified, the devices that make up the unit will be spun up. If NORUN is specified the unit will be spun down. When entering an ADD UNIT command, RUN is the default. WRITE_PROTECT NOWRITE_PROTECT (D) Enables and disables write protection of the unit. When entering an ADD UNIT command, NOWRITE_PROTECT is the default. Qualifiers for a unit created from a tape drive (HSJ and HSD only) DEFAULT_FORMAT=format DEFAULT_FORMAT=DEVICE_DEFAULT (D) Specifies the tape format to be used unless overridden by the host. Note that not all devices support all formats. The easiest way to determine what formats are supported by a specific device is to enter ``SHOW DEFAULT_ FORMAT= ?''--the valid options will be displayed. Supported tape formats are as follow: · DEVICE_DEFAULT The default tape format is the default that the device uses, or, in the case of devices that are settable via switches on the front panel, the settings of those switches. · 800BPI_9TRACK · 1600BPI_9TRACK · 6250BPI_9TRACK · TZ85 · TZ86 · TZ87_NOCOMPRESSION · TZ87_COMPRESSION · DAT_NOCOMPRESSION · DAT_COMPRESSION · 3480_NOCOMPRESSION · 3480_COMPRESSION When entering the ADD UNIT command for a tape device, DEFAULT_ FORMAT=DEVICE_DEFAULT is the default. Examples 1. CLI> ADD UNIT D0 DISK0 Disk unit number 0 is created from container DISK0. Command Line Interpreter B-9 ADD UNIT 2. CLI> ADD UNIT T0 TAPE12 Tape unit number 0 is created from container TAPE12. B-10 Command Line Interpreter CLEAR_ERRORS CLI ------------------------------------------------------------ CLEAR_ERRORS CLI Stops the display of errors at the CLI prompt. Format CLEAR_ERRORS CLI Description Errors detected by controller firmware are listed before the CLI prompt. These errors are listed even after the error condition is rectified, until either the controller is restarted, or the CLEAR_ERRORS CLI command is entered. ------------------------------------------------------------ Note ------------------------------------------------------------ This command does not clear the error conditions; it only clears the reporting of the errors at the CLI prompt. ------------------------------------------------------------ Examples 1. CLI> All NVPM components initialized to their default settings. CLI> CLEAR_ERRORS CLI CLI> This clears the message ``All NVPM components initialized to their default settings.'' that was displayed at the CLI prompt. Command Line Interpreter B-11 DELETE container-name ------------------------------------------------------------ DELETE container-name Deletes a container from the list of known containers. Format DELETE container-name Parameters container-name Specifies the name that identifies the container. This is the name given the container when it was created using the ADD command (ADD DEVICE, ADD STRIPESET, and so forth). Description Checks to see if the container is used by any other containers or a unit. If the container is in use, an error will be displayed and the container will not be deleted. If the container is not in use, it is deleted. Examples 1. CLI> DELETE DISK0 DISK0 is deleted from the known list of containers. 2. CLI> DELETE STRIPE0 STRIPE0 is deleted from the known list of containers. B-12 Command Line Interpreter DELETE unit-number ------------------------------------------------------------ DELETE unit-number Deletes a unit from the list of known units. Format DELETE unit-number Parameters unit-number Specifies the logical unit number (on HSDs and HSJs D0-D4094 or T0-T4094, on HSZs D0-D7 or T0-T7) that is to be deleted. This is the name given the unit when it was created using the ADD UNIT command. Description If the logical unit specified is on line to a host, the unit will not be deleted unless the OVERRIDE_ONLINE qualifier is specified. If any errors occur when trying to flush the user data, the logical unit will not be deleted. Qualifiers for HSD and HSJ controllers OVERRIDE_ONLINE NOOVERRIDE_ONLINE (D) If the logical unit is on line to the controller, it will not be deleted unless OVERRIDE_ONLINE is specified. If the OVERRIDE_ONLINE qualifier is specified, the unit will be spun down, the user data will be flushed to disk and the logical unit will be deleted. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the OVERRIDE_ONLINE qualifier is specified. ------------------------------------------------------------ NOOVERRIDE_ONLINE is the default. Examples 1. CLI> DELETE D12 Disk unit number 12 is deleted from the known list of units. 2. CLI> DELETE T3 OVERRIDE_ONLINE Tape unit number 3 is deleted from the known list of units even if it is currently on line to a host. Command Line Interpreter B-13 DIRECTORY ------------------------------------------------------------ DIRECTORY Lists the diagnostics and utilities available on THIS_CONTROLLER. Format DIRECTORY Description The DIRECTORY command lists the various diagnostics and utilities that are available on THIS_CONTROLLER. A directory of diagnostics and utilities available on this controller is displayed. For specific information about the diagnostics and utilities available, refer to the StorageWorks Array Controllers HS Family of Array Controllers Service Manual. Examples 1. CLI> DIRECTORY TILX X067 D DILX X067 D VTDPY X067 D ECHO X067 D DIRECTX067 D CLI X067 D A directory listing. B-14 Command Line Interpreter EXIT ------------------------------------------------------------ EXIT Exits the CLI and breaks a virtual terminal connection. Format EXIT Description When entering the EXIT command from a host, using a virtual terminal connection, the connection is broken and control is returned to the host. If entered from a maintenance terminal, the EXIT command restarts the CLI, displaying the copyright notice, the controller type, and the last fail packet. Examples 1. CLI> EXIT Copyright © Digital Equipment Corporation 1993 HSJ40 Software version E140, Hardware version 0000 Last fail code: 01800080 Press " ?" at any time for help. CLI> An EXIT command issued from a maintenance terminal. 2. CLI> EXIT Control returned to host $ An EXIT command entered on a terminal that was connected to the CLI via a DUP connection. Command Line Interpreter B-15 HELP ------------------------------------------------------------ HELP Displays an overview of how to get help. Format HELP Description The HELP command displays a brief description on how to use the question mark (?) to obtain help on any command or function of the CLI. Examples 1. CLI> HELP Help may be requested by typing a question mark (?) at the CLI prompt. This will display a list of all available commands For further information you may enter a partial command and type a space followed by a "?" to print a list of all available options at that point in the command. For example: SET THIS_CONTROLLER ? Will print a list of all legal SET THIS_CONTROLLER commands Displaying help using the HELP command. 2. CLI> SET ? Your options are: FAILOVER OTHER_CONTROLLER NOFAILOVER THIS_CONTROLLER Unit number or container name Obtaining help on the SET command, using the ``?'' facility. B-16 Command Line Interpreter INITIALIZE ------------------------------------------------------------ INITIALIZE Initializes the metadata on the container specified. Format INITIALIZE container-name Parameters container-name Specifies the container name to initialize. Description The INITIALIZE command initializes a container so a logical unit may be created from it. When initializing a single disk drive container, if NOTRANSPORTABLE was specified or allowed to default on the ADD DISK or SET disk-name commands, a small amount of disk space is made inaccessible to the host and used for metadata. The metadata will be initialized. If TRANSPORTABLE was specified, any metadata will be destroyed on the device and the full device will be accessible to the host. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ The INITIALIZE command destroys all customer data on the container. ------------------------------------------------------------ When an initialize is required: · When a unit is going to be created from a newly installed disk · When a unit is going to be created from a newly created storage set (stripeset) When an initialize is specifically not required: · When a unit has been deleted, and a new unit is going to be created from the same container. · When a storage set that was initialized in the past has been deleted, then re-added using the same members as before. Examples 1. CLI> INITIALIZE DISK0 Container DISK0 is initialized. If NOTRANSPORTABLE was specified (or allowed to default), metadata is written on it. 2. CLI> INITIALIZE STRIPE0 Container STRIPE0 is initialized and metadata is written on it. Command Line Interpreter B-17 LOCATE ------------------------------------------------------------ LOCATE Locates devices (disks, tapes, and storage sets) by lighting the amber device fault LED on the StorageWorks building block (SBB). Format LOCATE Description The LOCATE command illuminates the amber device fault LEDs (the lower LED on the front of an SBB) of the containers specified. The LOCATE command can also be used as a lamp test. Qualifiers ALL The LOCATE ALL command turns on the amber device fault LEDs of all configured devices. This qualifier can also be used as a lamp test. See LOCATE CANCEL to turn off the LEDs. An error is displayed if no devices have been configured. CANCEL The LOCATE CANCEL command turns off all amber device fault LEDs on all configured devices. An error is displayed if no devices have been configured. DISKS The LOCATE DISKS command turns on the amber device fault LEDs of all configured disks. See LOCATE CANCEL to turn off the LEDs. An error is displayed if no disks have been configured. TAPES The LOCATE TAPES command turns on the amber device fault LEDs of all configured tape devices. See LOCATE CANCEL to turn off the LEDs. An error is displayed if no tape devices have been configured. UNITS The LOCATE UNITS command turns on the amber device fault LEDs of all devices used by units. This command is useful to determine which devices are not currently configured into logical units. See LOCATE CANCEL to turn off device the LEDs. An error is displayed if no units have been configured. PTL SCSI-location The LOCATE PTL SCSI-location command turns on the amber device fault LEDs at the given SCSI location. SCSI-location is specified in the form PTL where P designates the port (1 through 6 or 1 through 3, depending on the controller model), T designates the target ID of the device (0 through 6) in a nonfailover configuration, or (0 through 5) if the controller is in a failover configuration, and L designates the LUN of the device (0 through 7). B-18 Command Line Interpreter LOCATE When entering the PTL, at least one space must separate the port, target, and LUN. See LOCATE CANCEL to turn off the LEDs. An error is displayed if the port, target, or LUN is invalid, or if no device is configured at that location. device or storage set name or unit number (entity) The LOCATE entity turns on the amber device fault LEDs that make up the entity supplied. If a device name is given, the device's LED is lit. If a storage set name is given, all device LEDs that make up the storage set are lit. If a unit number is given, all device LEDs that make up the unit are lit. See LOCATE CANCEL to turn off the LEDs. An error is displayed if no entity by that name or number has been configured. Examples 1. CLI> LOCATE DISK0 Turns on the device fault LED on device DISK0. 2. CLI> LOCATE D12 Turns on the device fault LEDs on all devices that make up disk unit number 12. 3. CLI> LOCATE DISKS Turns on the device fault LEDs on all disk devices. Command Line Interpreter B-19 RENAME ------------------------------------------------------------ RENAME Renames a container. Format RENAME old-container-name new-container-name Parameters old-container-name Specifies the existing name that identifies the container. new-container-name Specifies the new name to identify the container. This name is referred to when creating units and storage sets. The name must start with a letter (A through Z) and can then consist of up to eight more characters made up of A through Z, 0 through 9, period (.), dash (-) and underscore (_), for a total of nine characters. Description Gives a known container a new name by which to be referred. Examples 1. CLI> RENAME DISK0 DISK100 Rename container DISK0 to DISK100. B-20 Command Line Interpreter RESTART OTHER_CONTROLLER ------------------------------------------------------------ RESTART OTHER_CONTROLLER Restarts the other controller. ------------------------------------------------------------ Note ------------------------------------------------------------ This command is valid for HSJ and HSD controllers only. ------------------------------------------------------------ Format RESTART OTHER_CONTROLLER Description The RESTART OTHER_CONTROLLER command restarts the other controller. If any disks are on line to the other controller, the controller will not restart unless the OVERRIDE_ONLINE qualifier is specified (HSD and HSJ only). If any user data cannot be flushed to disk, the controller will not restart unless the IGNORE_ERRORS qualifier is specified. Specifying IMMEDIATE will cause the other controller to restart immediately without flushing any user data to the disks, even if drives are on line to the host. The RESTART OTHER_CONTROLLER command will not cause a failover to this controller in a dual-redundant configuration. The other controller will restart and resume operations where it was interrupted. Qualifiers for HSD and HSJ controllers IGNORE_ERRORS NOIGNORE_ERRORS (D) If errors result when trying to write user data, the controller will not be restarted unless IGNORE_ERROR is specified. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the IGNORE_ERRORS qualifier is specified. ------------------------------------------------------------ NOIGNORE_ERRORS is the default. Command Line Interpreter B-21 RESTART OTHER_CONTROLLER IMMEDIATE NOIMMEDIATE (D) If IMMEDIATE is specified, immediately restart the controller without checking for online devices. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the IMMEDIATE qualifier is specified. ------------------------------------------------------------ NOIMMEDIATE is the default. OVERRIDE_ONLINE NOOVERRIDE_ONLINE (D) If any units are on line to the controller, the controller will not be restarted unless OVERRIDE_ONLINE is specified. If the OVERRIDE_ONLINE qualifier is specified, the controller will restart after all customer data is written to disk. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the OVERRIDE_ONLINE qualifier is specified. ------------------------------------------------------------ NOOVERRIDE_ONLINE is the default. Examples 1. CLI> RESTART OTHER_CONTROLLER Restart the other controller as long as the other controller does not have any units that are on line. 2. CLI> RESTART OTHER_CONTROLLER OVERRIDE_ONLINE Restart the other controller even if there are units on line to the other controller. B-22 Command Line Interpreter RESTART THIS_CONTROLLER ------------------------------------------------------------ RESTART THIS_CONTROLLER Restarts this controller. Format RESTART THIS_CONTROLLER Description The RESTART THIS_CONTROLLER command restarts this controller. If any disks are on line to this controller, the controller will not restart unless the OVERRIDE_ONLINE qualifier is specified (HSD and HSJ only). If any user data cannot be flushed to disk, the controller will not restart unless the IGNORE_ ERRORS qualifier is specified. Specifying IMMEDIATE will cause this controller to restart immediately without flushing any user data to the disks, even if drives are on line to a host. The RESTART THIS_CONTROLLER command will not cause a failover to the other controller in a dual-redundant configuration. This controller will restart and resume operations where it was interrupted. ------------------------------------------------------------ Note ------------------------------------------------------------ If you enter a RESTART THIS_CONTROLLER command and you are using a virtual terminal to communicate with the controller, the connection will be lost when this controller restarts. ------------------------------------------------------------ Qualifiers for HSD and HSJ controllers IGNORE_ERRORS NOIGNORE_ERRORS (D) If errors result when trying to write user data, the controller will not be restarted unless IGNORE_ERROR is specified. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the IGNORE_ERRORS qualifier is specified. ------------------------------------------------------------ NOIGNORE_ERRORS is the default. IMMEDIATE NOIMMEDIATE (D) If IMMEDIATE is specified, immediately restart the controller without checking for online devices. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the IMMEDIATE qualifier is specified. ------------------------------------------------------------ Command Line Interpreter B-23 RESTART THIS_CONTROLLER NOIMMEDIATE is the default. OVERRIDE_ONLINE NOOVERRIDE_ONLINE (D) If any units are on line to the controller, the controller will not be restarted unless OVERRIDE_ONLINE is specified. If the OVERRIDE_ONLINE qualifier is specified, the controller will restart after all customer data is written to disk. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the OVERRIDE_ONLINE qualifier is specified. ------------------------------------------------------------ NOOVERRIDE_ONLINE is the default. Qualifiers for HSZ controllers IGNORE_ERRORS NOIGNORE_ERRORS (D) If errors result when trying to write user data, the controller will not be restarted unless IGNORE_ERROR is specified. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the IGNORE_ERRORS qualifier is specified. ------------------------------------------------------------ NOIGNORE_ERRORS is the default. IMMEDIATE NOIMMEDIATE (D) If IMMEDIATE is specified, immediately restart the controller without checking for online devices. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the IMMEDIATE qualifier is specified. ------------------------------------------------------------ NOIMMEDIATE is the default. Examples 1. CLI> RESTART THIS_CONTROLLER Restart this controller as long as this controller does not have any units that are on line. 2. CLI> RESTART THIS_CONTROLLER OVERRIDE_ONLINE Restart this controller even if there are units on line to this controller. B-24 Command Line Interpreter RUN ------------------------------------------------------------ RUN Runs a diagnostic or utility on THIS_CONTROLLER. Format RUN program-name Parameters program-name The name of the diagnostic or utility to be run. DILX and TILX are examples of utilities and diagnostics that can be run from the CLI. Description The RUN command enables various diagnostics and utilities on THIS_CONTROLLER. Diagnostics and utilities can only be run on the controller where the terminal or DUP connection is connected. For specific information about available diagnostics and utilities, refer to the StorageWorks Array Controllers HS Family of Array Controllers Service Manual. Examples 1. CLI> RUN DILX Copyright © Digital Equipment Corporation 1993 Disk Inline Exerciser - version 1.0 . . . How the diagnostic DILX would be run. Command Line Interpreter B-25 SELFTEST OTHER_CONTROLLER ------------------------------------------------------------ SELFTEST OTHER_CONTROLLER Runs a self-test on the other controller. ------------------------------------------------------------ Note ------------------------------------------------------------ This command is valid for HSJ and HSD controllers only. ------------------------------------------------------------ Format SELFTEST OTHER_CONTROLLER Description The SELFTEST OTHER_CONTROLLER command shuts down the other controller, then restarts it in DAEMON loop-on-self-test mode. The OCP reset (//) button must be pushed to take the other controller out of loop-on-self-test mode. If any disks are on line to the other controller, the controller will not self-test unless the OVERRIDE_ONLINE qualifier is specified (HSD and HSJ only). If any user data cannot be flushed to disk, the controller will not self-test unless the IGNORE_ERRORS qualifier is specified. Specifying IMMEDIATE will cause the other controller to self-test immediately without flushing any user data to the disks, even if drives are on line to the host. Qualifiers for HSD and HSJ controllers IGNORE_ERRORS NOIGNORE_ERRORS (D) If errors result when trying to write user data, the controller will not start self-test unless IGNORE_ERROR is specified. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the IGNORE_ERRORS qualifier is specified. ------------------------------------------------------------ NOIGNORE_ERRORS is the default. IMMEDIATE NOIMMEDIATE (D) If IMMEDIATE is specified, immediately start the self-test on the controller without checking for online devices. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the IMMEDIATE qualifier is specified. ------------------------------------------------------------ NOIMMEDIATE is the default. B-26 Command Line Interpreter SELFTEST OTHER_CONTROLLER OVERRIDE_ONLINE NOOVERRIDE_ONLINE (D) If any units are on line to the controller, self-test will not take place unless OVERRIDE_ONLINE is specified. If the OVERRIDE_ONLINE qualifier is specified, the controller will start self-test after all customer data is written to disk . ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the OVERRIDE_ONLINE qualifier is specified. ------------------------------------------------------------ NOOVERRIDE_ONLINE is the default. Examples 1. CLI> SELFTEST OTHER_CONTROLLER Start the self-test on the other controller, as long as the other controller does not have any units that are on line. 2. CLI> SELFTEST OTHER_CONTROLLER OVERRIDE_ONLINE Start the self-test on the other controller even if there are units on line to the other controller. Command Line Interpreter B-27 SELFTEST THIS_CONTROLLER ------------------------------------------------------------ SELFTEST THIS_CONTROLLER Runs a self-test on this controller. Format SELFTEST THIS_CONTROLLER Description The SELFTEST THIS_CONTROLLER command shuts down the this controller, then restarts it in DAEMON loop-on-self-test mode. The OCP reset (//) button must be pushed to take this controller out of loop-on-self-test mode. If any disks are on line to this controller, the controller will not self-test unless the OVERRIDE_ONLINE qualifier is specified (HSD and HSJ only). If any user data cannot be flushed to disk, the controller will not self-test unless the IGNORE_ERRORS qualifier is specified. Specifying IMMEDIATE will cause this controller to self-test immediately without flushing any user data to the disks, even if drives are on line to a host. ------------------------------------------------------------ Note ------------------------------------------------------------ If you enter a SELFTEST THIS_CONTROLLER command, and you are using a virtual terminal to communicate with the controller, the connection will be lost when this controller starts the self-test. ------------------------------------------------------------ Qualifiers for HSD and HSJ controllers IGNORE_ERRORS NOIGNORE_ERRORS (D) If errors result when trying to write user data, the controller will not start self-test unless IGNORE_ERROR is specified. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the IGNORE_ERRORS qualifier is specified. ------------------------------------------------------------ NOIGNORE_ERRORS is the default. IMMEDIATE NOIMMEDIATE (D) If IMMEDIATE is specified, immediately start the self-test on the controller without checking for online devices. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the IMMEDIATE qualifier is specified. ------------------------------------------------------------ NOIMMEDIATE is the default. B-28 Command Line Interpreter SELFTEST THIS_CONTROLLER OVERRIDE_ONLINE NOOVERRIDE_ONLINE (D) If any units are on line to the controller, SELFTEST will not take place unless OVERRIDE_ONLINE is specified. If the OVERRIDE_ONLINE qualifier is specified, the controller will start self-test after all customer data is written to disk . ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the OVERRIDE_ONLINE qualifier is specified. ------------------------------------------------------------ NOOVERRIDE_ONLINE is the default. Qualifiers for HSZ controllers IGNORE_ERRORS NOIGNORE_ERRORS (D) If errors result when trying to write user data, the controller will not start self-test unless IGNORE_ERROR is specified. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the IGNORE_ERRORS qualifier is specified. ------------------------------------------------------------ NOIGNORE_ERRORS is the default. IMMEDIATE NOIMMEDIATE (D) If IMMEDIATE is specified, immediately start the self-test on the controller without checking for online devices. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the IMMEDIATE qualifier is specified. ------------------------------------------------------------ NOIMMEDIATE is the default. Examples 1. CLI> SELFTEST THIS_CONTROLLER Start the self-test on this controller as long as this controller does not have any units on line. 2. CLI> SELFTEST THIS_CONTROLLER OVERRIDE_ONLINE Start the self-test on this controller even if there are units on line to this controller. Command Line Interpreter B-29 SET disk-container-name ------------------------------------------------------------ SET disk-container-name Modifies the characteristics of a disk drive. Format SET disk-container-name Parameters disk-container-name Specifies the name of the disk drive whose characteristics will be modified. Description Changes the characteristics of a disk drive. Qualifiers TRANSPORTABLE NOTRANSPORTABLE (D) In normal operations, the controller makes a small portion of the disk inaccessible to the host and uses this area to store metadata, which improves data reliability, error detection, and recovery. This vast improvement comes at the expense of transportability. If NOTRANSPORTABLE is specified (or allowed to default) and there is no valid metadata on the unit, the unit must be initialized. If TRANSPORTABLE is specified and there is valid metadata on the unit, the unit will have to be initialized in order to remove the metadata. ------------------------------------------------------------ Note ------------------------------------------------------------ Digital recommends that you avoid specifying TRANSPORTABLE unless transportability of disk drives or media is imperative and there is no other way to accomplish the movement of data. ------------------------------------------------------------ When entering an ADD DISK command, NOTRANSPORTABLE is the default. Examples 1. CLI> SET DISK130 TRANSPORTABLE DISK130 is made transportable. B-30 Command Line Interpreter SET FAILOVER ------------------------------------------------------------ SET FAILOVER Places THIS_CONTROLLER and OTHER_CONTROLLER into a dual-redundant configuration. Format SET FAILOVER COPY=configuration-source Parameters COPY=configuration-source Specifies where the ``good'' copy of the device configuration resides. If THIS_CONTROLLER is specified for configuration-source, all the device configuration information on THIS_CONTROLLER (the one that either the maintenance terminal is connected to or the virtual terminal is connected to) is copied to the other controller. If OTHER_CONTROLLER is specified for configuration-source, all the device configuration information on the OTHER_CONTROLLER (the controller that either the maintenance terminal or the virtual terminal connection is not connected to) will be copied to this controller. Description The SET FAILOVER command places THIS_CONTROLLER and the OTHER_CONTROLLER in a dual-redundant configuration. After entering this command, if one of the two controllers fail, the devices attached to the failed controller become available to and accessible through the operating controller. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ All device configuration information on the controller not specified by the COPY= parameter is destroyed and overwritten by the configuration information found in the controller specified by the COPY= parameter. Make sure you know where your good configuration information is stored, or you have a complete copy of the device configuration, BEFORE entering this command. A considerable amount of work and effort will be lost by overwriting a good configuration with incorrect information if the wrong controller is specified by the COPY= parameter. Also note that due to the amount of information that must be passed between the two controllers, this command may take up to 1 minute to complete. ------------------------------------------------------------ Examples 1. CLI> SET FAILOVER COPY=THIS_CONTROLLER This places two controllers into a dual-redundant configuration, where the ``good'' data was on the controller that the maintenance terminal or virtual terminal connection was connected to. Command Line Interpreter B-31 SET FAILOVER 2. CLI> SET FAILOVER COPY=OTHER_CONTROLLER This places two controllers into a dual-redundant configuration, where the ``good'' data was on the controller that the maintenance terminal or virtual terminal connection was not connected to. B-32 Command Line Interpreter SET NOFAILOVER ------------------------------------------------------------ SET NOFAILOVER Removes THIS_CONTROLLER and OTHER_CONTROLLER (if reachable) from a dual-redundant configuration. Format SET NOFAILOVER Description The SET NOFAILOVER command removes THIS_CONTROLLER and the OTHER_CONTROLLER (if currently reachable) from a dual-redundant configuration. Before or immediately after entering this command, one controller should be physically removed because the sharing of devices is not supported by single controller configurations. The controller on which the command was entered will always be removed from a dual-redundant state, even if the other controller is not currently reachable. No configuration information is lost when leaving a dual-redundant state. Examples 1. CLI> SET NOFAILOVER The two controllers are taken out of dual-redundant configuration. Command Line Interpreter B-33 SET OTHER_CONTROLLER ------------------------------------------------------------ SET OTHER_CONTROLLER Modifies the other controller 's parameters (in a dual-redundant configuration the controller that the maintenance terminal is not connected to or the controller that is not the target of the DUP connection. ------------------------------------------------------------ Note ------------------------------------------------------------ This command is valid for HSJ and HSD controllers only. ------------------------------------------------------------ Format SET OTHER_CONTROLLER Description The SET OTHER_CONTROLLER command allows you to modify the controller parameters of the other controller in a dual-redundant configuration. Qualifiers for HSD controllers ID=n Specifies the DSSI node number (0 through 7). MSCP_ALLOCATION_CLASS=n Specifies the allocation class (0 through 255 in a single controller configuration or 1 through 255 in a dual-redundant configuration). When first installed, the controller 's MSCP_ALLOCATION_CLASS is set to 0. PATH NOPATH Enables or disables the DSSI port. When first installed, NOPATH is set. PROMPT="new prompt" Specifies a 1- to 16-character prompt enclosed in quotes that will be displayed when the controller 's CLI prompts for input. Only printable ASCII characters are valid. When first installed, the CLI prompt is set to the first three letters of the controller 's model number (for example, HSJ>, HSD> or HSZ>). SCS_NODENAME="xxxxxx" Specifies a one to six character name for node. TERMINAL_PARITY={ODD,EVEN} NOTERMINAL_PARITY Specifies the parity transmitted and expected. Parity options are ODD or EVEN. NOTERMINAL_PARITY causes the controller not to check for or transmit any parity on the terminal lines. B-34 Command Line Interpreter SET OTHER_CONTROLLER When first installed, the controller 's terminal parity is set to NOTERMINAL_PARITY. TERMINAL_SPEED=baud_rate Sets the terminal speed to 300, 600, 1200, 2400, 4800, or 9600 baud. The transmit speed is always equal to the receive speed. When first installed, the controller 's terminal speed is set to 9600 baud. TMSCP_ALLOCATION_CLASS=n Specifies the allocation class (0 through 255 in a single controller configuration or 1 through 255 in a dual-redundant configuration). When first installed, the controller 's TMSCP_ALLOCATION_CLASS is set to 0. Qualifiers for HSJ controllers ID=n Specifies the CI node number (0 through (MAX_NODES 1)). MAX_NODES=n Specifies the maximum number of nodes (8, 16, or 32). When first installed, the controller 's MAX_NODES is set to 16. MSCP_ALLOCATION_CLASS=n Specifies the allocation class (0 through 255 in a single controller configuration or 1 through 255 in a dual-redundant configuration). When first installed, the controller 's MSCP_ALLOCATION_CLASS is set to 0. PATH_A NOPATH_A Enables or disables CI Path A. When first installed, NOPATH_A is set. PATH_B NOPATH_B Enables or disables CI Path B. When first installed, NOPATH_B is set. PROMPT="new prompt" Specifies a 1- to 16-character prompt enclosed in quotes that will be displayed when the controller 's CLI prompts for input. Only printable ASCII characters are valid. When first installed, the CLI prompt is set to the first three letters of the controller 's model number (for example, HSJ>, HSD> or HSZ>). SCS_NODENAME="xxxxxx" Specifies a one to six character name for node. TERMINAL_PARITY={ODD,EVEN} NOTERMINAL_PARITY Specifies the parity transmitted and expected. Parity options are ODD or EVEN. NOTERMINAL_PARITY causes the controller not to check for or transmit any parity on the terminal lines. Command Line Interpreter B-35 SET OTHER_CONTROLLER When first installed, the controller 's terminal parity is set to NOTERMINAL_PARITY. TERMINAL_SPEED=baud_rate Sets the terminal speed to 300, 600, 1200, 2400, 4800, or 9600 baud. The transmit speed is always equal to the receive speed. When first installed, the controller 's terminal speed is set to 9600 baud. TMSCP_ALLOCATION_CLASS=n Specifies the allocation class (0 through 255 in a single controller configuration or 1 through 255 in a dual-redundant configuration). When first installed, the controller 's TMSCP_ALLOCATION_CLASS is set to 0. Examples 1. CLI> SET OTHER_CONTROLLER PATH_A PATH_B SPEED=1200 Turns on the other HSJ controller 's two CI paths and sets the terminal speed to 1200 baud. B-36 Command Line Interpreter SET stripeset-container-name ------------------------------------------------------------ SET stripeset-container-name Modifies the characteristics of a stripeset. Format SET stripeset-container-name Parameters stripeset-container-name Specifies the name of the stripeset whose characteristics will be modified. Description Changes the characteristics of a stripeset. Qualifiers CHUNKSIZE=n CHUNKSIZE=DEFAULT (D) Specifies the chunksize to be used. The chunksize may be specified in blocks (CHUNKSIZE=n), or you may let the controller determine the optimal chunksize (CHUNKSIZE=DEFAULT). When entering an ADD command, CHUNKSIZE=DEFAULT is the default. ------------------------------------------------------------ Note ------------------------------------------------------------ The chunksize may not be changed if the stripeset is currently in use by a unit. To change the chunksize, the unit must first be deleted, then the chunksize may be changed. ------------------------------------------------------------ ------------------------------------------------------------ CAUTION ------------------------------------------------------------ If the chunksize is changed the stripeset must be initialized, which will destroy all customer data on the stripeset. ------------------------------------------------------------ Examples 1. CLI> SET STRIPE0 CHUNKSIZE=32 Stripeset STRIPE0's chunksize is set to 32. Command Line Interpreter B-37 SET THIS_CONTROLLER ------------------------------------------------------------ SET THIS_CONTROLLER Modifies this controller 's parameters (the controller that the maintenance terminal is connected to or the target of the DUP connection). Format SET THIS_CONTROLLER Description The SET THIS_CONTROLLER command allows you to modify controller parameters on THIS_CONTROLLER in single and dual-redundant configurations. Qualifiers for HSD controllers ID=n Specifies the DSSI node number (0 through 7). MSCP_ALLOCATION_CLASS=n Specifies the allocation class (0 through 255 in a single controller configuration or 1 through 255 in a dual-redundant configuration). When first installed, the controller 's MSCP_ALLOCATION_CLASS is set to 0. PATH NOPATH Enables or disables the DSSI port. When first installed, NOPATH is set. PROMPT="new prompt" Specifies a 1- to 16-character prompt enclosed in quotes that will be displayed when the controller 's CLI prompts for input. Only printable ASCII characters are valid. When first installed, the CLI prompt is set to the first three letters of the controller 's model number (for example, HSJ>, HSD> or HSZ>). SCS_NODENAME="xxxxxx" Specifies a one to six character name for node. TERMINAL_PARITY={ODD,EVEN} NOTERMINAL_PARITY Specifies the parity transmitted and expected. Parity options are ODD or EVEN. NOTERMINAL_PARITY causes the controller not to check for or transmit any parity on the terminal lines. When first installed, the controller 's terminal parity is set to NOTERMINAL_PARITY. TERMINAL_SPEED=baud_rate Sets the terminal speed to 300, 600, 1200, 2400, 4800, or 9600 baud. The transmit speed is always equal to the receive speed. When first installed, the controller 's terminal speed is set to 9600 baud. B-38 Command Line Interpreter SET THIS_CONTROLLER TMSCP_ALLOCATION_CLASS=n Specifies the allocation class (0 through 255 in a single controller configuration or 1 through 255 in a dual-redundant configuration). When first installed, the controller 's TMSCP_ALLOCATION_CLASS is set to 0. Qualifiers for HSJ controllers ID=n Specifies the CI node number (0 through (MAX_NODES 1)). MAX_NODES=n Specifies the maximum number of nodes (8, 16, or 32). When first installed, the controller 's MAX_NODES is set to 16. MSCP_ALLOCATION_CLASS=n Specifies the allocation class (0 through 255 in a single controller configuration or 1 through 255 in a dual-redundant configuration). When first installed, the controller 's MSCP_ALLOCATION_CLASS is set to 0. PATH_A NOPATH_A Enables or disables CI Path A. When first installed, NOPATH_A is set. PATH_B NOPATH_B Enables or disables CI Path B. When first installed, NOPATH_B is set. PROMPT="new prompt" Specifies a 1- to 16-character prompt enclosed in quotes that will be displayed when the controller 's CLI prompts for input. Only printable ASCII characters are valid. When first installed, the CLI prompt is set to the first three letters of the controller 's model number (for example, HSJ>, HSD> or HSZ>). SCS_NODENAME="xxxxxx" Specifies a one to six character name for node. TERMINAL_PARITY={ODD,EVEN} NOTERMINAL_PARITY Specifies the parity transmitted and expected. Parity options are ODD or EVEN. NOTERMINAL_PARITY causes the controller not to check for or transmit any parity on the terminal lines. When first installed, the controller 's terminal parity is set to NOTERMINAL_PARITY. TERMINAL_SPEED=baud_rate Sets the terminal speed to 300, 600, 1200, 2400, 4800, or 9600 baud. The transmit speed is always equal to the receive speed. When first installed, the controller 's terminal speed is set to 9600 baud. Command Line Interpreter B-39 SET THIS_CONTROLLER TMSCP_ALLOCATION_CLASS=n Specifies the allocation class (0 through 255 in a single controller configuration or 1 through 255 in a dual-redundant configuration). When first installed, the controller 's TMSCP_ALLOCATION_CLASS is set to 0. Qualifiers for HSZ controllers ID=n or ID=n1,n2 Specifies one or two SCSI target IDs (0 through 7). If two target IDs are specified, they must be enclosed in parenthesis and separated by a comma. ------------------------------------------------------------ Note ------------------------------------------------------------ The unit number determines which target the LUN will be available under. For example, D203 would be target 2, LUN 3. D500 would be target 5, LUN 0. D5 would be target 0, LUN 5. ------------------------------------------------------------ PROMPT="new prompt" Specifies a 1- to 16-character prompt enclosed in quotes that will be displayed when the controller 's CLI prompts for input. Only printable ASCII characters are valid. When first installed, the CLI prompt is set to the first three letters of the controller 's model number (for example, HSJ>, HSD> or HSZ>). TERMINAL_PARITY={ODD,EVEN} NOTERMINAL_PARITY Specifies the parity transmitted and expected. Parity options are ODD or EVEN. NOTERMINAL_PARITY causes the controller not to check for or transmit any parity on the terminal lines. When first installed, the controller 's terminal parity is set to NOTERMINAL_PARITY. TERMINAL_SPEED=baud_rate Sets the terminal speed to 300, 600, 1200, 2400, 4800, or 9600 baud. The transmit speed is always equal to the receive speed. When first installed, the controller 's terminal speed is set to 9600 baud. Examples 1. CLI> SET THIS_CONTROLLER PATH_A PATH_B SPEED=1200 Turns on this HSJ controller 's two CI paths and sets the terminal speed to 1200 baud. 2. CLI> SET THIS_CONTROLLER ID=5 Sets this HSZ controller so it responds to requests for target 5. 3. CLI> SET THIS_CONTROLLER ID=(2,5) Sets this HSZ controller so it responds to requests for targets 2 and 5. B-40 Command Line Interpreter SET unit-number ------------------------------------------------------------ SET unit-number Modifies the unit parameters. Format SET unit-number Parameters unit-number Specifies the logical unit number (on HSDs and HSJs D0-D4094 or T0-T4094, on HSZs D0-D7 or T0-T7) whose software switches are to be modified. This is the name given the unit when it was created using the ADD UNIT command. Description The SET command is used to change logical unit parameters. Qualifiers for a unit created from a CDROM drive (HSJ and HSD only) MAXIMUM_CACHED_TRANSFER=n MAXIMUM_CACHED_TRANSFER=32 (D) Specifies the maximum size transfer in blocks to be cached by the controller. Any transfers over this size will not be cached. Valid values are 1 through 1024. When entering the ADD UNIT command, MAXIMUM_CACHED_TRANSFER=32 is the default. READ_CACHE (D) NOREAD_CACHE Enables and disables the controller 's read cache on this unit. When entering an ADD UNIT command, READ_CACHE is the default. RUN (D) NORUN Enables and disables a unit's ability to be spun up. When RUN is specified, the devices that make up the unit will be spun up. If NORUN is specified the unit will be spun down. When entering an ADD UNIT command, RUN is the default. Qualifiers for a unit created from a disk drive MAXIMUM_CACHED_TRANSFER=n MAXIMUM_CACHED_TRANSFER=32 (D) Specifies the maximum size transfer in blocks to be cached by the controller. Any transfers over this size will not be cached. Valid values are 1 through 1024. When entering the ADD UNIT command, MAXIMUM_CACHED_TRANSFER=32 is the default. READ_CACHE (D) NOREAD_CACHE Enables and disables the controller 's read cache on this unit. When entering an ADD UNIT command, READ_CACHE is the default. Command Line Interpreter B-41 SET unit-number RUN (D) NORUN Enables and disables a unit's ability to be spun up. When RUN is specified, the devices that make up the unit will be spun up. If NORUN is specified the unit will be spun down. When entering an ADD UNIT command, RUN is the default. WRITE_PROTECT NOWRITE_PROTECT (D) Enables and disables write protection of the unit. When entering an ADD UNIT command, NOWRITE_PROTECT is the default. Qualifiers for a unit created from a stripeset MAXIMUM_CACHED_TRANSFER=n MAXIMUM_CACHED_TRANSFER=32 (D) Specifies the maximum size transfer in blocks to be cached by the controller. Any transfers over this size will not be cached. Valid values are 1 through 1024. When entering the ADD UNIT command, MAXIMUM_CACHED_TRANSFER=32 is the default. READ_CACHE (D) NOREAD_CACHE Enables and disables the controller 's read cache on this unit. When entering an ADD UNIT command, READ_CACHE is the default. RUN (D) NORUN Enables and disables a unit's ability to be spun up. When RUN is specified, the devices that make up the unit will be spun up. If NORUN is specified the unit will be spun down. When entering an ADD UNIT command, RUN is the default. WRITE_PROTECT NOWRITE_PROTECT (D) Enables and disables write protection of the unit. When entering an ADD UNIT command, NOWRITE_PROTECT is the default. Qualifiers for a unit created from a tape drive (HSJ and HSD only) DEFAULT_FORMAT=format DEFAULT_FORMAT=DEVICE_DEFAULT (D) Specifies the tape format to be used unless overridden by the host. Note that not all devices support all formats. The easiest way to determine what formats are supported by a specific device is to enter ``SHOW DEFAULT_ FORMAT= ?''--the valid options will be displayed. B-42 Command Line Interpreter SET unit-number Supported tape formats are as follow: · DEVICE_DEFAULT The default tape format is the default that the device uses, or, in the case of devices that are settable via switches on the front panel, the settings of those switches. · 800BPI_9TRACK · 1600BPI_9TRACK · 6250BPI_9TRACK · TZ85 · TZ86 · TZ87_NOCOMPRESSION · TZ87_COMPRESSION · DAT_NOCOMPRESSION · DAT_COMPRESSION · 3480_NOCOMPRESSION · 3480_COMPRESSION When entering the ADD UNIT command for a tape device, DEFAULT_ FORMAT=DEVICE_DEFAULT is the default. Examples 1. CLI> SET D1 WRITE_PROTECT NOREAD_CACHE Write protect and turn off the read cache on unit D1 2. CLI> SET T47 DEFAULT_FORMAT=1600BPI_9TRACK Set unit T47 to 1600 bpi. Command Line Interpreter B-43 SHOW CDROMS ------------------------------------------------------------ SHOW CDROMS Shows all CDROM drives and drive information. ------------------------------------------------------------ Note ------------------------------------------------------------ This command is valid for HSJ and HSD controllers only. ------------------------------------------------------------ Format SHOW CDROMS Description The SHOW CDROMS command displays all the CDROM drives known to the controller. Qualifiers FULL If the FULL qualifier is specified, additional amplifying information may be displayed after each device. Examples 1. CLI> SHO CDROM Name Type Port Targ Lun Used by ------------------------------------------------------------------------------ CDROM230 cdrom 2 3 0 D623 CDROM240 cdrom 2 4 0 D624 A normal listing of CDROMs. 2. CLI> SHO CDROM FULL Name Type Port Targ Lun Used by ------------------------------------------------------------------------------ CDROM230 cdrom 2 3 0 D623 DEC RRD44 (C) DEC 3593 CDROM240 cdrom 2 4 0 D624 DEC RRD44 (C) DEC 3593 A full listing of CDROMs B-44 Command Line Interpreter SHOW cdrom-container-name ------------------------------------------------------------ SHOW cdrom-container-name Shows information about a CDROM. Format SHOW cdrom-container-name Parameters cdrom-container-name The name of the CDROM drive that will be displayed. Description The SHOW cdrom-container-name command is used to show specific information about a particular CDROM drive. Examples 1. CLI> SHO CDROM230 Name Type Port Targ Lun Used by ------------------------------------------------------------------------------ CDROM230 cdrom 2 3 0 D623 DEC RRD44 (C) DEC 3593 A listing of CDROM CDROM230. Command Line Interpreter B-45 SHOW DEVICES ------------------------------------------------------------ SHOW DEVICES Shows physical devices and physical device information. Format SHOW DEVICES Description The SHOW DEVICES command displays all the devices known to the controller. First disks are shown, then tapes and finally CDROMs. Qualifiers FULL If the FULL qualifier is specified, additional amplifying information may be displayed after each device. Information contained in the amplifying information is dependent on the device type. Examples 1. CLI> SHOW DEVICES Name Type Port Targ Lun Used by ------------------------------------------------------------------------------ DI0 disk 1 0 0 D100 DI1 disk 1 1 0 D110 TAPE110 tape 3 1 0 T110 TAPE130 tape 3 3 0 T130 CDROM230 cdrom 2 3 0 D623 CDROM240 cdrom 2 4 0 D624 A basic listing of devices attached to the controller. 2. CLI> SHOW DEVICES FULL Name Type Port Targ Lun Used by ------------------------------------------------------------------------------ DI0 disk 1 0 0 D100 DEC RZ35 (C) DEC X388 DI1 disk 1 1 0 D110 DEC RZ26 (C) DEC T386 TAPE110 tape 3 1 0 T110 DEC TZ877 (C) DEC 930A TAPE130 tape 3 3 0 T130 DEC TZ877 (C) DEC 930A CDROM230 cdrom 2 3 0 D623 DEC RRD44 (C) DEC 3593 CDROM240 cdrom 2 4 0 D624 DEC RRD44 (C) DEC 3593 A full listing of devices attached to the controller. B-46 Command Line Interpreter SHOW DISKS ------------------------------------------------------------ SHOW DISKS Shows all disk drives and drive information. Format SHOW DISKS Description The SHOW DISKS command displays all the disk drives known to the controller. Qualifiers FULL If the FULL qualifier is specified, additional amplifying information may be displayed after each device. Examples 1. CLI> SHOW DISKS Name Type Port Targ Lun Used by ------------------------------------------------------------------------------ DI0 disk 1 0 0 D100 DI1 disk 1 1 0 D110 A basic listing of disks attached to the controller. 2. CLI> SHOW DISKS FULL Name Type Port Targ Lun Used by ------------------------------------------------------------------------------ DI0 disk 1 0 0 D100 DEC RZ35 (C) DEC X388 DI1 disk 1 1 0 D110 DEC RZ26 (C) DEC T386 A full listing of disks attached to the controller. Command Line Interpreter B-47 SHOW disk-container-name ------------------------------------------------------------ SHOW disk-container-name Shows information about a disk drive. Format SHOW disk-container-name Parameters disk-container-name The name of the disk drive that will be displayed. Description The SHOW disk-container-name command is used to show specific information about a particular disk. Examples 1. CLI> SHOW DI3 Name Type Port Targ Lun Used by ------------------------------------------------------------------------------ DI3 disk 1 3 0 D130 DEC RZ26 (C) DEC X388 A listing of disk DI3. B-48 Command Line Interpreter SHOW OTHER_CONTROLLER ------------------------------------------------------------ SHOW OTHER_CONTROLLER Shows the other controller 's information. ------------------------------------------------------------ Note ------------------------------------------------------------ This command is valid for HSJ and HSD controllers only. ------------------------------------------------------------ Format SHOW OTHER_CONTROLLER Description Shows all controller, port, and terminal information for the other controller. Qualifiers FULL If the FULL qualifier is specified, additional amplifying information is displayed after the normal controller information. Examples 1. CLI> SHOW OTHER_CONTROLLER Controller: HSJ40 ZG313FF115 Software E140, Hardware 0000 Configured for dual-redundancy with ZG30355555 In dual-redundant configuration SCSI address 6 Host port: Node name: HSJ306, valid CI node 6, 32 max nodes System ID 420010061120 Path A is ON Path B is ON MSCP allocation class 3 TMSCP allocation class 3 Cache: 32 megabyte read cache, version 2 The basic HSJ controller information. Command Line Interpreter B-49 SHOW OTHER_CONTROLLER 2. CLI> SHOW OTHER_CONTROLLER Controller: HSD30 ZG33400026 Software E140, Hardware 0000 Configured for dual-redundancy with CX40100000 All devices failed over to this controller SCSI address 7 Host port: Node name: HSD001, valid DSSI node 1 Host path is ON MSCP allocation class 9 TMSCP allocation class 9 Cache: 32 megabyte read cache, version 2 The basic HSD controller information. 3. CLI> SHOW OTHER_CONTROLLER FULL Controller: HSJ40 ZG313FF115 Software E140, Hardware 0000 Configured for dual-redundancy with ZG30355555 In dual-redundant configuration SCSI address 6 Host port: Node name: HSJ306, valid CI node 6, 32 max nodes System ID 420010061120 Path A is ON Path B is ON MSCP allocation class 3 TMSCP allocation class 3 Cache: 32 megabyte read cache, version 2 Extended information: Terminal speed 19200 baud, eight bit, no parity, 1 stop bit Operation control: 00000005 Security state code: 41415 A full HSJ controller information listing. B-50 Command Line Interpreter SHOW STORAGESETS ------------------------------------------------------------ SHOW STORAGESETS Shows storage sets and storage set information. Format SHOW STORAGESETS Description The SHOW STORAGESETS command displays all the storage sets known by the controller. A storage set is any collection of containers, such as stripesets. Stripesets will be displayed first. Qualifiers FULL If the FULL qualifier is specified, additional amplifying information may be displayed after each storage set. Examples 1. CLI> SHOW STORAGESETS Name Storageset Uses Used by ------------------------------------------------------------------------------ ST1 stripeset DISK500 D1 DISK510 DISK520 A basic listing of all storage sets. 2. CLI> SHOW STORAGESETS FULL Name Storageset Uses Used by ------------------------------------------------------------------------------ ST1 stripeset DISK500 D1 DISK510 DISK520 CHUNKSIZE = DEFAULT ST2 stripeset DISK400 D17 DISK410 DISK420 CHUNKSIZE = DEFAULT A full listing of all storage sets. Command Line Interpreter B-51 SHOW STRIPESETS ------------------------------------------------------------ SHOW STRIPESETS Shows stripesets and related stripeset information. Format SHOW STRIPESETS Description The SHOW STRIPESET command displays all the stripesets known by the controller. Qualifiers FULL If the FULL qualifier is specified, additional amplifying information may be displayed after each storage set. Examples 1. CLI> SHOW STRIPESETS Name Storageset Uses Used by ------------------------------------------------------------------------------ ST1 stripeset DISK500 D1 DISK510 DISK520 ST2 stripeset DISK400 D17 DISK410 DISK420 A basic listing of all stripesets. 2. CLI> SHOW STRIPESETS FULL Name Storageset Uses Used by ------------------------------------------------------------------------------ ST1 stripeset DISK500 D1 DISK510 DISK520 CHUNKSIZE = DEFAULT ST2 stripeset DISK400 D17 DISK410 DISK420 CHUNKSIZE = DEFAULT A full listing of all stripesets. B-52 Command Line Interpreter SHOW stripeset-container-name ------------------------------------------------------------ SHOW stripeset-container-name Shows information about a stripeset. Format SHOW stripeset-container-name Parameters stripeset-container-name The name of the stripeset that will be displayed. Description The SHOW stripeset-container-name command is used to show specific information about a particular stripeset. Examples 1. CLI> SHOW STRIPE0 Name Storageset Uses Used by ------------------------------------------------------------------------------ STRIPE0 stripeset DISK500 D1 DISK510 DISK520 CHUNKSIZE = DEFAULT A listing of stripeset STRIPE0. Command Line Interpreter B-53 SHOW TAPES ------------------------------------------------------------ SHOW TAPES Shows all tape drives and tape drive information. ------------------------------------------------------------ Note ------------------------------------------------------------ This command is valid for HSJ and HSD controllers only. ------------------------------------------------------------ Format SHOW TAPES Description The SHOW TAPES command displays all the tape drives known to the controller. Qualifiers FULL If the FULL qualifier is specified, additional amplifying information may be displayed after each device. Examples 1. CLI> sho t0 MSCP unit Uses -------------------------------------------------------------- T0 TAPE0 Switches: DEFAULT_FORMAT = TZ87_NOCOMPRESSION State: AVAILABLE No exclusive access CLI> Shows an individual tape unit. B-54 Command Line Interpreter SHOW tape-container-name ------------------------------------------------------------ SHOW tape-container-name Shows information about a tape drive. Format SHOW tape-container-name Parameters tape-container-name The name of the tape drive that will be displayed. Description The SHOW tape-container-name command is used to show specific information about a particular tape drive. Examples 1. HSJB0> SHOW TAPE230 Name Type Port Targ Lun Used by ------------------------------------------------------------------------------ TAPE230 tape 2 3 0 T230 DEC TSZ07 0309 A listing of TAPE230. Command Line Interpreter B-55 SHOW THIS_CONTROLLER ------------------------------------------------------------ SHOW THIS_CONTROLLER Shows this controller 's information. Format SHOW THIS_CONTROLLER Description Shows all controller, port, and terminal information for this controller. Qualifiers FULL If the FULL qualifier is specified, additional amplifying information is displayed after the normal controller information. Examples 1. CLI> SHOW THIS_CONTROLLER Controller: HSJ40 ZG313FF115 Software E140, Hardware 0000 Configured for dual-redundancy with ZG30355555 In dual-redundant configuration SCSI address 6 Host port: Node name: HSJ306, valid CI node 6, 32 max nodes System ID 420010061120 Path A is ON Path B is ON MSCP allocation class 3 TMSCP allocation class 3 Cache: 32 megabyte read cache, version 2 The basic HSJ controller information. 2. CLI> SHOW THIS_CONTROLLER Controller: HSD30 ZG33400026 Software E140, Hardware 0000 Configured for dual-redundancy with CX40100000 All devices failed over to this controller SCSI address 7 Host port: Node name: HSD001, valid DSSI node 1 Host path is ON MSCP allocation class 9 TMSCP allocation class 9 Cache: 32 megabyte read cache, version 2 The basic HSD controller information. B-56 Command Line Interpreter SHOW THIS_CONTROLLER 3. CLI> SHOW THIS_CONTROLLER Controller: HSZ40 SC00103056 Software E140, Hardware 0000 SCSI address 6 Host port: valid SCSI target 2 Cache: 32 megabyte read cache, version 2 The basic HSZ controller information. 4. CLI> SHOW THIS_CONTROLLER FULL Controller: HSJ40 ZG313FF115 Software E140, Hardware 0000 Configured for dual-redundancy with ZG30355555 In dual-redundant configuration SCSI address 6 Host port: Node name: HSJ306, valid CI node 6, 32 max nodes System ID 420010061120 Path A is ON Path B is ON MSCP allocation class 3 TMSCP allocation class 3 Cache: 32 megabyte read cache, version 2 Extended information: Terminal speed 19200 baud, eight bit, no parity, 1 stop bit Operation control: 00000005 Security state code: 41415 A full HSJ controller information listing. Command Line Interpreter B-57 SHOW UNITS ------------------------------------------------------------ SHOW UNITS Shows all units and unit information. Format SHOW UNITS Description The SHOW UNITS command displays all the units known by the controller. First disks (including CDROMs) are listed, then tapes. Qualifiers FULL If the FULL qualifier is specified after UNITS, additional amplifying information may be displayed after each unit-number, such as the switch settings. Examples 1. CLI> SHOW UNITS MSCP unit Uses -------------------------------------------------------------- D100 DI0 D110 DI1 D150 DI5 A basic listing of units available on the controller. 2. CLI> SHOW UNITS FULL MSCP unit Uses -------------------------------------------------------------- D100 DI0 Switches: RUN READ_CACHE NOWRITE_PROTECT NOTRANSPORTABLE MAXIMUM_CACHED_TRANSFER_SIZE = 32 State: ONLINE to this controller No exclusive access D110 DI1 Switches: RUN READ_CACHE NOWRITE_PROTECT NOTRANSPORTABLE MAXIMUM_CACHED_TRANSFER_SIZE = 32 State: ONLINE to this controller No exclusive access D150 DI5 Switches: RUN READ_CACHE NOWRITE_PROTECT NOTRANSPORTABLE MAXIMUM_CACHED_TRANSFER_SIZE = 32 State: ONLINE to this controller No exclusive access A full listing of units available on the controller. B-58 Command Line Interpreter SHOW unit-number ------------------------------------------------------------ SHOW unit-number Shows information about a unit. Format SHOW unit-number Parameters unit-number The unit number of the unit to display. Description The SHOW unit-number command is used to show specific information about a particular unit. Examples 1. CLI> SHOW D150 MSCP unit Uses -------------------------------------------------------------- D150 DI5 Switches: RUN READ_CACHE NOWRITE_PROTECT NOTRANSPORTABLE MAXIMUM_CACHED_TRANSFER_SIZE = 32 State: ONLINE to this controller No exclusive access A listing of a specific disk unit. 2. CLI> sho t110 MSCP unit Uses -------------------------------------------------------------- T110 TAPE110 Switches: DEFAULT_FORMAT = DEVICE_DEFAULT State: AVAILABLE No exclusive access A listing of a specific tape unit. Command Line Interpreter B-59 SHUTDOWN OTHER_CONTROLLER ------------------------------------------------------------ SHUTDOWN OTHER_CONTROLLER Shuts down and does not restart the other controller. ------------------------------------------------------------ Note ------------------------------------------------------------ This command is valid for HSJ and HSD controllers only. ------------------------------------------------------------ Format SHUTDOWN OTHER_CONTROLLER Description The SHUTDOWN OTHER_CONTROLLER command shuts down the other controller. If any disks are on line to the other controller, the controller will not shut down unless the OVERRIDE_ONLINE qualifier is specified (HSD and HSJ only). If any user data cannot be flushed to disk, the controller will not shut down unless the IGNORE_ERRORS qualifier is specified. Specifying IMMEDIATE will cause the other controller to shut down immediately without flushing any user data to the disks, even if drives are on line to the host. Qualifiers for HSD and HSJ controllers IGNORE_ERRORS NOIGNORE_ERRORS (D) If errors result when trying to write user data, the controller will not be shut down unless IGNORE_ERROR is specified. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the IGNORE_ERRORS qualifier is specified. ------------------------------------------------------------ NOIGNORE_ERRORS is the default. IMMEDIATE NOIMMEDIATE (D) If IMMEDIATE is specified, immediately shut down the controller without checking for online devices. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the IMMEDIATE qualifier is specified. ------------------------------------------------------------ NOIMMEDIATE is the default. B-60 Command Line Interpreter SHUTDOWN OTHER_CONTROLLER OVERRIDE_ONLINE NOOVERRIDE_ONLINE (D) If any units are on line to the controller, the controller will not be shut down unless OVERRIDE_ONLINE is specified. If the OVERRIDE_ONLINE qualifier is specified, the controller will shut down after all customer data is written to disk. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the OVERRIDE_ONLINE qualifier is specified. ------------------------------------------------------------ NOOVERRIDE_ONLINE is the default. Examples 1. CLI> SHUTDOWN OTHER_CONTROLLER Shuts down the other controller as long as the other controller does not have any units on line. 2. CLI> SHUTDOWN OTHER_CONTROLLER OVERRIDE_ONLINE Shuts down the other controller even if there are units on line to the other controller. Command Line Interpreter B-61 SHUTDOWN THIS_CONTROLLER ------------------------------------------------------------ SHUTDOWN THIS_CONTROLLER Shuts down and does not restart this controller. Format SHUTDOWN THIS_CONTROLLER Description The SHUTDOWN THIS_CONTROLLER command shuts down this controller. If any disks are on line to this controller, the controller will not shut down unless the OVERRIDE_ONLINE qualifier is specified (HSD and HSJ only). If any user data cannot be flushed to disk, the controller will not shut down unless the IGNORE_ERRORS qualifier is specified. Specifying IMMEDIATE will cause this controller to shut down immediately without flushing any user data to the disks, even if drives are on line to a host. ------------------------------------------------------------ Note ------------------------------------------------------------ If you enter a SHUTDOWN THIS_CONTROLLER command, communication with the controller will be lost when this controller shuts down. ------------------------------------------------------------ Qualifiers for HSD and HSJ controllers IGNORE_ERRORS NOIGNORE_ERRORS (D) If errors result when trying to write user data, the controller will not be shut down unless IGNORE_ERROR is specified. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the IGNORE_ERRORS qualifier is specified. ------------------------------------------------------------ NOIGNORE_ERRORS is the default. IMMEDIATE NOIMMEDIATE (D) If IMMEDIATE is specified, immediately shut down the controller without checking for online devices. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the IMMEDIATE qualifier is specified. ------------------------------------------------------------ NOIMMEDIATE is the default. B-62 Command Line Interpreter SHUTDOWN THIS_CONTROLLER OVERRIDE_ONLINE NOOVERRIDE_ONLINE (D) If any units are on line to the controller, the controller will not be shut down unless OVERRIDE_ONLINE is specified. If the OVERRIDE_ONLINE qualifier is specified, the controller will shut down after all customer data is written to disk. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the OVERRIDE_ONLINE qualifier is specified. ------------------------------------------------------------ NOOVERRIDE_ONLINE is the default. Qualifiers for HSZ controllers IGNORE_ERRORS NOIGNORE_ERRORS (D) If errors result when trying to write user data, the controller will not be shut down unless IGNORE_ERROR is specified. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the IGNORE_ERRORS qualifier is specified. ------------------------------------------------------------ NOIGNORE_ERRORS is the default. IMMEDIATE NOIMMEDIATE (D) If IMMEDIATE is specified, immediately shuts down the controller without checking for online devices. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ Customer data may be lost or corrupted if the IMMEDIATE qualifier is specified. ------------------------------------------------------------ NOIMMEDIATE is the default. Examples 1. CLI> SHUTDOWN THIS_CONTROLLER Shuts down this controller as long as this controller does not have any units on line. 2. CLI> SHUTDOWN THIS_CONTROLLER OVERRIDE_ONLINE Shuts down this controller even if there are units on line to this controller. Command Line Interpreter B-63 B.2 CLI Messages The following sections describe messages you may encounter during interactive use of the CLI. B.2.1 Error Conventions An Error nnnn: means that the command did not complete. Except for a few of the failover messages (6000 series), no part of the command was executed. When encountering an error going into or exiting dual-redundant mode, some synchronization problems are unavoidable; the error message in such a case will tell you what to do to get things back in synchronization. Multiple error messages may result from one command. Items in angle brackets (<>) will be replaced at run time with names, numbers, and so on. B.2.2 CLI Error Messages For HSJ and HSD30 controllers: Error 1000: Unit number must be from 0 to 4094 For HSZ controllers: Error 1000: The LUN portion of the unit number must be from 0 to 7 Explanation: This error results from an ADD UNIT command where the n in the Dn or Tn specified is out of range. The MSCP or TMSCP unit number after the ``D'' or ``T'' must be in the range of 0 to 4094. Retry the ADD UNIT command with a correct number. Error 1010: Maximum cached transfer size must be 1 through 1024 blocks Explanation: This error results from a SET or an ADD UNIT command where MAXIMUM_CACHED_TRANSFER_SIZE was specified. MAXIMUM_CACHED_TRANSFER_SIZE must be in the range 1 through 1024. Retry the SET or ADD command with a correct number. Error 1020: CHUNKSIZE must be from to Explanation: This error results from a SET storageset-container-name or an ADD (storage set type) command where CHUNKSIZE was specified. The chunksize must be DEFAULT, VOLUME or greater than 15. Retry the SET or ADD command with DEFAULT, VOLUME or a correct number. Error 1030: Cannot set chunksize on a storageset that is still part of a configuration Explanation: Chunksize must be set before a storage set is bound to a unit. If you wish to change the chunksize, delete the unit and then change it. ------------------------------------------------------------ CAUTION ------------------------------------------------------------ After changing the chunksize, an INITIALIZE command is required to rewrite the container 's metadata. This will destroy customer data. ------------------------------------------------------------ B-64 Command Line Interpreter Error 1090: Tape unit numbers must start with the letter 'T' Explanation: All tape unit numbers are of the form ``Tn.'' This error is displayed if you add a tape unit and do not begin the unit number with the letter ``T.'' Retry the ADD command with a ``T'' at the start of the unit number. Error 1100: Disk unit numbers must start with the letter 'D' Explanation: All disk unit numbers are of the form ``Dn.'' This error is displayed if you add a disk unit and do not begin the unit number with the letter ``D.'' Retry the ADD command with a ``D'' at the beginning of the unit number. Error 1110: Unit numbers may not have leading zeros Explanation: Tape and disk unit numbers may not be of the form ``D03,'' for example, ``D3'' should be specified. Retry the ADD command without any leading zeros. Error 1120: LUN is already used Explanation: Lun number has already been used by a disk or tape. Retry the ADD command specifying a different LUN. Error 1130: The unit number cannot exceed Explanation: You specified a unit number that was out of bounds. Try to add the unit again using a unit number that is less than or equal to . Error 1140: Invalid unit number. Valid unit number range(s) are: to Explanation: You attempted to create a unit out of the valid unit ranges. The valid unit ranges are given by the and values. Retry the ADD command specifying a unit number in the correct range. Error 2000: Port must be 1 - Explanation: When adding a device, you specified a port less than 1 or greater than . Retry the command specifying a port within the range given. Error 2010: Target must be 0 - Explanation: When adding a device, you specified a target greater than . In single controller configurations, is 6. In dual-redundant configurations, is 5. Error 2020: LUN must be 0 - 7 Explanation: When adding a device, you specified a LUN greater than 7. Error 2030: This port, target, and LUN already in use by another device. Explanation: When adding a device, you specified PTL that is already specified by another device. Command Line Interpreter B-65 Error 2040: Cannot set TRANSPORTABLE when device in use by an upper layer Explanation: A disk cannot be set to TRANSPORTABLE once it is being used by an upper level (unit or storage set). Error 2050: Cannot set NOTRANSPORTABLE when device in use by an upper layer Explanation: A disk cannot be set to NOTRANSPORTABLE once it is being used by an upper level (unit or storage set). Error 4000: The CLI prompt must have 1 to 16 characters. Explanation: This error results from a SET THIS_CONTROLLER or SET OTHER_CONTROLLER command with the qualifier PROMPT=. The length of the CLI prompt must be at least one character and may not exceed 16 characters. Retry the command with the correct number of characters. Error 4010: Illegal character in CLI prompt. Explanation: A nonprintable character was specified. Only ASCII characters space `` '' through tilde ``~'' may be specified (hex 20-7E). Error 4020: Terminal speed must be 300, 1200, 2400, 4800, 9600 or 19200 Explanation: This error results from a SET THIS_CONTROLLER or SET OTHER_CONTROLLER command with the argument TERMINAL_SPEED=. The only valid baud rates that may be specified are 110, 300, 1200, 2400, 4800, or 9600 baud. Retry the command with a correct terminal speed. Error 4030: Controller ID must be in the range 0 to . Explanation: The ID= was specified with a number greater than . If increasing the controller 's ID, set MAX_NODES first, then the controller 's ID. Error 4040: SCS nodename length must be from 1 to 6 characters. Explanation: This error results from a SET THIS_CONTROLLER or SET OTHER_CONTROLLER command with the argument SCS_NODENAME=. The SCS node name must consist of one to six alphanumeric characters enclosed in quotes with an alphabetic character first. Retry the command with a correct SCS node name length. Error 4050: SCS nodename must start with an alpha character and contain only A-Z and 0-9. Explanation: This error results from a SET THIS_CONTROLLER or SET OTHER_CONTROLLER command with the argument SCS_NODENAME=. The SCS node name must consist of alphanumeric characters enclosed in quotes with an alphabetic character first. Retry the command with a correct SCS node name. Error 4060: Allocation class must be from to 255. Explanation: An illegal MSCP or TMSCP allocation class was specified. The is 0 for a single controller configuration, or 1 for a dual- redundant configuration. B-66 Command Line Interpreter Error 4070: Max nodes must be 2, 8, 16 or 32 Explanation: This error results from a SET THIS_CONTROLLER or SET OTHER_CONTROLLER command with the argument MAX_NODES=. Max nodes must be 2, 8, 16 or 32 nodes. Retry the command with a correct max node number. Error 4080: Current node ID too large for requested max nodes setting. Explanation: This error results from a SET THIS_CONTROLLER or SET OTHER_CONTROLLER command with the arguments MAX_NODES= or ID=. MAX_NODES= was specified with a number less than the controller 's ID or the controller 's ID was specified with a number greater than (MAX_NODES 1). If decreasing MAX_NODES, set the controller 's ID first, then MAX_NODES. Error 4090: Module has invalid serial number. This controller cannot be used Call Digital Services. Explanation: This error means that an uninitialized controller has slipped out of manufacturing, or the NV memory was destroyed. Contact Digital Multivendor Services. Error 4100: Unable to RESTART other controller. Explanation: A communication error occurred when trying to restart the other controller. Retry the RESTART command. Error 4110: Unable to SHUTDOWN other controller. Explanation: A communication error occurred when trying to shut down the other controller. Retry the SHUTDOWN command. Error 4120: Unable to SELFTEST other controller. Explanation: A communication error occurred when trying to self-test the other controller. Retry the SELFTEST command. Error 4130: Unable to setup controller restart. Explanation: A communication error occurred when trying to RESTART or self-test the other controller. Retry the RESTART or SELFTEST command. Error 4140: Unable to lock the other controller 's NV memory Explanation: Most configuration commands, such as ADD, DELETE, and SET, require both controllers in a dual-redundant configuration to be up and functioning so configuration changes can be recorded in both controllers. If one controller is not running, this message results when you attempt to change the configuration. Restart the other controller and try the command again, or SET NOFAILOVER on the remaining controller. Error 4150: Unable to rundown the following units on the other controller: Explanation: When attempting to SHUTDOWN, RESTART or SELFTEST the other controller, some units could not be successfully spun down. This can be caused either by online units or errors when trying to spin down the units. Either rectify the problems on the problem units or enter the SHUTDOWN, RESTART or SELFTEST command with the qualifier OVERRIDE_ONLINE or IGNORE_ERRORS. Command Line Interpreter B-67 Error 4160: Unable to rundown the following units on this controller: Explanation: When attempting to SHUTDOWN, RESTART or SELFTEST the this controller, some units could not be successfully spun down. This can be caused either by online units or errors when trying to spin down the units. Either rectify the problems on the problem units or enter the SHUTDOWN, RESTART or SELFTEST command with the qualifier OVERRIDE_ONLINE or IGNORE_ERRORS. Error 4170: Only targets may be specified Explanation: When setting THIS_CONTROLLER ID=, you specified too many IDs; you may only specify up to IDs. Retry the SET THIS_CONTROLLER ID= command with no more than IDs specified. Error 4180: Invalid unit number(s) still present that must be deleted before the controller ID may be changed. All unit numbers must be in the range(s): to Explanation: You attempted to change the controller ID(s) when there were still units using those IDs. The current valid unit ranges are given by the and values. Either delete the units that use the ID that will no longer be specified, or retry the SET THIS_CONTROLLER ID= specifying the ID being used by the existing units. Error 5000: A program name may be from 1 to 6 characters. Explanation: This error results from a ``RUN .'' Error 5010: The requested program is currently busy. Explanation: This error results from a ``RUN .'' The program requested is being run by someone else. Error 5020: The requested program is unknown. Explanation: This error results from a ``RUN ''. Enter ``DIR'' to get a list of available programs. Error 5030: Insufficient memory for request. Explanation: This error results from a ``RUN '' resource problem. Retry the command later. Error 6000: Communication failure with other controller. Explanation: There was a communication problem with the other controller. This typically happens if the other controller is shutting down. If these messages happen often when the other controller is not shutting down, call Digital Multivendor Services. B-68 Command Line Interpreter Error 6010: Other controller not present Explanation: When asked to communicate with another controller (the result of any one of a number of commands), the other controller was found not to be running. If the other controller is in the process of restarting, retry the command later. If the other controller is shut down or turned off, start it. If the other controller is no longer present, enter a SET NOFAILOVER command to take it out of dual-redundant mode. Error 6020: Initial failover handshake not yet complete Explanation: For a short period of time after start up, the two controllers must communicate to set up a dual-redundant mode. This setup time is typically less than 1 minute. If commands that require controller-to-controller communication are entered during this setup time, error 6020 results. Retry the command later. Error 6030: Unable to communicate with the other controller to setup FAILOVER Explanation: Could not setup FAILOVER due to communication problems between the controllers. The command should be retried later. Error 6040: The write of the other controller 's configuration information did not succeed; information may be in an inconsistent state. Before further use both controllers should be removed from dual-redundant mode (SET NOFAILOVER) and then placed back into dual-redundant mode (SET FAILOVER) to assure consistency Explanation: Communication was lost in the middle of a SET FAILOVER command. Follow the instructions included in the error message. Error 6050: Communication failure with other controller while putting controllers into dual-redundant mode. Reissue the SET FAILOVER command Explanation: Communication was lost in the middle of a SET FAILOVER command. Follow the instructions included in the error message. Error 6070: Illegal command--this controller not configured for dual-redundancy Explanation: A command was entered to a single controller configuration that requires two controllers to be in dual-redundant mode. If two controllers are supposed to be in dual-redundant mode, enter a SET FAILOVER command. If not, do not enter the command that resulted in the error. Error 6080: Illegal command--this controller not currently in dual-redundant mode Explanation: A command was entered to a dual-redundant-configured controller, but the other controller was not available for communication. Restart the other controller and wait until it is communicating with this controller. If this controller is no longer supposed to be in dual-redundant mode, enter a SET NOFAILOVER command. Command Line Interpreter B-69 Error 6090: In failover no device may be configured at target 6 is at PTL Explanation: Target addresses 6 and 7 are used by the controllers when in a dual-redundant configuration. When in a single controller configuration, target 6 is available for use by devices. If devices are configured at target 6 and you attempted to install a dual-redundant configuration, this error is displayed for all devices that use target 6 and the controllers will not be placed in a dual-redundant configuration. You should both logically and physically reconfigure the drives so that target 6 is not used. Error 6100: Allocation classes cannot be zero for a dual-redundant configuration. Set MSCP and TMSCP allocation classes to non-zero. Explanation: If in a dual-redundant configuration, the allocation class must not be set to zero. Error 6110: This controller already in failover mode. You must issue a SET NOFAILOVER command first Explanation: A SET FAILOVER cannot be entered on a controller already in failover. Error 6120: Other controller already in failover mode. You must issue a SET NOFAILOVER command first Explanation: A SET FAILOVER ccommand was entered and although this controller was not configured for dual redundancy, the other controller was. Error 6170: An and cannot configured for failover Explanation: Two different controllers (such as an HSJ and an HSZ) cannot be configured for failover. Replace the other controller with the same model as this one and reenter the command. Error 9000: Cannot rename a unit Explanation: Only devices and storage sets may be renamed. If you attempt to rename a unit, this message results. Error 9010: is an illegal name, it must be from 1 to 9 characters. Explanation: This error results from an ADD command with an illegal name given. Error 9020: is an illegal name, it must start with A-Z Explanation: This error results from an ADD command with an illegal name given. Error 9030: is an illegal name, characters may consist only of A-Z, 0-9, ., - or _ Explanation: This error results from an ADD command with an illegal name given. B-70 Command Line Interpreter Error 9040: conflicts with keyword Explanation: The name given in an ADD command conflicts with a CLI keyword. Specify another name. Error 9050: Configuration area full Explanation: The total number of units, devices, and storage sets that can be configured is 195 in any combination. This error results when you exceed that number of nodes. Delete some units or devices in order to recover some configuration nodes. Error 9060: does not exist Explanation: Some operation (SET, DELETE, INITIALIZE, and so forth) specified a name that does not exist. Check the name and retry the command. Error 9070: is still part of a configuration. Delete upper configuration first. Explanation: Devices may not be deleted if they are still in use by storage sets or units. Storage sets may not be deleted if they are still used by units. Delete configurations from the top down; delete units, then stripesets, and then finally devices. Error 9080: is already used Explanation: An ADD command specified a name that is already in use. Specify another name. Error 9090: A cannot be used in a Explanation: The device specified cannot be used in the storage set specified, for example, tapes cannot be bound into a stripeset. Reexamine the configuration and correct the incompatibility. Error 9100: A must have from to entities Explanation: The wrong number of devices was specified for this storage set. Different storage sets require different numbers of devices. Reexamine the configuration, then correct the number of devices. Error 9130: Cannot delete ONLINE unit Explanation: The unit specified in a DELETE command is on line to a host. Dismount the unit at the host then retry the command. Or add the OVERRIDE_ONLINE qualifier to the DELETE command. Error 9140: Cannot delete exclusive access unit Explanation: The unit specified in a DELETE command is set up for exclusive access. Take the unit out of exclusive access mode and retry the command. Command Line Interpreter B-71 Error 9150: INITIALIZE is no longer supported at the unit level. You must INITIALIZE the container that makes up this unit Explanation: You tried to initialize a unit. Units may no longer be initialized. The container that makes up the unit must be initialized before a unit is created out of the container. Error 9160: Non-disk devices cannot be INITIALIZED Explanation: Tapes and CDROMS may not be initialized. Error 9170: at PTL No device installed Explanation: When a unit is added or initialized, the configuration of the devices that makes up the unit is checked. If no device is found at the PTL specified, this error is displayed. Check both the logical and physical configuration of the unit and correct any mismatches. Error 9180: at PTL Incorrect device type installed Explanation: When a unit is added or initialized, the configuration of the devices that make up the unit is checked. If a non-disk device is found at the PTL specified, this error is displayed. Check both the logical and physical configuration of the unit and correct any mismatches. Error 9190: Unit is currently online Explanation: When a SHUTDOWN, RESTART, or SELFTEST command is entered without the OVERRIDE_ONLINE qualifier and online devices are found, the command is aborted and the units that are currently on line are listed. Either retry the command with OVERRIDE_ONLINE qualifier or dismount all devices from the hosts. Error 9200: conflicts with unit names Explanation: This error results from an ADD command. Names in the format of Dn and Tn, where n is a number from 0 to 4094, are reserved for units. Rename the storage set or device that is being added so it does not conflict with the unit names and retry the command. Error 9210: Cannot check if drives are online to the other controller Explanation: When trying to check for online drives on the other controller, there was a communication failure. Retry the command. Error 9230: Unable to modify switches requested Explanation: This error results from a SET command. The system is currently busy. Retry the SET command later. B-72 Command Line Interpreter Error 9240: Cannot delete unit in maintenance mode Explanation: When trying to delete a unit, the unit was found to be in Maintenance mode. This is typically the result of trying to delete a unit that is in use by DILX or TILX. Make sure that DILX and TILX is not being run against the unit that is to be deleted, and retry the command. Error 9250: Initialize of disk failed Explanation: Unable to write metadata on disk. Make sure the disk is functioning properly. Error 9260: Cannot INITIALIZE a container that is still part of a configuration. Delete upper configuration first Explanation: A container cannot be initialized that is part of another configuration or is being used by a unit. Delete the upper configuration and reenter the INITIALIZE command. Error 9270: No metadata found on container, unit not created. An INITIALIZE must be issued before this container may be used Explanation: You attempted to create a unit from a container that did not have valid metadata. INITIALIZE the metadata on the container, then create a unit out if it. Error 9300: Metadata found on container. Are you sure this is a TRANSPORTABLE container? An INITIALIZE must be issued before this container may be used. Explanation: Metadata was found on a TRANSPORTABLE container. Enter an INITIALIZE command. Error 9330: NV memory write collision. Please try again Explanation: Two users were trying to configure the CLI at the same time. Check the configuration you were trying to modify to make sure it is unchanged and retry the command. Error 9350: Metadata found on container but the chunksize is different Either a SET CHUNKSIZE= or an INITIALIZE must be issued before this container may be used Explanation: The chunksize defined by the ADD or SET command is different than that on the media. Either INITIALIZE the storageset or SET the chunksize to the given value. Error 9360: A tape is not installed at the PTL . Cannot set tape switches unless a tape is installed Explanation: A SET or ADD command specified a tape format, but there was no tape installed at the tape's PTL. Install a tape and retry the command. Command Line Interpreter B-73 Error 9370: A is an unsupported device. Tape switches cannot be set on unsupported devices Explanation: The tape installed is not currently supported by the controller. Replace the tape with a supported device and retry the command. Error 9380: Unable to allocate unit for NORUN to RUN transition Explanation: The unit could not be allocated so the controller could do a RUN/NORUN transition. Retry the command. If this error persists, call Digital Multivendor Services. Error 9390: Cannot change default tape format while tape drive online to host Explanation: The default tape format cannot be changed when the tape drive is on line to a host. Dismount the tape drive from the host and retry the command. Error 9400: Cannot rundown or allocate unit in order to delete it Explanation: Retry the command. If this error persists, call Digital Multivendor Services. B.2.3 Warning Conventions A Warning nnnn: means that the command completed, but there is a situation that you should be aware of. Typically, a warning will result in an unusable configuration; you will have to either logically reconfigure the cabinet using the CLI or physically reconfigure the cabinet by moving the disks around. Multiple warning messages may result from one command. Items in angle brackets (<>) will be replaced at run time with names, numbers, and so on. B.2.4 CLI Warning Messages Warning 3000: This storageset is configured with more than one disk per port. This will cause a degradation in performance Explanation: This error results from an ADD storageset-type command. The storage set specified has more than one member per port. One method of increasing the controller 's performance is through parallel transfers to members of a storage set. If multiple members of a storage set are on one port, transfers must be done in serial to those members. Though multiple storage set members on one port will work, it is strongly recommended that the storage set be deleted and reconfigured with one member per port. Warning 3010: Unable to check all device types that make up this storageset. If the storageset is made up of different device types, it may result in a storageset of reduced size Explanation: This error results from an ADD storageset-type command. Device types being added to a storage set are checked to make sure that they are the correct device types. If one or more devices could not be checked, this warning is displayed. You should check all the devices to make sure that they are correctly installed and configured. B-74 Command Line Interpreter Warning 3020: This storageset is configured with different device types. This may result in a storageset of reduced size Explanation: This error results from an ADD storageset-type command. Device types being added to a storage set are checked to assure that they are the same types. If all devices are not the same, this warning is displayed. Storage set size is determined by the size of the smallest device, so the storage set configured will be of reduced size. If a reduced size storage set is acceptable, nothing need be done in response to this warning. To realize the mazimum storage set size, all devices that make up the storage set should be identical. Warning 4000: A restart of this controller will be required before all the parameters modified will take effect Explanation: This error results from a SET THIS_CONTROLLER command. Some controller parameters require a restart before they can take effect. If any of those parameters are changed, this warning is displayed. It is recommended that a restart via the RESTART THIS_CONTROLLER command be done as soon as possible. Warning 4010: A restart of the other controller will be required before all the parameters modified will take effect Explanation: This error results from a SET OTHER_CONTROLLER command. Some controller parameters require a restart before they can take effect. If any of those parameters are changed, this warning is displayed. Restart the controller and retry the command. Warning 4020: A restart of both this and the other controller will be required before all the parameters modified will take effect Explanation: This error results from a SET THIS_CONTROLLER or a SET OTHER_CONTROLLER command. Some controller parameters require a restart of both controllers before they can take effect. If any of those parameters are changed, this warning is displayed. Restart both controllers and retry the command. Warning 6000: Communication failure with other controller while taking controllers out of dual-redundant mode. Enter a SET NOFAILOVER command on the other controller Explanation: This error results from a SET NOFAILOVER command. This controller was unable to communicate with the other controller to notify it that it is no longer in dual-redundant mode. Typically, this occurs when the other controller has already been removed prior to the SET NOFAILOVER command. A SET NOFAILOVER command should be entered on the other controller as soon as possible. Warning 9030: Cannot determine if the correct device type is at the PTL specified Explanation: When a device is added, the location specified is checked to see if the correct device type is present. This error results when no device responds from the location specified. Check the physical configuration and the PTL that was specified. Command Line Interpreter B-75 Warning 9040: There is currently a at the PTL specified Explanation: When a device is added, the location specified is checked to see if the correct device type is present. This error results when a device different from the one specified is found at the location specified (for example, a tape is found where a disk was added). Check the physical configuration and the PTL that was specified. Warning 9050: at PTL No device installed. Explanation: When a unit is added, the configuration of the disks that make up the unit is checked. If no device is found at the PTL specified, this warning is displayed. Check both the logical and physical configuration of the devices that make up the unit and correct any mismatches. Warning 9060: at PTL Incorrect device type installed Explanation: When a unit is added, the configuration of the disks that make up the unit is checked. If a non-disk device is found at the PTL specified, this warning is displayed. Check both the logical and physical configuration of the devices that make up the unit and correct any mismatches. B-76 Command Line Interpreter B.3 Examples The following examples show some commonly performed CLI functions. Your subsystem parameters will of course differ from those shown here. B.3.1 Setting HSD-Series Parameters, Nonredundant SET THIS_CONTROLLER ID=5 SET THIS_CONTROLLER SCS_NODENAME="HSD03" SET THIS_CONTROLLER MSCP_ALLOCATION_CLASS=4 SET THIS_CONTROLLER TMSCP_ALLOCATION_CLASS=4 RESTART THIS_CONTROLLER [this controller restarts at this point] SET THIS_CONTROLLER PATH These commands could optionally be entered on fewer lines: SET THIS_CONTROLLER ID=5 SCS_NODENAME="HSD03" SET THIS_CONTROLLER MSCP_ALLOCATION_CLASS=4 TMSCP_ALLOCATION_CLASS=4 RESTART THIS_CONTROLLER [this controller restarts at this point] SET THIS_CONTROLLER PATH B.3.2 Setting HSJ-Series Parameters, Dual-Redundant SET THIS_CONTROLLER MAX_NODES=16 SET THIS_CONTROLLER ID=5 SCS_NODENAME="HSJ01" SET THIS_CONTROLLER MSCP_ALLOCATION_CLASS=4 TMSCP_ALLOCATION_CLASS=4 SET FAILOVER COPY=THIS SET OTHER_CONTROLLER MAX_NODES=16 SET OTHER_CONTROLLER ID=7 SCS_NODENAME="HSJ02" RESTART OTHER_CONTROLLER [other controller restarts at this point] RESTART THIS_CONTROLLER [this controller restarts at this point] SET THIS_CONTROLLER PATH_A PATH_B SET OTHER_CONTROLLER PATH_A PATH_B B.3.3 Setting HSZ-Series Parameters SET THIS_CONTROLLER ID=5 RESTART THIS_CONTROLLER [this controller restarts at this point] B.3.4 Setting Terminal Speed and Parity SET THIS_CONTROLLER TERMINAL_SPEED=19200 NOTERMINAL_PARITY ------------------------------------------------------------ Note ------------------------------------------------------------ Garbage will appear on the terminal after setting the controller 's terminal speed until you set the terminal's speed to match the new speed. ------------------------------------------------------------ Command Line Interpreter B-77 B.3.5 Adding Devices This example shows how to define the devices on a six-port controller. Define devices one at a time through the ADD command, specifying device type (DISK/TAPE/CDROM), device name, and device PTL location. CLI> ADD DISK DISK0 1 0 0 CLI> ADD DISK DISK1 2 0 0 CLI> ADD DISK DISK2 3 0 0 CLI> ADD DISK DISK3 4 0 0 CLI> ADD DISK DISK4 4 1 0 CLI> ADD TAPE TAPE0 5 1 0 CLI> ADD CDROM CDROM0 6 0 0 This example created the following devices: ------------------------------------------------------------ Device Type Device Name Port Target LUN ------------------------------------------------------------ Disk DISK0 1 0 0 Disk DISK1 2 0 0 Disk DISK2 3 0 0 Disk DISK3 4 0 0 Disk DISK4 4 1 0 Tape TAPE0 5 1 0 CDROM CDROM0 6 0 0 ------------------------------------------------------------ B.3.6 Adding Storage Sets Storage sets are created from disks. In the previous example, devices were given names to make them identifiable. Use these names when creating storage sets. CLI> ADD STRIPESET STRIPE0 DISK0 DISK1 DISK2 DISK3 This example creates a stripeset (named STRIPE0) using disks DISK0, DISK1, DISK2, and DISK3 from Section B.3.5. All members of the storage set (a stripeset) must have been previously defined using ADD DISK. Tapes and CDROMs cannot be bound to storage sets. B.3.7 Initializing Containers Disks and storage sets are also called containers. Containers must be initialized before they are made available to a host via the ADD UNIT command. The following initializes containers from the previous examples: CLI> INITIALIZE STRIPE0 CLI> INITIALIZE DISK4 Initializing a tape or CDROM is not required (and is not allowed). B.3.8 Adding Logical Units Units can be created from any container (either device or storage set). Tapes and CDROMs are always bound directly to units because they cannot comprise a storage set. B-78 Command Line Interpreter The following makes the devices and containers from the previous examples available to the host as units: CLI> ADD UNIT D0 STRIPE0 CLI> ADD UNIT D100 DISK4 CLI> ADD UNIT D120 CDROM0 CLI> ADD UNIT T0 TAPE0 This creates disk unit 0 from stripeset STRIPE0, disk unit 100 from DISK4, disk unit 120 from CDROM0, and tape unit 0 from TAPE0. At the UNIT level, CDROMs are treated as disks (but only a subset of the disk SET commands are available for CDROMs). B.3.9 Device Configuration Examples The following examples show some different device configurations. Creating a Unit From a Disk Device CLI> ADD DISK DISK0 2 0 0 CLI> INITIALIZE DISK0 CLI> ADD UNIT D0 DISK0 Creating a Unit From a Tape Device CLI> ADD TAPE TAPE0 3 0 0 CLI> ADD UNIT T0 TAPE0 Creating a Unit From a Four-Member Stripeset CLI> ADD DISK DISK0 1 0 0 CLI> ADD DISK DISK1 2 0 0 CLI> ADD DISK DISK2 3 0 0 CLI> ADD DISK DISK2 1 1 0 CLI> ADD STRIPESET STRIPE0 DISK0 DISK1 DISK2 DISK3 Warning 3000: This storageset is configured with more than one disk per port This will cause a degradation in performance CLI> INITIALIZE STRIPE0 CLI> ADD UNIT D0 STRIPE0 Creating a Write-Protected Unit From a Disk CLI> ADD DISK DISK0 2 0 0 CLI> INITIALIZE DISK0 CLI> ADD UNIT D0 DISK0 WRITE_PROTECT Write Protecting an Existing Unit CLI> ADD DISK DISK0 2 0 0 CLI> INITIALIZE DISK0 CLI> ADD UNIT D0 DISK0 CLI> SET D0 WRITE_PROTECT Command Line Interpreter B-79 Renumbering Disk Unit 0 to Disk Unit 100 CLI> ADD DISK DISK0 2 0 0 CLI> INITIALIZE DISK0 CLI> ADD UNIT D0 DISK0 CLI> DELETE D0 CLI> ADD UNIT D100 DISK0 Note that no INITIALIZE is required because DISK0 has already been initialized. Creating a Transportable Unit From a Disk Device CLI> ADD DISK DISK0 2 0 0 TRANSPORTABLE CLI> INITIALIZE DISK0 CLI> ADD UNIT D0 DISK0 or: CLI> ADD DISK DISK0 2 0 0 CLI> SET DISK0 TRANSPORTABLE CLI> INITIALIZE DISK0 CLI> ADD UNIT D0 DISK0 Deleting the Unit, Stripeset and All Disks Associated With the Stripeset CLI> DELETE D0 CLI> DELETE STRIPE0 CLI> DELETE DISK0 CLI> DELETE DISK1 CLI> DELETE DISK2 CLI> DELETE DISK3 B-80 Command Line Interpreter C ------------------------------------------------------------ HSJ-Series Error Logging This appendix details errors the HSJ-series controller reports in its host event logs under the OpenVMS operating system, as well as how to extract the information from the logs. ------------------------------------------------------------ Note ------------------------------------------------------------ Host event log translations are correct as of the date of publication of this manual. However, log information may change with firmware updates. Refer to your StorageWorks Array Controller Operating Firmware Release Notes for event log information updates. ------------------------------------------------------------ C.1 Reading an HSJ-Series Error Log To understand the error logs, use the following guidelines: · Each error log contains an ``MSLG$B_FORMAT'' field (in the upper portion of the log), plus a ``CONTROLLER DEPENDENT INFORMATION'' area (in the lower portion of the log). ``CONTROLLER DEPENDENT INFORMATION'' will vary according to the ``MSLG$B_FORMAT'' field. Example C-1 shows an example of an ERF translated host error log (a Disk Transfer Event log). See Example C-1 to find ``MSLG$B_FORMAT'' and ``CONTROLLER DEPENDENT INFORMATION''. · The key to interpreting error logs is a 32 bit instance code located in the ``CONTROLLER DEPENDENT INFORMATION'' area. The instance code uniquely identifies the following: · The error or condition · The component reporting the condition · The recommended repair action · The threshold when the repair action should be taken ------------------------------------------------------------ Note ------------------------------------------------------------ The instance code is the single, most important part of interpreting the error log. This is a departure from HSC-based error logs, where other fields in the error information contained values of primary interest. ------------------------------------------------------------ HSJ-Series Error Logging C-1 Example C-1 Disk Transfer Error Event Log V A X / V M S SYSTEM ERROR REPORT COMPILED 16-MAR-1993 11:05:04 PAGE 146. ******************************* ENTRY 12. ******************************* ERROR SEQUENCE 2832. LOGGED ON: SID 05903914 DATE/TIME 16-MAR-1993 10:27:58.95 SYS_TYPE 00000000 SYSTEM UPTIME: 4 DAYS 02:11:34 SCS NODE: CNOTE VAX/VMS V5.5-2 ERL$LOGMESSAGE ENTRY KA825 HW REV# B PATCH REV# 28. UCODE REV# 20. BI NODE # 2. I/O SUB-SYSTEM, UNIT _FRED$DUA115: MESSAGE TYPE 0001 DISK MSCP MESSAGE MSLG$L_CMD_REF 9DB30013 MSLG$W_UNIT 0073 UNIT #115. MSLG$W_SEQ_NUM 0002 SEQUENCE #2. MSLG$B_FORMAT 02 DISK TRANSFER LOG MSLG$B_FLAGS 00 UNRECOVERABLE ERROR MSLG$W_EVENT 000B DRIVE ERROR UNKNOWN SUBCODE #0000(X) MSLG$Q_CNT_ID 00134534 01280001 UNIQUE IDENTIFIER, 000100134534(X) MASS STORAGE CONTROLLER MODEL = 40. MSLG$B_CNT_SVR FF CONTROLLER SOFTWARE VERSION #255. MSLG$B_CNT_HVR 00 CONTROLLER HARDWARE REVISION #0. MSLG$W_MULT_UNT 0005 MSLG$Q_UNIT_ID 00000001 02FF0000 UNIQUE IDENTIFIER, 000000000001(X) DISK CLASS DEVICE (166) MODEL = 255. MSLG$B_UNIT_SVR 0B UNIT SOFTWARE VERSION #11. MSLG$B_UNIT_HVR 0C UNIT HARDWARE REVISION #12. MSLG$B_LEVEL 01 MSLG$B_RETRY 00 MSLG$L_VOL_SER 00001492 VOLUME SERIAL #5266. MSLG$L_HDR_CODE 000659B6 LOGICAL BLOCK #416182. GOOD LOGICAL SECTOR (continued on next page) C-2 HSJ-Series Error Logging Example C-1 (Cont.) Disk Transfer Error Event Log CONTROLLER DEPENDENT INFORMATION LONGWORD 1. 03094002 /.@../ LONGWORD 2. 00003C51 /Q<../ LONGWORD 3. 00000000 /..../ LONGWORD 4. 000016D4 /Ô.../ LONGWORD 5. 00000000 /..../ LONGWORD 6. 00030002 /..../ LONGWORD 7. 56415246 /CNOT/ LONGWORD 8. 20205355 /E / LONGWORD 9. 00000501 /..../ LONGWORD 10. 36325A52 /RZ26/ LONGWORD 11. 20202020 / / LONGWORD 12. 29432820 / (C)/ LONGWORD 13. 43454420 / DEC/ LONGWORD 14. 20202020 / / LONGWORD 15. 31202020 / 1/ LONGWORD 16. i00F0002A /*.ğ./ LONGWORD 17. 59060004 /...Y/ LONGWORD 18. 000016B6 /¶.../ LONGWORD 19. 01030000 /..../ LONGWORD 20. 000A8001 /..../ HSJ-Series Error Logging C-3 The 32-bit instance code always appears in ``LONGWORD 1'' of ``CONTROLLER DEPENDENT INFORMATION'', with the following exceptions: - When MSLG$B_FORMAT reads ``09 BAD BLOCK REPLACEMENT ATTEMPT'', the instance code does not appear, because ERF does not provide ``CONTROLLER DEPENDENT INFORMATION''. - When MSLG$B_FORMAT reads ``0A MEDIA LOADER LOG'', the instance code appears in ``LONGWORD 2''. - When MSLG$B_FORMAT reads ``00 CONTROLLER LOG'', the instance code appears in part of both ``LONGWORD 1'' and ``LONGWORD 2.'' For this ``MSLG$B_FORMAT'', the code is skewed and not directly readable as a longword. (The code's low-order bytes appear in the two high-order bytes of ``LONGWORD 1'', and the code's high-order bytes appear in the two low-order bytes of ``LONGWORD 2''). For example: CONTROLLER DEPENDENT INFORMATION LONGWORD 1. 030A0000 /..../ LONGWORD 2. 24010102 /...$/ In this case, the instance code is 0102030A. A OpenVMS DCL command procedure is provided at the end of this appendix (see Section C.6) for deskewing this particular instance code. Running the command procedure will make the error log directly readable when used in conjunction with the other information supplied in this appendix. · Once you locate and identify the instance code, see the following sections for further information: - Section C.3 contains the Event Log Code tables, Tables C-2 through C-49. These tables list specific code descriptions. - Section C.2 contains detailed error packet descriptions, based on template type. - Section C.4 contains error threshold values. - Section C.5 contains recommended repair actions. · When you look up a specific instance code, you will notice that each error belongs to one of fifteen template types. Each template type has a one byte value identifying it, which is also located in the ``CONTROLLER DEPENDENT INFORMATION'' area longwords, as shown in Table C-1. You may be able to use Table C-1 to quickly identify the template type, after examining the longwords in the ``CONTROLLER DEPENDENT INFORMATION'' area. However, since the location of the value identifying the template varies, the safest way to determine the template is to use the instance code. The template type is always the very next byte after the instance code. C-4 HSJ-Series Error Logging Table C-1 Template Types ------------------------------------------------------------ Description Template Longword Value Deskewed Value ------------------------------------------------------------ Last Failure Event Log 01+ 2 2401xxxx 00002401 Failover Event Log 05+ 2 0005xxxx 00000005 Host buffer Access Error Event Log 10 2 00000C10 Nonvolatile Parameter Memory Component Event Log 11 2 00000811 Backup Battery Failure Event Log 12 2 00000012 Subsystem Built-In Self Test Failure Event Log 13+ 2 2413xxxx 00002413 Cache Memory Failure Event Log 14 2 00002414 CI Port Event Log 31+ 2 0C31xxxx 00000C31 CI Port/Port Driver Event Log 32+ 2 1032xxxx 00001032 CI System Communication Services Event Log 33+ 2 2C33xxxx 00002C33 Device Services Nontransfer Event Error Log 41+ 2 0441xxxx 00000441 Disk Transfer Error Event Log 51 2 00003C51 Disk Bad Block Replacement (BBR) Attempt Event Log 57 No Longwords Tape Transfer Error Event Log 61 2 00003C61 Media Loader Error Event Log 71 3 00003C71 ------------------------------------------------------------ +The MSLG$B_FORMAT field for these templates will read ``00 CONTROLLER LOG'', so you may want to run the OpenVMS DCL command procedure provided at the end of this appendix (Section C.6) for deskewing the longwords. ------------------------------------------------------------ · You should use the template type to learn even more from the error log. Information available in longwords, other than the instance code, includes the following: · Template type · Template information size · Event time · Drive sense data · Other information specific to the template Knowing the template type allows you to better use Section C.2 to obtain a complete description of each template and determine where information is located within the associated ``CONTROLLER DEPENDENT INFORMATION''. HSJ-Series Error Logging C-5 C.2 Event Log Formats ------------------------------------------------------------ Note ------------------------------------------------------------ The numeric code values discussed in the figures and tables of this appendix are hexadecimal, unless otherwise stated. ------------------------------------------------------------ The HSJ30/40 controller reports significant events that occur during normal controller operation using the following standard MSCP and TMSCP error log message formats: · Controller Errors · Memory Errors · Disk Transfer Errors · Bad Block Replacement Attempts · Tape Errors · Media Loader Errors · Disk Copy Data Correlation To more fully use the remainder of this appendix, you should become familiar with MSCP and TMSCP protocols, especially in the area of error log message formats. C.2.1 Implementation Dependent Information Area With the exception of the Disk Copy Data Correlation error log message format, each of the error log message formats listed in Section C.2 provide an ``implementation dependent information'' area located at the end of the message. For HSJ30/40 controller-specific event logs, this area is formatted as shown in Figure C-1. Note that the fields shown in Figure C-1 always begin on a longword boundary within HSJ30/40 controller-specific event logs. If the ``implementation dependent information'' area of a particular MSCP error log message format does not begin on a longword boundary, a ``reserved'' field containing the appropriate number of bytes is appended to the format to provide the necessary alignment (such as offset 16 in Figure C-15). Implementation Dependent Information Fields: instance code A number that uniquely identifies the event being reported. The format of this field is shown in Figure C-2. C-6 HSJ-Series Error Logging Figure C-1 Implementation Dependent Information Format Figure C-2 Instance Code Format Instance Code Specific Subfields: NR Threshold The notification/recovery threshold assigned to the event. This value indicates when notification/recovery action should be taken. See Section C.4 for more detail. Repair Action The recommended repair action code assigned to the event. This value indicates what notification/recovery action should be taken when the NR Threshold is reached. See Section C.5 for more detail. Event Number A number, when combined with the value contained in the Component ID subfield, uniquely identifies the event. HSJ-Series Error Logging C-7 Component ID A number that uniquely identifies the firmware component that detected the event as shown in Table C-2. templ A number that uniquely describes the format of the ``template dependent information'' field. tdisize The number of bytes contained in the ``template dependent information'' field. reserved Reserved for future use. event time The time the event occurred according to the power on time value maintained by the HSJ30/40 controller operational firmware. The power on time value is a 64-bit unsigned integer that represents the total number of seconds HSJ30/40 controller operational firmware has executed on the HSJ30/40 controller board. Note that the time expended during controller reinitializations, power-on diagnostics, and system initialization is not accounted for by this value. template dependent information A variable length field containing information specific to the event being reported. This field is divided into separate fields specific to the template identified in the ``templ'' field. The template-specific fields common to multiple event logs are described in separate subsections of Section C.2.2 to avoid duplication of the field descriptions in Section C.2.3. C.2.2 Common Event Log Fields Common fields are generated across certain event logs. These common fields are described in Sections C.2.2.1 through C.2.2.5. C.2.2.1 CI Host Interconnect Services Common Event Log Fields The fields common to certain event logs generated by the CI Host Interconnect Services firmware component are shown in Figure C-3. C-8 HSJ-Series Error Logging Figure C-3 CI Host Interconnect Services Common Event Log Fields CI Host Interconnect Services Common Fields: his status The Host Interconnect Services status code as shown in Table C-3. error id The address of the Host Interconnect Services routine that detected the event. src The CI source node address. dst The CI destination node address. intopcd The CI message opcode as shown in Table C-4. vcstate The virtual circuit state code as shown in Table C-5. ------------------------------------------------------------ Note ------------------------------------------------------------ The setting of the high order bit (Bit 7) in this field indicates the state of ID polling for the virtual circuit. If Bit 7 is set, ID polling is complete. Otherwise, ID polling is incomplete. ------------------------------------------------------------ ppd opcode The Port/Port Driver layer opcode as shown in Table C-6. HSJ-Series Error Logging C-9 scs opcode The System Communication Services layer opcode as shown in Table C-7. C.2.2.2 Host/Server Connection Common Fields The fields common to certain event logs generated by the Disk and Tape MSCP Server, CI Host Interconnect Services, Device Services, and Value Added firmware components are shown in Figure C-4. Figure C-4 Host/Server Connection Common Fields Host/Server Connection Common Fields: connection id Identifies the host/server connection associated with the event being reported. If this value is zero, the host/server connection information was invalidated before the event could be reported. remote node name An 8-byte ASCII string that represents the node name associated with the host/server connection identified in the ``connection id'' field. If the ``connection id'' field is zero, the content of this field is undefined. C.2.2.3 Byte Count/Logical Block Number Common Fields The fields common to certain event logs generated by the Device Services and Value Added firmware components are shown in Figure C-5. C-10 HSJ-Series Error Logging Figure C-5 Byte Count/Logical Block Number Common Fields Byte Count/Logical Block Number Common Fields: byte count Number of bytes of the HSJ30/40 controller firmware component initiated transfer successfully transferred. logical block number Starting logical block number of the HSJ30/40 controller firmware component initiated transfer. reserved Reserved for future use, currently contains the value 0. C.2.2.4 Device Location/Identification Common Fields The fields common to certain event logs generated by the Device Services and Value Added firmware components are shown in Figure C-6. Device Location/Identification Common Fields: device locator The location within the HSJ30/40 controller 's subsystem of the target device involved in the event being reported. This field is formatted as shown in Figure C-7. HSJ-Series Error Logging C-11 Figure C-6 Device Location/Identification Common Fields Figure C-7 Device Locator Field Format Device Locator Specific Subfields: port The SCSI bus number to which the target device is connected. target The SCSI target number on the ``port'' to which the target device is connected. lun The logical unit number on the ``target'' by which the target device is logically addressed. C-12 HSJ-Series Error Logging devtype The SCSI device type of the device. The various SCSI device types supported by the HSJ30/40 controller are shown in Table C-9. device identification Sixteen bytes of ASCII data as defined by the device vendor in the Product Identification field of the SCSI INQUIRY command data. The most significant character of the product identification data will appear in the low order byte of the first longword of this field while the least significant character appears in the high order byte of the last long word. device serial number Eight bytes of ASCII data as defined by the device vendor in the Product Serial Number field of the SCSI Unit Serial Number Page data. The most significant character of the serial number data will appear in the low order byte of the first longword of this field while the least significant character appears in the high order byte of the last longword. Note that the number of characters of serial number data supplied may vary from vendor to vendor as well as from device to device. If the serial number data supplied is less than eight characters, this field is ASCII space filled from the lowest order byte (relative to the low order byte of the first longword) containing a serial number character through the high order byte of the last longword. If the serial number data supplied are greater than eight characters, the serial number data are truncated at eight bytes (that is, the least significant character(s) of the serial number data are lost). If the serial number data are not available at all, this field is ASCII space filled. C.2.2.5 SCSI Device Sense Data Common Fields The fields common to certain event logs generated by the Device Services and Value Added firmware components are shown in Figure C-8. The first two fields shown in Figure C-8, the ``cmdopcd'' and ``sdqual'' fields, are supplied by the HSJ30/40 controller to provide qualifying information required to interpret the other SCSI Sense Data Common fields. The other fields, ``ercdval'' through ``keyspec'', contain standard Sense Data, returned in the response of a SCSI REQUEST SENSE command issued to the target device or generated by the HSJ30/40 controller on the target device's behalf. HSJ-Series Error Logging C-13 Figure C-8 SCSI Device Sense Data Common Fields Figure C-9 Sense Data Qualifier Field Format SCSI Device Sense Data Common Fields: cmdopcd The operation code of the SCSI command issued to the target device. SCSI command operation codes vary according to device type (see Table C-10) so the content of this field depends on the content of the ``devtype'' field. See the description of the ``ercdval'' field for information regarding the validity of this field. sdqual This field contains information necessary to determine whether or not the Sense Data contained in the ``ercdval'' through ``keyspec'' fields are supplied by an attached device or generated by the HSJ30/40 controller itself and to qualify the content of the ``info'' field. This field is formatted as shown in Figure C-9. Sense Data Qualifier Specific Subfields: bufmode The SCSI buffered mode selected on the device. The various SCSI Buffered Modes are shown in Table C-11. C-14 HSJ-Series Error Logging uweuo This bit is set to one if and only if an unrecoverable write error was detected while unwritten objects (that is, data blocks, filemarks, or setmarks) remain in the buffer. msbd This bit is set to one if and only if the MODE SENSE block descriptor is nonzero. fbw This bit is set to one if and only if the Fixed bit of the WRITE command is set to one. rsvd Reserved for future use. dssd This bit is set to one if and only if the Sense Data contained in the ``ercdval'' through ``keyspec'' fields are supplied by the target device. If this bit is zero, the Sense Data contained in the ``ercdval'' through ``keyspec'' fields are generated by the HSJ30/40 controller on behalf of the target device because the Sense Data could not be obtained from that device. ercdval This field contains byte 0 of the Sense Data returned in the response of a SCSI REQUEST SENSE command. This field is formatted as shown in Figure C-10. Figure C-10 SCSI Sense Data Byte Zero (``ercdval'') Field Format SCSI Sense Data Byte Zero (``ercdval'') Specific Subfields: Error Code An error code of 70 indicates that the event being reported occurred during the execution of the current command, identified in the ``cmdopcd'' field. HSJ-Series Error Logging C-15 An error code of 71 indicates that the event being reported occurred during execution of a previous command for which GOOD status has already been returned. The ``cmdopcd'' field is undefined in this case. For error codes 70 and 71 the remaining fields of the event log (such as segment, snsflgs, info, and so forth) will contain the standard SCSI Sense Data fields (bytes 1 through 17) returned in the response of a SCSI REQUEST SENSE command. An error code of 7F indicates that the Sense Data fields are in a vendor-specific format so the content of the remaining event log fields can only be determined from documentation provided by the vendor of the target device. The SCSI standard states that error code values 72 through 7E are currently reserved for future use and that error codes 00 through 6F are not defined. Should this field contain any of those codes the remaining event log fields are undefined. Valid If this bit is set to one, the content of the Sense Data Information field (bytes 3 through 6) is valid and its content is as defined by the SCSI standard (see the description of the ``info'' field for the SCSI definition of the Sense Data Information field). Otherwise, the Sense Data Information field is not as defined by the SCSI standard (refer to documentation provided by the device vendor for their definition of the field). segment This field contains byte 1 (Segment field) of the Sense Data returned in the response of a SCSI REQUEST SENSE command. If the ``cmdopcd'' is an 18 (COPY), 39 (COMPARE), or 3A (COPY AND VERIFY), this field contains the number of the current segment descriptor. snsflgs This field contains byte 2 of the Sense Data returned in the response of a SCSI REQUEST SENSE command. This field is formatted as shown in Figure C-11. C-16 HSJ-Series Error Logging Figure C-11 SCSI Sense Data Byte Two (``snsflgs'') Field Format SCSI Sense Data Byte Two (``snsflgs'') Specific Subfields: Sense Key The sense key provides generic categories in which events can be reported. The sense keys are described in Table C-12. ILI An incorrect length indicator (ILI) bit of one usually indicates that the requested logical block length did not match the logical block length of the data on the medium. EOM For sequential-access devices (that is, ``devtype'' is 1) an end-of-me dium (EOM) bit set to one indicates that the unit is at or past the early-warning if the direction was forward or that the command could not be completed because beginning-of-partition was encountered if the direction was reverse. FM A filemark (FM) bit set to one indicates that the current command has read a filemark or setmark. The Additional Sense Code field (see ``asc'' field description) may be used to indicate whether or not a filemark or setmark was read. Note that the reporting of setmarks is optional. info This field contains bytes 3 through 6 (Information field) of the Sense Data returned in the response of a SCSI REQUEST SENSE command. The content of this field varies depending on the values contained in the ``devtype'' and ``cmdopcd'' fields and the ``bufmode'', ``uweuo'', ``msbd'', and ``fbw'' subfields of the ``sdqual'' field as follows: · Regardless of the value of the ``devtype'' field and the ``sdqual'' subfields, if the ``cmdopcd'' is an 18 (COPY), 39 (COMPARE), or 3A (COPY AND VERIFY), this field contains the difference (residue) of the requested number of blocks minus the actual number of blocks copied or compared for the current segment descriptor. HSJ-Series Error Logging C-17 · Regardless of the value of the ``sdqual'' subfields, if ``devtype'' is 0 (Direct-Access Devices--such as magnetic disk) or 5 (CDROM Devices) and ``cmdopcd'' is not an 18 (COPY), 39 (COMPARE), or 3A (COPY AND VERIFY), this field contains the unsigned logical block address associated with the value contained in the Sense Key subfield of the ``snsflgs'' field (see Figure C-11). · Regardless of the value of ``cmdopcd,'' if ``devtype'' is 1 (Sequential- Access Devices--such as magnetic tape) and ``uweuo'' is 1 and ``bufmode'' is either 1 or 2, this field contains the following: - The total number of objects in the buffer if ``msbd'' and ``fbw'' are both 1. - The number of bytes in the buffer, including filemarks and setmarks, if ``msbd'' is 1 and ``fbw'' is 0. addsnsl This field contains byte 7 (Additional Sense Length field) of the Sense Data returned in the response of a SCSI REQUEST SENSE command. This field contains the number of additional Sense Data bytes to follow. If this value is less than 10, the content of some or all of the remaining event log fields (that is, cmdspec, asc, ascq, frucode, and keyspec) may be undefined. The ``cmdspec'' field is undefined unless this value is 4 or greater. The ``asc'' and ``ascq'' fields are undefined unless this value is 6 or greater. The ``frucode'' field is undefined unless this value is 7 or greater. The ``keyspec'' field is undefined unless this value is 10 or greater. If this value is greater than 10, the device supplied the Additional Sense Bytes field, which begins at byte 12 of the Sense Data. The content of the Additional Sense Bytes field is not included in the event log. cmdspec If the value contained in the ``addsnsl'' field is 4 or greater, this field contains bytes 8 through 0B (Command-Specific Information field) of the Sense Data returned in the response of a SCSI REQUEST SENSE command. The content of this field varies depending on the value contained in the ``cmdopcd'' field as follows: · If the ``cmdopcd'' is an 18 (COPY), 39 (COMPARE), or 3A (COPY AND VERIFY), the low order byte of this field contains the starting byte number of an area relative to Sense Data byte 0 that contains (unchanged) the source logical unit's status byte and sense data and the next higher order byte contains the starting byte number of an area relative to Sense Byte 0 that contains (unchanged) the destination logical unit's status byte and sense data. If the low order or next higher order byte of this field contains the value zero, no status byte or sense data was supplied for the corresponding (source or destination) logical unit. The content of the highest order two bytes of this field is undefined. C-18 HSJ-Series Error Logging · If the ``cmdopcd'' is a 7 (REASSIGN BLOCKS), this field contains the logical block address of the first defect descriptor not reassigned. If information about the first defect descriptor not reassigned is not available, or if all the defects have been reassigned, this field will contain the value FFFFFFFF. · If the ``cmdopcd'' is a 31 (SEARCH DATA EQUAL), 30 (SEARCH DATA HIGH), or 32 (SEARCH DATA LOW) and the Sense Key subfield of the ``snsflgs'' field (refer to Figure C-11) value is EQUAL, this field contains the record offset of the matching record. asc ascq If the value contained in the ``addsnsl'' field is 6 or greater and the ``dssd'' subfield of the ``sdqual'' field is equal to 1, the ``asc'' and ``ascq'' fields contain the values supplied in the byte 0C (Additional Sense Code) and byte 0D (Additional Sense Code Qualifier) fields, respectively, of the Sense Data returned in the response of a SCSI REQUEST SENSE command issued to the target device. The Additional Sense Code (ASC) field and the Additional Sense Code Qualifier (ASCQ) field together describe the event being reported. The standard SCSI ASC/ASCQ codes are ``devtype'' dependent as shown in Tables C-13 through C-16. Note that the SCSI standard defines ASCs within the range 80 through FF in combination with ASCQs within the range 00 through FF and ASCQs within the range 80 through FF regardless of ASC value as being vendor specific. Refer to documentation provided by the vendor of the target device for a description of an ASC/ASCQ value that falls within the defined vendor specific ranges. If the value contained in the ``addsnsl'' field is 6 or greater and the ``dssd'' subfield of the ``sdqual'' field is equal to 0, the ``asc'' and ``ascq'' fields contain HSJ30/40 controller vendor-specific SCSI ASC/ASCQ codes generated by the HSJ30/40 on behalf of the target device. See Table C-17 for the descriptions of the HSJ30/40 controller vendor- specific SCSI ASC/ASCQ codes. frucode If the value contained in the ``addsnsl'' field is 7 or greater, this field contains byte 0E (Field Replaceable Unit field) of the Sense Data returned in the response of a SCSI REQUEST SENSE command. If this field is nonzero, the target device is identifying the ``field replaceable unit'' that has failed. Refer to documentation for the target device for complete details of the meaning of this value. keyspec If the value contained in the ``addsnsl'' field is 10 or greater, this field contains bytes 0F through 11 (Sense-Key Specific field) of the Sense Data returned in the response of a SCSI REQUEST SENSE command. The definition of this field is determined by the value of the Sense Key subfield of the ``snsflgs'' field. This field is reserved HSJ-Series Error Logging C-19 for Sense Key values other than ILLEGAL REQUEST, RECOVERED ERROR, HARDWARE ERROR, MEDIUM ERROR, and NOT READY. If the Sense Key value is ILLEGAL REQUEST, the format of this field is as shown in Figure C-12. Figure C-12 SCSI Sense Data Byte 0F through 11 (``keyspec'') Field--Field Pointer Bytes Format SCSI Sense Data Byte 0F through 11 (``keyspec'')--Field Pointer Bytes Specific Subfields: Bit Pointer and BPV A bit pointer valid (BPV) bit of zero indicates that the value in the Bit Pointer subfield is not valid. A BPV bit of one indicates that the Bit Pointer subfield specifies which bit of the byte designated by the Field Pointer field is in error. When a multiple-bit field is in error, the Bit Pointer subfield points to the most-significant (left-most) bit of the field. C/D A command data (C/D) bit of one indicates that the illegal parameter is in the command descriptor block. A C/D bit of zero indicates that the illegal parameter is in the data parameters sent by the initiator during the DATA OUT phase. SKSV The content of the ``keyspec'' field is valid if and only if this bit is set to one. Field Pointer The Field Pointer subfield indicates which byte of the command descriptor block or of the parameter data was in error. When a multiple-byte field is in error, the pointer points to the most-significant (left-most) byte of the field. C-20 HSJ-Series Error Logging If the Sense Key value is RECOVERED ERROR, HARDWARE ERROR, or MEDIUM ERROR, the format of this field is as shown in Figure C-13. Figure C-13 SCSI Sense Data Byte 0F through 11 (``keyspec'') Field--Actual Retry Count Bytes Format SCSI Sense Data Byte 0F through 11 (``keyspec'')--Actual Retry Count Bytes Specific Subfields: SKSV The content of the ``keyspec'' field is valid if and only if this bit is set to one. Actual Retry Count The actual retry count subfield contains the implementation-specific information on the actual number of retries of the recovery algorithm used in attempting to recover an error or exception condition. If the Sense Key value is NOT READY and the last command issued to the device was a FORMAT UNIT, the format of this field is as shown in Figure C-14. Figure C-14 SCSI Sense Data Byte 0F through 11 (``keyspec'') Field--Progress Indication Bytes Format HSJ-Series Error Logging C-21 SCSI Sense Data Byte 0F through 11 (``keyspec'')--Progress Indication Bytes Specific Subfields: SKSV The content of the ``keyspec'' field is valid if and only if this bit is set to one. Progress Indication This subfield is a percent complete indication in which the returned value is the numerator that has 10000 as its denominator. The progress indication is based upon the total format operation including any certification or initialization operations. C.2.3 Specific Event Log Formats In addition to the common fields generated across certain event logs, there is specific information for each log, based on template type. The specific information is described in Sections C.2.3.1 through C.2.3.14. C.2.3.1 Last Failure Event Log (Template 01) Unrecoverable conditions detected by either the firmware or hardware, and certain operator initiated conditions result in the termination of HSJ30/40 controller operation. In most cases, following such a termination, the controller will attempt to restart (that is, reinitialization) with hardware components and firmware data structures initialized to the states necessary to perform normal operations. If the restart is successful and communications are reestablished with the host system(s), and ``Miscellaneous'' error logging is enabled by one or more host systems, the HSJ30/40 controller will send a Last Failure Event Log that describes the condition that caused controller operation to terminate to all host systems that have enabled ``Miscellaneous'' error logging on a connection or connections established with the HSJ30/40 controller 's Disk and/or Tape MSCP Server. The Last Failure Event Log is reported via the T/MSCP Controller Errors error log message format. The format of this event log, including the HSJ30/40 controller specific fields, is shown in Figure C-15. Last Failure Event Log Format Specific Fields: format This field contains the value 00 (that is, T/MSCP Controller Errors error log format code). C-22 HSJ-Series Error Logging Figure C-15 Last Failure Event Log (Template 01) Format event code The values that can be reported in this field for this event log are shown in Table C-18. reserved (offset 16) This field contains the value 0. instance code See Section C.2.1 for the description of this field. The values that can be reported in this field for this event log are shown in Table C-18. HSJ-Series Error Logging C-23 templ See Section C.2.1 for the description of this field. This field contains the value 01 for this event log. tdisize See Section C.2.1 for the description of this field. This field contains the value 24 for this event log. reserved (offset 1E) This field contains the value 0. event time See Section C.2.1 for the description of this field. last failure code A number that uniquely describes the unrecoverable condition being reported as shown in Tables C-33 through C-48. The format of this field is shown in Figure C-16. ------------------------------------------------------------ Note ------------------------------------------------------------ Do not confuse this field with the ``instance code'' field. They are similar in format but convey different information. ------------------------------------------------------------ Figure C-16 Last Failure Code Format Last Failure Code Specific Subfields: Parameter Count The number of longwords of supplemental information provided in the ``last failure parameters'' field. Restart Code A number that describes the actions taken to restart the controller after the unrecoverable condition was detected, as shown in Table C-49. C-24 HSJ-Series Error Logging HW Hardware/firmware flag. If this flag is equal to 1, the unrecoverable condition is due to a hardware-detected fault. If this flag is equal to 0, the unrecoverable condition is due to a firmware-detected inconsistency. Repair Action The recommended repair action code assigned to the condition. This value indicates what notification/recovery action should be taken. See Section C.5 for more detail. Error Number A number, when combined with the value contained in the Component ID subfield, uniquely identifies the condition detected. Component ID A number that uniquely identifies the firmware component that reported the condition, as shown in Table C-2. last failure parameters This field contains supplemental information specific to the failure being reported. The content of the parameters supplied (if any) are described in the individual ``last failure code'' descriptions contained in Tables C-33 through C-48. C.2.3.2 Failover Event Log (Template 05) The HSJ30/40 controller Failover Control firmware component reports errors and other conditions encountered during redundant controller communications and failover operation via the Failover Event Log. The Failover Event Log will be sent to all host systems that have enabled ``Miscellaneous'' error logging on a connection or connections established with the HSJ30/40 controller 's Disk and/or Tape MSCP Server. The Failover Event Log is reported via the T/MSCP Controller Errors error log message format. The format of this event log, including the HSJ30/40 controller specific fields, is shown in Figure C-17. HSJ-Series Error Logging C-25 Figure C-17 Failover Event Log (Template 05) Format Failover Event Log Format Specific Fields: format This field contains the value 00 (that is, T/MSCP Controller Errors error log format code). event code The values that can be reported in this field for this event log are shown in Table C-19. reserved (offset 16) This field contains the value 0. C-26 HSJ-Series Error Logging instance code See Section C.2.1 for the description of this field. The values that can be reported in this field for this event log are shown in Table C-19. templ See Section C.2.1 for the description of this field. This field contains the value 05 for this event log. tdisize See Section C.2.1 for the description of this field. This field contains the value 24 for this event log. reserved (offset 1E) This field contains the value 0. event time See Section C.2.1 for the description of this field. last failure code last failure parameters These fields contain the last failure information supplied in the last gasp message sent by the other HSJ30/40 controller in a dual- redundant configuration as a normal part of terminating controller operation. See Section C.2.3.1 for the description of the format of these fields. Note that the content of certain of the fields described previously may be undefined depending on the value supplied in the ``instance code'' field. See Table C-19 for more detail. C.2.3.3 Nonvolatile Parameter Memory Component Event Log (Template 11) The HSJ30/40 controller Executive firmware component reports errors detected while accessing a Nonvolatile Parameter Memory Component via the Nonvolatile Parameter Memory Component Event Log. The Nonvolatile Parameter Memory Component Event Log will be sent to all host systems that have enabled ``Miscellaneous'' error logging on a connection or connections established with the HSJ30/40 controller 's Disk and/or Tape MSCP Server. The Nonvolatile Parameter Memory Component Event Log is reported via the T/MSCP Memory Errors error log message format. The format of this event log, including the HSJ30/40 controller specific fields, is shown in Figure C-18. HSJ-Series Error Logging C-27 Figure C-18 Nonvolatile Parameter Memory Component Event Log (Template 11) Format Nonvolatile Parameter Memory Component Event Log Format Specific Fields: format This field contains the value 01 (that is, T/MSCP Memory Errors error log format code). event code The values that can be reported in this field for this event log are shown in Table C-20. C-28 HSJ-Series Error Logging memory address The physical address of the beginning of the affected Nonvolatile Parameter Memory component area. instance code See Section C.2.1 for the description of this field. The values that can be reported in this field for this event log are shown in Table C-20. templ See Section C.2.1 for the description of this field. This field contains the value 11 for this event log. tdisize See Section C.2.1 for the description of this field. This field contains the value 08 for this event log. reserved (offset 22) This field contains the value 0. event time See Section C.2.1 for the description of this field. byte count The number of bytes contained in the affected Nonvolatile Parameter Memory component area (that is, the area bounded by: ``memory address'' through ``memory address'' + ``byte count'' -1). number of times written The number of times the affected Nonvolatile Parameter Memory component area has been written. undef This field is only present to provide longword alignment; its content is undefined. C.2.3.4 Backup Battery Failure Event Log (Template 12) The HSJ30/40 controller Value Added Services firmware component reports backup battery failure conditions for the various hardware components that use a battery to maintain state during power-failures via the Backup Battery Failure Event Log. HSJ-Series Error Logging C-29 The Backup Battery Failure Event Log will be sent to all host systems that have enabled ``Miscellaneous'' error logging on a connection or connections established with the HSJ30/40 controller 's Disk and/or Tape MSCP Server. The Backup Battery Failure Event Log is reported via the T/MSCP Memory Errors error log message format. The format of this event log, including the HSJ30/40 controller specific fields, is shown in Figure C-19. Figure C-19 Backup Battery Failure Event Log (Template 12) Format Backup Battery Failure Event Log Format Specific Fields: format This field contains the value 01 (that is, T/MSCP Memory Errors error log format code). C-30 HSJ-Series Error Logging event code The values that can be reported in this field for this event log are shown in Table C-21. memory address The content of this field depends on the value supplied in the ``instance code'' field. See Table C-21 for more detail. instance code See Section C.2.1 for the description of this field. The values that can be reported in this field for this event log are shown in Table C-21. templ See Section C.2.1 for the description of this field. This field contains the value 12 for this event log. tdisize See Section C.2.1 for the description of this field. This field contains the value 00 for this event log. reserved (offset 22) This field contains the value 0. event time See Section C.2.1 for the description of this field. C.2.3.5 Subsystem Built-In Self-Test Failure Event Log (Template 13) The HSJ30/40 controller Subsystem Built-In Self-Tests firmware component reports errors detected during test execution via the Subsystem Built-In Self-Test Failure Event Log. The Subsystem Built-In Self-Test Failure Event Log will be sent to all host systems that have enabled ``Miscellaneous'' error logging on a connection or connections established with the HSJ30/40 controller 's Disk and/or Tape MSCP Server. The Subsystem Built-In Self-Test Failure Event Log is reported via the T/MSCP Controller Errors error log message format. The format of this event log, including the HSJ30/40 controller specific fields, is shown in Figure C-20. HSJ-Series Error Logging C-31 Figure C-20 Subsystem Built-In Self-Test Failure Event Log (Template 13) Format Subsystem Built-In Self-Test Failure Event Log Format Specific Fields: format This field contains the value 00 (that is, T/MSCP Controller Errors error log format code). event code The values that can be reported in this field for this event log are shown in Table C-22. reserved (offset 16) This field contains the value 0. C-32 HSJ-Series Error Logging instance code See Section C.2.1 for the description of this field. The values that can be reported in this field for this event log are shown in Table C-22. templ See Section C.2.1 for the description of this field. This field contains the value 13 for this event log. tdisize See Section C.2.1 for the description of this field. This field contains the value 24 for this event log. reserved (offset 1E) This field contains the value 0. event time See Section C.2.1 for the description of this field. undefined This field is only present to provide longword alignment, its content is undefined. hdrtype hdrflgs te tnum tcmd tflags error code return code address of error expected error data actual error data extra status 1 extra status 2 extra status 3 The content of these fields varies depending on the HSJ30/40 controller Subsystem Built-in Self-Test that detected the error condition and the error condition that was detected. HSJ-Series Error Logging C-33 C.2.3.6 Memory System Failure Event Log (Template 14) The HSJ30/40 controller Executive firmware component and the Cache Manager, part of the Value Added firmware component, report the occurrence of memory errors via the Memory System Failure Event Log. The Memory System Failure Event Log will be sent to all host systems that have enabled ``Miscellaneous'' error logging on a connection or connections established with the HSJ30/40 controller 's Disk and/or Tape MSCP Server. The Memory System Failure Event Log is reported via the T/MSCP Memory Errors error log message format. The format of this event log, including the HSJ30/40 controller specific fields, is shown in Figure C-21. Memory System Failure Event Log Format Specific Fields: format This field contains the value 01 (that is, T/MSCP Memory Errors error log format code). event code The values that can be reported in this field for this event log are shown in Table C-23. memory address The content of this field depends on the value supplied in the ``instance code'' field. See Table C-23 for more detail. instance code See Section C.2.1 for the description of this field. The values that can be reported in this field for this event log are shown in Table C-23. templ See Section C.2.1 for the description of this field. This field contains the value 14 for this event log. tdisize See Section C.2.1 for the description of this field. This field contains the value 34 for this event log. reserved (offset 22) This field contains the value 0. event time See Section C.2.1 for the description of this field. C-34 HSJ-Series Error Logging Figure C-21 Memory System Failure Event Log (Template 14) Format byte count The number of bytes contained in the bad memory area (that is, the area bounded by: ``memory address'' through ``memory address'' + ``byte count'' -1). HSJ-Series Error Logging C-35 dsr csr dcsr der ear edr err rsr These fields contain the values contained in the registers of the DRAB that detected the memory failure. rdr0 rdr1 wdr0 wdr1 These fields contain the values contained in the HSJ30/40 controller 's Read and Write Diagnostic registers. Note that the content of certain of the fields described previously may be undefined depending on the value supplied in the ``instance code'' field. See Table C-23 for more detail. C.2.3.7 CI Port Event Log (Template 31) The HSJ30/40 controller Host Interconnect Services firmware component reports errors detected while performing work related to the CI Port communication layer via the CI Port Event Log. The CI Port Event Log will be sent to all host systems that have enabled ``Miscellaneous'' error logging on a connection or connections established with the HSJ30/40 controller 's Disk and/or Tape MSCP Server. The CI Port Event Log is reported via the T/MSCP Controller Errors error log message format. The format of this event log, including the HSJ30/40 controller specific fields, is shown in Figure C-22. CI Port Event Log Format Specific Fields: format This field contains the value 00 (that is, T/MSCP Controller Errors error log format code). event code The values that can be reported in this field for this event log are shown in Table C-24. reserved (offset 16) This field contains the value 0. C-36 HSJ-Series Error Logging Figure C-22 CI Port Event Log (Template 31) Format instance code See Section C.2.1 for the description of this field. The values that can be reported in this field for this event log are shown in Table C-24. templ See Section C.2.1 for the description of this field. This field contains the value 31 for this event log. tdisize See Section C.2.1 for the description of this field. This field contains the value 0C for this event log. HSJ-Series Error Logging C-37 reserved (offset 1E) This field contains the value 0. event time See Section C.2.1 for the description of this field. his status error id src dst intopcd See Section C.2.2.1 for the description of these fields. undef This field is only present to provide longword alignment; its content is undefined. C.2.3.8 CI Port/Port Driver Event Log (Template 32) The HSJ30/40 controller Host Interconnect Services firmware component reports errors detected while performing work related to the CI Port/Port Driver (PPD) communication layer via the CI Port/Port Driver Event Log. The CI Port/Port Driver Event Log will be sent to all host systems that have enabled ``Miscellaneous'' error logging on a connection or connections established with the HSJ30/40 controller 's Disk and/or Tape MSCP Server. The CI Port/Port Driver Event Log is reported via the T/MSCP Controller Errors error log message format. The format of this event log, including the HSJ30/40 controller specific fields, is shown in Figure C-23. CI Port/Port Driver Event Log Format Specific Fields: format This field contains the value 00 (that is, T/MSCP Controller Errors error log format code). event code The values that can be reported in this field for this event log are shown in Table C-25. reserved (offset 16) This field contains the value 0. C-38 HSJ-Series Error Logging Figure C-23 CI Port/Port Driver Event Log (Template 32) Format instance code See Section C.2.1 for the description of this field. The values that can be reported in this field for this event log are shown in Table C-25. templ See Section C.2.1 for the description of this field. This field contains the value 32 for this event log. tdisize See Section C.2.1 for the description of this field. This field contains the value 10 for this event log. HSJ-Series Error Logging C-39 reserved (offset 1E) This field contains the value 0. event time See Section C.2.1 for the description of this field. his status error id src dst intopcd vcstate ppd opcode See Section C.2.2.1 for the description of these fields. undefined This field is only present to provide longword alignment; its content is undefined. Note that the content of certain of the fields described previously may be undefined depending on the value supplied in the ``instance code'' field. See Table C-25 for more detail. C.2.3.9 CI System Communication Services Event Log (Template 33) The HSJ30/40 controller Host Interconnect Services firmware component reports errors detected while performing work related to the CI System Communication Services (SCS) communication layer via the CI System Communication Services Event Log. The CI Communication Services Event Log will be sent to all host systems that have enabled ``Miscellaneous'' error logging on a connection or connections established with the HSJ30/40 controller 's Disk and/or Tape MSCP Server. The CI System Communication Services Event Log is reported via the T/MSCP Controller Errors error log message format. The format of this event log, including the HSJ30/40 controller specific fields, is shown in Figure C-24. CI System Communication Services Event Log Format Specific Fields: format This field contains the value 00 (that is, T/MSCP Controller Errors error log format code). event code The values that can be reported in this field for this event log are shown in Table C-26. C-40 HSJ-Series Error Logging Figure C-24 CI System Communication Services Event Log (Template 33) Format reserved (offset 16) This field contains the value 0. instance code See Section C.2.1 for the description of this field. The values that can be reported in this field for this event log are shown in Table C-26. templ See Section C.2.1 for the description of this field. This field contains the value 33 for this event log. HSJ-Series Error Logging C-41 tdisize See Section C.2.1 for the description of this field. This field contains the value 2C for this event log. reserved (offset 1E) This field contains the value 0. event time See Section C.2.1 for the description of this field. his status error id src dst intopcd vcstate ppd opcode scs opcode See Section C.2.2.1 for the description of these fields. connection id remote node name See Section C.2.2.2 for the description of these fields. remote connection id The remote connection identifier supplied by the host node. received connection id The connection identifier of the System Application (SYSAP) that is receiving the message contained in the Host Transaction Block. send connection id The connection identifier of the System Application (SYSAP) that is sending the message contained in the Host Transaction Block. connection state The connection state code as shown in Table C-8. undefined This field is only present to provide longword alignment; its content is undefined. Note that the content of certain of the fields described previously may be undefined depending on the value supplied in the ``instance code'' field. See Table C-26 for more detail. C-42 HSJ-Series Error Logging C.2.3.10 Device Services Nontransfer Error Event Log (Template 41) The HSJ30/40 controller Device Services firmware component reports errors detected while performing nontransfer work related to disk, tape, or media loader device operations via the Device Services Nontransfer Event Log. If the error is associated with a command issued by a host system, the Device Services Nontransfer Error Event Log will be sent to the host system that issued the command on the same connection upon which the command was received if ``This Host'' error logging is enabled on that connection, and to all host systems that have enabled ``Other Host'' error logging on a connection or connections established with the HSJ30/40 controller 's Disk and/or Tape MSCP Server. If the error is associated with a command issued by an HSJ30/40 controller firmware component, the Device Services Nontransfer Error Event Log will be sent to all host systems that have enabled ``Miscellaneous'' error logging on a connection established with the HSJ30/40 controller 's Disk and/or Tape MSCP Server. The Device Services Nontransfer Error Event Log is reported via the T/MSCP Controller Errors error log message format. The format of this event log, including the HSJ30/40 controller specific fields, is shown in Figure C-25. Device Services Nontransfer Error Event Log Format Specific Fields: format This field contains the value 00 (that is, T/MSCP Controller Errors error log format code). event code The values that can be reported in this field for this event log are shown in Table C-27. reserved (offset 16) This field contains the value 0. instance code See Section C.2.1 for the description of this field. The values that can be reported in this field for this event log are shown in Table C-27. templ See Section C.2.1 for the description of this field. This field contains the value 41 for this event log. tdisize See Section C.2.1 for the description of this field. HSJ-Series Error Logging C-43 Figure C-25 Device Services Nontransfer Error Event Log (Template 41) Format This field contains the value 04 for this event log. reserved (offset 1E) This field contains the value 0. event time See Section C.2.1 for the description of this field. port The SCSI bus number affected by the error being reported. target The SCSI target number on the ``port'' affected by the error being reported. C-44 HSJ-Series Error Logging asc ascq The ``asc'' and ``ascq'' fields contain the values supplied in byte 0C (Additional Sense Code) and byte 0D (Additional Sense Code Qualifier) fields, respectively, of the Sense Data returned in the response of a SCSI REQUEST SENSE command issued to the target device. The description of the value supplied in the ``instance code'' field (see Table C-27) describes the Sense Key value supplied in the Sense Data returned. Note that the content of certain of the fields described previously may be undefined depending on the value supplied in the ``instance code'' field. See Table C-27 for more detail. C.2.3.11 Disk Transfer Error Event Log (Template 51) The HSJ30/40 controller Device Services and Value Added Services firmware components report errors detected while performing work related to disk unit transfer operations via the Disk Transfer Error Event Log. If the error is associated with a command issued by a host system, the Disk Transfer Error Event Log will be sent to the host system that issued the command on the same connection upon which the command was received if ``This Host'' error logging is enabled on that connection and to all host systems that have enabled ``Other Host'' error logging on a connection or connections established with the HSJ30/40 controller 's Disk and/or Tape MSCP Server. If the error is associated with a command issued by a HSJ30/40 controller firmware component, the Disk Transfer Error Event Log will be sent to all host systems that have enabled ``Miscellaneous'' error logging on a connection established with the HSJ30/40 controller 's Disk MSCP Server. The Disk Transfer Error Event Log is reported via the MSCP Disk Transfer Errors error log message format. The format of this event log, including the HSJ30/40 controller specific fields, is shown in Figure C-26. Disk Transfer Error Event Log Format Specific Fields: format This field contains the value 02 (that is, MSCP Disk Transfer Errors error log format code). event code The values that can be reported in this field for this event log are shown in Table C-28. instance code See Section C.2.1 for the description of this field. HSJ-Series Error Logging C-45 Figure C-26 Disk Transfer Error Event Log (Template 51) Format The values that can be reported in this field for this event log are shown in Table C-28. templ See Section C.2.1 for the description of this field. This field contains the value 51 for this event log. tdisize See Section C.2.1 for the description of this field. This field contains the value 3C for this event log. C-46 HSJ-Series Error Logging reserved (offset 32) This field contains the value 0. event time See Section C.2.1 for the description of this field. ancillary information The format of this field varies depending on whether or not the event being reported is associated with a command issued by a host system or one issued by an HSJ30/40 controller firmware component. If the event is associated with a command issued by a host system, this field is formatted as described in Section C.2.2.2. If the event is associated with a command issued by an HSJ30/40 controller firmware component, this field is formatted as described in Section C.2.2.3. device locator devtype device identification device serial number See Section C.2.2.4 for the description of these fields. cmdopcd infoq ercdval segment snsflgs info addsnsl cmdspec asc ascq frucode keyspec See Section C.2.2.5 for the description of these fields. Note that the content of certain of the fields described previously may be undefined depending on the value supplied in the ``instance code'' field. See Table C-28 for more detail. C.2.3.12 Disk Bad Block Replacement Attempt Event Log (Template 57) The HSJ30/40 controller Value Added firmware component reports disk unit bad block replacement attempt results via the Disk Bad Block Replacement Attempt Event Log. HSJ-Series Error Logging C-47 If the replacement is associated with a command issued by a host system, the Disk Bad Block Replacement Attempt Event Log will be sent to the host system that issued the command on the same connection upon which the command was received if ``This Host'' error logging is enabled on that connection, and to all host systems that have enabled ``Other Host'' error logging on a connection or connections established with the HSJ30/40 controller 's Disk and/or Tape MSCP Server. If the replacement is associated with a command issued by an HSJ30/40 controller firmware component, the Disk Bad Block Replacement Attempt Error Event Log will be sent to all host systems that have enabled ``Miscellaneous'' error logging on a connection established with the HSJ30/40 controller 's Disk MSCP Server. The Disk Bad Block Replacement Attempt Event Log is reported via the MSCP Bad Block Replacement Attempt error log message format. The format of this event log, including the HSJ30/40 controller specific fields, is shown in Figure C-27. Disk Bad Block Replacement Attempt Event Log Format Specific Fields: format This field contains the value 09 (that is, MSCP Bad Block Replacement Attempt error log format code). event code The values that can be reported in this field for this event log are shown in Table C-29. reserved (offset 36) This field contains the value 0. instance code See Section C.2.1 for the description of this field. The values that can be reported in this field for this event log are shown in Table C-29. templ See Section C.2.1 for the description of this field. This field contains the value 57 for this event log. tdisize See Section C.2.1 for the description of this field. This field contains the value 1C for this event log. C-48 HSJ-Series Error Logging Figure C-27 Disk Bad Block Replacement Attempt Event Log (Template 57) Format reserved (offset 3E) This field contains the value 0. event time See Section C.2.1 for the description of this field. HSJ-Series Error Logging C-49 device locator devtype device identification device serial number See Section C.2.2.4 for the description of these fields. Note that the content of certain of the fields described previously may be undefined depending on the value supplied in the ``instance code'' field. See Table C-29 for more detail. C.2.3.13 Tape Transfer Error Event Log (Template 61) The HSJ30/40 controller Device Services and Value Added Services firmware components report errors detected while performing work related to tape unit transfer operations via the Tape Transfer Error Event Log. If the error is associated with a command issued by a host system, the Tape Transfer Error Event Log will be sent to the host system that issued the command on the same connection upon which the command was received if ``This Host'' error logging is enabled on that connection, and to all host systems that have enabled ``Other Host'' error logging on a connection or connections established with the HSJ30/40 controller 's Disk and/or Tape MSCP Server. If the error is associated with a command issued by an HSJ30/40 controller firmware component, the Tape Transfer Error Event Log will be sent to all host systems that have enabled ``Miscellaneous'' error logging on a connection established with the HSJ30/40 controller 's Tape MSCP Server. The Tape Transfer Error Event Log is reported via the TMSCP Tape Errors error log message format. The format of this event log, including the HSJ30/40 controller specific fields, is shown in Figure C-28. Tape Transfer Error Event Log Format Specific Fields: format This field contains the value 05 (that is, TMSCP Tape Errors error log format code). event code The values that can be reported in this field for this event log are shown in Table C-30. instance code See Section C.2.1 for the description of this field. The values that can be reported in this field for this event log are shown in Table C-30. C-50 HSJ-Series Error Logging Figure C-28 Tape Transfer Error Event Log (Template 61) Format templ See Section C.2.1 for the description of this field. This field contains the value 61 for this event log. tdisize See Section C.2.1 for the description of this field. This field contains the value 3C for this event log. reserved (offset 32) This field contains the value 0. HSJ-Series Error Logging C-51 event time See Section C.2.1 for the description of this field. ancillary information The format of this field varies depending on whether or not the event being reported is associated with a command issued by a host system or one issued by an HSJ30/40 controller firmware component. If the event is associated with a command issued by a host system, this field is formatted as described in Section C.2.2.2. If the event is associated with a command issued by an HSJ30/40 controller firmware component, this field is considered ``reserved'' and contains the value 0. device locator devtype device identification device serial number See Section C.2.2.4 for the description of these fields. cmdopcd infoq ercdval segment snsflgs info addsnsl cmdspec asc ascq frucode keyspec See Section C.2.2.5 for the description of these fields. Note that the content of certain of the fields described previously may be undefined depending on the value supplied in the ``instance code'' field. See Table C-30 for more detail. C.2.3.14 Media Loader Error Event Log (Template 71) The HSJ30/40 controller Device Services firmware component reports errors detected while performing work related to media loader operations via the Media Loader Error Event Log. If the error is associated with a command issued by a host system, the Media Loader Error Event Log will be sent to the host system that issued the command on the same connection upon which the command was received if ``This Host'' error logging is enabled on that connection, and to all host systems that have enabled ``Other Host'' error logging on a connection or connections established with the HSJ30/40 controller 's Disk and/or Tape MSCP Server. C-52 HSJ-Series Error Logging If the error is associated with a command issued by an HSJ30/40 controller firmware component, the Media Loader Error Event Log will be sent to all host systems that have enabled ``Miscellaneous'' error logging on a connection established with the HSJ30/40 controller 's Tape MSCP Server. The Media Loader Error Event Log is reported via the T/MSCP Media Loader Errors error log message format. The format of this event log, including the HSJ30/40 controller specific fields, is shown in Figure C-29. Figure C-29 Media Loader Error Event Log (Template 71) Format HSJ-Series Error Logging C-53 Media Loader Error Event Log Format Specific Fields: format This field contains the value 0A (that is, T/MSCP Media Loader Errors error log format code). event code The values that can be reported in this field for this event log are shown in Table C-31. instance code See Section C.2.1 for the description of this field. The values that can be reported in this field for this event log are shown in Table C-31. templ See Section C.2.1 for the description of this field. This field contains the value 71 for this event log. tdisize See Section C.2.1 for the description of this field. This field contains the value 3C for this event log. reserved (offset 36) This field contains the value 0. event time See Section C.2.1 for the description of this field. ancillary information The format of this field varies depending on whether or not the event being reported is associated with a command issued by a host system or one issued by an HSJ30/40 controller firmware component. If the event is associated with a command issued by a host system, this field is formatted as described in Section C.2.2.2. If the event is associated with a command issued by an HSJ30/40 controller firmware component, this field is considered ``reserved'' and contains the value 0. device locator devtype device identification device serial number See Section C.2.2.4 for the description of these fields. C-54 HSJ-Series Error Logging cmdopcd infoq ercdval segment snsflgs info addsnsl cmdspec asc ascq frucode keyspec See Section C.2.2.5 for the description of these fields. C.2.3.15 Disk Copy Data Correlation Event Log The HSJ30/40 controller Disk MSCP Server firmware component reports errors detected while performing Disk Copy Data commands via the Disk Copy Data Correlation Event Log. The format of the Disk Copy Data Correlation Event Log is identical to the format of the MSCP Disk Copy Data Correlation error log message. The HSJ30/40 controller generates Disk Copy Data Correlation Event Logs in accordance with MSCP protocol. If a Controller Error (subcode ``Local Connection Request Failed, Insufficient Resources to Request Local Connection'') or a Controller Error (subcode ``Remote Connection Request Failed, Insufficient Resources to Request Remote Connection'') condition is detected, the HSJ30/40 controller will store one of values shown in Table C-32 in the first longword of the ``event dependent information'' field of the MSCP Disk Copy Data Correlation error log message to identify the resource that is lacking. HSJ-Series Error Logging C-55 C.3 Event Log Codes Tables C-2 through C-49 list specific codes contained within the event log information. Table C-2 Firmware Component Identifier Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 01 Executive Services 02 Value Added Services 03 Device Services 04 Fault Manager 06 Dual Universal Asynchronous Receiver/Transmitter Services 07 Failover Control 08 Nonvolatile Parameter Memory Failover Control 20 Command Line Interpreter 40 Host Interconnect Services 42 Host Interconnect Port Services 60 Disk and Tape MSCP Server 61 Diagnostics and Utilities Protocol Server 62 System Communication Services Directory Service 80 Disk Inline Exerciser (DILX) 81 Tape Inline Exerciser (TILX) 82 Subsystem Built-In Self-Tests (BIST) 83 Automatic Device Configuration Program (CONFIG) ------------------------------------------------------------ Table C-3 Host Interconnect Services Status Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 00000000 Request succeeded. 00000001 The remote sent a message over a connection that has been invalidated. 00000002 The remote sent a message for which no receive credit is available. 00000003 Received a message from the remote while in an invalid or illegal connection state. 00000004 Pending work exists but connection state is invalid or illegal. 00000009 Request failed, no additional information available. 00000032 A PPD message was received from the remote but the Virtual Circuit is in an invalid or illegal state. 00000033 A PPD START was received from the remote but the Virtual Circuit state indicates that the Virtual Circuit is already OPEN. 00000034 A PPD NODE_STOP was received from the remote. 00000035 The ``PPD START send without receiving a PPD START in response'' limit has been reached; the remote node is acknowledging the packets but not responding to them. (continued on next page) C-56 HSJ-Series Error Logging Table C-3 (Cont.) Host Interconnect Services Status Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 00000036 The ``PPD STACK send without receiving a PPD ACK in response'' limit has been reached; the remote node is acknowledging the packets but not responding to them. 00000064 The ``CI IDREQ send without receiving a CI ID in response'' limit has been reached on both Path A and Path B; the remote node is acknowledging the packets but not responding to them. 00000065 A CI ID or CI CNF packet (transmitted by the thread on behalf of Host Interconnect Services) could not be successfully transmitted. 00010009 VC closed due to CI ID request failure. 00020009 VC closed due to unexpected SCS state. 00030009 VC closed due to CI START failure. 00040009 VC closed due to CI STACK failure. 00050009 VC closed due to PPD ACK failure. 00060009 VC closed due to PPD NODE_STOP or PPD START message received. 00070009 VC closed due to NAK ADP retry CI ID transmit failure. 00080009 VC closed due to NAK ADP retry transmit failure. 00090009 VC closed due to NOR DDL retry transmit failure on Path A. 000A0009 VC closed due to NOR DDL retry transmit failure on Path B. 000B0009 VC closed due to NOR ADP retry CI ID transmit failure. 000C0009 VC closed due to NOR ADP retry transmit failure. 000D0009 VC closed due to NAK DDL retry transmit failure on Path A. 000E0009 VC closed due to NAK DDL retry transmit failure on Path B. 000F0009 VC closed due to arbitration timeout on Path A. 00100009 VC closed due to arbitration timeout on Path B. 00110009 VC closed due to Path A off. 00120009 VC closed due to Path B off. 00130009 VC closed due to dual receive. 00140009 VC closed due to invalid receive data structure state. 00150009 VC closed due to no path. 00160009 VC closed due to message transmit closed. 00170009 VC closed due to data transmit closed. 00180009 VC closed due to message scan. 00190009 VC closed due to data scan. 001A0009 VC closed due to data timeout. 001B0009 VC closed due to unrecognized packet. 001C0009 VC closed due to data transmit failure. 001D0009 VC closed due to CI ID complete failure. 001E0009 VC closed due to lost command. 001F0009 Not implemented in CI environment. ------------------------------------------------------------ HSJ-Series Error Logging C-57 Table C-4 CI Message Operation Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 00 Reserved 01 DG 02 MSG 03 CNF 04 MCNF 05 IDREQ 06 RST 07 STRT 08 DATREQ0 09 DATREQ1 0A DATREQ2 0B ID 0C PSREQ 0D LB 0E MDATREQ 0F RETPS 10 SNTDAT 11 RETDAT 12 SNTMDAT 13 RETMDAT ------------------------------------------------------------ Table C-5 CI Virtual Circuit State Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 0001 VC_CLOSED 0002 START_SENT 0003 START_REC 0004 VC_OPEN 0005 VC_CLOSING ------------------------------------------------------------ C-58 HSJ-Series Error Logging Table C-6 Port/Port Driver Message Operation Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 0000 START 0001 STACK 0002 ACK 0003 SCS_DG 0004 SCS_MSG 0005 ERROR_LOG 0006 NODE_STOP ------------------------------------------------------------ Table C-7 System Communication Services Message Operation Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 0000 CONNECT_REQ 0001 CONNECT_RSP 0002 ACCEPT_REQ 0003 ACCEPT_RSP 0004 REJECT_REQ 0005 REJECT_RSP 0006 DISCONNECT_REQ 0007 DISCONNECT_RSP 0008 CREDIT_REQ 0009 CREDIT_RSP 000A APPL_MSG 000B APPL_DG ------------------------------------------------------------ HSJ-Series Error Logging C-59 Table C-8 CI Connection State Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 0000 CLOSED 0001 LISTENING 0002 CONNECT_SENT 0003 CONNECT_ACK 0004 CONNECT_REC 0005 ACCEPT_SENT 0006 REJECT_SENT 0007 OPEN 0008 DISCONNECT_SENT 0009 DISCONNECT_REC 000A DISCONNECT_ACK 000B DISCONNECT_MATCH ------------------------------------------------------------ Table C-9 Supported SCSI Device Type Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 00 Direct-Access Devices (such as magnetic disk). 01 Sequential-Access Devices (such as magnetic tape). 05 CDROM Devices. 08 Medium Changer Devices (such as jukeboxes). ------------------------------------------------------------ C-60 HSJ-Series Error Logging Table C-10 SCSI Command Operation Codes ------------------------------------------------------------ Code Supported Device Types (See Table C-9) Description ------------------------------------------------------------ 00 00, 01, 05, 08 TEST UNIT READY 01 01 REWIND 01 00, 05, 08 REZERO UNIT 03 00, 01, 05, 08 REQUEST SENSE 04 00 FORMAT UNIT 05 01 READ BLOCK LIMITS 07 08 INITIALIZE ELEMENT STATUS 07 00 REASSIGN BLOCKS 08 00, 01, 05 READ (6 byte) 0A 00, 01 WRITE (6 byte) 0B 00, 05 SEEK (6 byte) 0F 01 READ REVERSE 10 01 WRITE FILEMARKS 11 01 SPACE 12 00, 01, 05, 08 INQUIRY 13 01 TAPE VERIFY 14 01 RECOVER BUFFERED DATA 15 00, 01, 05, 08 MODE SELECT (6 byte) 16 00, 01, 05, 08 RESERVE UNIT 17 00, 01, 05, 08 RELEASE UNIT 18 00, 01, 05 COPY 19 01 ERASE 1A 00, 01, 05, 08 MODE SENSE (6 byte) 1B 00, 05 START STOP UNIT 1B 01 LOAD UNLOAD 1C 00, 01, 05, 08 RECEIVE DIAGNOSTIC RESULTS 1D 00, 01, 05, 08 SEND DIAGNOSTIC 1E 00, 01, 05, 08 PREVENT-ALLOW MEDIUM REMOVAL 25 00, 05 READ CAPACITY 28 00, 05 READ (10 byte) 2A 00 WRITE (10 byte) 2B 08 POSITION TO ELEMENT 2B 01 LOCATE 2B 00, 05 SEEK (10 byte) 2E 00 WRITE AND VERIFY (10 byte) 2F 00, 05 VERIFY (10 byte) 30 00, 05 SEARCH DATA HIGH (10 byte) (continued on next page) HSJ-Series Error Logging C-61 Table C-10 (Cont.) SCSI Command Operation Codes ------------------------------------------------------------ Code Supported Device Types (See Table C-9) Description ------------------------------------------------------------ 31 00, 05 SEARCH DATA EQUAL (10 byte) 32 00, 05 SEARCH DATA LOW (10 byte) 33 00, 05 SET LIMITS (10 byte) 34 01 READ POSITION 34 00, 05 PRE-FETCH 35 00, 05 SYNCHRONIZE CACHE 36 00, 05 LOCK-UNLOCK CACHE 37 00 READ DEFECT DATA (10 byte) 39 00, 01, 05 COMPARE 3A 00, 01, 05 COPY AND VERIFY 3B 00, 01, 05, 08 WRITE BUFFER 3C 00, 01, 05, 08 READ BUFFER 3E 00, 05 READ LONG 3F 00 WRITE LONG 40 00, 01, 05, 08 CHANGE DEFINITION 41 00 WRITE SAME 42 05 READ SUB-CHANNEL 43 05 READ TOC (table of contents) 44 05 READ HEADER 45 05 PLAY AUDIO (10 byte) 47 05 PLAY AUDIO MSF 48 05 PLAY AUDIO TRACK/INDEX 49 05 PLAY TRACK RELATIVE (10 byte) 4B 05 PAUSE/RESUME 4C 00, 01, 05, 08 LOG SELECT 4D 00, 01, 05, 08 LOG SENSE 55 00, 01, 05, 08 MODE SELECT (10 byte) 5A 00, 01, 05, 08 MODE SENSE (10 byte) A5 05 PLAY AUDIO (12 byte) A5 08 MOVE MEDIUM A6 08 EXCHANGE MEDIUM A8 05 READ (12 byte) A9 05 PLAY TRACK RELATIVE (12 byte) AF 05 VERIFY (12 byte) B0 05 SEARCH DATA HIGH (12 byte) B1 05 SEARCH DATA EQUAL (12 byte) B2 05 SEARCH DATA LOW (12 byte) (continued on next page) C-62 HSJ-Series Error Logging Table C-10 (Cont.) SCSI Command Operation Codes ------------------------------------------------------------ Code Supported Device Types (See Table C-9) Description ------------------------------------------------------------ B3 05 SET LIMITS (12 byte) B5 08 REQUEST VOLUME ELEMENT ADDRESS B6 08 SEND VOLUME TAG B8 08 READ ELEMENT STATUS ------------------------------------------------------------ Table C-11 SCSI Buffered Modes Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 0 The target shall not report GOOD status on write commands until the data blocks are actually written on the medium. 1 The target may report GOOD status on write commands as soon as as all the data specified in the write command has been transferred to the target's buffer. One or more blocks may be buffered prior to writing the block(s) to the medium. 2 The target may report GOOD status on write commands as soon as: (1) All the data specified in the write command has been successfully transferred to the target's buffer, and (2) All buffered data from different initiators has been successfully written to the medium. 3 Reserved for future use. 4 Reserved for future use. 5 Reserved for future use. 6 Reserved for future use. 7 Reserved for future use. ------------------------------------------------------------ HSJ-Series Error Logging C-63 Table C-12 SCSI Sense Key Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 0 NO SENSE. Indicates that there is no specific sense key information to be reported for the designated logical unit. This would be the case for a successful command or a command that received CHECK CONDITION or COMMAND TERMINATED status because one of the FM, EOM, or ILI bits is set to one in the ``snsflgs'' field. 1 RECOVERED ERROR. Indicates that the last command completed successfully with some recovery action performed by the target. Details may be determinable by examining the ``info'' field. 2 NOT READY. Indicates that the logical unit addressed cannot be accessed. Operator intervention may be required to correct this condition. 3 MEDIUM ERROR. Indicates that the command terminated with a non-recovered error condition that was probably caused by a flaw in the medium or an error in the recorded data. This sense key may also be returned if the target is unable to distinguish between a flaw in the medium and a specific hardware failure (sense key 4). 4 HARDWARE ERROR. Indicates that the target detected a non-recoverable hardware failure (for example, controller failure, device failure, parity error, and so forth) while performing the command or during a self-test. 5 ILLEGAL REQUEST. Indicates that there was an illegal parameter in the command descriptor block or in the additional parameters supplied as data for some commands (FORMAT UNIT, SEARCH DATA, and so forth). If the target detects an invalid parameter in the command descriptor block, then it shall terminate the command without altering the medium. If the target detects an invalid parameter in the additional parameters supplied as data, then the target may have already altered the medium. This sense key may also indicate that an invalid IDENTIFY message was received. 6 UNIT ATTENTION. Indicates that the removable medium may have been changed or the target has been reset. 7 DATA PROTECT. Indicates that a command that reads or writes the medium was attempted on a block that is protected from this operation. The read or write operation is not performed. 8 BLANK CHECK. Indicates that a write-once device or a sequential-access device encountered blank medium or format-defined end-of-data indication while reading or a write-once device encountered a non-blank medium while writing. 9 Vendor Specific. This sense key is available for reporting vendor specific conditions. A COPY ABORTED. Indicates a COPY, COMPARE, or COPY AND VERIFY command was aborted due to an error condition on the source device, the destination device, or both. B ABORTED COMMAND. Indicates that the target aborted the command. The initiator may be able to recover by trying the command again. C EQUAL. Indicates a SEARCH DATA command has satisfied an equal comparison. D VOLUME OVERFLOW. Indicates that a buffered peripheral device has reached the end-of-partition and data may remain in the buffer that has not been written to the medium. A RECOVER BUFFERED DATA command(s) may be issued to read the unwritten data from the buffer. E MISCOMPARE. Indicates that the source data did not match the data read from the medium. F RESERVED. ------------------------------------------------------------ C-64 HSJ-Series Error Logging Table C-13 SCSI ASC/ASCQ Codes For Direct-Access Devices (such as magnetic disk) ------------------------------------------------------------ ASC Code ASCQ Code Description ------------------------------------------------------------ 00 00 No additional sense information. 00 06 I/O process terminated. 01 00 No index/sector signal. 02 00 No seek complete. 03 00 Peripheral device write fault. 04 00 Logical unit not ready, cause not reportable. 04 01 Logical unit is in process of becoming ready. 04 02 Logical unit not ready, initializing command required. 04 03 Logical unit not ready, manual intervention required. 04 04 Logical unit not ready, format in progress. 06 00 No reference position found. 07 00 Multiple peripheral devices selected. 08 00 Logical unit communication failure. 08 01 Logical unit communication time-out. 08 02 Logical unit communication parity error. 09 00 Track following error. 0A 00 Error log overflow. 0C 01 Write error recovered with auto reallocation. 0C 02 Write error--auto reallocation failed. 10 00 ID CRC or ECC error. 11 00 Unrecovered read error. 11 01 Read retries exhausted. 11 02 Error too long to correct. 11 03 Multiple read errors. 11 04 Unrecovered read error--auto reallocate failed. 11 0A Miscorrected error. 11 0B Unrecovered read error--recommend reassignment. 11 0C Unrecovered read error--recommend rewrite the data. 12 00 Address mark not found for ID field. 13 00 Address mark not found for data field. 14 00 Recorded entity not found. 14 01 Record not found. 15 00 Random positioning error. 15 01 Mechanical positioning error. 15 02 Positioning error detected by read of medium. 16 00 Data synchronization mark error. 17 00 Recovered data with no error correction applied. (continued on next page) HSJ-Series Error Logging C-65 Table C-13 (Cont.) SCSI ASC/ASCQ Codes For Direct-Access Devices (such as magnetic disk) ------------------------------------------------------------ ASC Code ASCQ Code Description ------------------------------------------------------------ 17 01 Recovered data with retries. 17 02 Recovered data with positive head offset. 17 03 Recovered data with negative head offset. 17 05 Recovered data using previous sector ID. 17 06 Recovered data without ECC--data auto-reallocated. 17 07 Recovered data without ECC--recommend reassignment. 17 08 Recovered data without ECC--recommend rewrite. 18 00 Recovered data with error correction applied. 18 01 Recovered data with error correction and retries applied. 18 02 Recovered data--data auto-reallocated. 18 05 Recovered data--recommend reassignment. 18 06 Recovered data--recommend rewrite. 19 00 Defect list error. 19 01 Defect list not available. 19 02 Defect list error in primary list. 19 03 Defect list error in grown list. 1A 00 Parameter list length error. 1B 00 Synchronous data transfer error. 1C 00 Defect list not found. 1C 01 Primary defect list not found. 1C 02 Grown defect list not found. 1D 00 Miscompare during verify operation. 1E 00 Recovered ID with ECC correction. 20 00 Invalid command operation code. 21 00 Logical block address out of range. 22 00 Illegal function (should use 0020, 0024, or 0026) 24 00 Invalid field in CDB. 25 00 Logical unit not supported. 26 00 Invalid field in parameter list. 26 01 Parameter not supported. 26 02 Parameter value invalid. 26 03 Threshold parameters not supported. 27 00 Write protected. 28 00 Not ready to ready transition, medium may have changed. 29 00 Power on, reset, or bus device reset occurred. 29 01 Power on occurred. (continued on next page) C-66 HSJ-Series Error Logging Table C-13 (Cont.) SCSI ASC/ASCQ Codes For Direct-Access Devices (such as magnetic disk) ------------------------------------------------------------ ASC Code ASCQ Code Description ------------------------------------------------------------ 29 02 SCSI bus reset occurred. 29 03 Bus device reset occurred. 2A 00 Parameters changed. 2A 01 Mode parameters changed. 2A 02 Log parameters changed. 2B 00 Copy cannot execute because host cannot disconnect. 2C 00 Command sequence error. 2F 00 Commands cleared by another initiator. 30 00 Incompatible medium installed. 30 01 Cannot read medium--unknown format. 30 02 Cannot read medium - incompatible format. 30 03 Cleaning cartridge installed. 31 00 Medium format corrupted. 31 01 Format command failed. 32 00 No defect spare location available. 32 01 Defect list update failure. 37 00 Rounded parameter. 39 00 Saving parameters not supported. 3A 00 Medium not present. 3D 00 Invalid bits in identify message. 3E 00 Logical unit has not self-configured yet. 3F 00 Target operating conditions have changed. 3F 01 Microcode has been changed. 3F 02 Changed operating definition. 3F 03 Inquiry data has changed. 40 00 Ram failure (should use 8040 through FF40). 41 00 Data path failure (should use 8040 through FF40). 42 00 Power-on or self-test failure (should use 8040 through FF40). 43 00 Message error. 44 00 Internal target failure. 45 00 Select or reselect failure. 46 00 Unsuccessful soft reset. 47 00 SCSI parity error. 48 00 Initiator detected error message received. 49 00 Invalid message error. 4A 00 Command phase error. (continued on next page) HSJ-Series Error Logging C-67 Table C-13 (Cont.) SCSI ASC/ASCQ Codes For Direct-Access Devices (such as magnetic disk) ------------------------------------------------------------ ASC Code ASCQ Code Description ------------------------------------------------------------ 4B 00 Data phase error. 4C 00 Logical unit failed self-configuration. 4E 00 Overlapped commands attempted. 53 00 Media load or eject failed. 53 02 Medium removal prevented. 5A 00 Operator request or state change input (unspecified). 5A 01 Operator medium removal request. 5A 02 Operator selected write protect. 5A 03 Operator selected write permit. 5B 00 Log exception. 5B 01 Threshold condition met. 5B 02 Log counter at maximum. 5B 03 Log list codes exhausted. 5C 00 Rpl status change. 5C 01 Spindles synchronized. 5C 02 Spindles not synchronized. 40 nn Diagnostic failure detected on component nn; where nn identifies a specific target device component (nn range 80 through FF). Refer to documentation provided by the vendor of the target device for a description of the component identified by nn. ------------------------------------------------------------ Table C-14 SCSI ASC/ASCQ Codes For Sequential-Access Devices (such as magnetic tape) ------------------------------------------------------------ ASC Code ASCQ Code Description ------------------------------------------------------------ 00 00 No additional sense information. 00 01 Filemark detected. 00 02 End-of-partition/medium detected. 00 03 Setmark detected. 00 04 Beginning-of-partition/medium detected. 00 05 End-of-data detected. 00 06 I/O process terminated. 03 00 Peripheral device write fault. 03 01 No write current. 03 02 Excessive write errors. 04 00 Logical unit not ready, cause not reportable. 04 01 Logical unit is in process of becoming ready. (continued on next page) C-68 HSJ-Series Error Logging Table C-14 (Cont.) SCSI ASC/ASCQ Codes For Sequential-Access Devices (such as magnetic tape) ------------------------------------------------------------ ASC Code ASCQ Code Description ------------------------------------------------------------ 04 02 Logical unit not ready, initializing command required. 04 03 Logical unit not ready, manual intervention required. 04 04 Logical unit not ready, format in progress. 07 00 Multiple peripheral devices selected. 08 00 Logical unit communication failure. 08 01 Logical unit communication time-out. 08 02 Logical unit communication parity error. 09 00 Track following error. 0A 00 Error log overflow. 0C 00 Write error. 11 00 Unrecovered read error. 11 01 Read retries exhausted. 11 02 Error too long to correct. 11 03 Multiple read errors. 11 08 Incomplete block read. 11 09 No gap found. 11 0A Miscorrected error. 14 00 Recorded entity not found. 14 01 Record not found. 14 02 Filemark or setmark not found. 14 03 End-of-data not found. 14 04 Block sequence error. 15 00 Random positioning error. 15 01 Mechanical positioning error. 15 02 Positioning error detected by read of medium. 17 00 Recovered data with no error correction applied. 17 01 Recovered data with retries. 17 02 Recovered data with positive head offset. 17 03 Recovered data with negative head offset. 18 00 Recovered data with error correction applied. 1A 00 Parameter list length error. 1B 00 Synchronous data transfer error. 20 00 Invalid command operation code. 21 00 Logical block address out of range. 24 00 Invalid field in CDB. 25 00 Logical unit not supported. (continued on next page) HSJ-Series Error Logging C-69 Table C-14 (Cont.) SCSI ASC/ASCQ Codes For Sequential-Access Devices (such as magnetic tape) ------------------------------------------------------------ ASC Code ASCQ Code Description ------------------------------------------------------------ 26 00 Invalid field in parameter list. 26 01 Parameter not supported. 26 02 Parameter value invalid. 26 03 Threshold parameters not supported. 27 00 Write protected. 28 00 Not ready to ready transition, medium may have changed. 29 00 Power on, reset, or bus device reset occurred. 29 01 Power on occurred. 29 02 SCSI bus reset occurred. 29 03 Bus device reset occurred. 2A 00 Parameters changed. 2A 01 Mode parameters changed. 2A 02 Log parameters changed. 2B 00 Copy cannot execute because host cannot disconnect. 2C 00 Command sequence error. 2D 00 Overwrite error on update in place. 2F 00 Commands cleared by another initiator. 30 00 Incompatible medium installed. 30 01 Cannot read medium--unknown format. 30 02 Cannot read medium--incompatible format. 30 03 Cleaning cartridge installed. 31 00 Medium format corrupted. 33 00 Tape length error. 37 00 Rounded parameter. 39 00 Saving parameters not supported. 3A 00 Medium not present. 3B 00 Sequential positioning error. 3B 01 Tape position error at beginning-of-medium. 3B 02 Tape position error at end-of-medium. 3B 08 Reposition error. 3D 00 Invalid bits in identify message. 3E 00 Logical unit has not self-configured yet. 3F 00 Target operating conditions have changed. 3F 01 Microcode has been changed. 3F 02 Changed operating definition. 3F 03 Inquiry data has changed. (continued on next page) C-70 HSJ-Series Error Logging Table C-14 (Cont.) SCSI ASC/ASCQ Codes For Sequential-Access Devices (such as magnetic tape) ------------------------------------------------------------ ASC Code ASCQ Code Description ------------------------------------------------------------ 43 00 Message error. 44 00 Internal target failure. 45 00 Select or reselect failure. 46 00 Unsuccessful soft reset. 47 00 SCSI parity error. 48 00 Initiator detected error message received. 49 00 Invalid message error. 4A 00 Command phase error. 4B 00 Data phase error. 4C 00 Logical unit failed self-configuration. 4E 00 Overlapped commands attempted. 50 00 Write append error. 50 01 Write append position error. 50 02 Position error related to timing. 51 00 Erase failure. 52 00 Cartridge fault. 53 00 Media load or eject failed. 53 01 Unload tape failure. 53 02 Medium removal prevented. 5A 00 Operator request or state change input (unspecified). 5A 01 Operator medium removal request. 5A 02 Operator selected write protect. 5A 03 Operator selected write permit. 5B 00 Log exception. 5B 01 Threshold condition met. 5B 02 Log counter at maximum. 5B 03 Log list codes exhausted. 40 nn Diagnostic failure detected on component nn; where nn identifies a specific target device component (nn range 80 through FF). Refer to documentation provided by the vendor of the target device for a description of the component identified by nn. ------------------------------------------------------------ HSJ-Series Error Logging C-71 Table C-15 SCSI ASC/ASCQ Codes For CDROM Devices. ------------------------------------------------------------ ASC Code ASCQ Code Description ------------------------------------------------------------ 00 00 No additional sense information. 00 06 I/O process terminated. 00 11 Audio play operation in progress. 00 12 Audio play operation paused. 00 13 Audio play operation successfully completed. 00 14 Audio play operation stopped due to error. 00 15 No current audio status to return. 02 00 No seek complete. 04 00 Logical unit not ready, cause not reportable. 04 01 Logical unit is in process of becoming ready. 04 02 Logical unit not ready, initializing command required. 04 03 Logical unit not ready, manual intervention required. 06 00 No reference position found. 07 00 Multiple peripheral devices selected. 08 00 Logical unit communication failure. 08 01 Logical unit communication time-out. 08 02 Logical unit communication parity error. 09 00 Track following error. 09 01 Tracking servo failure. 09 02 Focus servo failure. 09 03 Spindle servo failure. 0A 00 Error log overflow. 11 00 Unrecovered read error. 11 05 L-ec uncorrectable error. 11 06 CIRC unrecovered error. 14 00 Recorded entity not found. 14 01 Record not found. 15 00 Random positioning error. 15 01 Mechanical positioning error. 15 02 Positioning error detected by read of medium. 17 00 Recovered data with no error correction applied. 17 01 Recovered data with retries. 17 02 Recovered data with positive head offset. 17 03 Recovered data with negative head offset. 17 04 Recovered data with retries and/or CIRC applied. 17 05 Recovered data using previous sector id. 18 00 Recovered data with error correction applied. (continued on next page) C-72 HSJ-Series Error Logging Table C-15 (Cont.) SCSI ASC/ASCQ Codes For CDROM Devices. ------------------------------------------------------------ ASC Code ASCQ Code Description ------------------------------------------------------------ 18 01 Recovered data with error correction and retries applied. 18 02 Recovered data--data auto-reallocated. 18 03 Recovered data with CIRC. 18 04 Recovered data with LEC. 18 05 Recovered data--recommend reassignment. 18 06 Recovered data - recommend rewrite. 1A 00 Parameter list length error. 1B 00 Synchronous data transfer error. 20 00 Invalid command operation code. 21 00 Logical block address out of range. 24 00 Invalid field in CDB. 25 00 Logical unit not supported. 26 00 Invalid field in parameter list. 26 01 Parameter not supported. 26 02 Parameter value invalid. 26 03 Threshold parameters not supported. 28 00 Not ready to ready transition, medium may have changed. 29 00 Power on, reset, or bus device reset occurred. 29 01 Power on occurred. 29 02 SCSI bus reset occurred. 29 03 Bus device reset occurred. 2A 00 Parameters changed. 2A 01 Mode parameters changed. 2A 02 Log parameters changed. 2B 00 Copy cannot execute because host cannot disconnect. 2C 00 Command sequence error. 2F 00 Commands cleared by another initiator. 30 00 Incompatible medium installed. 30 01 Cannot read medium--unknown format. 30 02 Cannot read medium--incompatible format. 37 00 Rounded parameter. 39 00 Saving parameters not supported. 3A 00 Medium not present. 3D 00 Invalid bits in identify message. 3E 00 Logical unit has not self-configured yet. 3F 00 Target operating conditions have changed. 3F 01 Microcode has been changed. (continued on next page) HSJ-Series Error Logging C-73 Table C-15 (Cont.) SCSI ASC/ASCQ Codes For CDROM Devices. ------------------------------------------------------------ ASC Code ASCQ Code Description ------------------------------------------------------------ 3F 02 Changed operating definition. 3F 03 Inquiry data has changed. 43 00 Message error. 44 00 Internal target failure. 45 00 Select or reselect failure. 46 00 Unsuccessful soft reset. 47 00 SCSI parity error. 48 00 Initiator detected error message received. 49 00 Invalid message error. 4A 00 Command phase error. 4B 00 Data phase error. 4C 00 Logical unit failed self-configuration. 4E 00 Overlapped commands attempted. 53 00 Media load or eject failed. 53 02 Medium removal prevented. 57 00 Unable to recover table-of-contents. 5A 00 Operator request or state change input (unspecified). 5A 01 Operator medium removal request. 5B 00 Log exception. 5B 01 Threshold condition met. 5B 02 Log counter at maximum. 5B 03 Log list codes exhausted. 63 00 End of user area encountered on this track. 64 00 Illegal mode for this track. 40 nn Diagnostic failure detected on component nn; where nn identifies a specific target device component (nn range 80 through FF). Refer to documentation provided by the vendor of the target device for a description of the component identified by nn. ------------------------------------------------------------ C-74 HSJ-Series Error Logging Table C-16 SCSI ASC/ASCQ Codes For Medium Changer Devices (such as jukeboxes) ------------------------------------------------------------ ASC Code ASCQ Code Description ------------------------------------------------------------ 00 00 No additional sense information. 00 06 I/O process terminated. 02 00 No seek complete. 04 00 Logical unit not ready, cause not reportable. 04 01 Logical unit is in process of becoming ready. 04 02 Logical unit not ready, initializing command required. 04 03 Logical unit not ready, manual intervention required. 06 00 No reference position found. 07 00 Multiple peripheral devices selected. 08 00 Logical unit communication failure. 08 01 Logical unit communication time-out. 08 02 Logical unit communication parity error. 0A 00 Error log overflow. 15 00 Random positioning error. 15 01 Mechanical positioning error. 1A 00 Parameter list length error. 1B 00 Synchronous data transfer error. 20 00 Invalid command operation code. 21 00 Logical block address out of range. 21 01 Invalid element address. 24 00 Invalid field in CDB. 25 00 Logical unit not supported. 26 00 Invalid field in parameter list. 26 01 Parameter not supported. 26 02 Parameter value invalid. 26 03 Threshold parameters not supported. 28 00 Not ready to ready transition, medium may have changed. 28 01 Import or export element accessed. 29 00 Power on, reset, or bus device reset occurred. 29 01 Power on occurred. 29 02 SCSI bus reset occurred. 29 03 Bus device reset occurred. 2A 00 Parameters changed. 2A 01 Mode parameters changed. 2A 02 Log parameters changed. 2C 00 Command sequence error. 2F 00 Commands cleared by another initiator. (continued on next page) HSJ-Series Error Logging C-75 Table C-16 (Cont.) SCSI ASC/ASCQ Codes For Medium Changer Devices (such as jukeboxes) ------------------------------------------------------------ ASC Code ASCQ Code Description ------------------------------------------------------------ 30 00 Incompatible medium installed. 37 00 Rounded parameter. 39 00 Saving parameters not supported. 3A 00 Medium not present. 3B 0D Medium destination element full. 3B 0E Medium source element empty. 3D 00 Invalid bits in identify message. 3E 00 Logical unit has not self-configured yet. 3F 00 Target operating conditions have changed. 3F 01 Microcode has been changed. 3F 02 Changed operating definition. 3F 03 Inquiry data has changed. 43 00 Message error. 44 00 Internal target failure. 45 00 Select or reselect failure. 46 00 Unsuccessful soft reset. 47 00 SCSI parity error. 48 00 Initiator detected error message received. 49 00 Invalid message error. 4A 00 Command phase error. 4B 00 Data phase error. 4C 00 Logical unit failed self-configuration. 4E 00 Overlapped commands attempted. 53 00 Media load or eject failed. 53 02 Medium removal prevented. 5A 00 Operator request or state change input (unspecified). 5A 01 Operator medium removal request. 5B 00 Log exception. 5B 01 Threshold condition met. 5B 02 Log counter at maximum. 5B 03 Log list codes exhausted. 40 nn Diagnostic failure detected on component nn; where nn identifies a specific target device component (nn range 80 through FF). Refer to documentation provided by the vendor of the target device for a description of the component identified by nn. ------------------------------------------------------------ C-76 HSJ-Series Error Logging Table C-17 HSJ30/40 Controller Vendor Specific SCSI ASC/ASCQ Codes ------------------------------------------------------------ ASC Code ASCQ Code Description ------------------------------------------------------------ 3F 85 Test Unit Ready or Read Capacity command failed. 3F 87 Drive failed by a Host Mode Select command. 3F 88 Drive failed due to a deferred error reported by drive. 3F 90 Unrecovered Read/Write error. 3F C0 No response from one or more drives. 3F C2 NV memory and drive metadata indicate conflicting drive configurations. 3F D2 Synchronous Transfer Value differences between drives. 80 03 Fault Manager detected an unknown error code. 80 06 Maximum number of errors for this I/O exceeded. 80 07 Drive reported recovered error without transferring all data. 82 01 No command control structures available. 84 04 Command failed--SCSI ID verification failed. 85 05 Data returned from drive is invalid. 89 00 Request Sense command to drive failed. 8A 00 Illegal command for pass through mode. 8C 04 Data transfer request error. 8F 00 Premature completion of a drive command. 93 00 Drive returned vendor unique sense data. A0 00 Last failure event report. A0 01 Nonvolatile parameter memory component event report. A0 02 Backup battery failure event report. A0 03 Subsystem built-in self-test failure event report. A0 04 Memory system failure event report. A0 05 Failover event report. A1 00 Shelf OK is not properly asserted. A1 01 Unable to clear SWAP interrupt, interrupt disabled. A1 02 Swap interrupt reenabled. A1 03 Asynchronous SWAP detected. B0 00 Command timeout. B0 01 Watchdog timer timeout. D0 01 Disconnect timeout. D0 02 Chip command timeout. D0 03 Byte transfer timeout. D1 00 Bus errors. D1 02 Unexpected bus phase. D1 03 Disconnect expected. D1 04 ID Message not sent. (continued on next page) HSJ-Series Error Logging C-77 Table C-17 (Cont.) HSJ30/40 Controller Vendor Specific SCSI ASC/ASCQ Codes ------------------------------------------------------------ ASC Code ASCQ Code Description ------------------------------------------------------------ D1 05 Synchronous negotiation error. D1 07 Unexpected disconnect. D1 08 Unexpected message. D1 09 Unexpected Tag message. D1 0A Channel busy. D1 0B Device initialization failure, device sense data available. D2 00 Miscellaneous SCSI driver error. D3 00 Drive SCSI chip reported gross error. D4 00 Non-SCSI bus parity error. D5 02 Message Reject received on a valid message. D7 00 Source driver programming error. ------------------------------------------------------------ Table C-18 Last Failure Event Log (Template 01) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 01010302 03EA EXEC$BUGCHECK called with HW flag set. (that is, an unrecoverable hardware detected fault occurred). 0102030A 040A EXEC$BUGCHECK called with HW flag clear (that is, an unrecoverable firmware inconsistency was detected). ------------------------------------------------------------ Table C-19 Failover Event Log (Template 05) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 07030B0A 022A Failover Control detected a receive packet sequence number mismatch. The HSJ30/40s are out of synchronization with each other and are unable to communicate. Note that in this instance the ``last failure code'' and ``last failure parameters'' fields are undefined. 07040B0A 022A Failover Control detected a transmit packet sequence number mismatch. The HSJ30/40s are out of synchronization with each other and are unable to communicate. Note that in this instance the ``last failure code'' and ``last failure parameters'' fields are undefined. 07050064 022A Failover Control received a Last Gasp message from the other HSJ30/40. The other HSJ30/40 is expected to restart itself within a given time period. If it does not, it will be held reset with the ``Kill'' line. (continued on next page) C-78 HSJ-Series Error Logging Table C-19 (Cont.) Failover Event Log (Template 05) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 07060C01 022A Failover Control detected that both HSJ30/40s are acting as SCSI ID 6. Because IDs are determined by hardware, it is unknown which HSJ30/40 is the real SCSI ID 6. Note that in this instance the ``last failure code'' and ``last failure parameters'' fields are undefined. 07070C01 022A Failover Control detected that both HSJ30/40s are acting as SCSI ID 7. Because IDs are determined by hardware, it is unknown which HSJ30/40 is the real SCSI ID 7. Note that in this instance the ``last failure code'' and ``last failure parameters'' fields are undefined. 07080B0A 022A Failover Control was unable to send keepalive communication to the other HSJ30/40. It is assumed that the other HSJ30/40 is hung or not started. Note that in this instance the ``last failure code'' and ``last failure parameters'' fields are undefined. ------------------------------------------------------------ Table C-20 Nonvolatile Parameter Memory Component Event Log (Template 11) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 01032002 012A Nonvolatile parameter memory component EDC check failed; content of the component reset to default settings. ------------------------------------------------------------ Table C-21 Backup Battery Failure Event Log (Template 12) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 02032001 012A Journal SRAM backup battery failure; detected during system restart. The ``memory address'' field contains the starting physical address of the Journal SRAM. 02042001 012A Journal SRAM backup battery failure; detected during periodic check. The ``memory address'' field contains the starting physical address of the Journal SRAM. ------------------------------------------------------------ HSJ-Series Error Logging C-79 Table C-22 Subsystem Built-In Self-Test Failure Event Log (Template 13) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 82012002 020A An unrecoverable error was detected during execution of the NCR710 Subsystem Built-In Self-Test. One of the ports on the controller module has failed; some/all of the attached storage is no longer accessible via this controller. 82022202 020A An unrecoverable error was detected during execution of the Cache Memory/DRAB Chip Subsystem Built-In Self-Test that rendered half of the cache memory unusable. 82032202 020A An unrecoverable error was detected during execution of the Cache Memory/DRAB Chip Subsystem Built-In Self-Test that rendered the entire cache memory unusable. 82042002 020A A spurious interrupt was detected during the execution of a Subsystem Built-In Self-Test. 82052002 020A An unrecoverable error was detected during execution of the HOST PORT Subsystem Test. The system will not be able to communicate with the host. 82062002 020A An unrecoverable error was detected during execution of the UART/DUART Subsystem Test. This will cause the console to be unusable. This will cause failover communications to fail. 82072002 020A An unrecoverable error was detected during execution of the FX Subsystem Test. 82082002 020A An unrecoverable error was detected during execution of the nbuss init Test. ------------------------------------------------------------ Table C-23 Memory System Failure Event Log (Template 14) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 02072201 012A The CACHE Dynamic RAM Controller and Arbitration engine 0 (DRAB0) failed testing performed by the Cache Diagnostics. The ``memory address'' field contains the starting physical address of the CACHEA0 memory. 02082201 012A The CACHE Dynamic RAM Controller and Arbitration engine 1 (DRAB1) failed testing performed by the Cache Diagnostics. The ``memory address'' field contains the starting physical address of the CACHEA1 memory. 020C2201 012A Cache Diagnostics have declared the cache bad during testing. The ``memory address'' field contains the starting physical address of the CACHEA0 memory. ------------------------------------------------------------ C-80 HSJ-Series Error Logging Table C-24 CI Port Event Log (Template 31) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 40016001 006A CI A/B transmit cables are crossed. 40026001 006A CI A/B receive cables are crossed. 4009640A 006A CI Port detected bad Path A upon attempting to transmit a packet. 400A640A 006A CI Port detected bad Path B upon attempting to transmit a packet. 400B640A 006A CI Port detected bad Path A upon attempting to transmit a packet. 400C640A 006A CI Port detected bad Path B upon attempting to transmit a packet. 400D640A 006A CI Port detected bad Path A upon attempting to transmit a packet. 400E640A 006A CI Port detected bad Path B upon attempting to transmit a packet. ------------------------------------------------------------ Table C-25 CI Port/Port Driver Event Log (Template 32) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 4003640A 006A CI Port detected a Dual Receive condition that resulted in the closure of the Virtual Circuit. This error condition will be eliminated in a future CI interface chip. 4004020A 006A Host Interconnect Services detected protocol error upon validating a received packet. 4007640A 006A CI Port detected error upon attempting to transmit a packet. This resulted in the closure of the Virtual Circuit. 403D020A 006A Received packet with an unrecognized PPD opcode. Note that the content of the ``vcstate'' field is undefined in this instance. 40440064 006A Received a PPD NODE_STOP and closed virtual circuit. ------------------------------------------------------------ HSJ-Series Error Logging C-81 Table C-26 CI System Communication Services Event Log (Template 33) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 4015020A 006A Remote SYSAP sent an SCS APPL_MSG but no receive credit was available. 4029010A 006A Illegal connection state. Not in CONNECT_REC connection state when an SCS ACCEPT_REQ is pending. 402A010A 006A Illegal connection state. Not in CONNECT_REC connection state when an SCS REJECT_REQ is pending. 402B010A 006A Illegal connection state. Not in CLOSED connection state when an SCS CONNECT_REQ is pending. 402C010A 006A Illegal connection state. Not in OPEN or DISCONNECT_REC connection state when an SCS DISCONNECT_REQ is pending. 4051020A 006A Received SCS CONNECT_RSP when not in CONNECT_SENT connection state. 4052020A 006A Received SCS CONNECT_RSP when the connection is no longer valid. 4053020A 006A Received SCS ACCEPT_REQ when not in CONNECT_ACK connection state. 4054020A 006A Received SCS ACCEPT_RSP when not in the ACCEPT_SENT connection state. 4055020A 006A Received SCS REJECT_REQ when not in the CONNECT_ACK connection state. 4056020A 006A Received SCS REJECT_RSP when not in the REJECT_SENT connection state. 4057020A 006A Received SCS DISCONNECT_REQ when not in the OPEN, DISCONNECT_SENT or DISCONNECT_ACK connection state. 4058020A 006A Received SCS DISCONNECT_RSP when not in the DISCONNECT_SENT or DISCONNECT_MATCH connection state. 4059020A 006A Received SCS CREDIT_REQ when in the DISCONNECT_REC or DISCONNECT_MATCH connection state. 405A020A 006A Received SCS APPL_MSG when in the DISCONNECT_SENT or DISCONNECT_ACK connection state. 405B020A 006A Received SCS ACCEPT_REQ on a connection that is no longer valid. Note that in this instance if the ``connection ID'' field is zero, the content of the ``VCSTATE'', ``remote node name'', ``remote connection id'', and ``connection state'' fields are undefined. 405C020A 006A Received SCS ACCEPT_RSP on a connection that is no longer valid. Note that in this instance if the ``connection ID'' field is zero, the content of the ``VCSTATE'', ``remote node name'', ``remote connection id'', and ``connection state'' fields are undefined. (continued on next page) C-82 HSJ-Series Error Logging Table C-26 (Cont.) CI System Communication Services Event Log (Template 33) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 405D020A 006A Received SCS REJECT_REQ on a connection that is no longer valid. Note that in this instance if the ``connection ID'' field is zero, the content of the ``VCSTATE'', ``remote node name'', ``remote connection id'', and ``connection state'' fields are undefined. 405E020A 006A Received SCS REJECT_RSP on a connection that is no longer valid. Note that in this instance if the ``connection ID'' field is zero, the content of the ``VCSTATE'', ``remote node name'', ``remote connection id'', and ``connection state'' fields are undefined. 405F020A 006A Received SCS DISCONNECT_REQ on a connection that is no longer valid. Note that in this instance if the ``connection ID'' field is zero, the content of the ``VCSTATE'', ``remote node name'', ``remote connection id'', and ``connection state'' fields are undefined. 4060020A 006A Received SCS DISCONNECT_RSP on a connection that is no longer valid. Note that in this instance if the ``connection ID'' field is zero, the content of the ``VCSTATE'', ``remote node name'', ``remote connection id'', and ``connection state'' fields are undefined. 4061020A 006A Received SCS CREDIT_REQ on a connection that is no longer valid. Note that in this instance if the ``connection ID'' field is zero, the content of the ``VCSTATE'', ``remote node name'', ``remote connection id'', and ``connection state'' fields are undefined. 4062020A 006A Received SCS CREDIT_RSP on a connection that is no longer valid. Note that in this instance if the ``connection ID'' field is zero, the content of the ``VCSTATE'', ``remote node name'', ``remote connection id'', and ``connection state'' fields are undefined. 4063020A 006A Received SCS APPL_MSG on a connection that is no longer valid. Note that in this instance if the ``connection ID'' field is zero, the content of the ``VCSTATE'', ``remote node name'', ``remote connection id'', and ``connection state'' fields are undefined. 4064020A 006A Received an unrecognized SCS message. Note that in this instance if the ``connection ID'' field is zero, the content of the ``VCSTATE'', ``remote node name'', ``remote connection id'', and ``connection state'' fields are undefined. 4065020A 006A Received SCS CONNECT_RSP with an unrecognized status. Connection is broken by Host Interconnect Services. 4066020A 006A Received SCS REJECT_REQ with an invalid reason. 4067020A 006A Received SCS APPL_MSG with no receive credit available. ------------------------------------------------------------ HSJ-Series Error Logging C-83 Table C-27 Device Services Nontransfer Error Event Log (Template 41) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 021B0064 0014 Disk Bad Block Replacement attempt completed for a read of controller metadata from a location outside the user data area of the disk. Note that due to the way Bad Block Replacement is performed on SCSI disk drives, information on the actual replacement blocks is not available to the controller and is therefore not included in the event report. 021A0064 0014 Disk Bad Block Replacement attempt completed for a write of controller metadata to a location outside the user data area of the disk. Note that due to the way Bad Block Replacement is performed on SCSI disk drives, information on the actual replacement blocks is not available to the controller and is therefore not included in the event report. 03010101 006A No command control structures available for disk operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03820101 006A No command control structures available for tape operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03B40101 006A No command control structures available for media loader operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03C80101 006A No command control structures available for operation to a device that is unkown to the controller. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03022002 002A SCSI interface chip command timeout during disk operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03832002 002A SCSI interface chip command timeout during tape operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03B52002 002A SCSI interface chip command timeout during media loader operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03C92002 002A SCSI interface chip command timeout during operation to a device that is unknown to the controller. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03034002 016A Byte transfer timeout during disk operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03844002 016A Byte transfer timeout during tape operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03B64002 016A Byte transfer timeout during media loader operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03CA4002 016A Byte transfer timeout during operation to a device that is unknown to the controller. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03044402 01AA SCSI bus errors during disk operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. (continued on next page) C-84 HSJ-Series Error Logging Table C-27 (Cont.) Device Services Nontransfer Error Event Log (Template 41) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 03854402 01AA SCSI bus errors during tape operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03B74402 01AA SCSI bus errors during media loader operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03D24402 01AA SCSI bus errors during device operation. The device type is unknown to the controller. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03052002 002A Device port SCSI chip reported gross error during disk operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03862002 002A Device port SCSI chip reported gross error during tape operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03B82002 002A Device port SCSI chip reported gross error during media loader operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03CD2002 002A Device port SCSI chip reported gross error during operation to a device that is unknown to the controller. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03062002 008A Non-SCSI bus parity error during disk operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03872002 008A Non-SCSI bus parity error during tape operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03B92002 008A Non-SCSI bus parity error during media loader operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03CE2002 008A Non-SCSI bus parity error during operation to a device that is unknown to the controller. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03070101 01CA Source driver programming error encountered during disk operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03880101 01CA Source driver programming error encountered during tape operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03BA0101 01CA Source driver programming error encountered during media loader operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03CF0101 01CA Source driver programming error encountered during operation to a device that is unknown to the controller. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03080101 01EA Miscellaneous SCSI Port Driver coding error detected during disk operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. (continued on next page) HSJ-Series Error Logging C-85 Table C-27 (Cont.) Device Services Nontransfer Error Event Log (Template 41) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 03890101 01EA Miscellaneous SCSI Port Driver coding error encountered during tape operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03BB0101 01EA Miscellaneous SCSI Port Driver coding error detected during media loader operation. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03CB0101 01EA Miscellaneous SCSI Port Driver coding error detected during operation to a device that is unkown to the controller. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03270101 01EA A disk related error code was reported that was unknown to the Fault Management firmware. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 038A0101 01EA A tape related error code was reported that was unknown to the Fault Management firmware. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03BC0101 01EA A media loader related error code was reported that was unknown to the Fault Management firmware. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03CC0101 01EA A error code was reported that was unknown to the Fault Management firmware. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03D04002 01AA A failure occurred while attempting a SCSI Test Unit Ready or Read Capacity command to a device. The device type is unknown to the controller. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03D14002 006A The identification of a device does not match the configuration information. The actual device type is unknown to the controller. Note that in this instance the ``asc'' and ``ascq'' fields are undefined. 03F00402 00EB The shelf indicated by the ``port'' field is reporting a problem. This could mean one or both of the following: · If the shelf is using dual power supplies, one power supply has failed. · One of the shelf cooling fans has failed. Note that in this instance the ``target'', ``asc'', and ``ascq'' fields are undefined. 03F10502 00EB The SWAP interrupt from the shelf indicated by the ``port'' field can not be cleared. All SWAP interrupts from all ports will be disabled until corrective action is taken. When SWAP interrupts are disabled, both HSJ30/40 controller front panel button presses and removal/insertion of devices are not detected by the HSJ30/40 controller. Note that in this instance the ``target'', ``asc'', and ``ascq'' fields are undefined. (continued on next page) C-86 HSJ-Series Error Logging Table C-27 (Cont.) Device Services Nontransfer Error Event Log (Template 41) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 03F20064 00EB The SWAP interrupts have been cleared and reenabled for all shelves. Note that in this instance the ``port'', ``target'', ``asc'', and ``ascq'' fields are undefined. 03F30064 00EB An asynchronous SWAP interrupt was detected by the HSJ30/40 controller for the shelf indicated by the ``port'' field. Possible reasons for this occurance include: · Device insertion/removal · Shelf power failure · SWAP interrupts reenabled. Note that in this instance the ``target'', ``asc'', and ``ascq'' fields are undefined. 03D3450A 00EB During device initialization, the device reported the SCSI Sense Key NO SENSE. This indicates that there is no specific sense key information to be reported for the designated logical unit. This would be the case for a successful command or a command that received CHECK CONDITION or COMMAND TERMINATED status because one of the FM, EOM, or ILI bits is set to one in the sense data flags field. 03D4450A 00EB During device initialization, the device reported the SCSI Sense Key RECOVERED ERROR. This indicates the last command completed successfully with some recovery action performed by the target. 03D5450A 00EB During device initialization, the device reported the SCSI Sense Key NOT READY. This indicates that the logical unit addressed cannot be accessed. Operator intervention may be required to correct this condition. 03D6450A 00EB During device initialization, the device reported the SCSI Sense Key MEDIUM ERROR. This indicates that the command terminated with a nonrecovered error condition that was probably caused by a flaw in the medium or an error in the recorded data. This sense key may also be returned if the target is unable to distinguish between a flaw in the medium and a specific hardware failure (HARDWARE ERROR sense key). 03D7450A 00EB During device initialization, the device reported the SCSI Sense Key HARDWARE ERROR. This indicates that the target detected a nonrecoverable hardware failure (for example, controller failure, device failure, parity error, and so forth) while performing the command or during a self-test. (continued on next page) HSJ-Series Error Logging C-87 Table C-27 (Cont.) Device Services Nontransfer Error Event Log (Template 41) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 03D8450A 00EB During device initialization, the device reported the SCSI Sense Key ILLEGAL REQUEST. Indicates that there was an illegal parameter in the command descriptor block or in the additional parameters supplied as data for some commands (FORMAT UNIT, SEARCH DATA, and so forth). If the target detects an invalid parameter in the command descriptor block, then it shall terminate the command without altering the medium. If the target detects an invalid parameter in the additional parameters supplied as data, then the target may have already altered the medium. This sense key may also indicate that an invalid IDENTIFY message was received. 03D9450A 00EB During device initialization, the device reported the SCSI Sense Key UNIT ATTENTION. This indicates that the removable medium may have been changed or the target has been reset. 03DA450A 00EB During device initialization, the device reported the SCSI Sense Key DATA PROTECT. This indicates that a command that reads or writes the medium was attempted on a block that is protected from this operation. The read or write operation is not performed. 03DB450A 00EB During device initialization, the device reported the SCSI Sense Key BLANK CHECK. This indicates that a write-once device or a sequential-access device encountered blank medium or format-defined end-of-data indication while reading or a write-once device encountered a non-blank medium while writing. 03DC450A 00EB During device initialization, the device reported a SCSI Vendor Specific Sense Key. This sense key is available for reporting vendor specific conditions. 03DD450A 00EB During device initialization, the device reported the SCSI Sense Key COPY ABORTED. This indicates a COPY, COMPARE, or COPY AND VERIFY command was aborted due to an error condition on the source device, the destination device, or both. 03DE450A 00EB During device initialization, the device reported the SCSI Sense Key ABORTED COMMAND. This indicates the target aborted the command. The initiator may be able to recover by trying the command again. 03DF450A 00EB During device initialization, the device reported the SCSI Sense Key EQUAL. This indicates a SEARCH DATA command has satisfied an equal comparison. 03E0450A 00EB During device initialization, the device reported the SCSI Sense Key VOLUME OVERFLOW. This indicates a buffered peripheral device has reached the end-of-partition and data may remain in the buffer that has not been written to the medium. A RECOVER BUFFERED DATA command(s) may be issued to read the unwritten data from the buffer. 03E1450A 00EB During device initialization, the device reported the SCSI Sense Key MISCOMPARE. This indicates the source data did not match the data read from the medium. (continued on next page) C-88 HSJ-Series Error Logging Table C-27 (Cont.) Device Services Nontransfer Error Event Log (Template 41) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 03E2450A 00EB During device initialization, the device reported a reserved SCSI Sense Key. ------------------------------------------------------------ Table C-28 Disk Transfer Error Event Log (Template 51) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 02090064 0007 A data compare error was detected during the execution of a compare modified READ or WRITE command. Note that in this instance the SCSI Device Sense Data fields, ``cmdopcd'' through ``keyspec'', are undefined. 03094002 000B An unrecoverable disk drive error was encountered while performing work related to disk unit operations. 0328450A 000B The disk device reported standard SCSI Sense Data. 030C4002 014B A Drive failed because a Test Unit Ready command or a Read Capacity command failed. 030D000A 0103 Drive was failed by a Mode Select command received from the host. 030E4002 00EB Drive failed due to a deferred error reported by drive. 030F4002 00E8 Unrecovered Read or Write error. 03104002 002B No response from one or more drives. 0311430A 012B Nonvolatile memory and drive metadata indicate conflicting drive configurations. 0312430A 012B The Synchronous Transfer Value differs between drives in the same storageset. 03134002 012B Maximum number of errors for this data transfer operation exceeded. 03144002 00CB Drive reported recovered error without transferring all data. 03154002 00E8 Data returned from drive is invalid. 03164002 012B Request Sense command to drive failed. 03170064 0016 Illegal command for pass through mode. 03180064 0016 Data transfer request error. 03194002 012B Premature completion of a drive command. 031A4002 002B Command timeout. 031B0101 002B Watchdog timer timeout. 031C4002 002B Disconnect timeout. 031D4002 012B Unexpected bus phase. 031E4002 012B Disconnect expected. (continued on next page) HSJ-Series Error Logging C-89 Table C-28 (Cont.) Disk Transfer Error Event Log (Template 51) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 031F4002 012B ID Message not sent by drive. 03204002 012B Synchronous negotiation error. 03214002 012B The drive unexpectedly disconnected from the SCSI bus. 03224002 012B Unexpected message. 03234002 012B Unexpected Tag message. 03244002 012B Channel busy. 03254002 012B Message Reject received on a valid message. 0326450A 00EB The disk device reported Vendor Unique SCSI Sense Data. ------------------------------------------------------------ Table C-29 Disk Bad Block Replacement Attempt Event Log (Template 57) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 02110064 0014 Disk Bad Block Replacement attempt completed for a read within the user data area of the disk. Note that due to the way Bad Block Replacement is performed on SCSI disk drives, information on the actual replacement blocks is not available to the controller and is therefore not included in the ``Old RBN'' and ``New RBN'' fields. The content of those fields is undefined. 02020064 0014 Disk Bad Block Replacement attempt completed for a write within the user data area of the disk. Note that due to the way Bad Block Replacement is performed on SCSI disk drives, information on the actual replacement blocks is not available to the controller and is therefore not included in the ``Old RBN'' and ``New RBN'' fields. The content of those fields is undefined. ------------------------------------------------------------ C-90 HSJ-Series Error Logging Table C-30 Tape Transfer Error Event Log (Template 61) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 020A0064 0007 A data compare error was detected during the execution of a compare modified READ or WRITE command. Note that in this instance the SCSI Device Sense Data fields, ``cmdopcd'' through ``keyspec'', are undefined. 03644002 000B An unrecoverable tape drive error was encountered while performing work related to tape unit operations. 038B450A 000B The tape device reported standard SCSI Sense Data. 03674002 014B A Drive failed because a Test Unit Ready command or a Read Capacity command failed. 0368000A 0103 Drive was failed by a Mode Select command received from the host. 03694002 00EB Drive failed due to a deferred error reported by drive. 036A4002 00E8 Unrecovered Read or Write error. 036B4002 002B No response from one or more drives. 036C430A 012B Nonvolatile memory and drive metadata indicate conflicting drive configurations. 036D430A 012B The Synchronous Transfer Value differs between drives in the same storageset. 036E4002 012B Maximum number of errors for this data transfer operation exceeded. 036F4002 00CB Drive reported recovered error without transferring all data. 03704002 00E8 Data returned from drive is invalid. 03714002 012B Request Sense command to drive failed. 03720064 0016 Illegal command for pass through mode. 03730064 0016 Data transfer request error. 03744002 012B Premature completion of a drive command. 03754002 002B Command timeout. 03760101 002B Watchdog timer timeout. 03774002 002B Disconnect timeout. 03784002 012B Unexpected bus phase. 03794002 012B Disconnect expected. 037A4002 012B ID Message not sent by drive. 037B4002 012B Synchronous negotiation error. 037C4002 012B The drive unexpectedly disconnected from the SCSI bus. 037D4002 012B Unexpected message. 037E4002 012B Unexpected Tag message. 037F4002 012B Channel busy. 03804002 012B Message Reject received on a valid message. 0381450A 00EB The tape device reported Vendor Unique SCSI Sense Data. ------------------------------------------------------------ HSJ-Series Error Logging C-91 Table C-31 Media Loader Error Event Log (Template 71) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 03964002 0097 An unrecoverable media loader error was encountered while performing work related to media loader operations. 03BD450A 0097 The media changer device reported standard SCSI Sense Data. 03994002 0097 A Drive failed because a Test Unit Ready command or a Read Capacity command failed. 039A000A 0077 Drive was failed by a Mode Select command received from the host. 039B4002 0097 Drive failed due to a deferred error reported by drive. 039C4002 0097 Unrecovered Read or Write error. 039D4002 0037 No response from one or more drives. 039E430A 0097 Nonvolatile memory and drive metadata indicate conflicting drive configurations. 039F430A 0097 The Synchronous Transfer Value differs between drives in the same storageset. 03A04002 0097 Maximum number of errors for this data transfer operation exceeded. 03A14002 0097 Drive reported recovered error without transferring all data. 03A24002 0097 Data returned from drive is invalid. 03A34002 0097 Request Sense command to drive failed. 03A40064 0016 Illegal command for pass through mode. 03A50064 0016 Data transfer request error. 03A64002 0097 Premature completion of a drive command. 03A74002 0037 Command timeout. 03A80101 0037 Watchdog timer timeout. 03A94002 0037 Disconnect timeout. 03AA4002 0097 Unexpected bus phase. 03AB4002 0097 Disconnect expected. 03AC4002 0097 ID Message not sent by drive. 03AD4002 0097 Synchronous negotiation error. 03AE4002 0097 The drive unexpectedly disconnected from the SCSI bus. 03AF4002 0097 Unexpected message. 03B04002 0097 Unexpected Tag message. 03B14002 0097 Channel busy. 03B24002 0097 Message Reject received on a valid message. 03B3450A 0097 The media changer device reported Vendor Unique SCSI Sense Data. ------------------------------------------------------------ C-92 HSJ-Series Error Logging Table C-32 Disk Copy Data Correlation Event Log ``event dependent information'' Values ------------------------------------------------------------ Value Description ------------------------------------------------------------ 00000001 Unable to allocate a sufficient number of DCD Context Blocks to support this host. 00000002 Unable to find an inactive Unit Path Block. 00000003 Unable to find an inactive Source Unit Block. 00000004 Insufficient resources returned by HIS$CONNECT. ------------------------------------------------------------ Table C-33 Executive Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 01000100 Memory allocation failure during executive initialization. 01010100 An interrupt without any handler was triggered. 01020100 Entry on timer que was not of type AQ or BQ. 01030100 Memory allocation for a facility lock failed. 01040100 Memory initialization called with invalid memory type. 01050104 The I960 reported a fault. · Last Failure Parameter[0] contains the PC value. · Last Failure Parameter[1] contains the AC value. · Last Failure Parameter[2] contains the fault type and subtype values. · Last Failure Parameter[3] contains the address of the faulting instruction. 01060100 An attempt was made to do EXEC UART I/O when there is no support for it. 01070100 Timer chip setup failed. 01082004 The core diagnostics reported a fault. · Last Failure Parameter[0] contains the error code value (same as blinking OCP LEDs error code). · Last Failure Parameter[1] contains the address of the fault. · Last Failure Parameter[2] contains the actual data value. · Last Failure Parameter[3] contains the expected data value. 01800080 A powerfail interrupt occured. (continued on next page) HSJ-Series Error Logging C-93 Table C-33 (Cont.) Executive Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 01812088 A processor interrupt was generated by the Master Dynamic RAM Controller and Arbitration engine (DRAB) with an indication that an unrecoverable memory access problem occurred. · Last Failure Parameter[0] contains the Master DRAB Setup Register value. · Last Failure Parameter[1] contains the Master DRAB CSR Register value. · Last Failure Parameter[2] contains the Master DRAB Diagnostic CSR Register value. · Last Failure Parameter[3] contains the Master DRAB Diagnostic Error Register value. · Last Failure Parameter[4] contains the Master DRAB Error Address Register value. · Last Failure Parameter[5] contains the Master DRAB Error Data Register value. · Last Failure Parameter[6] contains the Master DRAB Error Region Register value. · Last Failure Parameter[7] contains the Master DRAB Region Setup Register value. 01822288 A processor interrupt was generated by the CACHEA0 Dynamic RAM Controller and Arbitration engine (DRAB) with an indication that an unrecoverable memory access problem occurred. · Last Failure Parameter[0] contains the CACHEA0 DRAB Setup Register value. · Last Failure Parameter[1] contains the CACHEA0 DRAB CSR Register value. · Last Failure Parameter[2] contains the CACHEA0 DRAB Diagnostic CSR Register value. · Last Failure Parameter[3] contains the CACHEA0 DRAB Diagnostic Error Register value. · Last Failure Parameter[4] contains the CACHEA0 DRAB Error Address Register value. · Last Failure Parameter[5] contains the CACHEA0 DRAB Error Data Register value. · Last Failure Parameter[6] contains the CACHEA0 DRAB Error Region Register value. · Last Failure Parameter[7] contains the CACHEA0 DRAB Region Setup Register value. (continued on next page) C-94 HSJ-Series Error Logging Table C-33 (Cont.) Executive Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 01832288 A processor interrupt was generated by the CACHEA1 Dynamic RAM Controller and Arbitration engine (DRAB) with an indication that an unrecoverable memory access problem occurred. · Last Failure Parameter[0] contains the CACHEA1 DRAB Setup Register value. · Last Failure Parameter[1] contains the CACHEA1 DRAB CSR Register value. · Last Failure Parameter[2] contains the CACHEA1 DRAB Diagnostic CSR Register value. · Last Failure Parameter[3] contains the CACHEA1 DRAB Diagnostic Error Register value. · Last Failure Parameter[4] contains the CACHEA1 DRAB Error Address Register value. · Last Failure Parameter[5] contains the CACHEA1 DRAB Error Data Register value. · Last Failure Parameter[6] contains the CACHEA1 DRAB Error Region Register value. · Last Failure Parameter[7] contains the CACHEA1 DRAB Region Setup Register value. 01842288 A processor interrupt was generated by the CACHEB0 Dynamic RAM Controller and Arbitration engine (DRAB) with an indication that an unrecoverable memory access problem occurred. · Last Failure Parameter[0] contains the CACHEB0 DRAB Setup Register value. · Last Failure Parameter[1] contains the CACHEB0 DRAB CSR Register value. · Last Failure Parameter[2] contains the CACHEB0 DRAB Diagnostic CSR Register value. · Last Failure Parameter[3] contains the CACHEB0 DRAB Diagnostic Error Register value. · Last Failure Parameter[4] contains the CACHEB0 DRAB Error Address Register value. · Last Failure Parameter[5] contains the CACHEB0 DRAB Error Data Register value. · Last Failure Parameter[6] contains the CACHEB0 DRAB Error Region Register value. · Last Failure Parameter[7] contains the CACHEB0 DRAB Region Setup Register value. (continued on next page) HSJ-Series Error Logging C-95 Table C-33 (Cont.) Executive Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 01852288 A processor interrupt was generated by the CACHEB1 Dynamic RAM Controller and Arbitration engine (DRAB) with an indication that an unrecoverable memory access problem occurred. · Last Failure Parameter[0] contains the CACHEB1 DRAB Setup Register value. · Last Failure Parameter[1] contains the CACHEB1 DRAB CSR Register value. · Last Failure Parameter[2] contains the CACHEB1 DRAB Diagnostic CSR Register value. · Last Failure Parameter[3] contains the CACHEB1 DRAB Diagnostic Error Register value. · Last Failure Parameter[4] contains the CACHEB1 DRAB Error Address Register value. · Last Failure Parameter[5] contains the CACHEB1 DRAB Error Data Register value. · Last Failure Parameter[6] contains the CACHEB1 DRAB Error Region Register value. · Last Failure Parameter[7] contains the CACHEB1 DRAB Region Setup Register value. 01860080 A processor interrupt was generated with an indication that the other controller in a dual controller configuration asserted the KILL line to disable this controller. 01870080 A processor interrupt was generated with an indication that the (//) RESET button on the controller module was depressed. 01880080 A processor interrupt was generated with an indication that the program card was removed. 01890080 A powerfail interrupt occurred because of watch dog timeout. 018A0080 Cache region timeout with no other DRAB errors. ------------------------------------------------------------ C-96 HSJ-Series Error Logging Table C-34 Value Added Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 02000100 Initialization code was unable to allocate enough memory to set up the receive data descriptors. 02010100 Initialization code was unable to allocate enough memory to set up the send data descriptors. 02040100 Unable to allocate memory necessary for data buffers. 02050100 Unable to allocate memory for the Free Buffer Array. 02080100 A call to EXEC$ALLOCATE_MEM_ZEROED failed to return memory when populating the disk read DWD stack. 02090100 A call to EXEC$ALLOCATE_MEM_ZEROED failed to return memory when populating the disk write DWD stack. 020A0100 A call to EXEC$ALLOCATE_MEM_ZEROED failed to return memory when populating the tape read DWD stack. 020B0100 A call to EXEC$ALLOCATE_MEM_ZEROED failed to return memory when populating the tape write DWD stack. 020C0100 A call to EXEC$ALLOCATE_MEM_ZEROED failed to return memory when populating the miscellaneous DWD stack. 020E0100 A call to RESMGR$ALLOCATE_SEND_DATA_DESC failed to return a send data descriptor when populating the send_dd_stack. 020F0100 A call to RESMGR$ALLOCATE_RCV_DATA_DESC failed to return a receive data descriptor when populating the rcv_dd_stack. 02100100 A call to EXEC$ALLOCATE_MEM_ZEROED failed to return memory when creating the device services state table. 02170100 Unable to allocate memory for the Free Node Array. 02180100 Unable to allocate memory for the Free Buffer Descriptor Array. 021B0100 A call to EXEC$ALLOCATE_MEM_ZEROED failed to return memory when populating the disk read EDC DWD stack. 021C0100 A call to EXEC$ALLOCATE_MEM_ZEROED failed to return memory when populating the disk write EDC DWD stack. 021D0100 Unable to allocate memory for the Free Buffer Array. 021E0100 Unable to allocate memory for the Free Strip Node Array. 021F0100 Unable to allocate memory for WARPs and RMDs. 02210100 Invalid parameters in CACHE$OFFER_META call. 02220100 No buffer found for CACHE$MARK_META_DIRTY call. 02270104 A callback from DS on a transfer request has returned a bad or illegal DWD status. · Last Failure Parameter[0] contains the DWD Status. · Last Failure Parameter[1] contains the DWD address. · Last Failure Parameter[2] contains the PUB Address. · Last Failure Parameter[3] contains the Device Port. (continued on next page) HSJ-Series Error Logging C-97 Table C-34 (Cont.) Value Added Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 022E0102 An invalid mapping type was specified for a logical unit. · Last Failure Parameter[0] contains the USB address. · Last Failure Parameter[1] contains the Unit Mapping Type. 02360101 Unrecognized state supplied to FOC$SEND callback routine va_dap_ snd_cmd_complete. Last Failure Parameter[0] contains the unrecognized value. 02370102 Unsupported return from HIS$GET_CONN_INFO routine · Last Failure Parameter[0] contains the DD address. · Last Failure Parameter[1] contains the invalid status. 02392084 A processor interrupt was generated by the HSJ30/40 controller 's XOR engine (FX), with no bits set in the CSR to indicate a reason for the interrupt. · Last Failure Parameter[0] contains the FX Control and Status Register (CSR). · Last Failure Parameter[1] contains the FX DMA Indirect List Pointer register (DILP). · Last Failure Parameter[2] contains the FX DMA Page Address register (DADDR). · Last Failure Parameter[3] contains the FX DMA Command and control register (DCMD). 023A2084 A processor interrupt was generated by the HSJ30/40 controller 's XOR engine (FX), indicating an unrecoverable error condition. · Last Failure Parameter[0] contains the FX Control and Status Register (CSR). · Last Failure Parameter[1] contains the FX DMA Indirect List Pointer register (DILP). · Last Failure Parameter[2] contains the FX DMA Page Address register (DADDR). · Last Failure Parameter[3] contains the FX DMA Command and control register (DCMD). 02440100 The logical unit mapping type was detected invalid in va_set_disk_ geometry( ). 02530102 An invalid status was returned from CACHE$LOOKUP_LOCK( ). · Last Failure Parameter[0] contains the DD address. · Last Failure Parameter[1] contains the invalid status. (continued on next page) C-98 HSJ-Series Error Logging Table C-34 (Cont.) Value Added Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 02560102 An invalid status was returned from CACHE$LOOKUP_LOCK( ). · Last Failure Parameter[0] contains the DD address. · Last Failure Parameter[1] contains the invalid status. 02570102 An invalid status was returned from VA$XFER( ) during a operation. · Last Failure Parameter[0] contains the DD address. · Last Failure Parameter[1] contains the invalid status. 025A0102 An invalid status was returned from CACHE$LOOKUP_LOCK( ). · Last Failure Parameter[0] contains the DD address. · Last Failure Parameter[1] contains the invalid status. 025B0102 An invalid mapping type was specified for a logical unit. · Last Failure Parameter[0] contains the USB address. · Last Failure Parameter[1] contains the Unit Mapping Type. 025C0102 An invalid mapping type was specified for a logical unit. · Last Failure Parameter[0] contains the USB address. · Last Failure Parameter[1] contains the Unit Mapping Type. 02620102 An invalid status was returned from CACHE$LOOKUP_LOCK( ). · Last Failure Parameter[0] contains the DD address. · Last Failure Parameter[1] contains the invalid status. 02690102 An invalid status was returned from CACHE$OFFER_WRITE_DATA( ). · Last Failure Parameter[0] contains the DD address. · Last Failure Parameter[1] contains the invalid status. 02720100 A request was made to read a device metadata block with an invalid block type. 02730100 A request was made to write a device metadata block with an invalid block type. 02790102 An invalid status was returned from VA$XFER( ) in a complex read operation. · Last Failure Parameter[0] contains the DD address. · Last Failure Parameter[1] contains the invalid status. (continued on next page) HSJ-Series Error Logging C-99 Table C-34 (Cont.) Value Added Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 027B0102 An invalid status was returned from VA$XFER( ) in a complex ACCESS operation. · Last Failure Parameter[0] contains the DD address. · Last Failure Parameter[1] contains the invalid status. 027D0100 Unable to allocate memory for a Failover Control Block. 027E0100 Unable to allocate memory for a Failover Control Block. 027F0100 Unable to allocate memory for a Failover Control Block. 02800100 Unable to allocate memory for a Failover Control Block. 02820100 Unable to allocate memory for the Dirty Count Array. 02830100 Unable to allocate memory for the Cache Buffer Index Array. 02840100 Unable to allocate memory for the XNode Array. 02850100 Cache was declared bad by the Cache Diagnostics after first Meg was tested. Cannot recover and use local memory because those initial buffers cannot be retrieved. 02860100 Unable to allocate memory for the Fault Management Event Information Packet used by the Cache Manager in generating error logs to the host. 02880100 Invalid FOC Message in cmfoc_snd_cmd. 02890100 Invalid FOC Message in cmfoc_rcv_cmd. 028A0100 Invalid return status from DIAG$CACHE_MEMORY_TEST. 028B0100 Invalid return status from DIAG$CACHE_MEMORY_TEST. 028C0100 Invalid error status given to cache_fail. 028D0100 Invalid number of banks in cache. 028E0100 Cache module is locked when not expected. 028F0100 Invalid status returned from CACHE$CHECK_METADATA. 02900100 Unable to allocate memory for the First Cache Buffer Index Array. 02910100 Invalid metadata combination detected in build_raid_node. 02920100 Unable to handle that many bad dirty pages (exceeded MAX_BAD_ DIRTY). Cache memory is bad. 02950100 Invalid DCA state detected in start_crashover. 02960100 Invalid DCA state detected in start_failover. 02970100 Invalid DCA state detected in init_failover. 029B0100 The host port software has insufficient resources to set up a block data transfer operation for a WRITE command. 029C0100 The host port software has insufficient resources to set up a block data transfer operation for a COMPARE command. ------------------------------------------------------------ C-100 HSJ-Series Error Logging Table C-35 Device Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 03020101 Invalid SCSI direct-access device opcode in miscellaneous command DWD. Last Failure Parameter[0] contains the SCSI command opcode. 03030101 Invalid SCSI sequential-access device opcode in miscellaneous command DWD. Last Failure Parameter[0] contains the SCSI command opcode. 03040101 Invalid SCSI CDROM device opcode in miscellaneous command DWD. Last Failure Parameter[0] contains the SCSI command opcode. 03050101 Invalid SCSI medium changer device opcode in miscellaneous command DWD. Last Failure Parameter[0] contains the SCSI command opcode. 03060101 Invalid SCSI device type in PUB. Last Failure Parameter[0] contains the SCSI device type. 03070101 Invalid CDB Group Code detected during create of miscellaneous command DWD Last Failure Parameter[0] contains the SCSI command opcode. 03080101 Invalid SCSI OPTICAL MEMORY device opcode in miscellaneous command DWD. Last Failure Parameter[0] contains the SCSI command opcode. 030A0100 Error DWD not found in port in_proc_q. 030B0188 A dip error was detected when pcb_busy was set. · Last Failure Parameter[0] contains the PCB reg710_ptr value. · Last Failure Parameter[1] contains the new info NULL - SSTAT0 - DSTAT - ISTAT. · Last Failure Parameter[2] contains the PCB copy of the 710 DBC register. · Last Failure Parameter[3] contains the PCB copy of the 710 DNAD register. · Last Failure Parameter[4] contains the PCB copy of the 710 DSP register. · Last Failure Parameter[5] contains the PCB copy of the 710 DSPS register. · Last Failure Parameter[6] contains the PCB copies of the 710 SSTAT2/SSTAT1/SSTAT0/DSTAT registers. · Last Failure Parameter[7] contains the PCB copies of the 710 LCRC/RESERVED/ISTAT/DFIFO registers. 03150100 More DBDs than allowded for in mask. 031E0100 Cannot find in_error DWD on in-process queue. 031F0100 Either DWD_PTR is null or bad value in DSPS. 03280100 SCSI CDB contains an invalid group code for a transfer command. 03290100 The required error information packet (EIP) or device work descriptor (DWD) were not supplied to the Device Services error logging code. 032A0100 HIS$GET_CONN_INFO( ) returned an unexpected completion code. 032B0100 A Device Work Discriptor (DWD) was supplied with a NULL Physical Unit Block (PUB) pointer. (continued on next page) HSJ-Series Error Logging C-101 Table C-35 (Cont.) Device Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 03320101 An invalid code was passed to the error recovery thread in the error_stat field of the PCB. Last Failure Parameter[0] contains the PCB error_stat code. 03330188 A parity error was detected by a 710 while sending data out onto the SCSI bus. · Last Failure Parameter[0] contains the PCB reg710_ptr value. · Last Failure Parameter[1] contains the PCB copy of the 710 TEMP register. · Last Failure Parameter[2] contains the PCB copy of the 710 DBC register. · Last Failure Parameter[3] contains the PCB copy of the 710 DNAD register. · Last Failure Parameter[4] contains the PCB copy of the 710 DSP register. · Last Failure Parameter[5] contains the PCB copy of the 710 DSPS register. · Last Failure Parameter[6] contains the PCB copies of the 710 SSTAT2/SSTAT1/SSTAT0/DSTAT registers. · Last Failure Parameter[7] contains the PCB copies of the 710 LCRC/RESERVED/ISTAT/DFIFO registers. 03350188 The TEA (bus fault) signal was asserted into a 710. · Last Failure Parameter[0] contains the PCB reg710_ptr value. · Last Failure Parameter[1] contains the PCB copy of the 710 TEMP register. · Last Failure Parameter[2] contains the PCB copy of the 710 DBC register. · Last Failure Parameter[3] contains the PCB copy of the 710 DNAD register. · Last Failure Parameter[4] contains the PCB copy of the 710 DSP register. · Last Failure Parameter[5] contains the PCB copy of the 710 DSPS register. · Last Failure Parameter[6] contains the PCB copies of the 710 SSTAT2/SSTAT1/SSTAT0/DSTAT registers. · Last Failure Parameter[7] contains the PCB copies of the 710 LCRC/RESERVED/ISTAT/DFIFO registers. (continued on next page) C-102 HSJ-Series Error Logging Table C-35 (Cont.) Device Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 03360188 A 710's host bus watchdog timer expired. · Last Failure Parameter[0] contains the PCB reg710_ptr value. · Last Failure Parameter[1] contains the PCB copy of the 710 TEMP register. · Last Failure Parameter[2] contains the PCB copy of the 710 DBC register. · Last Failure Parameter[3] contains the PCB copy of the 710 DNAD register. · Last Failure Parameter[4] contains the PCB copy of the 710 DSP register. · Last Failure Parameter[5] contains the PCB copy of the 710 DSPS register. · Last Failure Parameter[6] contains the PCB copies of the 710 SSTAT2/SSTAT1/SSTAT0/DSTAT registers. · Last Failure Parameter[7] contains the PCB copies of the 710 LCRC/RESERVED/ISTAT/DFIFO registers. 03370108 A 710 detected an illegal script instruction. · Last Failure Parameter[0] contains the PCB reg710_ptr value. · Last Failure Parameter[1] contains the PCB copy of the 710 TEMP register. · Last Failure Parameter[2] contains the PCB copy of the 710 DBC register. · Last Failure Parameter[3] contains the PCB copy of the 710 DNAD register. · Last Failure Parameter[4] contains the PCB copy of the 710 DSP register. · Last Failure Parameter[5] contains the PCB copy of the 710 DSPS register. · Last Failure Parameter[6] contains the PCB copies of the 710 SSTAT2/SSTAT1/SSTAT0/DSTAT registers. · Last Failure Parameter[7] contains the PCB copies of the 710 LCRC/RESERVED/ISTAT/DFIFO registers. (continued on next page) HSJ-Series Error Logging C-103 Table C-35 (Cont.) Device Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 03380188 A 710's DSTAT register contains multiple asserted bits, or an invalidly asserted bit, or both. · Last Failure Parameter[0] contains the PCB reg710_ptr value. · Last Failure Parameter[1] contains the PCB copy of the 710 TEMP register. · Last Failure Parameter[2] contains the PCB copy of the 710 DBC register. · Last Failure Parameter[3] contains the PCB copy of the 710 DNAD register. · Last Failure Parameter[4] contains the PCB copy of the 710 DSP register. · Last Failure Parameter[5] contains the PCB copy of the 710 DSPS register. · Last Failure Parameter[6] contains the PCB copies of the 710 SSTAT2/SSTAT1/SSTAT0/DSTAT registers. · Last Failure Parameter[7] contains the PCB copies of the 710 LCRC/RESERVED/ISTAT/DFIFO registers. 03390108 An unknown interrupt code was found in a 710's DSPS register. · Last Failure Parameter[0] contains the PCB reg710_ptr value. · Last Failure Parameter[1] contains the PCB copy of the 710 TEMP register. · Last Failure Parameter[2] contains the PCB copy of the 710 DBC register. · Last Failure Parameter[3] contains the PCB copy of the 710 DNAD register. · Last Failure Parameter[4] contains the PCB copy of the 710 DSP register. · Last Failure Parameter[5] contains the PCB copy of the 710 DSPS register. · Last Failure Parameter[6] contains the PCB copies of the 710 SSTAT2/SSTAT1/SSTAT0/DSTAT registers. · Last Failure Parameter[7] contains the PCB copies of the 710 LCRC/RESERVED/ISTAT/DFIFO registers. 033C0101 An invalid code was seen by the error recovery thread in the er_funct_step field of the PCB. Last Failure Parameter[0] contains the PCB er_funct_ step code. (continued on next page) C-104 HSJ-Series Error Logging Table C-35 (Cont.) Device Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 033E0108 An attempt was made to restart a 710 at the SDP DBD. · Last Failure Parameter[0] contains the PCB reg710_ptr value. · Last Failure Parameter[1] contains the PCB copy of the 710 TEMP register. · Last Failure Parameter[2] contains the PCB copy of the 710 DBC register. · Last Failure Parameter[3] contains the PCB copy of the 710 DNAD register. · Last Failure Parameter[4] contains the PCB copy of the 710 DSP register. · Last Failure Parameter[5] contains the PCB copy of the 710 DSPS register. · Last Failure Parameter[6] contains the PCB copies of the 710 SSTAT2/SSTAT1/SSTAT0/DSTAT registers. · Last Failure Parameter[7] contains the PCB copies of the 710 LCRC/RESERVED/ISTAT/DFIFO registers. 033F0108 An EDC error was detected on a read of a soft-sectored device - path not yet implemented. · Last Failure Parameter[0] contains the PCB reg710_ptr value. · Last Failure Parameter[1] contains the PCB copy of the 710 TEMP register. · Last Failure Parameter[2] contains the PCB copy of the 710 DBC register. · Last Failure Parameter[3] contains the PCB copy of the 710 DNAD register. · Last Failure Parameter[4] contains the PCB copy of the 710 DSP register. · Last Failure Parameter[5] contains the PCB copy of the 710 DSPS register. · Last Failure Parameter[6] contains the PCB copies of the 710 SSTAT2/SSTAT1/SSTAT0/DSTAT registers. · Last Failure Parameter[7] contains the PCB copies of the 710 LCRC/RESERVED/ISTAT/DFIFO registers. 03410101 Invalid SCSI device type in PUB. Last Failure Parameter[0] contains the PUB SCSI device type. (continued on next page) HSJ-Series Error Logging C-105 Table C-35 (Cont.) Device Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 03420188 A UDC interrupt could not be associated with either a DWD or the non- callable scripts. · Last Failure Parameter[0] contains the PCB reg710_ptr value. · Last Failure Parameter[1] contains the PCB copy of the 710 TEMP register. · Last Failure Parameter[2] contains the PCB copy of the 710 DBC register. · Last Failure Parameter[3] contains the PCB copy of the 710 DNAD register. · Last Failure Parameter[4] contains the PCB copy of the 710 DSP register. · Last Failure Parameter[5] contains the PCB copy of the 710 DSPS register. · Last Failure Parameter[6] contains the PCB copies of the 710 SSTAT2/SSTAT1/SSTAT0/DSTAT registers. · Last Failure Parameter[7] contains the PCB copies of the 710 LCRC/RESERVED/ISTAT/DFIFO registers. 03470100 Insufficient memory available for static structure allocation. 03480100 Insufficient memory available for static structure allocation. 03490100 DWDs exhausted. 034A2080 Diagnostics report all NCR710s are broken. ------------------------------------------------------------ C-106 HSJ-Series Error Logging Table C-36 Fault Manager Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 04010101 The requestor ID component of the instance code passed to FM$REPORT_ EVENT is larger than the maximum allowed for this environment. Last Failure Parameter[0] contains the instance code value. 04020102 The requestor 's error table index passed to FM$REPORT_EVENT is larger than the maximum allowed for this requestor. · Last Failure Parameter[0] contains the instance code value. · Last Failure Parameter[1] contains the requester error table index value. 04030102 The USB index supplied in the EIP is larger than the maximum number of USBs. · Last Failure Parameter[0] contains the instance code value. · Last Failure Parameter[1] contains the USB index value. 04040103 The event log format found in V_fm_template_table is not supported by the Fault Manager. The bad format was discovered while trying to fill in a supplied eip. · Last Failure Parameter[0] contains the instance code value. · Last Failure Parameter[1] contains the format code value. · Last Failure Parameter[2] contains the requester error table index value. 04050100 The Fault Manager could not allocate memory for its Event Information Packet (EIP) buffers. 04060100 The Fault Manager could not allocate a Datagram HTB in its initialization routine. 04070103 There is more EIP information than will fit into a datagram. The requestor specific size is probably too large. · Last Failure Parameter[0] contains the instance code value. · Last Failure Parameter[1] contains the format code value. · Last Failure Parameter[2] contains the requester error table index value. 04080102 The event log format found in the already-built EIP is not supported by the Fault Manager. The bad format was discovered while trying to copy the EIP information into a datagram HTB. · Last Failure Parameter[0] contains the format code value. · Last Failure Parameter[1] contains the instance code value. 04090100 The caller of FM$CANCEL_EVENT_NOTIFICATION passed an address of an event notification routine that does not match the address of any routines for which event notification is enabled. (continued on next page) HSJ-Series Error Logging C-107 Table C-36 (Cont.) Fault Manager Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 040D0100 FM$ENABLE_EVENT_NOTIFICATION was called to enable EIP notification but the specified routine was already enabled to receive EIP notification. 040F0102 The eip->generic.mscp1.flgs field of the EIP passed to FM$REPORT_ EVENT contains an invalid flag. · Last Failure Parameter[0] contains the instance code value. · Last Failure Parameter[1] contains the value supplied in the eip- >generic.mscp1.flgs field. ------------------------------------------------------------ Table C-37 Dual Universal Asynchronous Receiver/Transmitter Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 06010100 The DUART was unable to allocate enough memory to establish a connection to the CLI. 06020100 A port other than terminal port A was referred to by a set terminal characteristics command. This is illegal. 06030100 A DUP question or default question message type was passed to the DUART driver, but the pointer to the input area to receive the response to the question was NULL. ------------------------------------------------------------ Table C-38 Failover Control Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 07010100 All available slots in the FOC notify table are filled. 07020100 FOC$CANCEL_NOTIFY( ) was called to disable notification for a rtn that did not have notification enabled. 07030100 Unable to start the Failover Control Timer before main loop. 07040100 Unable to restart the Failover Control Timer. 07050100 Unable to allocate flush buffer. 07060100 Unable to allocate active receive FCB. ------------------------------------------------------------ C-108 HSJ-Series Error Logging Table C-39 Nonvolatile Parameter Memory Failover Control Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 08010101 A remote state change was received from the FOC thread that NVFOC does not recognize. Last Failure Parameter[0] contains the unrecognized state value. 08020100 No memory could be allocated for a NVFOC information packet. 08030101 Work received on the S_nvfoc_bque did not have a NVFOC work ID. Last Failure Parameter[0] contains the ID type value that was received on the NVFOC work queue. 08040101 Unknown work value received by the S_nvfoc_bque. Last Failure Parameter[0] contains the unknown work value 08050100 An unlock was received and the controller was not locked by the other controller. 08060100 A read write command was received when the NV memory was not locked. 08070100 A write to NV memory was received while not locked. 08080000 The other controller requested this controller to restart. 08090010 The other controller requested this controller to shutdown. 080A0000 The other controller requested this controller to selftest. 080B0100 Could not get enough memory to build a FCB to send to the remote routines on the other controller. 080C0100 Could not get enough memory for FCBs to receive information from the other controller. 080D0100 Could not get enough memory to build a FCB to reply to a request from the other controller. 080E0101 An out-of-range receiver ID was received by the NVFOC communication utility (master send to slave send ACK). Last Failure Parameter[0] contains the bad ID value. 080F0101 An out-of-range receiver ID was received by the NVFOC communication utility (received by master). Last Failure Parameter[0] contains the bad ID value. 08100101 A call to NVFOC$TRANSACTION had a from field (id) that was out of range for the NVFOC communication utility. Last Failure Parameter[0] contains the bad ID value. 08110101 NVFOC tried to defer more than one FOC send. Last Failure Parameter[0] contains the master ID of the connection that had the multiple delays. 08120100 Unable to lock other controller 's NVmemory despite the fact that the running and handshake_complete flags are set. 08130100 Could not allocate memory to build a callback context block on an unlock NVmemory call. 08140100 Could not allocate memory to build a workblock to queue to the NVFOC thread. 08150100 A lock was requested by the other controller but the memory is already locked by the other controller. 08160100 A request to clear the remote configuration was received but the memory was not locked. 08170100 A request to read the next configuration was received but the memory was not locked. (continued on next page) HSJ-Series Error Logging C-109 Table C-39 (Cont.) Nonvolatile Parameter Memory Failover Control Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 08180100 Could not get enough memory for FLS FCBs to receive information from the other controller. 08190100 An unlock command was received when the NV memory was not locked. 081A0100 Unable to allocate memory for remote work 081B0101 Bad remote work received on remote work queue Last Failure Parameter[0] contains the ID type value that was received on the NVFOC remote work queue. ------------------------------------------------------------ Table C-40 Command Line Interpreter Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 20010100 The action for work on the CLI queue should be CLI_CONNECT, CLI_ COMMAND_IN or CLI_PROMPT. If it is not one of these three, this bugcheck will result. 20020100 The FAO returned a non-successful response. This will only happen if a bad format is detected or the formatted string overflows the output buffer. 20030100 The type of work received on the CLI work queue was not of type CLI. 20070100 A work item of an unknown type was placed on the CLI's DUP Virtual Terminal thread work queue by the CLI. 20080000 This controller requested this controller to restart. 20090010 This controller requested this controller to shut down. 200A0000 This controller requested this controller to self-test. 200B0100 Could not get enough memory for FCBs to receive information from the other controller. 200C0100 After a CLI command the NV memory was still locked. The CLI should always unlock NV memory when the command is complete (if it had an error or not). 200D0101 After many calls to DS$PORT_BLOCKED, we never got a FALSE status back (which signals that nothing is blocked). Last Failure Parameter[0] contains the port number (1-n) that we were waiting on to be unblocked. ------------------------------------------------------------ C-110 HSJ-Series Error Logging Table C-41 Host Interconnect Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 40000101 An unrecognized CI opcode was received by HIS. These packets are packets with CI opcodes recognized by the port but not by HIS. Last Failure Parameter[0] contains the CI opcode value. 40150100 LOCAL VC Timer in unexpected state. 40280100 Failed to allocate Buffer Name Table. 40290100 Failed to allocate ACB. 402A0100 Failed to allocate ID member template. 402B0100 Failed to allocate DG HTBs. 402C0100 Failed to allocate message HTBs. 402D0101 S_max_node greater than MAX_VC_ENTRIES. Last Failure Parameter[0] contains the S_ci_max_nodes value. 402E0101 S_max_node not set to valid value (8, 16, 32, 64, 128, 256). Last Failure Parameter[0] contains the S_ci_max_nodes value. 402F0100 Failure to allocate a HIS EIP structure 40300100 Failure in memory allocation 40510100 htb_id type not DG, when attempting to deallocate DG HTB. 40520100 htb_id type not RCV_SND, when attempting to dealloc recv queue HTB. 40530100 htb_id type not RCV_SND, when attempting to dealloc SCS queue HTB. 40560100 Failed to find a virtual circuit entry for CCB during his_close_connection routine. 407B0100 SCS command timeout unexpectedly inactive during SCS Accept Request. 407C0100 SCS command timeout unexpectedly inactive during SCS Reject Request. 408E0100 Message receive queue count disagrees with # HTBs on the queue. 408F0100 Unrecognized HTB ID type. 40900100 htb_id type not DG, when attempting to xmit DG HTB. 40930100 Message receive queue count disagrees with # HTBs on the queue. 40950100 Create transfer request with 0-byte count 40960100 Create transfer request with 0-byte count 40970100 Create transfer request with 0-byte count 40980100 Create transfer request with 0-byte count 409C0100 Illegal return value from HIS$MAP. 409D0100 Illegal return value from HIS$MAP. 40B40101 Invalid value in max_nodes field of se_params structure. Last Failure Parameter[0] contains the max_nodes field value. ------------------------------------------------------------ HSJ-Series Error Logging C-111 Table C-42 Host Interconnect Port Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 42000100 Cmpl_main routine found invalid port transmit status. 42020100 Cannot start timer. 42030100 Cannot restart work timer. 42060100 HP_INIT could not allocate initial buffers 420B0100 HP_INIT could not allocate initial bufs for path a dl_ctl table 42332080 Receive_main found destination address in the rcv packet does not match node address. 42340100 HP could not allocate buffers for I/O rundown in VC Close. 42382080 Ci_isr found that the YACI hardware had invalid transmit status on Path A, no bits set. 42392080 Ci_isr found that the YACI hardware had invalid transmit status on Path B, no bits set. 423A2080 CI_ISR found the abort bit set with out any valid reason; Path A. 423B2080 CI_ISR found transmit parity error without abort bit set; Path A. 423C2080 CI_ISR found buffer underflow without abort bit set; Path A. 423D2080 CI_ISR found the abort bit set with out any valid reason; Path B. 423E2080 CI_ISR found transmit parity error without abort bit set; Path B. 423F2080 CI_ISR found buffer underflow without abort bit set; Path B. 42442080 Ci_isr found that yaci hardware had a parity error. 42452080 Ci_isr found that yaci hardware had a bus timeout error. 42472080 Ci_isr found Data parity on Transmit Path A. 42482080 Ci_isr found Data parity on Transmit Path B. 424B0001 Ci_isr found Host Reset on Path A. Last Failure Parameter[0] contains the node number of the resetting node. 424C0001 Ci_isr found Host Reset on Path B. Last Failure Parameter[0] contains the node number of the resetting node. 424D2080 Ci_isr found Fetch parity on Transmit Path A. 424E2080 Ci_isr found Fetch parity on Transmit Path B. ------------------------------------------------------------ C-112 HSJ-Series Error Logging Table C-43 Disk and Tape MSCP Server Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 60000100 Invalid return value from routine HIS$PREPARE_MSG_XMIT, processing write command. 60010100 Invalid return value from routine HIS$PREPARE_MSG_XMIT, processing read command. 60030100 Invalid return value from routine HIS$XMIT_APPL_MSG, processing completed non-automatic end message. 60040100 Invalid return value from routine HIS$XFER_BLOCK_DATA, processing return of Write History Log to host buffers. 60050100 Invalid return value from routine HIS$CONNECT, while DCD attempting to establish connection to a remote subsystem. 60060100 Invalid return value from routine HIS$XMIT_APPL_MSG, while dmscp_ dcd_send_cmd attempting to send a command to a remote subsystem. 60070100 Invalid return value from routine HIS$MAP, while dmscp_dcd_allocate_bh attempting to map a buffer. 60080100 Invalid return value from routine HIS$XMIT_APPL_MSG, while dmscp_ dcd_src_gcs_send attempting to send a GCS command to a remote subsystem. 60090100 Invalid return value from routine HIS$DISCONNECT, while dmscp_dcd_ comm_path_event attempting to disconnect a remote source connection. 600B0100 Invalid return value from routine HIS$PREPARE_MSG_XMIT, processing TMSCP Write, Read or Compare Host Data command. 600C0100 Invalid return value from routine RESMGR$ALLOCATE_DATA_ SEGMENT. 600D0100 Opcode field in command being aborted is not valid. 600E0100 Opcode of command to be initiated is invalid. 600F0100 Opcode of command to be initiated is invalid. 60100100 Opcode field in non-sequential command being inititated is invalid. 60110100 Opcode of command to be initiated is invalid. 60120100 Opcode of TMSCP command to be aborted is invalid. 60130100 tmscp_clear_cdl_cmpl_rtn detected an unexpected opcode. 60140100 tmscp_clear_cdl_cmpl_rtn detected an unexpected opcode. 60150100 VA$CHANGE_STATE failed to change the SW Write protect when requested to do so as part of the Disk Set Unit Characteristics command. 60160100 VA$CHANGE_STATE failed to change the SW Write protect when requested to do so as part of the Tape Set Unit Characteristics command. 60170100 Invalid type in entry of long interval work queue. 60180100 mscp_short_interval found an Invalid type in entry of long interval work queue. 60190100 dmscp_dcd_send_cmd found that the SIWI Work Item code supplied is unrecognized or invalid in this context during DCD inhibited processing. 601A0100 dmscp_dcd_send_cmd found that the SIWI Work Item code supplied is unrecognized or invalid in this context during HIS$XMIT_APPL_MSG failure processing. 601B0100 Invalid EVENT_CODE parameter in call to dmscp_connection_event. (continued on next page) HSJ-Series Error Logging C-113 Table C-43 (Cont.) Disk and Tape MSCP Server Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 601C0100 Invalid EVENT_CODE parameter in call to tmscp_connection_event. 601D0100 Invalid EVENT_CODE parameter in call to dmscp_dcd_comm_path_event. 601E0100 Invalid EVENT_CODE parameter in call to dmscp_dcd_comm_path_event. 601F0100 Invalid EVENT_CODE parameter in call to mscp_do_disconnect. 60250100 An attempt was about to be made to return a progress indicator to the host that was 0xFFFFFFFF, the only invalid value. 60260100 An WH_DAF command was requested to be performed by the wrong process. 60270100 A non-immediate WHM operation was passed to the dmscp_exec_whm_ immediate routine. 60280100 This routine found an invalid xfer_state so cannot continue. 60290100 HIS did not allocate an HTB when there should have been one reserved for this connection as determined by mscp_rcv_listen. 602A0100 HIS did not allocate an HTB when there should have been one reserved for this connection as determined by dmscp_dcd_src_gcs_send. 602B0100 HIS did not allocate an HTB when there should have been one reserved for this connection as determined by dmscp_dcd_comm_path_event. 602C0100 When trying to put THE extra send-HTB on the connections send_htb_list there was already one on the queue. 602D0100 The VA$CHANGE_STATE service did not set the Software write protect as requested (for disk). 602E0100 The VA$CHANGE_STATE service did not set the Software write protect as requested (for tape). 603B0100 Initial HIS$LISTEN call for MSCP$DISK was unsuccessful. 603C0100 Initial HIS$LISTEN call for MSCP$TAPE was unsuccessful. 603F0100 dmscp_dcd_send_cmd received a command on an idle remote source connection that is no longer valid. 60400100 Unrecognized or invalid in this context return value from routine RESMGR$ALLOCATE_DATA_SEGMENT, while dmscp_dcd_allocate_ dseg attempting to allocate a data segment. 60410100 Unrecognized or invalid in this context return value from routine RESMGR$ALLOCATE_DATA_BUFFERS, while dmscp_dcd_allocate_ dbuf attempting to allocate a data buffer. 60420100 dmscp_dcd_rmte_end_msg was unable to find a command message that corresponds to end message it is currently processing. 60430100 dmscp_dcd_src_gcs_send was entered even though remote connection lost is indicated. This condition should not occur because the command timer is deactivated when a connection is lost (and the server is running at the same priority as HIS and cannot invalidate a connection). 60440100 dmscp_dcd_src_gcs_cmpl found the command being GCSed is no longer at the head of the remote connection's queue. 60450100 dmscp_dcd_errlog_rvc found that an error log is not associated with a command, internal miscellaneous error logs are assumed to not be associated with a connection and remote miscellaneous error logs generation was not requested. (continued on next page) C-114 HSJ-Series Error Logging Table C-43 (Cont.) Disk and Tape MSCP Server Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 60460100 dmscp_dcd_elrt_scc_send was entered to issue a remote source connection SCC but was unable to find an available HTB on the connection's htb_list. With no active DCDs the connection should always have HTBs available. 60480100 tmscp_suc_avl_cmpl_rtn found the unit not in the available state. 60490100 tmscp_clear_cdl_cmpl_rtn found the state change failed. 604A0100 tmscp_clear_cdl_cmpl_rtn found the state change failed. 604B0100 Subroutine process_event returned a value to dmscp_dcd_comm_path_ event that indicates that an internal disconnect request occurred while processing an immediate communications event. 604D0100 Subroutine process_event returned a value to dmscp_dcd_comm_path_ event that indicates that a connection established event occurred while no DCD commands were active. 604F0100 tmscp_set_cmpl_rtn found the state change failed. 60500100 dmscp_dcd_op_cmpl found an unrecognized P_STS value in a DCD HTB status field. 60550100 mscp_initialize unable to get LOCAL STATIC memory from exec for use as a local connection ITB. 60560100 mscp_initialize unable to get LOCAL STATIC memory from exec for use as an AVAILABLE ITB. 60570100 mscp_initialize unable to get LOCAL STATIC memory from exec for use as an AVAILABLE state change ITB. 60580100 mscp_initialize unable to get LOCAL STATIC memory from exec for use as a state change ITB. 605D0100 tmscp_onl_cleanup_rtn detected a failure in enabling variable speed mode suppression. 605E0100 tmscp_suc_cmpl_rtn detected a failure in enabling variable speed mode suppression. 605F0100 tmscp_suc_cmpl_rtn detected a failure in enabling variable speed mode suppression. 60610100 mscp_initialize unable to get BUFFER STATIC memory from exec for use as Write History Logs. 60620100 mscp_initialize unable to get LOCAL STATIC memory from exec for use as Write History Log Allocation Failure Lists. 60640100 Invalid condition when there exists no unused Write History Log Entries. 60650100 Attempting to block incoming requests for the tape/loader when it was unnexpectedly found already blocked. 60660100 Loader boundary block request to stall incoming requests to the tape/loader unit was not setup as expected. 60670100 Invalid return value from routine HIS$XMIT_APPL_MSG. 60680100 VA$ENABLE_NOTIFICATION failed with insufficient resources at init time. 606B0100 mscp_foc_receive_cmd detected that the message sent from the other controller had an illegal usb index. 606C0100 mscp_foc_receive_cmd detected that the message sent from the other controller had an illegal exclusive access state. (continued on next page) HSJ-Series Error Logging C-115 Table C-43 (Cont.) Disk and Tape MSCP Server Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 606D0100 FOC provided mscp_foc_send_cmpl_rtn with an invalid status for the FOC$SEND transmit command completion. 606E0100 FOC provided mscp_foc_send_rsp_done with an invalid transmit status for the FOC$SEND transmit response completion. ------------------------------------------------------------ Table C-44 Diagnostics and Utilities Protocol Server Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 61020100 HIS$LISTEN call failed with INSUFFICIENT_RESOURCES. 61090100 LISTEN_CONNECTION_ESTABLISHED event from HIS specified a connection ID for a connection we already know about. 610C0100 HIS has reported a connection event that should not be possible. ------------------------------------------------------------ Table C-45 System Communication Services Directory Service Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 62000100 HIS$LISTEN call failed with INSUFFICIENT_RESOURCES. 62010100 Failure to allocate associated work queue. 62020100 Failure to allocate associated timer queue. 62030100 Failure to allocate connection ID timers. ------------------------------------------------------------ Table C-46 Disk Inline Exerciser (DILX) Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 80010100 An HTB was not available to issue an IO when it should have been. 80020100 A unit could not be dropped from testing because an available command failed. 80030100 DILX tried to release a facility that was not reserved by DILX. 80040100 DILX tried to change the unit state from MAINTENANCE_MODE to NORMAL but was rejected because of insufficient resources. 80050100 DILX tried to change the USB unit state from MAINTENANCE_MODE to NORMAL but DILX never received notification of a successful state change . 80060100 DILX tried to switch the unit state from MAINTENANCE_MODE to NORMAL but was not successful. 80070100 DILX aborted all commands via va$d_abort( ) but the HTBs have not been returned. 80080100 While DILX was deallocating HIS EIP buffers, at least one could not be found. 80090100 DILX received an end message that corresponds to an opcode not supported by DILX. (continued on next page) C-116 HSJ-Series Error Logging Table C-46 (Cont.) Disk Inline Exerciser (DILX) Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 800A0100 DILX was was not able to restart HIS timer. 800B0100 DILX tried to issue an IO for an opcode that is not supported. 800C0100 DILX tried to issue a oneshot IO for an opcode that is not supported. 800D0100 A DILX device control block contains an unsupported unit_state. 800E0100 While trying to print an Event Information Packet, DILX discovered an unsupported MSCP error log format. 80100100 DILX could not compare buffers because no memory was available from EXEC$ALLOCATE_MEM_ZEROED. 80120100 DILX expected an EIP to be on the receive EIP question but no EIPs were there. 80130100 DILX was asked to fill a data buffer with an unsupported data pattern. 80140100 DILX could not process an unsupported answer in dx$reuse_params( ). ------------------------------------------------------------ Table C-47 Tape Inline Exerciser (TILX) Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 81010100 An HTB was not available to issue an IO when it should have been 81020100 A unit could not be dropped from testing because an available command failed 81030100 TILX tried to release a facility that was not reserved by TILX 81040100 TILX tried to change the unit state from MAINTENANCE_MODE to NORMAL but was rejected because of insufficient resources 81050100 TILX tried to change the USB unit state from MAINTENANCE_MODE to NORMAL but TILX never received notification of a successful state change 81060100 TILX tried to switch the unit state from MAINTENANCE_MODE to NORMAL but was not successful 81070100 TILX aborted all commands via va$d_abort( ) but the htbs have not been returned 81080100 While TILX was deallocating HIS EIP buffers, at least one could not be found 81090100 TILX received an end message that corresponds to an opcode not supported by TILX 810A0100 TILX was was not able to restart HIS timer 810B0100 TILX tried to issue an IO for an opcode that is not supported. 810C0100 TILX tried to issue a oneshot IO for an opcode that is not supported. 810D0100 A TILX device control block contains an unsupported unit_state. 810E0100 TILX received an unsupported Value Added status in a Value Added completion message. 810F0100 TILX found an unsupported device control block substate while trying to build a command for the Basic Function test. 81100100 TILX found an unsupported device control block substate while trying to build a command for the Read-Only test. (continued on next page) HSJ-Series Error Logging C-117 Table C-47 (Cont.) Tape Inline Exerciser (TILX) Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 81110100 TILX found an unsupported device control block substate while trying to build a command for the User-Defined test. 81120100 TILX received an EOT encountered while in a substate where EOT encountered should not occur. 81130100 TILX calculated an illegal position type value while trying to generate a command for the position intensive phase of the Basic Function test. 81140100 While trying to display an EIP, TILX discovered an unsupported MSCP error log format. 811A0100 TILX expected a deferred error to be on the receive deferred error question but no deferred errors were there. 811B0100 TILX was asked to fill a data buffer with an unsupported data pattern. 811C0100 TILX could not process an unsupported answer in tx$reuse_params( ). ------------------------------------------------------------ Table C-48 Automatic Device Configuration Program (CONFIG) Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 83010100 The CLI prompt was not returned to the Auto-Config virtual terminal code within a reasonable amount of time. 83020100 An unsupported message type or terminal request was received by the Auto-Config virtual terminal code from the CLI. 83030100 Not all alter_device requests completed within the timeout interval. ------------------------------------------------------------ Table C-49 Controller Restart Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 0 Full restart 1 No restart ------------------------------------------------------------ C-118 HSJ-Series Error Logging C.4 Event Notification/Recovery Threshold An Event Notification/Recovery Threshold value is assigned to each significant event that can be reported by an HSJ30/40 controller. The Event Notification/Recovery Threshold values and their meanings are shown in Table C-50. Table C-50 Event Notification/Recovery Threshold Classifications ------------------------------------------------------------ Threshold Value Classification Description ------------------------------------------------------------ 01 IMMEDIATE Failure or potential failure of a component critical to proper controller operation is indicated; immediate attention is required. 02 HARD Failure of a component that affects controller performance or precludes access to a device connected to the controller is indicated. 0A SOFT An unexpected condition detected by a controller firmware component (such as protocol violations, host buffer access errors, internal inconsistencies, and so forth) is indicated. 64 INFORMATIONAL An event having little or no effect on proper controller or device operation is indicated. ------------------------------------------------------------ With the exception of events reported via the Disk Copy Data Correlation Event Log, the Event Notification/Recovery Threshold value assigned to a particular event is supplied in the NR Threshold subfield of the ``instance code'' field of the event log used to report the event. See Section C.2 for ``instance code'' field details. Disk Copy Data Correlation Event Log Conditions The Event Notification/Recovery Threshold Classification assigned to the following conditions reported via a Disk Copy Data Correlation Event Log is SOFT (see Table C-50): · Subcommand Error (subcode ``Destination--Command Timed Out'') · Subcommand Error (subcode ``Source--Command Timed Out'') · Subcommand Error (subcode ``Destination--Inconsistent State''), cases A, B, C, D, E, and F. · Controller Error (subcode ``Local Connection Request Failed, Insufficient Resources to Request Local Connection'') · Controller Error (subcode ``Remote Connection Request Failed, Insufficient Resources to Request Local Connection'') All other conditions that can be reported via the Disk Copy Data Correlation Event Log are not assigned a specific Event Notification/Recovery Threshold Classification because they can be correlated with the associated condition specific event log. HSJ-Series Error Logging C-119 C.5 Recommended Repair Action A Recommended Repair Action code is assigned to each significant event that can be reported by an HSJ30/40 controller. The Recommended Repair Action codes and their meanings are shown in Table C-51. Table C-51 Recommended Repair Action Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 00 No action necessary. 01 An unrecoverable hardware detected fault occurred or an unrecoverable firmware inconsistency was detected, proceed with HSJ30/40 controller support avenues. Contact Digital Multivendor Services. 02 Inconsistent/erroneous information received from the operating system, proceed with operating system software support avenues. Contact Digital Multivendor Services. 03 Follow the recommended repair action contained in the ``last failure code'' field. 04 Two possible problem sources are indicated: · In the case of a shelf with dual power supplies, one of the power supplies has failed. Follow repair action 07 for the power supply with the power LED out. · One of the shelf blowers has failed. Follow repair action 06. 05 Four possible problem sources are indicated: · Total power supply failure on a shelf. Follow repair action 09. · A device inserted into a shelf that has a broken internal SBB connector. Follow repair action 0A. · A standalone device is connected to the HSJ30/40 controller with an incorrect cable. Follow repair action 08. · A HSJ30/40 controller hardware failure. Follow repair action 20. 06 Determine which blower has failed and replace it. Refer to Chapter 7 for the blower removal procedure. 07 Replace power supply. Refer to Chapter 7 for the power supply removal procedure. 08 Replace the cable. Refer to the specific device documentation. 09 Determine power failure cause. 0A Determine which SBB has a failed connector and replace it. Refer to Chapter 7. 0B The other HSJ30/40 controller in a dual-redundant configuration has been reset with the ``Kill'' line by the HSJ30/40 controller that reported the event. To restart the ``Killed'' HSJ30/40 controller, enter the CLI RESTART OTHER command on the ``Surviving'' HSJ30/40 controller and then press the (//) RESET button on the ``Killed'' HSJ30/40 controller. If the other HSJ30/40 controller is repeatedly being ``Killed'' for the same or a similar reason, follow repair action 20. (continued on next page) C-120 HSJ-Series Error Logging Table C-51 (Cont.) Recommended Repair Action Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 0C Both HSJ30/40 controllers in a dual-redundant configuration are attempting to use the same SCSI ID (either 6 or 7 as indicated in the event report). Note that the other HSJ30/40 controller of the dual-redundant pair has been reset with the ``Kill'' line by the HSJ30/40 controller that reported the event. Two possible problem sources are indicated: · A HSJ30/40 controller hardware failure. · A controller backplane failure. Follow repair action 20 for the ``Killed'' HSJ30/40 controller. If the problem persists, then follow repair action 20 for the ``Surviving'' HSJ30/40 controller. If the problem still persists, then replace the controller backplane. 20 Replace HSJ30/40 controller module. Refer to Chapter 7 for proper replacement procedure. 22 Replace indicated HSJ30/40 cache module. 40 If the Sense Data FRU field is non-zero, follow repair action 41. Otherwise, replace the appropriate FRU associated with the device's SCSI interface or the entire device. 41 Consult the device's maintenance manual for guidance on replacing the indicated device FRU. 43 Update the configuration data to correct the problem. 44 Replace the SCSI cable for the failing SCSI bus. If the problem persists, replace the controller backplane, drive backplane, or controller module. 45 Interpreting the device supplied Sense Data is beyond the scope of the HSJ30/40 controller firmware. Refer to the device documentation to determine the appropriate repair action, if any. 60 Swap the transmit and receive cables for the indicated path. 61 Check indicated path cables for proper installation. 63 Check the CI adapter on the host system identified in the ``remote node name'' field for proper operation. ------------------------------------------------------------ Recommended Repair Action codes apply to each reportable event (except those reported via the Disk Copy Data Correlation Event Log) as identified by the value contained in the Repair Action subfield of the ``instance code'' field of the event logs described in Section C.2. For events reported via the Last Failure Event Log the Recommended Repair Action code is contained in the Repair Action subfield of the ``last failure code'' field of that event log. Disk Copy Data Correlation Event Log Conditions The Recommended Repair Action Code assigned to the following conditions reported via a Disk Copy Data Correlation Event Log is 01 (see Table C-51): · Subcommand Error (subcode ``Destination--Command Timed Out'') · Subcommand Error (subcode ``Source--Command Timed Out'') · Subcommand Error (subcode ``Destination--Inconsistent State''), cases C, D, E, and F. · Controller Error (subcode ``Local Connection Request Failed, Insufficient Resources to Request Local Connection'') HSJ-Series Error Logging C-121 · Controller Error (subcode ``Remote Connection Request Failed, Insufficient Resources to Request Local Connection'') The Recommended Repair Action Code assigned to the following condition reported via a Disk Copy Data Correlation Event Log is 02 (see Table C-51): · Subcommand Error (subcode ``Source--Inconsistent State''), cases A and B. All other conditions that can be reported via the Disk Copy Data Correlation Event Log are not assigned a specific Recommended Repair Action Code because they can be correlated with the associated condition specific event log. C-122 HSJ-Series Error Logging C.6 Deskew Command Procedure Example C-2 presents a command procedure to deskew the ``CONTROLLER DEPENDENT INFORMATION'' for a ``CONTROLLER LOG'' type error log. Example C-2 Deskew Command Procedure Example $! P1 = Input file name $! P2 = Output file name $ on warning then $exit $ inew_entry = " ******************************* ENTRY" $ ctrl_entry = " CONTROLLER LOG" $ lw_entry = " LONGWORD" $ ctrl_inp = "FALSE" $ lw_string = "" $ open/read inf 'p1' $ open/write ouf 'p2' $in_loop: $ read/end=in_done inf inr $ inrlen = f$length(inr) $ if f$locate(new_entry,inr) .ne. inrlen $ then $ write sys$output inr $ if ctrl_inp $ then $ gosub convert_longs $ ctrl_inp = "FALSE" $ endif $ else $ if f$locate(ctrl_entry,inr) .ne. inrlen $ then $ write sys$output inr $ ctrl_inp = "TRUE" $ lw_string = "" $ endif $ if f$locate(lw_entry,inr) .ne. inrlen .and. ctrl_inp $ then $ lw = f$element(2," ",f$edit(inr,"TRIM,COMPRESS")) $ if lw_string .eqs. "" $ then $ lw_string = f$extract(0,4,lw) $ else $ lw_string = lw + lw_string $ endif $ endif $ endif $ write ouf inr $ goto in_loop $in_done: $ close inf $ if ctrl_inp $ then $ gosub convert_longs $ endif $ close ouf $ exit $convert_longs: $ index = 1 $ write ouf "" $ write ouf "" (continued on next page) HSJ-Series Error Logging C-123 Example C-2 (Cont.) Deskew Command Procedure Example $ write ouf "" $ write ouf "" $ write ouf "LONGWORD DESKEW:" $ write ouf "" $ write ouf "" $convert_longs_loop: $ len = f$length(lw_string) $ if len .le. 4 then goto convert_longs_done $ lw = f$extract(len - 8,8,lw_string) $ write ouf " LONGWORD[''index'] = ",lw $ lw_string = f$extract(0,len - 8, lw_string) $ index = index + 1 $ goto convert_longs_loop $convert_longs_done: $ write ouf "" $ return C-124 HSJ-Series Error Logging Example C-3 shows an ERF error log before running the command procedure. Example C-3 ERF Error Log Before Command Procedure V A X / V M S SYSTEM ERROR REPORT COMPILED 16-MAR-1993 12:30:07 PAGE 144. ******************************* ENTRY 11. ******************************* ERROR SEQUENCE 2820. LOGGED ON: SID 05903914 DATE/TIME 16-MAR-1993 11:35:45.39 SYS_TYPE 00000000 SYSTEM UPTIME: 2 DAYS 22:48:03 SCS NODE: CNOTE VAX/VMS V5.5-2 ERL$LOGMESSAGE ENTRY KA825 HW REV# B PATCH REV# 28. UCODE REV# 20. BI NODE # 2. I/O SUB-SYSTEM, UNIT _HSJ402$DUA20: MESSAGE TYPE 0001 DISK MSCP MESSAGE MSLG$L_CMD_REF 5B54001E MSLG$W_SEQ_NUM 0039 SEQUENCE #57. MSLG$B_FORMAT 00 CONTROLLER LOG MSLG$B_FLAGS 00 UNRECOVERABLE ERROR MSLG$W_EVENT 01CA CONTROLLER ERROR POLICY PROCESS ERROR MSLG$Q_CNT_ID 00000021 01280001 UNIQUE IDENTIFIER, 000100000021(X) MASS STORAGE CONTROLLER MODEL = 40. MSLG$B_CNT_SVR FF CONTROLLER SOFTWARE VERSION #255. MSLG$B_CNT_HVR 00 CONTROLLER HARDWARE REVISION #0. CONTROLLER DEPENDENT INFORMATION LONGWORD 1. 01010000 /..../ LONGWORD 2. 044103CF /Ï.A./ LONGWORD 3. 00000000 /..../ LONGWORD 4. 00470000 /..G./ LONGWORD 5. 00000000 /..../ LONGWORD 6. 00020000 /..../ LONGWORD 7. 00000000 /..../ HSJ-Series Error Logging C-125 Example C-4 shows the same ERF error log after running the command procedure (notice the deskewed longwords). Example C-4 ERF Error Log After Command Procedure V A X / V M S SYSTEM ERROR REPORT COMPILED 16-MAR-1993 12:30:07 PAGE 144. ******************************* ENTRY 11. ******************************* ERROR SEQUENCE 2820. LOGGED ON: SID 05903914 DATE/TIME 16-MAR-1993 11:35:45.39 SYS_TYPE 00000000 SYSTEM UPTIME: 2 DAYS 22:48:03 SCS NODE: CNOTE VAX/VMS V5.5-2 ERL$LOGMESSAGE ENTRY KA825 HW REV# B PATCH REV# 28. UCODE REV# 20. BI NODE # 2. I/O SUB-SYSTEM, UNIT _HSJ402$DUA20: MESSAGE TYPE 0001 DISK MSCP MESSAGE MSLG$L_CMD_REF 5B54001E MSLG$W_SEQ_NUM 0039 SEQUENCE #57. MSLG$B_FORMAT 00 CONTROLLER LOG MSLG$B_FLAGS 00 UNRECOVERABLE ERROR MSLG$W_EVENT 01CA CONTROLLER ERROR POLICY PROCESS ERROR MSLG$Q_CNT_ID 00000021 01280001 UNIQUE IDENTIFIER, 000100000021(X) MASS STORAGE CONTROLLER MODEL = 40. MSLG$B_CNT_SVR FF CONTROLLER SOFTWARE VERSION #255. MSLG$B_CNT_HVR 00 CONTROLLER HARDWARE REVISION #0. CONTROLLER DEPENDENT INFORMATION LONGWORD 1. 01010000 /..../ LONGWORD 2. 044103CF /Ï.A./ LONGWORD 3. 00000000 /..../ LONGWORD 4. 00470000 /..G./ LONGWORD 5. 00000000 /..../ LONGWORD 6. 00020000 /..../ LONGWORD 7. 00000000 /..../ LONGWORD DESKEW: (continued on next page) C-126 HSJ-Series Error Logging Example C-4 (Cont.) ERF Error Log After Command Procedure LONGWORD[1] = 03CF0101 LONGWORD[2] = 00000441 LONGWORD[3] = 00000000 LONGWORD[4] = 00000047 LONGWORD[5] = 00000000 LONGWORD[6] = 00000002 HSJ-Series Error Logging C-127 D ------------------------------------------------------------ HSD-Series Error Logging This appendix details errors the HSD-series controller will report in its host error logs under the OpenVMS operating system, as well as how to extract the information from the logs. ------------------------------------------------------------ Note ------------------------------------------------------------ Host error log translations are correct as of the date of publication of this manual. However, log information may change with firmware updates. Refer to your StorageWorks Array Controller Operating Firmware Release Notes for error log information updates. ------------------------------------------------------------ D.1 Reading an HSD-series Error Log You can interpret an HSD-series error log the same way as an HSJ-series error log (Appendix C), with the following exeptions: · Template type 31 does not exist for HSD-series error logs. · Template types 32 and 33 have changed as shown in Table D-1. Table D-1 Template Types ------------------------------------------------------------ Description Template Longword Value Deskewed Value ------------------------------------------------------------ DSSI Port/Port Driver Event Log 32+ 2 1032xxxx 00001032 DSSI System Communication Services Event Log 33+ 2 2C33xxxx 00002C33 ------------------------------------------------------------ +The MSLG$B_FORMAT field for these templates will read ``00 CONTROLLER LOG,'' so you may want to run the OpenVMS DCL command procedure provided at the end of Appendix C for deskewing the longwords. ------------------------------------------------------------ HSD-Series Error Logging D-1 D.2 Event Log Formats In general, the event log formats for the HSD-series controller are identical to those for the HSJ-series. However, where the HSJ-series uses ``CI'' to describe the host interface, the HSD-series controller uses ``DSSI''. For example, in the following table, the terms in the first column for HSJ-series controllers translate to the terms in the second column for for HSD-series controllers. Be aware of this change in terminology as you use Appendix C to decode your error logs. CI Host Interconnect Services Common Event Log Fields DSSI Host Interconnect Services Common Event Log Fields CI source node address DSSI source node address CI destination node address DSSI destination node address CI Virtual Circuit State Codes DSSI Virtual Circuit State Codes CI Port/Port Driver Event Log (Template 32) DSSI Port/Port Driver Event Log (Template 32) CI System Communication Services Event Log (Template 33) DSSI System Communication Services Event Log (Template 33) D.3 Event Log Codes Tables D-2 through D-5 show some important difference in reported codes between HSJ- and HSD-series controllers. Some entries may show identical numeric codes with different description text, while other entries are in fact different (HSD-series controller only) codes and descriptions. Be aware of these differences when decoding HSD-series controller error logs using Appendix C. Table D-2 Host Interconnect Services Status Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 00000064 The "DSSI IDREQ send without receiving a DSSI ID in response" limit has been reached on Path A; the remote node is acknowledging the packets but not responding to them. 00000065 A DSSI ID or DSSI CNF packet (transmitted by the thread on behalf of Host Interconnect Services) could not be successfully transmitted. 00010009 Virtual circuit closed due to DSSI ID request failure. 00030009 Virtual circuit closed due to DSSI START failure. 00040009 Virtual circuit closed due to DSSI STACK failure. 00070009 Virtual circuit closed due to NAK ADP retry DSSI ID transmit failure. 000A0009 Not implemented in DSSI environment. 000B0009 Virtual circuit closed due to NOR ADP retry DSSI ID transmit failure. 000E0009 Not implemented in DSSI environment. 00100009 Not implemented in DSSI environment. 00120009 Not implemented in DSSI environment. 001D0009 Virtual circuit closed due to DSSI ID complete failure. (continued on next page) D-2 HSD-Series Error Logging Table D-2 (Cont.) Host Interconnect Services Status Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 001F0009 Virtual circuit closed due to DSSI retry. ------------------------------------------------------------ Table D-3 DSSI Port/Port Driver Event Log (Template 32) Instance/MSCP Event Codes ------------------------------------------------------------ Instance Code MSCP Event Code Description ------------------------------------------------------------ 4007640A 006A DSSI Port detected error upon attempting to transmit a packet. This resulted in the closure of the Virtual Circuit. ------------------------------------------------------------ Table D-4 Host Interconnect Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 40000101 An unrecognized DSSI opcode was received by HIS. These packets are packets with DSSI opcodes recognized by the port but not by HIS. Last Failure Parameter[0] contains the DSSI opcode value. ------------------------------------------------------------ Table D-5 Host Interconnect Port Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 420C0100 HP_INIT could not allocate initial HTB for Path A. 420D0100 HP_INIT could not allocate HPHW structure. 42350100 HP found a negative offset in a Host Data transfer operation. 42640100 Scan packet que found bad path select case for DSSI. 42680102 Dssi_err_isr routine found that 720 report status for initiator mode. Last Failure Ped an unexpected status for target mode. Last Failure Parameter[0] contains the 720 chip dstat register value. Last Failure Parameter[1] contains the 720 chip sist1 register value. 42690101 Dssi_isr routine found that the 720 script reported an invalid Receive status. Last Failure Parameter[0] contains the receive interrupt status written by the 720 chip. 426B0101 Dssi_err_isr routine found that 720 interrupted without status Last Failure Parameter[0] contains the 720 chip istat register value. 42742001 Dssi_err_isr routine found that 720 reported a bus error on the FIB internal bus. Last Failure Parameter[0] contains the 720 chip dstat register value. 42752002 Dssi_err_isr routine found that 720 reported a bus error on the FIB internal bus. Last Failure Parameter[0] contains the 720 chip dstat register value. Last Failure Parameter[1] contains the 720 chip dcmd register value. 42760102 Dssi_err_isr routine found that 720 reported an unexpected status for initiator mode. Last Failure Parameter[0] contains the 720 chip dstat register value. Last Failure Parameter[1] contains the 720 chip sist1 register value. (continued on next page) HSD-Series Error Logging D-3 Table D-5 (Cont.) Host Interconnect Port Services Last Failure Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 42770102 Dssi_err_isr routine found that 720 reported an unexpected status for initiator mode. Last Failure Parameter[0] contains the 720 chip dstat register value. Last Failure Parameter[1] contains the 720 chip sist1 register value. ------------------------------------------------------------ D.4 Recommended Repair Action Table D-6 shows a difference in description text for recommended repair actions on HSD-series controllers. Be aware of the difference when decoding HSD-series controller error logs using Appendix C. Table D-6 Recommended Repair Action Codes ------------------------------------------------------------ Code Description ------------------------------------------------------------ 63 Check the DSSI adapter on the host system identified in the "remote node name" field for proper operation. ------------------------------------------------------------ D-4 HSD-Series Error Logging E ------------------------------------------------------------ HSZ-Series Error Logging This appendix details errors the HSZ-series controller will report in its host event logs under the DEC OSF/1 AXP operating system, as well as how to extract the information from the logs. ------------------------------------------------------------ Note ------------------------------------------------------------ Host event log translations are correct as of the date of publication of this manual. However, log information may change with firmware updates. Refer to your StorageWorks Array Controllers HSZ40 Array Controller Operating Firmware Release Notes for error log information updates. ------------------------------------------------------------ E.1 Reading an HSZ-Series Error Log Example E-1 shows an example of a uerf translated host error log. The uerf utility under the DEC OSF/1 AXP operating system will show the target and LUN of the unit in question. Use your current configuration information to match the unit to the devices it is mapped to. Then, test and/or service the devices on a case-by-case basis. HSZ-Series Error Logging E-1 Example E-1 was generated using the uerf -o full command on an HSZ40 controller with a KZTSA host adapter. Example E-1 The uerf utility Error Event Log ********************************* ENTRY 19. ********************************* ----- EVENT INFORMATION ----- EVENT CLASS ERROR EVENT OS EVENT TYPE 199. CAM SCSI SEQUENCE NUMBER 19. OPERATING SYSTEM DEC OSF/1 OCCURRED/LOGGED ON Tue Mar 15 12:36:47 1994 OCCURRED ON SYSTEM dombek SYSTEM ID x0004000F CPU TYPE: DEC CPU SUBTYPE: KN15AA ----- UNIT INFORMATION ----- CLASS x0000 DISK SUBSYSTEM x0000 DISK BUS # x000E x0392 LUN x2 TARGET x2 ----- CAM STRING ----- ROUTINE NAME cdisk_check_sense ----- CAM STRING ----- ROUTINE NAME cdisk_check_sense ----- CAM STRING ----- Hardware Error bad block number: 0 ----- CAM STRING ----- ERROR TYPE Hard Error Detected ----- CAM STRING ----- DEVICE NAME DEC HSZ40 ----- CAM STRING ----- Active CCB at time of error ----- CAM STRING ----- CCB request completed with an error ERROR - os_std, os_type = 11, std_type = 10 ----- ENT_CCB_SCSIIO ----- (continued on next page) E-2 HSZ-Series Error Logging Example E-1 (Cont.) The uerf utility Error Event Log *MY ADDR x8A960728 CCB LENGTH x00C0 FUNC CODE x01 CAM_STATUS x0084 CAM_REQ_CMP_ERR AUTOSNS_VALID PATH ID 14. TARGET ID 2. TARGET LUN 2. CAM FLAGS x00000442 CAM_QUEUE_ENABLE CAM_DIR_IN CAM_SIM_QFRZDIS *PDRV_PTR x8A960428 *NEXT_CCB x00000000 *REQ_MAP x8A971E00 VOID (*CAM_CBFCNP)() x003B5520 *DATA_PTR x40023230 DXFER_LEN x00000200 *SENSE_PTR x8A960450 SENSE_LEN xA0 CDB_LEN x06 SGLIST_CNT x0000 CAM_SCSI_STATUS x0002 SCSI_STAT_CHECK_CONDITION SENSE_RESID x00 RESID x00000000 CAM_CDB_IO x0000000000000001DA681B08 CAM_TIMEOUT x0000003C MSGB_LEN x0000 VU_FLAGS x4000 TAG_ACTION x20 ----- CAM STRING ----- Error, exception, or abnormal _condition ----- CAM STRING ----- HARDWARE ERROR - Nonrecoverable _hardware error ----- ENT_SENSE_DATA ----- (continued on next page) HSZ-Series Error Logging E-3 Example E-1 (Cont.) The uerf utility Error Event Log ERROR CODE x0070 CODE x70 SEGMENT x00 SENSE KEY x0004 HARDWARE ERR INFO BYTE 3 x00 INFO BYTE 2 x00 INFO BYTE 1 x00 INFO BYTE 0 x00 ADDITION LEN x98 CMD SPECIFIC 3 x00 CMD SPECIFIC 2 x00 CMD SPECIFIC 1 x00 CMD SPECIFIC 0 x00 ASC x44 ASQ x00 FRU x00 SENSE SPECIFIC x000000 ADDITIONAL SENSE 0000: 00030000 01080108 00000206 40020000 *...............@* 0010: 01510309 08002800 01DA681B 01000000 *..Q..(...h......* 0020: 00000700 20202020 58432020 33323130 *.... CX0123* 0030: 37363534 5A373845 00000000 36333400 *4567E87Z.....436* 0040: 325A5241 20202038 43282020 45442029 *ARZ28 (C) DE* 0050: 00000043 00000000 00000004 00000000 *C...............* 0060: 01080000 00000000 00000000 00000000 *................* 0070: 00000000 00000000 00000000 00000000 *................* 0080: 00000000 00000000 00000000 00000000 *................* 0090: 7E250000 00005E3C 00000000 00000000 *..%~<^..........* E-4 HSZ-Series Error Logging ------------------------------------------------------------ Glossary ac distribution The method of distributing ac power in a cabinet. ac power supply A power supply designed to produce dc power from an ac input. adapter A device that converts the protocol and hardware interface of one bus type into that of another without changing the functionality of the bus. See signal converter. American National Standards Institute See ANSI. ANSI American National Standards Institute. An organization that develops and publishes electronic and mechanical standards. array controller A hardware/software device that facilitates communications between a host and one or more devices organized in an array. The HS controllers are array controllers. BA350-Mx controller shelf The StorageWorks controller shelf used for HS-family controller modules, cache modules, and shelf power units. BA350-Sx SBB shelf A StorageWorks shelf used for only power units and SBBs. bad block A block containing a defect that: · Exceeds the correction capability of the subsystem error correction scheme. · Exceeds a drive-specified error threshold. Once a block exceeds this threshold, data integrity is not guaranteed. · Imposes too great a strain on system performance. In this case, the subsystem still assures data integrity, but the extensive error correction required for each block access causes too great a strain on system performance. Glossary-1 bad block replacement See BBR. battery backup unit See BBU BBR Bad block replacement. BBU StorageWorks battery backup unit that extends power availability after the loss of primary ac power or a power supply to protect against the corruption or loss of data. BIST Built-in self-test. BIST is the internal self-test routine for the HS controller module microprocessor chip. block A stream of data transferred as a unit. Used interchangeably with the term sector for disk drives to represent 512 bytes (for 16- and 32-bit host architectures) or 576 bytes (for 36-bit architectures). A block is the smallest data unit addressable on a subunit. It occupies a specific physical position relative to the index and is available for reading or writing once per disk rotation. The five types of blocks follow: 1. Diagnostic block--Used for drive read or write diagnostics. The diagnostic block area is not visible to the host operating system. However, it is visible to the controller. Diagnostic block addresses are 28 bits wide and are called diagnostic block numbers (DBNs). 2. External block--Contains the format control tables. The external block area is not visible to the host operating system. However, it is visible to the controller. External block addresses are 28 bits wide and are called external block numbers (XBNs). 3. Logical block--Contains the host applications area and the Replacement Control Table. All logical blocks are visible to the host operating system. Logical block addresses are 28 bits wide and are called logical block numbers (LBNs). 4. Physical block--Contains all the blocks on a subunit. DBNs, LBNs, RBNs, and XBNs are subsets of the physical block area. Physical block addresses are 28 bits wide and are called physical block numbers (PBNs). 5. Replacement block--A reserved block used as a replacement for a bad block on a subunit. Replacement block addresses are 28 bits wide and are called replacement block numbers (RBNs). blower An airflow device mounted in a StorageWorks shelf. Built-in self-test See BIST. Glossary-2 cable distribution unit See CDU. carrier A standard, StorageWorks shelf-compatible, plastic shell into which a device can be installed. Sometimes called SBB carrier. CDU Cable distribution unit. The power entry device for StorageWorks center cabinets. The unit provides the connections necessary to distribute ac power to cabinet shelves and fans. CI bus Digital's computer interconnect bus using two serial paths, each with a transfer rate of 70 Mb/s (8.75 MB/s). CIRT CI receiver/transmitter CI20 DECSYSTEM-20 interface to the CI bus. CI750 VAX 11/750 and VAX 11/751 interface to the CI bus. CI780 VAX 11/780 and VAX 11/782 interface to the CI bus. CLI Command line interpreter for, and user interface to, the HS-family controller firmware. cluster A collection of processors called nodes, attached to each other by a high-speed bus. These processors are independent and survivable. They may be general-purpose computers or special-purpose servers, such as the HS controller, providing a special set of services to the rest of the nodes. command line interpreter See CLI. cold swap A method of device replacement that requires that power be removed from all shelves in a cabinet. This method is used when conditions preclude the use of the warm swap or hot swap methods. container Either a single disk device, or group of disk devices linked as a storage set. controller A hardware/software device that facilitates communications between a host and one or more devices. Glossary-3 controller shelf A StorageWorks shelf designed to contain controller and cache memory modules. CRC A checkword (polynomial checksum) generally appended to a disk data transfer. CRC is computed using data message bits as coefficients divided by a generating polynomial. The resulting remainder is the CRC. When a transmitter computes and transmits a CRC following a data transfer, the receiver can recompute and compare it with the received version to verify correct reception. EDC and ECC (both used by disks) are examples of CRC checkwords. cyclic redundancy check See CRC. DAEMON Diagnostic And Execution MONitor. DAEMON is a part of HS controller self-testing that includes port and cache initialization and self-test routines. DAT Digital Audio Tape. A format for recording digital data on a cartridge tape. data center cabinet A generic reference to the large cabinets, such as the SW800 series, in which StorageWorks components can be mounted. device driver An operating system software module used to physically control an I/O device. In DSA, conventional device drivers are replaced by a single driver for an entire class of devices, such as disk drives, and a single port driver for the host-to- controller transport mechanism. For example, a host computer communicating with an HSJ-series controller uses disk and tape class drivers and the CI port driver. device shelf A StorageWorks shelf designed to contain SBBs. Diagnostic And Execution MONitor See DAEMON. Diagnostics and Utilities Protocol See DUP. digital audio tape See DAT DIGITAL Standard Disk Format See DSDF. Glossary-4 DSDF The Digital Storage Architecture (DSA) standard for disk media format. DSDF specifies the mechanism for mapping a contiguous logical block address space into a possibly imperfect physical space, as well as defining diagnostic and factory areas. DSDF is transparent to the system. DIGITAL Storage Architecture See DSA. DSA A set of specifications and interfaces describing standards for designing mass storage products. DSA defines the functions performed by host computers, controllers, and drives. It also specifies how they interact to accomplish mass storage management. DIGITAL Storage System Interconnect See DSSI DILX Disk inline exerciser. Diagnostic firmware used to test the data transfer capabilities of disk drives in a way that simulates a high level of user activity. Disk Inline Exerciser See DILX. DIGITAL Storage Architecture See DSA. DSSI Digital's storage system interconnect bus with an 8-bit data transfer rate of 4-5 MB/s. dual universal asynchronous receiver transmitter See DUART. dual cabinet power configuration A cabinet ac power configuration in which two ac sources and two ac power supplies are used to supply dc power to the cabinet's SBB shelves. dual porting (or dual access) The ability of a disk or tape drive to be accessed by two controllers. All DSA drives have a standard dual-port feature. DSA drives can be online to only one controller at a time. However, they are able to disconnect themselves from a failed controller (or be disconnected by a failing controller) and become available for continued service through the other controller. dual shelf power configuration A cabinet ac power configuration in which one ac source and two ac power supplies are used to supply dc power to the cabinet's SBB shelves. Glossary-5 dual-redundant configuration A controller configuration consisting of a primary and backup controller in one controller shelf . Both controllers normally share access to each other 's devices. If the primary controller fails, the backup controller assumes control over the failing controller 's devices. DUART Dual Universal Asynchronous Receiver Transmitter. An integrated circuit containing two serial, asynchronous transceiver circuits. DUP Diagnostic and Utility Protocol. Host application software that allows a host operator terminal to connect to the controller 's command line interpreter. See also virtual terminal. ECC One or more cyclic redundancy check (CRC) words that allow detection of a mismatch between transmitted and received data in a communications system, or between stored and retrieved data in a storage system. The ECC allows for location and correction of an error in the received/retrieved data. All ECCs have limited correction power. EDC One or more checksum words that allow detection of a mismatch between transmitted and received data in a communications system, or between stored and retrieved data in a storage system. The EDC has no data correction capability. EIP Error information packet. The EIP includes bytes of data meant to be decoded into information explaining error events. electromagnetic interference See EMI electrostatic discharge See ESD EMI Electromagnetic interference. The impairment of a signal by an electromagnetic disturbance. error correction code See ECC. error detection code See EDC. error information packet See EIP. Glossary-6 ESD Electrostatic discharge. The discharge of a potentially harmful static electric voltage as a result of improper grounding. EXEC Firmware executive. EXEC is the portion of HS controller firmware that acts as the operating system for the controller. extended status An additional set of status information maintained by the drive that is of interest to a host error log. Extended status is drive-type specific and is not utilized by the controller except as input to the host error log and diagnostic processes. failover A software process that takes place when one controller fails in a dual-redundant configuration and the other controller takes over service to the devices of the failed controller. fan An airflow device mounted in a StorageWorks cabinet. fast, differential SCSI See FD SCSI. fast, wide, differential SCSI See FWD SCSI. FD SCSI The fast, differential SCSI bus with an 8-bit data transfer rate of 10 MB/s. See also FWD SCSI and SCSI. field replaceable unit See FRU. filler panel A sheet metal or plastic panel used to cover unused mounting areas in StorageWorks cabinets and shelves. firmware executive See EXEC. flush To write cached data to storage. FRU field replaceable unit. full-height device A single device that occupies an entire 5.25 inch SBB carrier. StorageWorks full-height devices have an order number suffix of "-VA". Glossary-7 FWD SCSI The fast, wide, differential SCSI bus with a 16-bit data transfer rate of up to 20 MB/s. See also FD SCSI and SCSI. half-height device A device that occupies half of a 5.25 inch SBB carrier. Two half-height devices can be mounted in a 5.25 inch SBB carrier. The first half-height device is normally mounted in the lower part of the carrier. The second device is normally mounted in the upper part of the carrier. HBVS Host-Based Volume Shadowing. Also known as Phase 2 Volume Shadowing. HBVS assistance RAID level 1a. The HS controller performs HBVS assistance by independently directing shadow copy operations that were requested by the host between two units under the given controller. Hierarchical Storage Controller See HSC. HIS Host Interconnect Services. The firmware that communicates with the host in HS-family controllers. host The primary or controlling computer to which a storage subsystem is attached. Host-Based Volume Shadowing See HBVS. Host Interconnect Services See HIS. host logical unit A virtual group of devices addressable as a unit. See also logical unit. hot swap A method of device replacement whereby the complete system remains on line and active during device removal and reinstallation. The device being removed or reinstalled is the only device that cannot perform operations during this process. HSC Hierarchical Storage Controller. An intelligent mass storage server used on the CI bus. Capable of supporting a total of eight disk and/or tape data channels, the HSC is part of the System Interconnect Architecture and Digital Storage Architecture. By performing as an I/O manager, the HSC can be classified as an I/O server, removing the burden of I/O management from the CPU. Glossary-8 initiator The SCSI bus member that requests an operation be performed by another member (target). When the HS controller interacts with physical storage devices, it is the initiator. Furthermore, when the host CPU interacts with the HSZ-series controller, the host is the initiator. instance code The four-byte value transmitted in the error log packet that is key to interpreting the error. KILL line The controller-to-controller disable signal used in a dual-redundant configuration. least recently used See LRU. logical unit A virtual group of devices addressable as a unit. Also called host logical unit. logical unit number See LUN. LRU Least recently used. This is cache terminology for the block replacement policy for the read cache. LUN A value of 0 through 7 that identifies a logical unit to a SCSI initiator. maintenance terminal The operator terminal used to identify an HS-family controller, to enable its host paths, to define its subsystem configuration, and to check its status. The HS-family maintenance terminal interface is designed to accept any terminal conforming to EIA-423. A maintenance terminal is only required to initially configure a controller and is not required for normal operations. Mass Storage Control Protocol See MSCP. MIST Module integrity self-test. MIST tests controller functions upon initialization. See also DAEMON. Module integrity self-test See MIST. MSCP Mass Storage Control Protocol. The message-level protocol used by the HSJ- and HSD-series controllers to communicate with a host computer. The three types of MSCP communication are sequential messages, datagrams, and block data transfers. Glossary-9 Network Interconnect See NI. NI One of two standard interconnects used in the System Interconnect Architecture (CI is the other). The NI (also known as the Ethernet) connects communications servers and compute servers, creating a local area network. node An intelligent entity in a distributed computing configuration. Nodes are independent but linked, as in a network or a cluster, becoming parts of a whole. In a cluster, HSJ-series controllers and host computers are cluster nodes. nonredundant A configuration in which there is no backup hardware in place for the hardware that is present. nontransportable A device setting that indicates the device is MSCP compliant and contains metadata. Nontransportable devices can be moved amongst HS controller subsystems, but not taken directly to non-HS controller systems. See also transportable. nonvolatile See NV. nonvolatile memory See NVMEM. nonvolatile parameters memory See NVPM. NV Nonvolatile. A term used to describe memory, the contents of which survive loss of power. NVMEM Nonvolatile memory. NVMEM is the battery backed-up SRAM on the controller module. NVPM Nonvolatile parameter memory. NVPM is a portion of NVMEM used to store controller configuration data. OCP Operator control panel. The control/indicator panel associated with a device. The OCP is usually mounted on the device and is accessible to the operator. offline One of the possible status conditions of a mass storage device or server. When a device is offline, it is not capable of communicating with the controller. When the controller is offline, it is inaccessible to any node in the configuration. Glossary-10 operator control panel See OCP. PCMCIA Personal Computer Memory Card Industry Association. An organization that develops standards for ROM memory cards. Personal Computer Memory Card Industry Association See PCMCIA. port The hardware and software used to connect a host controller to a communication bus, such as a CI, SCSI, or SDI bus. port/target/LUN See PTL. program card The PCMCIA card containing the HS controller operating firmware. PTL Port/target/LUN. PTL is a three-number hierarchical value representing a device location to a SCSI initiator. For example, PTL 143 is a device on port 1 of the initiator, target 4 on port 1, and LUN 3 under target 4. qualified device A device that has been fully tested in all appropriate StorageWorks hardware and software configurations, and is in complete compliance with Digital and country-specific standards (for example, FCC and TÜV). quiesce To make a bus inactive or dormant. The operator must quiesce SCSI bus operations, for example, during a device warm swap. radio frequency interference See RFI. redundant array of independent disks See RAID. read cache A block of high-speed memory used by a controller to buffer data being read from storage devices by a host. A read cache increases the controller 's effective device access speed by satisfying host read requests from its local cache memory when possible, instead of from external storage devices. The controller maintains in the cache copies of data recently requested by the host, and may fetch blocks of data ahead in anticipation that the controller will access the next sequential blocks. In a basic read cache, host write requests are handled without involving the cache. See also write through cache. Glossary-11 RAID Redundant array of independent disks. A set of storage techniques devised to increase the performance and availability of a storage subsystem. restore Data previously backed up on tape is retrieved for disk storage using the normal priority. Backup is used to preserve information in the event of a disk failure. Restore is used to recover the information. RFI Radio frequency interference. The impairment of a signal by an unwanted radio signal or radio disturbance. SBB StorageWorks building block. A device housed in a standard StorageWorks SBB carrier. An SBB has a standard physical and electrical interface that is compatible with those of StorageWorks shelves and enclosures. SBB shelf A StorageWorks shelf, such as the BA350-SB, designed to house plug-in SBB modules. SCA The interface specifications and protocols defining the connection of independent computer systems into clusters. SCS System Communication Services. A delivery protocol for packets of information (commands or data) to or from the host. SCSI Small Computer System Interface. An ANSI interface defining the physical and electrical parameters of a parallel I/O bus used to connect hosts to a maximum of seven devices. The StorageWorks device interface is implemented according to the SCSI-2 standard, allowing the synchronous transfer of 8-bit data at rates of up to 10 MB/s. shelf brackets Sheet metal components designed to attach and position StorageWorks shelves in their associated enclosures. signal converter A device that converts the protocol and hardware interface of one bus type into that of another without changing the functionality of the bus. See adapter. single cabinet power configuration A cabinet ac power configuration in which only one ac source and one ac power supply is used to supply dc power to the cabinet's SBB shelves. skirt A trim panel designed to mount around the base of the cabinet. Glossary-12 Small Computer System Interface See SCSI. standard disk interface See SDI. standard tape interface See STI. storage set A grouping of disk drives that make up a new distinct container. StorageWorks Digital's family of modular data storage products that allows customers to design and configure their own storage subsystems. Components include power, packaging, cabling, devices, controllers, and software. Customers can integrate devices and array controllers in StorageWorks enclosures to form storage subsystems. StorageWorks building block See SBB. stripeset In a RAID configuration, a virtual disk drive with its physical data spread across multiple physical disks. Stripeset configurations do not include a data recovery mechanism. supported device A device tested as functionally compatible with an approved StorageWorks hardware and software configuration. surviving controller The controller in a dual-redundant pair that assumes service to its companion's devices when the companion fails. See also failover. System Communication Architecture See SCA. System Communications Services See SCS. Tape Inline Exerciser See TILX. Tape Mass Storage Control Protocol See TMSCP. target A member of a SCSI bus responsible for carrying out operations requested by an initiator. The physical storage devices are targets of the HS controller. Also, the HSZ-series controller is a target of its host CPU. Glossary-13 TILX Tape Inline Exerciser. Diagnostic firmware used to test the data transfer capabilities of tape drives in a way that simulates a high level of user activity. TMSCP Tape Mass Storage Control Protocol. An applications protocol used by the HSJ- and HSD-series controllers to communicate with the host computer. TMSCP is tape-specific but overlaps and shares certain portions of MSCP. transportable A device setting that indicates the device is not MSCP compliant and does not contain metadata. Transportable devices can be moved between HS controller subsystems and non-HS controller systems. However, such devices do not support forced error, and should not be set to transportable after correct installation in an HS controller subsystem. See also nontransportable. VAXcluster System Console See VCS. VCS VAXcluster System Console. This terminal allows access to hosts (by networks). Another method of accessing the controller. See also DUP. virtual terminal A software path from an operator terminal on the host to the controller 's CLI interface. The path can be established via the host port on the controller (using DUP) or via the maintenance port through an intermediary host (VCS). A virtual terminal is also sometimes called a host console. warm swap A controller function that allows devices to be added, removed, or replaced while the subsystem remains operational. All activity on the device's SCSI bus must normally be halted for the duration of the warm swap operation. write through cache A technique for handling host write requests in read caches. When the host requests a write operation, the cache writes data directly to the external storage device and updates the cache memory to make sure that the memory does not contain obsolete data. This technique increases the chances that future host read requests can be filled from the cache. The host sees the write operation as complete only after the external storage device has been updated. Also see read cache. Glossary-14 ------------------------------------------------------------ Index 3½-Inch SBBs configurations, 3-10 restrictions, 3-9 5¼-Inch SBBs configurations, 3-13 restrictions, 3-9 A ------------------------------------------------------------ Abort codes HSJ-, HSD-series DILX, 6-29 TILX, 6-49 HSZ-series DILX, 6-65 Acceptance test, 4-10 ADD CDROM command, B-2 ADD DISK command, B-3 ADD STRIPESET command, B-5 ADD TAPE command, B-6 ADD UNIT command, B-7 Adding physical devices, 4-9, 7-12, B-78 Adding storage sets, B-78 Adding units, B-78 Allocation class, 4-5, 4-7, 7-10, 7-17, 7-46 Amber LEDs, 5-3 AUTOGEN.COM file recognized devices, 4-13 required modifications, 4-13 Availability configuration, 3-18, 3-19 B ------------------------------------------------------------ Basic function test HSJ-, HSD-series DILX, 6-7 TILX, 6-32 HSZ-series DILX, 6-51 BIST, 6-2 Bit Flags Connection State Codes 0000, C-60 0001, C-60 0002, C-60 0003, C-60 0004, C-60 Bit Flags Connection State Codes (cont'd) 0005, C-60 0006, C-60 0007, C-60 0008, C-60 0009, C-60 000A, C-60 000B, C-60 Virtual Circuit State Codes 0001, C-58 0002, C-58 0003, C-58 0004, C-58 0005, C-58 Blower, 7-34 installing, 7-36 removing, 7-35 replacing, 7-36 service of, 7-34 service precautions, 7-34 tools, 7-34 Boot See Initialization Built-in self-test See BIST Bus exchanger, 2-4 C ------------------------------------------------------------ Cabinet grounding stud, 7-3 Cabinets configurations, 3-1 Cable See also CI cable, external See also CI cable, internal See also Device port cable See also DSSI host cable See also SCSI host cable CI, 1-9, 7-23, 7-25 DSSI, 1-9, 7-27 handling guidelines, 1-8 SCSI, 1-9, 7-29 SCSI (device port), 7-31 Cache module, 2-5, 6-4, 7-19 See also Read cache DAEMON, 6-4 Index-1 Cache module (cont'd) error messages, 5-12 failover, 5-1 how to identify, 7-20 operation, 2-5 read cache, 2-5 service consideration, 5-1 service of, 7-19 size restriction, 1-3 specifications, 1-9 testing of, 6-4 upgrading, 7-20 write-through, 2-5 Certification Class A, xxi EMI, xxi Federal Republic of Germany, xxi Chunksize How to change, B-37 CI cable service precautions, 1-9 CI cable, external, 7-23 installing, 7-25 order for removal, 7-24 order for replacement, 7-25 removing, 7-23 replacing, 7-25 service of, 7-23 service precautions, 7-23 tools, 7-23 CI cable, internal, 7-25 installing, 7-26 removing, 7-26 replacing, 7-26 service of, 7-25 service precautions, 7-25 tools, 7-25 CI host interconnection supported protocols, 2-9 CI node number, 4-5, 4-6, 7-10, 7-16, 7-46 restriction, 4-12 CLEAR_ERRORS CLI command, B-11 CLI accessing, 4-2 command sets, 4-3 described, 4-2 error conventions, B-64 error messages, B-64 error messages, automatic, 5-14 error messages, interactive, 5-16 exiting, 4-3 firmware, 2-10 warning conventions, B-74 warning messages, B-74 CLI commands, B-1 Cluster size, 4-14 Codes CI Message Operation Codes 00, C-58 01, C-58 02, C-58 03, C-58 04, C-58 05, C-58 06, C-58 07, C-58 08, C-58 09, C-58 10, C-58 11, C-58 12, C-58 13, C-58 0A, C-58 0B, C-58 0C, C-58 0D, C-58 0E, C-58 0F, C-58 Controller Restart Codes 0, C-118 1, C-118 Event Codes 0007, C-89, C-91 0014, C-84, C-90 0016, C-89, C-91, C-92 0037, C-92 0077, C-92 0097, C-92 0103, C-89, C-91 002A, C-84, C-85 006A, C-81, C-82, C-83, C-84, C-86, D-3 008A, C-85 012A, C-79, C-80 016A, C-84 020A, C-80 022A, C-78, C-79 040A, C-78 01AA, C-84, C-85, C-86 000B, C-89, C-91 002B, C-89, C-91 012B, C-89, C-90, C-91 014B, C-89, C-91 01CA, C-85 00CB, C-89, C-91 00E8, C-89, C-91 01EA, C-85, C-86 03EA, C-78 00EB, C-86, C-87, C-88, C-89, C-90, C-91 Event Notification/Recovery Threshold Classification Value 01, C-119 02, C-119 64, C-119 Index-2 Codes Event Notification/Recovery Threshold Classification Value (cont'd) 0A, C-119 Firmware Component Identifier Codes 01, C-56 02, C-56 03, C-56 04, C-56 06, C-56 07, C-56 08, C-56 20, C-56 40, C-56 42, C-56 60, C-56 61, C-56 62, C-56 80, C-56 81, C-56 82, C-56 83, C-56 flashing OCP, 5-4 Format Codes 00, C-22, C-26, C-32, C-36, C-38, C-40, C-43 01, C-28, C-30, C-34 02, C-45 05, C-50 09, C-48 0A, C-54 Host Interconnect Services Status Codes 00000000, C-56 00000001, C-56 00000002, C-56 00000003, C-56 00000004, C-56 00000009, C-56 00000032, C-56 00000033, C-56 00000034, C-56 00000035, C-56 00000036, C-57 00000064, C-57, D-2 00000065, C-57, D-2 00010009, C-57, D-2 00020009, C-57 00030009, C-57, D-2 00040009, C-57, D-2 00050009, C-57 00060009, C-57 00070009, C-57, D-2 00080009, C-57 00090009, C-57 00100009, C-57, D-2 00110009, C-57 00120009, C-57, D-2 00130009, C-57 Codes Host Interconnect Services Status Codes (cont'd) 00140009, C-57 00150009, C-57 00160009, C-57 00170009, C-57 00180009, C-57 00190009, C-57 000A0009, C-57, D-2 001A0009, C-57 000B0009, C-57, D-2 001B0009, C-57 000C0009, C-57 001C0009, C-57 000D0009, C-57 001D0009, C-57, D-2 000E0009, C-57, D-2 001E0009, C-57 000F0009, C-57 001F0009, C-57, D-3 HSJ30/40 Controller Vendor Specific SCSI ASC/ASCQ Codes 80 03, C-77 80 06, C-77 80 07, C-77 82 01, C-77 84 04, C-77 85 05, C-77 89 00, C-77 93 00, C-77 8A 00, C-77 A0 00, C-77 A0 01, C-77 A0 02, C-77 A0 03, C-77 A0 04, C-77 A0 05, C-77 A1 00, C-77 A1 01, C-77 A1 02, C-77 A1 03, C-77 B0 00, C-77 B0 01, C-77 8C 04, C-77 D0 01, C-77 D0 02, C-77 D0 03, C-77 D1 00, C-77 D1 02, C-77 D1 03, C-77 D1 04, C-77 D1 05, C-78 D1 07, C-78 D1 08, C-78 D1 09, C-78 D1 0A, C-78 D1 0B, C-78 D2 00, C-78 Index-3 Codes HSJ30/40 Controller Vendor Specific SCSI ASC/ASCQ Codes (cont'd) D3 00, C-78 D4 00, C-78 D5 02, C-78 D7 00, C-78 8F 00, C-77 3F 85, C-77 3F 87, C-77 3F 88, C-77 3F 90, C-77 3F C0, C-77 3F C2, C-77 3F D2, C-77 Instance Codes 01010302, C-78 01032002, C-79 02020064, C-90 02032001, C-79 02042001, C-79 02072201, C-80 02082201, C-80 02090064, C-89 02110064, C-90 03010101, C-84 03022002, C-84 03034002, C-84 03044402, C-84 03052002, C-85 03062002, C-85 03070101, C-85 03080101, C-85 03094002, C-89 03104002, C-89 03134002, C-89 03144002, C-89 03154002, C-89 03164002, C-89 03170064, C-89 03180064, C-89 03194002, C-89 03204002, C-90 03214002, C-90 03224002, C-90 03234002, C-90 03244002, C-90 03254002, C-90 03270101, C-86 03644002, C-91 03674002, C-91 03694002, C-91 03704002, C-91 03714002, C-91 03720064, C-91 03730064, C-91 03744002, C-91 03754002, C-91 Codes Instance Codes (cont'd) 03760101, C-91 03774002, C-91 03784002, C-91 03794002, C-91 03804002, C-91 03820101, C-84 03832002, C-84 03844002, C-84 03854402, C-85 03862002, C-85 03872002, C-85 03880101, C-85 03890101, C-86 03964002, C-92 03994002, C-92 07050064, C-78 40016001, C-81 40026001, C-81 40440064, C-81 82012002, C-80 82022202, C-80 82032202, C-80 82042002, C-80 82052002, C-80 82062002, C-80 82072002, C-80 82082002, C-80 0102030A, C-78 0311430A, C-89 0312430A, C-89 0326450A, C-90 0328450A, C-89 0368000A, C-91 0381450A, C-91 4003640A, C-81 4004020A, C-81 4007640A, C-81, D-3 4009640A, C-81 4015020A, C-82 4029010A, C-82 4051020A, C-82 4052020A, C-82 4053020A, C-82 4054020A, C-82 4055020A, C-82 4056020A, C-82 4057020A, C-82 4058020A, C-82 4059020A, C-82 4060020A, C-83 4061020A, C-83 4062020A, C-83 4063020A, C-83 4064020A, C-83 4065020A, C-83 4066020A, C-83 Index-4 Codes Instance Codes (cont'd) 4067020A, C-83 039A000A, C-92 020A0064, C-91 021A0064, C-84 038A0101, C-86 402A010A, C-82 405A020A, C-82 03A04002, C-92 03A14002, C-92 03A24002, C-92 03A34002, C-92 031A4002, C-89 036A4002, C-91 037A4002, C-91 03A40064, C-92 03A50064, C-92 03A64002, C-92 400A640A, C-81 03A74002, C-92 03A80101, C-92 03A94002, C-92 03AA4002, C-92 03AB4002, C-92 03AC4002, C-92 03AD4002, C-92 03AE4002, C-92 03AF4002, C-92 021B0064, C-84 031B0101, C-89 402B010A, C-82 405B020A, C-82 03B04002, C-92 07030B0A, C-78 07040B0A, C-78 07080B0A, C-79 03B14002, C-92 03B24002, C-92 03B3450A, C-92 036B4002, C-91 037B4002, C-91 039B4002, C-92 03B40101, C-84 038B450A, C-91 03B52002, C-84 03B64002, C-84 400B640A, C-81 03B74402, C-85 03B82002, C-85 03B92002, C-85 03BA0101, C-85 03BB0101, C-86 03BC0101, C-86 03BD450A, C-92 07060C01, C-79 07070C01, C-79 402C010A, C-82 Codes Instance Codes (cont'd) 405C020A, C-82 020C2201, C-80 030C4002, C-89 031C4002, C-89 037C4002, C-91 039C4002, C-92 036C430A, C-91 400C640A, C-81 03C80101, C-84 03C92002, C-84 03CA4002, C-84 03CB0101, C-86 03CC0101, C-86 03CD2002, C-85 03CE2002, C-85 03CF0101, C-85 030D000A, C-89 403D020A, C-81 405D020A, C-83 03D04002, C-86 03D14002, C-86 03D24402, C-85 03D3450A, C-87 031D4002, C-89 037D4002, C-91 039D4002, C-92 036D430A, C-91 03D4450A, C-87 03D5450A, C-87 400D640A, C-81 03D6450A, C-87 03D7450A, C-87 03D8450A, C-88 03D9450A, C-88 03DA450A, C-88 03DB450A, C-88 03DC450A, C-88 03DD450A, C-88 03DE450A, C-88 03DF450A, C-88 405E020A, C-83 03E0450A, C-88 03E1450A, C-88 03E2450A, C-89 030E4002, C-89 031E4002, C-89 036E4002, C-91 037E4002, C-91 039E430A, C-92 400E640A, C-81 03F00402, C-86 405F020A, C-83 03F10502, C-86 03F20064, C-87 03F30064, C-87 030F4002, C-89 Index-5 Codes Instance Codes (cont'd) 031F4002, C-90 036F4002, C-91 037F4002, C-91 039F430A, C-92 Last Failure Codes firmware 01000100, C-93 01010100, C-93 01020100, C-93 01030100, C-93 01040100, C-93 01050104, C-93 01060100, C-93 01070100, C-93 01082004, C-93 02000100, C-97 02010100, C-97 02040100, C-97 02050100, C-97 02080100, C-97 02090100, C-97 02100100, C-97 02170100, C-97 02180100, C-97 02210100, C-97 02220100, C-97 02270104, C-97 02360101, C-98 02370102, C-98 02440100, C-98 02530102, C-98 02560102, C-99 02570102, C-99 02620102, C-99 02690102, C-99 02720100, C-99 02730100, C-99 02790102, C-99 02800100, C-100 02820100, C-100 02830100, C-100 02840100, C-100 02850100, C-100 02860100, C-100 02880100, C-100 02890100, C-100 02900100, C-100 02910100, C-100 02920100, C-100 02950100, C-100 02960100, C-100 02970100, C-100 03020101, C-101 03030101, C-101 03040101, C-101 03050101, C-101 Codes Last Failure Codes firmware (cont'd) 03060101, C-101 03070101, C-101 03080101, C-101 03150100, C-101 03280100, C-101 03290100, C-101 03320101, C-102 03370108, C-103 03390108, C-104 03410101, C-105 03470100, C-106 03480100, C-106 03490100, C-106 04010101, C-107 04020102, C-107 04030102, C-107 04040103, C-107 04050100, C-107 04060100, C-107 04070103, C-107 04080102, C-107 04090100, C-107 06010100, C-108 06020100, C-108 06030100, C-108 07010100, C-108 07020100, C-108 07030100, C-108 07040100, C-108 07050100, C-108 07060100, C-108 08010101, C-109 08020100, C-109 08030101, C-109 08040101, C-109 08050100, C-109 08060100, C-109 08070100, C-109 08080000, C-109 08090010, C-109 08100101, C-109 08110101, C-109 08120100, C-109 08130100, C-109 08140100, C-109 08150100, C-109 08160100, C-109 08170100, C-109 08180100, C-110 08190100, C-110 20010100, C-110 20020100, C-110 20030100, C-110 20070100, C-110 20080000, C-110 Index-6 Codes Last Failure Codes firmware (cont'd) 20090010, C-110 40000101, C-111, D-3 40150100, C-111 40280100, C-111 40290100, C-111 40300100, C-111 40510100, C-111 40520100, C-111 40530100, C-111 40560100, C-111 40900100, C-111 40930100, C-111 40950100, C-111 40960100, C-111 40970100, C-111 40980100, C-111 42000100, C-112 42020100, C-112 42030100, C-112 42060100, C-112 42340100, C-112 42350100, D-3 42640100, D-3 42680102, D-3 42690101, D-3 42742001, D-3 42752002, D-3 42760102, D-3 42770102, D-4 60000100, C-113 60010100, C-113 60030100, C-113 60040100, C-113 60050100, C-113 60060100, C-113 60070100, C-113 60080100, C-113 60090100, C-113 60100100, C-113 60110100, C-113 60120100, C-113 60130100, C-113 60140100, C-113 60150100, C-113 60160100, C-113 60170100, C-113 60180100, C-113 60190100, C-113 60250100, C-114 60260100, C-114 60270100, C-114 60280100, C-114 60290100, C-114 60400100, C-114 60410100, C-114 Codes Last Failure Codes firmware (cont'd) 60420100, C-114 60430100, C-114 60440100, C-114 60450100, C-114 60460100, C-115 60480100, C-115 60490100, C-115 60500100, C-115 60550100, C-115 60560100, C-115 60570100, C-115 60580100, C-115 60610100, C-115 60620100, C-115 60640100, C-115 60650100, C-115 60660100, C-115 60670100, C-115 60680100, C-115 61020100, C-116 61090100, C-116 62000100, C-116 62010100, C-116 62020100, C-116 62030100, C-116 80010100, C-116 80020100, C-116 80030100, C-116 80040100, C-116 80050100, C-116 80060100, C-116 80070100, C-116 80080100, C-116 80090100, C-116 80100100, C-117 80120100, C-117 80130100, C-117 80140100, C-117 81010100, C-117 81020100, C-117 81030100, C-117 81040100, C-117 81050100, C-117 81060100, C-117 81070100, C-117 81080100, C-117 81090100, C-117 81100100, C-117 81110100, C-118 81120100, C-118 81130100, C-118 81140100, C-118 83010100, C-118 83020100, C-118 83030100, C-118 Index-7 Codes Last Failure Codes firmware (cont'd) 080A0000, C-109 200A0000, C-110 020A0100, C-97 028A0100, C-100 030A0100, C-101 032A0100, C-101 081A0100, C-110 402A0100, C-111 601A0100, C-113 602A0100, C-114 604A0100, C-115 800A0100, C-117 810A0100, C-117 811A0100, C-118 025A0102, C-99 424B0001, C-112 020B0100, C-97 021B0100, C-97 028B0100, C-100 029B0100, C-100 032B0100, C-101 080B0100, C-109 200B0100, C-110 402B0100, C-111 407B0100, C-111 420B0100, C-112 600B0100, C-113 601B0100, C-113 602B0100, C-114 603B0100, C-114 604B0100, C-115 606B0100, C-115 800B0100, C-117 810B0100, C-117 811B0100, C-118 081B0101, C-110 426B0101, D-3 025B0102, C-99 027B0102, C-100 40B40101, C-111 424C0001, C-112 020C0100, C-97 021C0100, C-97 028C0100, C-100 029C0100, C-100 080C0100, C-109 200C0100, C-110 402C0100, C-111 407C0100, C-111 409C0100, C-111 420C0100, D-3 600C0100, C-113 601C0100, C-114 602C0100, C-114 603C0100, C-114 Codes Last Failure Codes firmware (cont'd) 606C0100, C-115 610C0100, C-116 800C0100, C-117 810C0100, C-117 811C0100, C-118 033C0101, C-104 025C0102, C-99 021D0100, C-97 027D0100, C-100 028D0100, C-100 040D0100, C-108 080D0100, C-109 409D0100, C-111 420D0100, D-3 600D0100, C-113 601D0100, C-114 602D0100, C-114 604D0100, C-115 605D0100, C-115 606D0100, C-116 800D0100, C-117 810D0100, C-117 200D0101, C-110 402D0101, C-111 020E0100, C-97 021E0100, C-97 027E0100, C-100 028E0100, C-100 031E0100, C-101 408E0100, C-111 600E0100, C-113 601E0100, C-114 602E0100, C-114 605E0100, C-115 606E0100, C-116 800E0100, C-117 810E0100, C-117 080E0101, C-109 402E0101, C-111 022E0102, C-98 033E0108, C-105 020F0100, C-97 021F0100, C-97 027F0100, C-100 028F0100, C-100 031F0100, C-101 402F0100, C-111 408F0100, C-111 600F0100, C-113 601F0100, C-114 603F0100, C-114 604F0100, C-115 605F0100, C-115 810F0100, C-117 080F0101, C-109 Index-8 Codes Last Failure Codes firmware (cont'd) 040F0102, C-108 033F0108, C-105 hardware 01800080, C-93 01812088, C-94 01822288, C-94 01832288, C-95 01842288, C-95 01852288, C-96 01860080, C-96 01870080, C-96 01880080, C-96 01890080, C-96 02392084, C-98 03330188, C-102 03350188, C-102 03360188, C-103 03380188, C-104 03420188, C-106 42332080, C-112 42382080, C-112 42392080, C-112 42442080, C-112 42452080, C-112 42472080, C-112 42482080, C-112 018A0080, C-96 034A2080, C-106 423A2080, C-112 023A2084, C-98 030B0188, C-101 423B2080, C-112 423C2080, C-112 423D2080, C-112 424D2080, C-112 423E2080, C-112 424E2080, C-112 423F2080, C-112 Port/Port Driver Message Operation Codes 0000, C-59 0001, C-59 0002, C-59 0003, C-59 0004, C-59 0005, C-59 0006, C-59 Recommended Repair Action Codes 00, C-120 01, C-120 02, C-120 03, C-120 04, C-120 05, C-120 06, C-120 07, C-120 Codes Recommended Repair Action Codes (cont'd) 08, C-120 09, C-120 20, C-121 22, C-121 40, C-121 41, C-121 43, C-121 44, C-121 45, C-121 60, C-121 61, C-121 63, C-121, D-4 0A, C-120 0B, C-120 0C, C-121 SCSI ASC/ASCQ Codes 00 00, C-65, C-68, C-72, C-75 00 01, C-68 00 02, C-68 00 03, C-68 00 04, C-68 00 05, C-68 00 06, C-65, C-68, C-72, C-75 00 11, C-72 00 12, C-72 00 13, C-72 00 14, C-72 00 15, C-72 01 00, C-65 02 00, C-65, C-72, C-75 03 00, C-65, C-68 03 01, C-68 03 02, C-68 04 00, C-65, C-68, C-72, C-75 04 01, C-65, C-68, C-72, C-75 04 02, C-65, C-69, C-72, C-75 04 03, C-65, C-69, C-72, C-75 04 04, C-65, C-69 06 00, C-65, C-72, C-75 07 00, C-65, C-69, C-72, C-75 08 00, C-65, C-69, C-72, C-75 08 01, C-65, C-69, C-72, C-75 08 02, C-65, C-69, C-72, C-75 09 00, C-65, C-69, C-72 09 01, C-72 09 02, C-72 09 03, C-72 10 00, C-65 11 00, C-65, C-69, C-72 11 01, C-65, C-69 11 02, C-65, C-69 11 03, C-65, C-69 11 04, C-65 11 05, C-72 11 06, C-72 11 08, C-69 Index-9 Codes SCSI ASC/ASCQ Codes (cont'd) 11 09, C-69 12 00, C-65 13 00, C-65 14 00, C-65, C-69, C-72 14 01, C-65, C-69, C-72 14 02, C-69 14 03, C-69 14 04, C-69 15 00, C-65, C-69, C-72, C-75 15 01, C-65, C-69, C-72, C-75 15 02, C-65, C-69, C-72 16 00, C-65 17 00, C-65, C-69, C-72 17 01, C-66, C-69, C-72 17 02, C-66, C-69, C-72 17 03, C-66, C-69, C-72 17 04, C-72 17 05, C-66, C-72 17 06, C-66 17 07, C-66 17 08, C-66 18 00, C-66, C-69, C-72 18 01, C-66, C-73 18 02, C-66, C-73 18 03, C-73 18 04, C-73 18 05, C-66, C-73 18 06, C-66, C-73 19 00, C-66 19 01, C-66 19 02, C-66 19 03, C-66 20 00, C-66, C-69, C-73, C-75 21 00, C-66, C-69, C-73, C-75 21 01, C-75 22 00, C-66 24 00, C-66, C-69, C-73, C-75 25 00, C-66, C-69, C-73, C-75 26 00, C-66, C-70, C-73, C-75 26 01, C-66, C-70, C-73, C-75 26 02, C-66, C-70, C-73, C-75 26 03, C-66, C-70, C-73, C-75 27 00, C-66, C-70 28 00, C-66, C-70, C-73, C-75 28 01, C-75 29 00, C-66, C-70, C-73, C-75 29 01, C-66, C-70, C-73, C-75 29 02, C-67, C-70, C-73, C-75 29 03, C-67, C-70, C-73, C-75 30 00, C-67, C-70, C-73, C-76 30 01, C-67, C-70, C-73 30 02, C-67, C-70, C-73 30 03, C-67, C-70 31 00, C-67, C-70 31 01, C-67 32 00, C-67 Codes SCSI ASC/ASCQ Codes (cont'd) 32 01, C-67 33 00, C-70 37 00, C-67, C-70, C-73, C-76 39 00, C-67, C-70, C-73, C-76 40 00, C-67 41 00, C-67 42 00, C-67 43 00, C-67, C-71, C-74, C-76 44 00, C-67, C-71, C-74, C-76 45 00, C-67, C-71, C-74, C-76 46 00, C-67, C-71, C-74, C-76 47 00, C-67, C-71, C-74, C-76 48 00, C-67, C-71, C-74, C-76 49 00, C-67, C-71, C-74, C-76 50 00, C-71 50 01, C-71 50 02, C-71 51 00, C-71 52 00, C-71 53 00, C-68, C-71, C-74, C-76 53 01, C-71 53 02, C-68, C-71, C-74, C-76 57 00, C-74 63 00, C-74 64 00, C-74 11 0A, C-65, C-69 0A 00, C-65, C-69, C-72, C-75 1A 00, C-66, C-69, C-73, C-75 2A 00, C-67, C-70, C-73, C-75 3A 00, C-67, C-70, C-73, C-76 4A 00, C-67, C-71, C-74, C-76 5A 00, C-68, C-71, C-74, C-76 2A 01, C-67, C-70, C-73, C-75 5A 01, C-68, C-71, C-74, C-76 2A 02, C-67, C-70, C-73, C-75 5A 02, C-68, C-71 5A 03, C-68, C-71 11 0B, C-65 1B 00, C-66, C-69, C-73, C-75 2B 00, C-67, C-70, C-73 3B 00, C-70 4B 00, C-68, C-71, C-74, C-76 5B 00, C-68, C-71, C-74, C-76 3B 01, C-70 5B 01, C-68, C-71, C-74, C-76 3B 02, C-70 5B 02, C-68, C-71, C-74, C-76 5B 03, C-68, C-71, C-74, C-76 3B 08, C-70 3B 0D, C-76 3B 0E, C-76 11 0C, C-65 0C 00, C-69 1C 00, C-66 2C 00, C-67, C-70, C-73, C-75 4C 00, C-68, C-71, C-74, C-76 Index-10 Codes SCSI ASC/ASCQ Codes (cont'd) 5C 00, C-68 0C 01, C-65 1C 01, C-66 5C 01, C-68 0C 02, C-65 1C 02, C-66 5C 02, C-68 1D 00, C-66 2D 00, C-70 3D 00, C-67, C-70, C-73, C-76 1E 00, C-66 3E 00, C-67, C-70, C-73, C-76 4E 00, C-68, C-71, C-74, C-76 2F 00, C-67, C-70, C-73, C-75 3F 00, C-67, C-70, C-73, C-76 3F 01, C-67, C-70, C-73, C-76 3F 02, C-67, C-70, C-74, C-76 3F 03, C-67, C-70, C-74, C-76 40 nn, C-68, C-71, C-74, C-76 SCSI Buffered Modes Codes 0, C-63 1, C-63 2, C-63 3, C-63 4, C-63 5, C-63 6, C-63 7, C-63 SCSI Command Operation Codes 00, C-61 01, C-61 03, C-61 04, C-61 05, C-61 07, C-61 08, C-61 10, C-61 11, C-61 12, C-61 13, C-61 14, C-61 15, C-61 16, C-61 17, C-61 18, C-61 19, C-61 25, C-61 28, C-61 30, C-61 31, C-62 32, C-62 33, C-62 34, C-62 35, C-62 36, C-62 37, C-62 Codes SCSI Command Operation Codes (cont'd) 39, C-62 40, C-62 41, C-62 42, C-62 43, C-62 44, C-62 45, C-62 47, C-62 48, C-62 49, C-62 55, C-62 0A, C-61 1A, C-61 2A, C-61 3A, C-62 5A, C-62 A5, C-62 A6, C-62 A8, C-62 A9, C-62 AF, C-62 0B, C-61 1B, C-61 2B, C-61 3B, C-62 4B, C-62 B0, C-62 B1, C-62 B2, C-62 B3, C-63 B5, C-63 B6, C-63 B8, C-63 1C, C-61 3C, C-62 4C, C-62 1D, C-61 4D, C-62 1E, C-61 2E, C-61 3E, C-62 0F, C-61 2F, C-61 3F, C-62 SCSI Device Type Codes 00, C-60 01, C-60 05, C-60 08, C-60 SCSI Sense Key Codes 0, C-64 1, C-64 2, C-64 3, C-64 4, C-64 5, C-64 Index-11 Codes SCSI Sense Key Codes (cont'd) 6, C-64 7, C-64 8, C-64 9, C-64 A, C-64 B, C-64 C, C-64 D, C-64 E, C-64 F, C-64 solid OCP, 5-4 System Communication Services Message Operation Codes 0000, C-59 0001, C-59 0002, C-59 0003, C-59 0004, C-59 0005, C-59 0006, C-59 0007, C-59 0008, C-59 0009, C-59 000A, C-59 000B, C-59 Template Codes 01, C-22 05, C-25 11, C-27 12, C-29, C-55 13, C-31 14, C-34 31, C-36 32, C-38 33, C-40 41, C-43 51, C-45 57, C-47 61, C-50 71, C-52 Cold swap power supply, 7-36 Command line interpreter See CLI Commands ADD CDROM, B-2 ADD DISK, B-3 ADD STRIPESET, B-5 ADD TAPE, B-6 ADD UNIT, B-7 CLEAR_ERRORS CLI, B-11 DELETE container-name, B-12 DELETE unit-number, B-13 DIRECTORY, B-14 EXIT, B-15 HELP, B-16 Commands (cont'd) INITIALIZE, B-17 LOCATE, B-18 LOCATE CANCEL, B-18 LOCATE DISKS, B-18 LOCATE entity, B-19 LOCATE PTL SCSI-location, B-18 LOCATE TAPES, B-18 LOCATE UNITS, B-18 RENAME, B-20 RESTART OTHER_CONTROLLER, B-21 RESTART THIS_CONTROLLER, B-23 RUN, B-25 SELFTEST OTHER_CONTROLLER, B-26 SELFTEST THIS_CONTROLLER, B-28 SET disk-container-name, B-30 SET FAILOVER, B-31 SET NOFAILOVER, B-33 SET OTHER_CONTROLLER, B-34 SET stripeset-container-name, B-37 SET THIS_CONTROLLER, B-38 SET unit-number, B-41 SHOW cdrom-container-name, B-45 SHOW CDROMS, B-44 SHOW DEVICES, B-46 SHOW disk-container-name, B-48 SHOW DISKS, B-47 SHOW OTHER_CONTROLLER, B-49 SHOW STORAGESETS, B-51 SHOW stripeset-container-name, B-53 SHOW STRIPESETS, B-52 SHOW tape-container-name, B-55 SHOW TAPES, B-54 SHOW THIS_CONTROLLER, B-56 SHOW unit-number, B-59 SHOW UNITS, B-58 SHUTDOWN OTHER_CONTROLLER, B-60 SHUTDOWN THIS_CONTROLLER, B-62 CONFIG command, 2-10, 6-98 CONFIG utility, 6-98 Configuration 3½-inch SBB restrictions, 3-9 5¼-inch SBB restrictions, 3-9 3½-inch SBBs, 3-10 5¼-inch SBBs, 3-13 atypical, 3-14 available, 1-1 cabinets, 3-1 combination, 3-1 CONFIGURATION.INFO file, 4-3 designation, 3-10 devices, 3-9 dual-redundant, 1-1, 3-16, 4-6, 7-16, 7-46 restrictions, 1-3 highest availability, 3-19 highest performance, 3-17 mismatch, 5-4 mixing disk and tape, 3-9 Index-12 Configuration (cont'd) mixing SBB sizes, 3-14 nonredundant, 1-1, 4-4, 7-9 nonredundant controller, 3-15 optimal availability, 3-18 optimal performance, 3-16 ordering, 3-1 predefined, 3-1 shelf, 3-8 small shelf count, 3-14 starter subsystem, 3-1 SW500-series cabinets, 3-6 SW800-series cabinets, 3-2 Configured-to-order See CTO Containers initializing, B-78 Controller ID, 4-5, 4-6, 7-10, 7-16, 7-46 Controller module failures, 7-2 shutting down, 7-2 warm swap, 7-2 Controller storage explained, 2-13 Controller warm swap, 7-42 controller removal, 7-42 controller replacement, 7-44 precautions, 7-42 tools, 7-42 Core functions, firmware, 2-9 Core MIST, 6-2 hardware tests, 6-2 IBR, 6-2 program card validation, 6-2 CTO, 3-1 C_SWAP command, 2-10, 7-42 D ------------------------------------------------------------ DAEMON, 6-3, 6-4 manually running, 6-4 manually stopping, 6-5 Data test patterns HSJ-, HSD-series DILX, 6-21 TILX, 6-44 HSZ-series DILX, 6-62 DDL, 2-6 DEC OSF/1 AXP initialization disk, 4-12 support, 4-11 Defaults HSJ-, HSD-series DILX, 6-9 HSZ-series DILX, 6-54 Deferred error display HSZ-series DILX, 6-62 DELETE container-name command, B-12 DELETE unit-number command, B-13 Device LEDs, 5-8 SBB active LED, 5-8 SBB fault LED, 5-8 storage SBB faults, 5-8 Device port cable, 7-31 installing, 7-33 removing, 7-32 replacing, 7-33 service of, 7-31 service precautions, 7-31 tools, 7-31 Device ports, 2-5 running on fewer, 6-3 testing, 6-3 Device services firmware, 2-11 Device shelf status power supply faults, 5-9 power supply LEDs, 5-9 shelf faults, 5-9 single power supply power supply faults, 5-10 shelf faults, 5-10 Device warm swap, 7-38 device removal, 7-39 device replacement, 7-40 precautions, 7-39 tools, 7-38 Devices adding, 4-9, 7-12, B-78 configurations, 3-9 configuring, automatic, 6-98 initializing, 4-17, 4-18, B-78 moving between controllers, 4-17 nontransportable, 4-17 transportable, 4-18 Diagnostic and execution monitor See DAEMON Diagnostic registers, 2-2 Diagnostic utility protocol See DUP Diagnostics, 4-1, 6-1 DILX, 1-5, 2-10 HSJ-, HSD-series abort codes, 6-29 basic function test, 6-7 configuring all units, 6-25 data test patterns, 6-21 defaults, 6-9 defined, 6-5 end message display, 6-18 error codes, 6-30 Index-13 DILX HSJ-, HSD-series (cont'd) error information packets, 6-18 examples, 6-22 interrupting, 6-6 output messages, 6-14 performance summary, 6-27 running from maintenance terminal, 6-6 running from VCS, 6-6 running from virtual terminal, 6-6 test definition questions, 6-8 tests available, 6-7 user-defined test, 6-8 using all defaults, 6-22 using all functions, 6-23 HSZ-series abort codes, 6-65 basic function test, 6-51 data test patterns, 6-62 defaults, 6-54 deferred error display, 6-62 defined, 6-50 error codes, 6-65 interrupting, 6-51 output messages, 6-58 performance summary, 6-63 running from maintenance terminal, 6-51 sense data display, 6-61 test definition questions, 6-53 tests available, 6-51 user-defined test, 6-52 DIRECTORY command, B-14 Disk in-line exerciser See DILX DRAB See Shared memory DRAM See Shared memory DSSI cable service precautions, 1-9 DSSI host cable, 3-19, 7-27 installing, 7-29 length, 3-19 removing, 7-28 replacing, 7-29 service of, 7-27 service precautions, 7-27 tools, 7-27 DSSI host interconnection supported protocols, 2-9 DSSI node number, 4-5, 4-6, 7-10, 7-16 DSSI trilink installing, 7-29 removing, 7-28 replacing, 7-29 Dual controller port, 2-4 Dual data link See DDL Dual-redundant controller and downtime, 5-1 configuration, 3-16 failover, 2-4, 2-12, 5-1, 7-3, 7-42 initialization, 4-1 installing one of, 7-15 on separate hosts, 4-8, 7-17, 7-47 removal of one, 7-13 replacing one of, 7-15 restoring parameters for one, 7-16 service consideration, 5-1 service precautions, 7-13 servicing both of, 7-18 servicing one of, 7-13 tools, 7-13 DUP, 2-10 E ------------------------------------------------------------ EDC, 6-2, 6-3 EIA-423 terminal port, 2-3 Electrostatic discharge See ESD End message display HSJ-, HSD-series DILX, 6-18 TILX, 6-42 Environmental specifications, 1-10 ERF invoking, 5-16 Error codes HSJ-, HSD-series DILX, 6-30 TILX, 6-50 HSZ-series DILX, 6-65 Error detection code See EDC Error information packets HSJ-, HSD-series DILX, 6-18 TILX, 6-42 Error logging, 1-5, 5-16 and controller model, 5-16 and ERF, 5-16 and uerf, 5-16 firmware, 2-10 translations, 5-16 Error messages, 5-11 automatic, 5-11 cache module, 5-12 CLI, automatic, 5-14 CLI, interactive, 5-16 during failover, 5-15 Index-14 Error messages (cont'd) from diagnostics, 5-12 NVPM, 5-12 shelf, 5-15 Errorlog Report Formatter See ERF ESD See also Precautions danger, 1-6 grounding, 1-6 guidelines, 1-6 module guidelines, 1-6 subsystem placement, 1-6 subsystem room, 1-6 Examples HSJ-, HSD-series DILX, 6-22 TILX, 6-45 EXEC, 6-3 Executive functions, firmware, 2-9 Exercisers See DILX, 6-5 See TILX, 6-5 EXIT command, B-15 F ------------------------------------------------------------ Failover, 2-4, 4-15 and SHUTDOWN, 7-3 copying configuration, 4-7 correcting mismatch, 4-17 error messages, 5-15 exiting, 4-16 firmware, 2-12 initialization, 4-17 of cache, 5-1 reviving failed controller, 4-16 setup for, 4-16 setup mismatch, 4-17 shared commands, 4-15 testing for, 4-17 time required for, 4-16 warm swap, 7-42 Fault management firmware, 2-10 Features summary, 1-3 Field replaceable unit See FRU Field replaceable units, 1-4 Firmware when downloaded, 6-3 Firmware executive See EXEC Firmware, HS controller CLI, 2-10 core functions, 2-9 description, 2-8 Firmware, HS controller (cont'd) device services, 2-11 DUP, 2-10 error logging, 2-10 executive functions, 2-9 failover, 2-12 fault management, 2-10 host protocol, 2-9 HSZUTIL, 2-10, 4-11 local programs, 2-10 operator interface, 2-9 program card, 1-1 read cache, 2-12 self-test, 2-9 upgrading, 1-1 value-added, 2-12 version restriction, 1-3 Flashing codes, OCP, 5-4 FRU controller, A-1 related, A-3 G ------------------------------------------------------------ Green LED, 4-1, 5-3, 6-1, 7-2 H ------------------------------------------------------------ Hardware, HS controller bus exchanger, 2-4 cache module, 2-5 description, 2-1 device ports, 2-5 diagnostic registers, 2-2 dual controller port, 2-4 host interface, 2-5 I/D cache, 2-2 Intel 80960 chip, 2-1 maintenance terminal, 2-3 NVMEM, 2-4 OCP, 2-2, 5-2 policy processor, 2-1 program card, 2-2 shared memory, 2-4 HBVS, 2-12 HELP command, B-16 High-availability See Configuration, dual-redundant Host adapters HSD-series controllers, 3-20 HSJ-series controllers, 3-20 HSZ-series controllers, 3-20 Quiet slot time, 3-19 Host interface, 2-5 HSD-series to DSSI, 2-6, 3-19, 7-27 HSJ-series to CI, 2-5, 7-23, 7-25 HSZ-series to SCSI, 2-7, 3-19, 7-29 testing, 6-3 Index-15 Host port path, 4-6, 4-8, 7-11, 7-17, 7-47 Host protocol firmware, 2-9 Host storage explained, 2-13 Host storage, HSZ-series explained, 2-15 Host-based volume shadowing See HBVS Hot swap power supply, 7-36 HS controller models and error logging, 5-16 host protocol, 2-9 HS operating firmware See Firmware HSD30 specifications, 1-9 HSJ30 specifications, 1-9 HSJ40 specifications, 1-9 HSZ40 specifications, 1-9 HSZUTIL, 2-10, 4-11, 6-100 I ------------------------------------------------------------ I/D cache, 2-2, 6-3, 6-4 IBR, 6-2 Initial boot record See IBR Initialization BIST, 6-2 causes of, 4-1, 6-1 command, 4-9, 7-12 containers, B-78 described, 6-1 device port, 6-3 dual-redundant controller, 4-1, 4-17 failover, 4-17 host port, 6-3 nontransportable devices, 4-17 subsystem, 4-2 tests performed, 6-1 time required, 6-1 transportable devices, 4-18 Initialization disk, operating system, 4-12 INITIALIZE command, B-17 Installation blower, 7-36 CI cable, external, 7-25 CI cable, internal, 7-26 device port cable, 7-33 DSSI host cable, 7-29 DSSI trilink, 7-29 nonredundant controller, 7-7 Installation (cont'd) one dual-redundant controller, 7-15 power supply, 7-38 program card, 7-22 read cache, 7-19 SCSI cable (device port), 7-33 SCSI host cable, 7-31 SCSI trilink, 7-31 Instruction/Data cache See I/D cache Intel 80960CA chip, 2-1, 6-2 L ------------------------------------------------------------ Lamp test, B-18 Local programs, 2-10 LOCATE CANCEL command, B-18 LOCATE command, B-18 LOCATE DISKS command, B-18 LOCATE entity, B-19 LOCATE PTL SCSI-location command, B-18 LOCATE TAPES command, B-18 LOCATE UNITS command, B-18 Logical Unit Number See LUN Logical units adding, B-78 Low-availability See Configuration, nonredundant LUN controller perspecctive, 2-13 host perspective, HSZ-series, 2-16 M ------------------------------------------------------------ Maintenance strategy, 1-4 Maintenance terminal, 1-5, 2-3 Mirroring See HBVS MIST, 6-2, 6-3 See also Core MIST See also DAEMON Mixing disk and tape, 3-9 Mixing SBB sizes, 3-14 MMJ, 2-3 Modified modular jack See MMJ Module handling guidelines, 1-6 Module integrity self-test See MIST Modules, 1-1 Moving devices between controllers, 4-17 MSCP, 4-5, 4-7, 7-10, 7-17, 7-46 MSCP timeout, 4-14 Index-16 N ------------------------------------------------------------ Nonredundant controller and downtime, 5-1 configuration, 3-15 installing, 7-7 removal, 7-4 replacing, 7-7 restoring parameters, 7-9 service consideration, 5-1 service of, 7-3 service precautions, 7-3 shelf rails, 7-7 tools, 7-3 Nontransportable devices, 4-17 Nonvolatile memory See NVMEM Nonvolatile Parameters in Memory See NVPM NOTRANSPORTABLE qualifier, 4-9, 7-12 NVMEM, 2-4 NVPM, 5-12 error messages, 5-12 O ------------------------------------------------------------ OCP, 1-5, 2-2, 4-2, 5-2 amber LEDs, 5-3 codes, 5-4 fault notification, 5-4, 6-2, 6-3 flashing codes, 5-4 green LED, 5-3 normal operation, 5-3 reset button, 5-3 solid codes, 5-4 OpenVMS AUTOGEN.COM file, 4-13 cluster size, 4-14 initialization disk, 4-12 MSCP timeout, 4-14 polling parameters, 4-15 shadow member timeout, 4-15 shadow sets, 4-15 storage set size, 4-14 support, 4-11 TMSCP timeout, 4-14 write history log, 4-14 Operating system initialization disk, 4-12 support, 4-11 Operator control panel See OCP Operator interface firmware, 2-9 maintenance terminal, 2-3 virtual terminal, 2-3 OSF/1 initialization disk, 4-12 support, 4-11 Output messages HSJ-, HSD-series DILX, 6-14 TILX, 6-37 HSZ-series DILX, 6-58 P ------------------------------------------------------------ Parameters initial, 4-4, 4-6, 7-9, 7-16, 7-46 Path, host port, 4-6, 4-8, 7-11, 7-17, 7-47 PCMCIA, 1-1 Performance configuration, 3-16, 3-17 Performance summary HSJ-, HSD-series DILX, 6-27 TILX, 6-48 HSZ-series DILX, 6-63 Personal Computer Memory Card Industry Association See PCMCIA Policy processor, 2-1, 6-2 Polling parameters, 4-15 Port Target LUN See PTL Power supply, 7-36 cold swap, 7-36 hot swap, 7-36 installing, 7-38 removing, 7-37 replacing, 7-38 service of, 7-36 service precautions, 7-37 tools, 7-37 Precautions, 1-6 cable guidelines, 1-8 ESD, 1-6 grounding, 1-6 module guidelines, 1-6 program card guidelines, 1-7 subsystem placement, 1-6 subsystem room, 1-6 Program card, 1-1, 2-2, 7-21 contents, 6-2 handling guidelines, 1-7 installing, 7-22 removing, 1-1, 4-1, 4-17, 6-1, 7-22 replacing, 1-1, 7-22 self-test, 6-2 service of, 7-21 service precautions, 7-21 Index-17 Program card (cont'd) tools, 7-21 validation, 6-2 version restriction, 1-3 PTL controller perspective, 2-13 host perspective, HSZ-series, 2-16 Q ------------------------------------------------------------ Quiet slot time, 3-19 R ------------------------------------------------------------ RAID firmware, 2-12 HBVS, 2-12 level 0, 2-12, 4-4, 4-14, B-78 level 1a, 2-12 striping, 2-12 Read cache, 7-19 and power failure, 2-5 firmware, 2-12 hardware, 2-5 installing, 7-19 removing, 7-19 replacing, 7-19 service of, 7-19 service precautions, 7-19 specifications, 1-9 testing, 6-4 tools, 7-19 Read only test HSJ-, HSD-series TILX, 6-33 Related documents, xviii Removal blower, 7-35 both dual-redundant controllers, 7-18 CI cable, external, 7-23 CI cable, internal, 7-26 device port cable, 7-32 DSSI host cable, 7-28 DSSI trilink, 7-28 nonredundant controller, 7-4 of controller using warm swap, 7-42 of devices using warm swap, 7-39 one dual-redundant controller, 7-13 power supply, 7-37 program card, 1-1, 4-1, 4-17, 6-1, 7-22 read cache, 7-19 SCSI cable (device port), 7-32 SCSI host cable, 7-30 SCSI trilink, 7-30 RENAME command, B-20 Replaceable parts See Field replaceable units Replacement blower, 7-36 both dual-redundant controllers, 7-18 CI cable, external, 7-25 CI cable, internal, 7-26 device port cable, 7-33 DSSI host cable, 7-29 DSSI trilink, 7-29 nonredundant controller, 7-7 of controller using warm swap, 7-44 of devices using warm swap, 7-40 one dual-redundant controller, 7-15 power supply, 7-38 program card, 1-1, 7-22 read cache, 7-19 SCSI cable (device port), 7-33 SCSI host cable, 7-31 SCSI trilink, 7-31 Reset button, 4-1, 4-17, 5-3, 6-1, 6-5, 7-2 RESTART OTHER_CONTROLLER command, B-21 RESTART THIS_CONTROLLER command, B-23 Restoring initial parameters nonredundant controller, 7-9 one dual-redundant controller, 7-16 RUN command, B-25 S ------------------------------------------------------------ Safety See Precautions SCS node name, 4-5, 4-6, 7-10, 7-16, 7-46 restriction, 4-12 SCSI cable service precautions, 1-9 SCSI cable (device port) See Device port cable SCSI host cable, 3-19, 7-29 installing, 7-31 length, 3-19 removing, 7-30 replacing, 7-31 service of, 7-29 service precautions, 7-30 tools, 7-29 SCSI host interconnection supported protocols, 2-9 SCSI hosts and storage, 2-15 SCSI target ID, 4-5, 7-10 SCSI trilink installing, 7-31 removing, 7-30 replacing, 7-31 Self-test, 1-5, 2-9, 6-4 See also DAEMON running, 6-4 Index-18 Self-test (cont'd) stopping, 6-5 SELFTEST OTHER_CONTROLLER command, B-26 SELFTEST THIS_CONTROLLER command, B-28 Sense data display HSZ-series DILX, 6-61 SET disk-container-name command, B-30 SET FAILOVER command, B-31 SET NOFAILOVER command, B-33 SET OTHER_CONTROLLER command, B-34 SET stripeset-container-name command, B-37 SET THIS_CONTROLLER command, B-38 SET unit-number command, B-41 Shadow member timeout, 4-15 Shadow sets, 4-15 Shared memory, 2-4, 6-3 testing, 6-3 Shelf configurations, 3-8 error messages, 5-15 SHOW cdrom-container-name command, B-45 SHOW CDROMS command, B-44 SHOW DEVICES command, B-46 SHOW disk-container-name command, B-48 SHOW DISKS command, B-47 SHOW OTHER_CONTROLLER command, B-49 SHOW STORAGESETS command, B-51 SHOW stripeset-container-name command, B-53 SHOW STRIPESETS command, B-52 SHOW tape-container-name command, B-55 SHOW TAPES command, B-54 SHOW THIS_CONTROLLER command, B-56 SHOW unit-number command, B-59 SHOW UNITS command, B-58 SHUTDOWN OTHER_CONTROLLER command, 7-2, B-60 SHUTDOWN THIS_CONTROLLER command, 7-2, B-62 Shutting down, 7-2 Software, HS controller See Firmware Solid codes, OCP, 5-4 Specifications cache module, 1-9 controller module, 1-9 environmental, 1-10 HSD30, 1-9 HSJ30, 1-9 HSJ40, 1-9 HSZ40, 1-9 Storage controller perspective, 2-13 controller PTL, 2-13 differences in HSZ-series, 2-16 host perspective, 2-13 Storage (cont'd) host perspective, HSZ-series, 2-15 host PTL, HSZ-series, 2-16 how addressed, 2-13 Storage SBB status, 5-8 Storage set defined, B-51 size, 4-14 Storage sets adding, B-78 initializing, B-78 Stripeset, 2-12, 4-4, 4-14, B-78 Striping, 2-12 Subsystem initialization, 4-2 Summary of features, 1-3 SW500-series cabinets configurations, 3-6 SW800-series cabinets configurations, 3-2 T ------------------------------------------------------------ Tape in-line exerciser See TILX Target HSZ-series as one or two, 2-15, 2-16 Test definition questions HSJ-, HSD-series DILX, 6-8 TILX, 6-33 HSZ-series DILX, 6-53 TILX, 1-5, 2-10 HSJ-, HSD-series abort codes, 6-49 basic function test, 6-32 data test patterns, 6-44 defined, 6-30 end message display, 6-42 error codes, 6-50 error information packets, 6-42 examples, 6-45 interrupting, 6-31 output messages, 6-37 performance summary, 6-48 read only test, 6-33 running from maintenance terminal, 6-31 running from VCS, 6-31 running from virtual terminal, 6-31 test definition questions, 6-33 tests available, 6-32 user-defined test, 6-32 using all defaults, 6-45 using all functions, 6-46 TMSCP, 4-5, 4-7, 7-10, 7-17, 7-46 Index-19 TMSCP timeout, 4-14 Transportable devices, 4-18 TRANSPORTABLE qualifier, 4-9, 7-12 Troubleshooting, 5-2, 5-11, 7-2 and error logs, 5-2 and visual indicators, 5-2 error messages, 5-11 fault notification, 5-2 using OCP, 5-2 U ------------------------------------------------------------ uerf invoking, 5-16 Units adding, B-78 creating from disk, B-79 creating from stripeset, B-79 creating from tape, B-79 deleting, B-80 renumbering, B-80 transportable, B-80 write-protection, B-79 UNIX Errorlog Report Formatter See uerf Upgrade cache memory capacity, 7-20 firmware, 1-1 User-defined test HSJ-, HSD-series DILX, 6-8 TILX, 6-32 HSZ-series DILX, 6-52 V ------------------------------------------------------------ Value-added firmware, 2-12 VAXcluster console system See VCS VCS, 2-4, 4-11, 6-5, 6-6, 6-31 Virtual terminal, 1-5, 2-3 HSZ-series controllers, 6-100 VTDPY, 1-5, 2-10, 6-65 help, 6-97 W ------------------------------------------------------------ Warm swap, 7-38 See also Controller warm swap See also Device warm swap controller, 1-5, 2-10, 7-42 controller module, 7-2 defined, 7-38 HSZ-series controller, 7-3 SBB, 7-38 Warm swap (cont'd) storage device, 7-38 Write history log, 4-14 Index-20