------------------------------------------------------------ ESE50 SDI Solid State Disk Service Guide Order Number: EK-ESE50-SG. B01 ------------------------------------------------------------ June 1993 The information in this document is subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no responsibility for any errors that may appear in this document. No responsibility is assumed for the use or reliability of software on equipment that is not supplied by Digital Equipment Corporation or its affiliated companies. © Digital Equipment Corporation 1992, 1993. All Rights Reserved. Printed in U.S.A. The following are trademarks of Digital Equipment Corporation: Alpha AXP, AXP, DEC, ESE20, ESE50, HSC, SDI, OpenVMS, VAX, VAX DOCUMENT, the AXP logo, and the DIGITAL logo. All other trademarks and registered trademarks are the property of their respective holders. This document was prepared using VAX DOCUMENT, Version 2.1. ------------------------------------------------------------ Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Product Description 1.1 General Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1.1.1 Media Type, Model Byte, and Capacity . . . . . . . . . . . . . . . . . . . . . . . . 1-2 1.1.2 Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 1.1.3 Error Correction and Checking (ECC) . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 1.1.4 Error Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 1.1.5 Data Retention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 1.1.5.1 Data Retention Data Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4 1.1.5.2 Data Retention Battery Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 1.1.6 Powerup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 1.1.7 AC Power On and Off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 1.1.8 Spin Up and Spin Down . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6 1.1.9 Data Retention Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7 1.1.10 Related Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7 1.2 ESE50 System Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7 1.2.1 Operator Control Panel (OCP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8 1.2.2 Controller Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9 1.2.3 Array Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9 1.2.4 RZ35 or RZ27 Disk Drive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9 1.2.5 Power Monitor Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9 1.2.6 Power Supply Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9 1.2.7 AC Input Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9 1.2.8 Battery Pack (1 or 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9 1.3 Powerup Diagnostic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10 1.4 Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10 1.5 Control Panel Switches and Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-11 1.5.1 READY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-11 1.5.2 RUN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-12 1.5.3 FAULT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-12 1.5.4 WRITE PROTECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-13 1.5.5 Port A and B Switches and LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-13 1.5.6 TEST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-14 1.5.7 Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-14 1.5.8 Drive Actions on Switch Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-15 iii 2 Diagnostics 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 2.2 Troubleshooting Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 2.3 System Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 2.3.1 OpenVMS Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 2.3.2 HSC Controller Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 2.3.2.1 ILDISK In-Line Disk Functional Test . . . . . . . . . . . . . . . . . . . . . . 2-3 2.3.2.2 ILEXER In-Line Multidrive Exerciser . . . . . . . . . . . . . . . . . . . . . . 2-3 2.3.2.3 DKUTIL Off-Line Disk Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 3 Fault Isolation 3.1 Service Adapter Facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 3.1.1 Service Adapter Facility Description (SAF) . . . . . . . . . . . . . . . . . . . . . 3-1 3.1.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 3.1.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 3.1.2.2 Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 3.1.2.3 Basic Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 3.1.3 Basic Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 3.1.3.1 Sanity Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 3.1.3.2 Stack Memory Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 3.1.3.2.1 Multifunction Peripheral Chip Test . . . . . . . . . . . . . . . . . . . . . 3-4 3.1.3.2.2 Local RAM Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 3.1.3.2.3 Processor Traps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 3.1.3.2.4 Firmware Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 3.1.3.2.5 Load Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5 3.1.3.2.6 Display Configuration/Error Logs . . . . . . . . . . . . . . . . . . . . . . 3-6 3.1.3.2.7 Display Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 3.1.3.2.8 Continue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 3.1.3.2.9 Local RAM Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 3.1.4 Advanced Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 3.1.4.1 Advanced Monitor Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10 3.1.4.2 Display Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10 3.1.4.3 Correctable Error Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 3.1.4.4 Internal Error Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 3.1.4.5 Storage Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 3.1.4.5.1 Data Retention and Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13 3.1.4.6 IMB Memory Dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13 3.1.4.7 Configure System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 3.1.4.7.1 Display Hardware Configuration . . . . . . . . . . . . . . . . . . . . . . . 3-14 3.1.4.7.2 Display Disk Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15 3.1.4.8 Display Backup Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15 3.1.4.8.1 Display/Set System Serial Number . . . . . . . . . . . . . . . . . . . . . 3-17 3.1.4.8.2 IMB Memory Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-17 3.1.4.9 Offline Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19 3.1.4.9.1 System Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19 3.1.4.9.2 Controller Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20 3.1.4.9.3 Backup Disk Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-25 3.1.4.9.4 Display Error Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-25 3.1.4.9.5 IMB Memory Exerciser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-25 3.1.4.9.6 Scan IMB Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-26 3.1.4.9.7 TEST SGL BIT ERR LOGIC . . . . . . . . . . . . . . . . . . . . . . . . . . 3-27 3.1.4.9.8 TEST DBL BIT ERR LOGIC . . . . . . . . . . . . . . . . . . . . . . . . . . 3-27 3.1.4.9.9 EDAC TEST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-27 iv 3.1.4.10 Patrol Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-27 3.1.4.11 Manual Save . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-27 3.1.4.12 Manual Restore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-28 3.1.4.13 Enable SDI Special Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-28 3.1.4.13.1 Log - Display Logging Information . . . . . . . . . . . . . . . . . . . . . . 3-28 3.1.4.13.2 Short - Display Functions in Short Form . . . . . . . . . . . . . . . . . 3-29 3.1.4.13.3 Long - Display Functions in Long Form . . . . . . . . . . . . . . . . . . 3-29 3.1.4.13.4 Off - Turn Display Off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-29 3.1.4.13.5 Clear - Clear Functions/Error Log . . . . . . . . . . . . . . . . . . . . . . 3-29 3.1.4.13.6 Status - Display System Status . . . . . . . . . . . . . . . . . . . . . . . . 3-29 3.1.4.13.7 Reset - Reset Controller (Power On) . . . . . . . . . . . . . . . . . . . . 3-29 3.1.4.13.8 DVAR - Display Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-29 3.1.4.13.9 MON - Return to Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-29 3.1.4.14 Power Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-30 3.1.5 Power Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-30 3.1.5.1 Power Monitor Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-30 4 Error Handling 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 4.2 FRUs and FRU Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 4.3 ESE50 Drive Faults (Error Type - DE) . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 4.4 ESE50 Transmission Error (Error Type - RE) . . . . . . . . . . . . . . . . . . . . . . 4-4 4.5 ESE50 Protocol Error (Error Type - PE) . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5 4.6 ESE50 Initialization Faults (Error Type - DF) . . . . . . . . . . . . . . . . . . . . . . 4-6 4.7 ESE50 Write Protected (Error Type - WE) . . . . . . . . . . . . . . . . . . . . . . . . . 4-7 5 ESE50 Rev A and B -- Removal and Replacement Procedures 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 5.2 Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2 5.3 Removing and Replacing the OCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 5.4 Removing the ESE50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 5.4.1 Shutting Down the ESE50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 5.4.2 Disconnecting the ESE50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5 5.4.3 Gaining Access to the ESE50 Internal FRUs . . . . . . . . . . . . . . . . . . . . 5-6 5.5 Removing and Replacing ESE50 Modules . . . . . . . . . . . . . . . . . . . . . . . . . 5-14 5.5.1 Controller and Array Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-16 5.5.2 Power Monitor and Power Supply Modules . . . . . . . . . . . . . . . . . . . . . 5-17 5.6 Removing and Replacing the RZ35 Disk Drive . . . . . . . . . . . . . . . . . . . . . . 5-19 5.7 Removing and Replacing the Battery Backup Pack . . . . . . . . . . . . . . . . . . 5-22 5.8 Removing and Replacing the AC Input Box . . . . . . . . . . . . . . . . . . . . . . . . 5-24 5.9 Removing and Replacing the Fan Assembly . . . . . . . . . . . . . . . . . . . . . . . . 5-26 5.10 Placing the ESE50 into External Enclosure . . . . . . . . . . . . . . . . . . . . . . . . 5-27 6 ESE50 Rev C -- Removal and Replacement Procedures 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 6.2 ESE50 Shutdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 6.2.1 Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 6.2.2 Shutdown Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 6.3 ESE50 Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 6.3.1 Disconnecting the ESE50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3 6.4 Operator Control Panel Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4 v 6.5 Accessing the Internal FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5 6.5.1 Front Panel Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5 6.5.2 ESE50 Chassis Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8 6.5.3 Side Panel Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9 6.5.4 Top Shock Mount Assembly Removal . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11 6.6 ESE50 Modules Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13 6.6.1 Controller and Array Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15 6.6.2 Power Monitor and Power Supply Modules . . . . . . . . . . . . . . . . . . . . . 6-17 6.7 RZ27 Disk Drive Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-19 6.8 Battery Backup Pack Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22 6.9 AC Input Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-24 6.10 Fan Assembly -- Removable/Replacement . . . . . . . . . . . . . . . . . . . . . . . . . 6-26 6.11 The ESE50 External Enclosure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-27 A ESE50 Error Codes A.1 Generic Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 A.1.1 Request Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 A.1.2 Mode Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2 A.1.3 Error Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2 A.2 Extended Status Bytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2 A.3 ESE50 Error Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-6 A.3.1 Drive Detected Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-6 A.3.2 Reporting Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-6 B ESE50-CA Upgrade Procedure Index Examples 3-1 Basic Boot Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 3-2 Basic Dialog Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 3-3 Load Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5 3-4 Display System Configuration/Error Log . . . . . . . . . . . . . . . . . . . . . . . 3-6 3-5 Display Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 3-6 Advanced Monitor Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10 3-7 Advanced Monitor Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10 3-8 Advanced Monitor Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 3-9 Advanced Monitor Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 3-10 Internal Error Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 3-11 Error Log Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 3-12 Display Log Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13 3-13 Storage Status Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13 3-14 Display Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13 3-15 IMB Memory Dump Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 3-16 System Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 3-17 Hardware Configuration Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15 3-18 Default Geometry--Sample Display . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15 3-19 Backup Display Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16 vi 3-20 System ID Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-17 3-21 IMB Memory Format Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-18 3-22 Offline Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19 3-23 System Verification Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20 3-24 Controller Test Submenu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20 3-25 Power-On Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20 3-26 Local RAM Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-21 3-27 Dual Port RAM Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-21 3-28 SDI Tests Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-22 3-29 Backup Disk Test Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-25 3-30 LBN Tested Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-25 3-31 IMB Memory Exerciser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-26 3-32 Terminal Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-27 3-33 Patrol Diagnostic Submenu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-27 3-34 SDI Special Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-28 3-35 Display Logging Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-29 3-36 Power Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-30 A-1 Longword Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-4 Figures 1-1 ESE50 Solid State Disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 1-2 Powerup Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 1-3 Power On/Off Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6 1-4 Spin Up/Spin Down Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6 1-5 Simplified Block Diagram of ESE50 System . . . . . . . . . . . . . . . . . . . . 1-8 1-6 Operator Controls and Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9 1-7 Operator Controls and Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-11 2-1 Power Supply Module LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 2-2 Power Supply Module LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 3-1 SAF Menu Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 5-1 Removing/Replacing the OCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 5-2 Rear View of ESE50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5 5-3 Removing/Replacing the Front Panel Assembly . . . . . . . . . . . . . . . . . . 5-6 5-4 Removing/Replacing the Front Panel Assembly . . . . . . . . . . . . . . . . . . 5-7 5-5 Removing/Replacing the OCP Cable . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 5-6 Removing/Replacing the ESE50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9 5-7 Removing/Replacing the RS-232 Cable . . . . . . . . . . . . . . . . . . . . . . . . . 5-10 5-8 Fan Assembly, Top Captive Screws . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 5-9 Removing/Replacing the Top Shock Mount Assembly . . . . . . . . . . . . . . 5-12 5-10 Removing/Replacing the Covers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13 5-11 ESE50 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15 5-12 Removing/Replacing the Controller Module . . . . . . . . . . . . . . . . . . . . . 5-16 5-13 Removing/Replacing the Power Supply Module . . . . . . . . . . . . . . . . . . 5-17 5-14 Removing/Replacing the Power Monitor Module . . . . . . . . . . . . . . . . . 5-18 5-15 Removing/Replacing the RZ35 with Module . . . . . . . . . . . . . . . . . . . . . 5-19 5-16 Removing/Replacing RZ35 from Module . . . . . . . . . . . . . . . . . . . . . . . . 5-20 vii 5-17 RZ35 Jumper Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-21 5-18 Removing/Replacing the Battery Pack . . . . . . . . . . . . . . . . . . . . . . . . . 5-22 5-19 Battery Pack Cable (J90) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-23 5-20 AC Input Box Screws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-24 5-21 Removing/Replacing the AC Input Box . . . . . . . . . . . . . . . . . . . . . . . . 5-25 5-22 Removing/Replacing the Fan Cable . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-26 6-1 ESE50 -- Rear View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3 6-2 OCP -- Removable/Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4 6-3 Front Panel Assembly -- Removable . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6 6-4 Front Panel Assembly -- Removable/Replacement . . . . . . . . . . . . . . . . 6-6 6-5 OCP Cable Disconnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7 6-6 The ESE50 -- Removable/Replacement . . . . . . . . . . . . . . . . . . . . . . . . 6-8 6-7 Side Panel Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9 6-8 The RS-232 Cable Disconnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11 6-9 Top Shock Mount Assembly -- Captive Screws . . . . . . . . . . . . . . . . . . 6-12 6-10 ESE50 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14 6-11 Controller Module -- Removable/Replacement . . . . . . . . . . . . . . . . . . . 6-16 6-12 Power Monitor Module -- Removable/Replacement . . . . . . . . . . . . . . . 6-18 6-13 Power Supply Module -- Removable/Replacement . . . . . . . . . . . . . . . . 6-18 6-14 Disk Drive with Slide Module -- Removable/Replacement . . . . . . . . . . 6-20 6-15 Disk Drive -- Removable/Replacement . . . . . . . . . . . . . . . . . . . . . . . . 6-20 6-16 Disk Drive -- Jumper Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21 6-17 Battery Pack -- Cable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-23 6-18 Battery Pack -- Removable/Replacement . . . . . . . . . . . . . . . . . . . . . . . 6-23 6-19 AC Input Box Mounting Screws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-25 6-20 AC Input Box Mounting -- Cables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-25 6-21 Fan Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-26 B-1 120 to 600 MB Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2 Tables 1-1 Summary of the ESE50 Media Type and Model Byte . . . . . . . . . . . . . . 1-2 1-2 Summary of Data Retention Times . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4 1-3 Summary of Data Status Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4 1-4 Powerup Sequence Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 1-5 Power On/Off Sequence Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6 1-6 Spin Up/Spin Down Sequence Summary . . . . . . . . . . . . . . . . . . . . . . . 1-6 1-7 Related Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7 1-8 ESE50 Solid State Disk Product Specifications . . . . . . . . . . . . . . . . . . 1-10 1-9 Summary of the RUN Switch and Indicator . . . . . . . . . . . . . . . . . . . . . 1-12 1-10 Summary of Drive Actions on Switch Changes . . . . . . . . . . . . . . . . . . 1-15 4-1 FRUs and FRU Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 4-2 Drive Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 4-3 Transmission Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4 4-4 Protocol Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5 4-5 Initialization Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6 4-6 Write Protected Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7 viii ------------------------------------------------------------ Preface Audience This guide is for Customer Services personnel, engineering, engineering technicians, and repair center personnel. It is intended to aid Customer Services personnel in servicing and maintaining the ESE50 electronic storage element. Organization This guide is divided into six chapters and two appendices. Chapter 1, Product Description, is an introduction. It describes the device configurations and gives a system overview that includes ECC, the storage array, data retention system, and the standby power system. Chapter 2, Diagnostics, describes the data retention system loading procedure, device-resident diagnostics, diagnostic monitor tests and test sequences, and the SDI loop test. Chapter 3, Fault Isolation, describes troubleshooting tips, maintenance, and system diagnostics. Chapter 4, Error Handling, describes error handling procedures and lists the errors with the FRUs suspected of causing the errors. Chapter 5, Removal and Replacement Procedures, describes the removal and replacement procedures for the various FRUs within a Rev. A or Rev. B ESE50 unit. Chapter 6, Removal and Replacement Procedures, describes the removal and replacement procedures for the various FRUs within a Rev. C ESE50 unit. Appendix A, ESE50 Error Codes, lists the error codes and their definitions. Appendix B, ESE50-CA Upgrade Procedure, describes the procedures for upgrading the ESE50 unit from 120 MB to 600 MB data storage capacity. Conventions Notes, Cautions, and Warnings indicate different types of special information. ------------------------------------------------------------ ------------------------------------------------------------ Note Provides additional general information. Caution Provides essential information to prevent damage to equipment and software. Warning Provides essential information to prevent personal injury. ------------------------------------------------------------ ix 1 ------------------------------------------------------------ Product Description 1.1 General Information The ESE50 solid state disk (SSD) is a member of the Digital Storage Architecture/Standard Disk Interconnect (DSA/SDI) family, and is plug- compatible with all DSA/SDI controllers. The ESE50 solid state disk is a random access low latency mass storage device that uses DRAM semiconductor technology for fast response time. The ESE50 is Digital's fastest disk drive and is used when I/O response time is a key metric. The ESE50 is contained in a half rack (RA92 style) enclosure with its own power and data retention capabilities. It connects and operates on the standard disk interface (SDI) and may be used with controllers implementing this bus. The operator control panel (OCP) from the RA9x disk drive is also used on the ESE50 SSD. The ESE50 can also be packaged in any SA3xx, SA6xx, SA8xx, or SA9xx storage array cabinet. The following controllers support the ESE50: · HSC (40/60/65/70/90/95) · KDM70 The ESE50 has been tested under OpenVMS Versions 5.4 and 5.5. Under these versions, the ESE50 is identified as a DU device (rather than by the ESE50 name). Identification as a DU device does not impact any standard Digital disk functionality. Shadowing, failover, dual porting, bound volumes, and so forth are supported. However, when running the ESE50 under OpenVMS Versions 5.4-n, 5.5, or 5.5-1, keys things to be aware of are: · All shadow sets must be comprised of the same capacity ESE50 units. · Controller-based volume shadow sets require manual verification that all members of the shadow set are on the same controller. The ESE50 is fully supported under OpenVMS Version 5.5-2 and OpenVMS AXP Version 1.5 (and all subsequent releases). Figure 1-1 shows an ESE50 solid state disk. Product Description 1-1 Product Description 1.1 General Information Figure 1-1 ESE50 Solid State Disk 1.1.1 Media Type, Model Byte, and Capacity The ESE50 has a unique media type and model byte depending on the capacity. This is done for operating system compatibility. Table 1-1 describes the unique media type and model byte for a specific ESE50 option. Table 1-1 Summary of the ESE50 Media Type and Model Byte ------------------------------------------------------------ Option Media Type Model Byte Capacity (LBNs) ------------------------------------------------------------ ESE50-AA,AB ESE52 31 238080 ESE50-BA,BB ESE56 48 1196544 ESE50-DA,DB ESE58 49 1915392 ------------------------------------------------------------ 1.1.2 Configurations The seven ESE50 device configurations are as follows: · ESE50-AA -- 120V 60 Hz (120 MB, with data retention) · ESE50-AB -- 240V 50 Hz (120 MB, with data retention) · ESE50-BA -- 120V 60 Hz (600 MB, with data retention) 1-2 Product Description Product Description 1.1 General Information · ESE50-BB -- 240V 50 Hz (600 MB, with data retention) · ESE50-DA -- 120V 60 Hz (1.0 GB, with or without data retention) · ESE50-DB -- 240V 50 Hz (1.0 GB, with or without data retention) · ESE50-CA -- 120 to 600 MB upgrade 1.1.3 Error Correction and Checking (ECC) For data reliability and integrity, Hamming code ECC and block ECC are used during reading and writing of data. Hamming code ECC is generated and stored with the data in the storage array. It is checked and used to correct single-bit errors for the DRAM storage array. Most uncorrectable errors detected by the Hamming code ECC are corrected by the block ECC code. The error is reported to the host and logged. The block ECC is supplied by the controller and stored with the data in the storage array. On a read transfer, the block ECC is returned with the data to the controller for checking. The block ECC code can correct up to eight symbol errors in a block of data. 1.1.4 Error Logging The ESE50 error log contains the error type, location, time, and other system status information. The log is available through the SDI port or the Field Service terminal port. When an error occurs, the ESE50 informs the host using the Attention mechanism. Then the controller can issue a GET STATUS command to ascertain drive status. The ESE50 device also logs the error in an internal error silo. This silo can be read by DKUTIL or by MDM diagnostics. This internal error silo records the 32 most recent errors. 1.1.5 Data Retention The ESE50 provides data nonvolatility through the use of an integrated data retention system. This data retention system is comprised of a controller, a SCSI magnetic Winchester disk drive, and batteries. Data retention is invoked only when ac power is lost to the ESE50 unit during normal operation. In this situation, the batteries will power the unit while all data in the memory arrays is transferred to the Winchester disk drive. If ac power is restored at any time during the transfer process, the ESE50 unit is instantly available. On Rev. A and Rev. B ESE50 units, the ESE50-AA/AB (120 MB) and ESE50-BA/BB (600 MB) options contain the complete data retention system. On Rev. C ESE50 units, the ESE50-AA/AB (120 MB), ESE50-BA/BB (600 MB), and ESE50-DA/DB (1 GB) options contain the complete data retention system. Table 1-2 lists the data retention times. Product Description 1-3 Product Description 1.1 General Information Table 1-2 Summary of Data Retention Times ------------------------------------------------------------ ESE50 Option Revision Data Retention Time Backup Time Start Time Maximum Number of Complete Backups (Within Two-Hour Period)+ ------------------------------------------------------------ ESE50_AA/AB A or B 120 MB 90 seconds 120 seconds 8 ESE50_BA/BB A or B 600 MB 360 seconds 390 seconds 2 ESE50_AA/AB C 120 MB 60 seconds 90 seconds 16 ESE50_BA/BB C 600 MB 300 seconds 360 seconds 4 ESE50_DA/DB C 1 GB 480 seconds 540 seconds 3 ------------------------------------------------------------ +A two-hour battery recharge period is required afterward. Power failure exceeding this duty cycle may result in data loss. ------------------------------------------------------------ Although the ESE50 provides nonvolatile features, it is the user 's responsibility to protect data by using proper backup procedures similar to backup of magnetic disks. The following backup methods are recommended for an ESE50: · File duplication This method normally involves copying the data onto removable media, such as magnetic tape. · Journaling This method is recommended for files up to the last checkpoint or backup. ------------------------------------------------------------ Caution ------------------------------------------------------------ A data retention operation does not occur when the ESE50 main ac power switch, located on the rear panel, is turned off. A data retention operation is invoked automatically only when ac power is lost to the input power cord. Refer to Section 6.2.2 for the orderly shutdown procedures. ------------------------------------------------------------ 1.1.5.1 Data Retention Data Status The ESE50 maintains data status flags for the SCSI hard drive contents. These flags are used to determine what data is restored to the RAM storage on a powerup. Table 1-3 descibes these flags. Table 1-3 Summary of Data Status Flags ------------------------------------------------------------ Flag Meaning Set Conditions ------------------------------------------------------------ Valid The data retention system contains a good copy of data from the RAM storage to be restored. Set when data is successfully moved from the RAM storage to the SCSI hard drive. A successful Save operation. Invalid The data retention system contains a copy of data, but it does not reflect the last data within the RAM storage. Set on a write to an LBN or by a successful Online command if the data retention system had a valid flag set. No_Data The data retention system contains no data. Set whenever a save is terminated before all the data has been moved to the SCSI hard drive. ------------------------------------------------------------ 1-4 Product Description Product Description 1.1 General Information 1.1.5.2 Data Retention Battery Status The ESE50 maintains a battery charge status for reporting to the host. The state of the batteries is reported to the host on powerup or returning from a power failure. If the batteries are not capable of performing a complete save, a low battery error code (BF) is sent and the system is write protected. The write protect is manually removed using the Write Protect switch on the OCP. The write protect feature is a selectable option by a service port command. If write protected, the write protect is removed as soon as the batteries are charged. 1.1.6 Powerup The ESE50 system, upon powerup, performs a self test and then restores data from the SCSI hard drive. If the data is not valid or nonexistent, the RAM storage is written with all zeros data, good EDC indicating no forced error and good ECC. The system reports an E1 error if the data in the SCSI hard drive is invalid, and E0 error if there is no data. Figure 1-2 and Table 1-4 describe the powerup sequence. Figure 1-2 Powerup Sequence Table 1-4 Powerup Sequence Summary ------------------------------------------------------------ DR Status Powerup Operation ------------------------------------------------------------ No_Data Write 0's data without FE flags, report an E0 error Valid Restore data from SCSI hard drive Invalid Write 0's without FE flag, report an E1 error ------------------------------------------------------------ 1.1.7 AC Power On and Off The data retention system, depending on the state of the AC input, starts or stops a save or restore operation. Figure 1-3 and Table 1-5 describe the AC power on/off sequence. Product Description 1-5 Product Description 1.1 General Information Figure 1-3 Power On/Off Sequence Table 1-5 Power On/Off Sequence Summary ------------------------------------------------------------ DR Operation AC Power On/Off AC Power Off/On ------------------------------------------------------------ Restoring Stop restore N/A Saving Continue Stop save Idle Start save, independent of battery state Start restore, if the data is not restored in arrays ------------------------------------------------------------ 1.1.8 Spin Up and Spin Down The data retention system on a spin down initiates a save if the storage is valid and the DR is not valid. On a spin up, the save is stopped if running. Spin up and spin down can only occur after the data has been fully restored to the RAM storage. Figure 1-4 and Table 1-6 describe the spin up and spin down sequence. Figure 1-4 Spin Up/Spin Down Sequence Table 1-6 Spin Up/Spin Down Sequence Summary ------------------------------------------------------------ DR Operation Spin Up Spin Down ------------------------------------------------------------ Saving Stop save, set DR status = No_Data Continue Idle N/A Start save, DR status = Invalid or No_Data ------------------------------------------------------------ 1-6 Product Description Product Description 1.1 General Information 1.1.9 Data Retention Error Handling The system reports errors through the use of the SDI attention mechanism. If the system fails during a test command or during an operation, the fault is reported to the controller or host. The system is declared faulted. The fault is reported to the controller/host three (3) times whenever the device is brought on-line or, if already on-line, when it failed. The system remains faulted until the ESE50 is successfully powered up again. The ESE50 is used even though the data retention system has failed, but the system is VOLATILE to power interruption not supported by the internal batteries. 1.1.10 Related Documents Table 1-7 Related Documentation ------------------------------------------------------------ Title Order Number ------------------------------------------------------------ ESE50 User Guide EK-ESE50-UG SAxxx Storage Array Configuration Guide EK-SAXXX-CG ------------------------------------------------------------ 1.2 ESE50 System Description Figure 1-5 is a simplified block diagram of the ESE50 system. The ESE50 system contains the following devices: · Operator control panel · Controller module · Array modules (up to 16) · RZ35 or RZ27 disk drive · Power monitor module · Power supply module ------------------------------------------------------------ 120 MB version utilizes one ------------------------------------------------------------ 600 MB version utilizes two ------------------------------------------------------------ 1 GB version utilizes two · AC input box · Backup battery pack (Rev. C ESE50 units have two battery packs) Product Description 1-7 Product Description 1.2 ESE50 System Description Figure 1-5 Simplified Block Diagram of ESE50 System 1.2.1 Operator Control Panel (OCP) The operator control panel provides an operator interface to the ESE50 system. The panel consists of five switches and several indicators. A microprocessor and support logic communicate to the ESE50 controller module. ------------------------------------------------------------ Note ------------------------------------------------------------ Although very similiar to the RA9X OCP, this OCP is not compatible with RA9X units. ------------------------------------------------------------ For a more detailed description of this panel, refer to the ESE50 Electronic Storage Element User Guide (EK-ESE50-UG) and Section 1.5. 1-8 Product Description Product Description 1.2 ESE50 System Description Figure 1-6 Operator Controls and Indicators 1.2.2 Controller Module This module contains the SDI interface, microprocessor, SCSI interface, and storage array interface. It performs all of the control functions internal to the ESE50 system. 1.2.3 Array Module This module (up to 16 in an ESE50 system) consists of 156 4 MB DRAMs, drivers, and interface logic. It also contains an ID code to identify the module. 1.2.4 RZ35 or RZ27 Disk Drive This disk drive resides in the ESE50 system and is used for data retention. 1.2.5 Power Monitor Module This module consists of a microprocessor, D/A interface, A/D interface, and serial line. It monitors the power supplies, temperature, and battery pack state. 1.2.6 Power Supply Module This module consists of DC to DC converters for the +5, -5, and +12 volts current. Up to two power supply modules (for redundancy) may be present in the ESE50 system. 1.2.7 AC Input Box This box contains the line filter and input rectifiers. 1.2.8 Battery Pack (1 or 2) The battery pack consists of 21 sub C cell nicad batteries and an in-line fuse. It is used to power the data retention system in the ESE50. Note that this battery pack should be replaced every three years. One battery pack is utilized in Rev. A and Rev. B units, and two battery packs are utilized in Rev. C units. Product Description 1-9 Product Description 1.3 Powerup Diagnostic 1.3 Powerup Diagnostic During powerup, the ESE50 device performs a powerup diagnostic self test (which takes approximately 30 seconds to complete). This test includes the following diagnostics: · Local RAM Test--tests the local RAM with exception of the vector and stack spares. · Dual Port RAM Test--verifies the FIFO area of the dual port RAM. · MFP (multifunction peripheral) Chip Test--tests this chip and its read/write register. · SCSI Chip Test--tests the SCSI interface chip and interrupts. · LCA Tests--tests the programmable gate arrays. · IMB Tests--tests the IMB bus for each array module installed. During powerup, data is loaded from the data retention system back to the memory arrays. The transfer time is approximately 120 seconds to 480 seconds, depending on capacity. 1.4 Specifications Table 1-8 lists the performance, nominal electrical, environmental, and physical specifications of the ESE50 electronic storage element. Table 1-8 ESE50 Solid State Disk Product Specifications ------------------------------------------------------------ Performance ------------------------------------------------------------ Data storage capacity 120 MB, 600 MB, or 980 MB Transfer rate 20 Mbits/second Bit width (cell period) 50 nanoseconds Access Time 0.25 millisecond ------------------------------------------------------------ Power Requirements ------------------------------------------------------------ Standards UL listed; CSA certified; FCC Class A verified Voltage 120/208 Vac at 60 Hz; Single-phase WYE 220/240 Vac at 50 Hz ------------------------------------------------------------ Operating Environment ------------------------------------------------------------ Temperature 15 to 32°C (59 to 90°F) Maximum allowable operating temperature is reduced by a factor of 1.8°C/1000 m (1°F/1000 ft) for operation at high altitude sites. Relative humidity 20 to 80% Maximum wet bulb 25°C (77°F) Minimum dew point 2°C (36°F) (continued on next page) 1-10 Product Description Product Description 1.4 Specifications Table 1-8 (Cont.) ESE50 Solid State Disk Product Specifications ------------------------------------------------------------ Physical Dimensions ------------------------------------------------------------ Height 10.42 inches Width 8.74 inches Length 18.50 inches Shipping weight 63 pounds Installed weight 55 pounds Cabinet capacity Varies ------------------------------------------------------------ 1.5 Control Panel Switches and Indicators The OCP contains six switches, seven LEDs, and a four-character alphanumeric display. All the panel positions contain lights that are turned on and off independent of any switch's position. The display shows the switch position, unit number and error codes Figure 1-7 shows the arrangement of these controls and LEDs on the OCP located at the front of the cabinet. The sections that follow further describe the switches and indicators. Figure 1-7 Operator Controls and Indicators 1.5.1 READY The READY LED is used to indicate that the internal drive is ready for a read or write operation. The LED can be illuminated only if the RUN/STOP switch is set to RUN and the RUN/STOP LED is illuminated. The LED is on if the drive is in the available state and spun up. The LED is also on if the drive is in the online state, spun up and the READ/WRITE READY signal is asserted. Product Description 1-11 Product Description 1.5 Control Panel Switches and Indicators 1.5.2 RUN The RUN switch is set to the in position to spin up the drive. The RUN switch is set to the out position to spin down the drive. The RUN LED always indicates the physical state of the drive. The RUN switch operates as described in Table 1-9, if the drive is on- or offline. Table 1-9 Summary of the RUN Switch and Indicator ------------------------------------------------------------ Drive Status RUN Switch RUN LED Action ------------------------------------------------------------ Offline 1 Out/In (DD status bit is not set) Off/On Spin up Offline 2 In/Out On/Off Spin down Online 3 In/Out On/Off Spin down Online 4 Out/In Off/On Spin up ------------------------------------------------------------ 1 If drive cannot start for any reason, the LED remains off. If the DD status bit is set and off line, the drive ignores the RUN switch. 2 Note that the LED turns off when the spin down is completed. 3 The drive utilizes the Attention Mechanism to notify the controller, but it is running and available for processing additional commands. When the controller has completed all essential write operations to the drive, it conducts a DISCONNECT with STOP modifier exchange, at which point the drive is spun down and LED turns off. 4 The drive utilizes the Attention Mechanism to notify the controller, but remains spun down and waits for additional commands. The controller conducts a RUN exchange when it is appropriate for the drive to spin up. When the drive is spun up as a result of the RUN command, the LED is lit. ------------------------------------------------------------ 1.5.3 FAULT The FAULT LED is used to indicate a read/write safety error and serious physical error condition in the drive. If the FAULT LED is not lit and the PORT A and PORT B LEDs are lit, changes in the FAULT switch are ignored. If the FAULT LED is off and the PORT A and PORT B LEDs are also off, the FAULT switch acts as a momentary contact. When pressed, a two-second long lamp test occurs, which lights all the LEDs and the four-character alphnumeric display. Regardless of its relative state to the controller, when the drive detects an error that it classifies as a fault, it sets the DE bit in its generic status, lights the FAULT LED, but remains in that current state. If and when the FAULT switch is pressed, the drive immediately enters the offline state relative to all contollers, stops all state clock transmissions on the real-time drive state line, and uses the four-character alphanumeric display to show the two-digit error code indicating the nature of the error. The drive remains offline and disconnected for the duration of the time that the code is being displayed. The code is displayed until the FAULT switch is again pressed, at which time the drive attempts to clear the error condition as if a DRIVE CLEAR command for the DE bit had been received, enters the drive available state, and returns the LEDs to normal service. If the attempt to clear the error condition is unsuccessful, the DE bit remains set and the FAULT LED remains lit. If the attempt is successful, the DE bit is cleared and the FAULT LED turns off. Note that whenever the DE bit is cleared (whether through DRIVE CLEAR, reinitialization, or the FAULT switch), the FAULT LED is turned off. 1-12 Product Description Product Description 1.5 Control Panel Switches and Indicators In addition to drive faults reported by the DE bit, the fault mechanism is also used to display a DD set fault code of 4A (hexadecimal) whenever the DD bit is set. DD set faults are to be treated as an unclearable fault by the drive. Except for the DD case, the error conditions under the FAULT LED are used and interpretations of the fault codes themselves are drive-specific. 1.5.4 WRITE PROTECT The WRITE PROT switch is set to the in position to request that the disk be write protected. It is set to the out position to request that the disk be write enabled. The WRITE PROTECT LED, when on, indicates a logically or physically write protected disk. The WRITE PROTECT LED, when off, indicates a write enabled disk. If a drive is offline and the WRITE PROTECT switch is set in, the drive will turn on the LED and internally write protect the drive. If the switch is set out, the drive will turn off the LED and internally write enable the drive. If a drive is online and the WRITE PROTECT switch is set in, the drive will utilize the Attention Mechanism to report the change in the switch setting. The drive will leave the LED off until the change is reported to the controller (using the next GET STATUS response), at which point it will leave the drive internally write enabled, but will turn the LED on to tell the operator that the controller will refuse future write operations to the drive. If a drive is online and the WRITE PROTECT switch is set out, the drive will utilize the Attention Mechanism to report the change in the switch setting. The drive will leave the LED on until the change is reported to the controller (using the next GET STATUS response), at which point it will turn the LED off but leave the drive internally write protected until the controller commands the drive to internally write enable itself. 1.5.5 Port A and B Switches and LEDs The port switches are set in to enable connection between the drive and the controller to that port. It is set out to disconnect the drive from the controller attached to that port. When a port is disconnected, the drive behaves as if the cable to that port was physically removed and avoids any transmissions (including clocks) down either SDB line. The LED (if on) indicates that the drive is online to the controller connected to that port. If the LED is off, the drive is offline to the controller connect to that port. Thus, when a drive is online to a controller, only the port LED is lit. When the drive is offline or available to all controllers, no port LEDs are lit. If a drive is online and the corresponding port switch is set out, the drive cannot physically deactivate the port until it has left the online state. If a drive is offline to the port, the port immediately deactivates. If a port switch is set in, a drive that is online immediately activates the port and begins transmission of appropriate information down the appropriate SDB lines. If the drive is online when a port switch is set in, it activates the port as soon as possible, but no later than when it next leaves the online state. If the drive is in the drive available state, it includes the newly activated port as one of the ports to be listened to. The drive does not light a port LED unless the drive is in the online state through that port. Product Description 1-13 Product Description 1.5 Control Panel Switches and Indicators 1.5.6 TEST The TEST switch is set in to enable the Test Mode or display the unit number, depending on the state of the other LEDs and switches. The TEST switch acts like a momentary contact if the PORT A and PORT B LEDs are on, or if the port switches are set in. This is the case if either controller port is selected or a controller is accessing the drive. In this case, pressing the TEST switch merely causes a temporary two-second display of the unit number. If the above conditions are not true, the TEST switch acts as a two-position switch; pushing it locks the TEST switch in and puts the OCP into the Test Mode and turns on the TEST LED. When in Test Mode, the display shows the four-digit unit number. The unit number changes by the following actions: · Push the PORT A switch to select the 1's digit to be changed. The 1's digit will start blinking to indicate that it is selected. Push the PORT A switch again to select the next digit. · Push the PORT B switch to increment the blinking digit. Only that digit will change. If the digit is at 9, pushing the PORT B switch will roll it over to 0. Digit 3 rolls over from 4 to 0. The unit number can range from 0000 to 4094. After setting the unit number, the Test Mode is exited by pressing the TEST switch. Pressing the WRITE PROTECT switch while in the Test Mode will display the letter T in position 3 of the display, a blank in 2, and a two-digit test number in position 1 and 0 of the display. The test number always equals 00 when first displayed. The two-digit test number is incremented in the same fashion as the unit number. Pressing the WRITE PROTECT switch again, while in Test Mode, starts the letter T blinking and starts the selected test running. While the letter T is blinking, the PORT A switch will stop the blinking letter T and select the 1's digit to be incremented. Pressing the TEST switch will stop the blinking and exit the Test Mode. Pressing the WRITE PROTECT switch will stop blinking letter T and stop the test. 1.5.7 Display The display is a four-position display for showing the switch state, unit number, or an error code. While the TEST LED is off, the display shows the state of the RUN, WRITE PROTECT, PORT A and PORT B switches. If the RUN switch is set in, position 3 will show the letter R. When the RUN switch is set out, position 4 will be blank. If the WRITE PROTECT switch is set in, position 2 will show the letter W. When the WRITE PROTECT switch is set out, position 2 will be blank. If the PORT A switch is set in, position 1 displays the letter A. When the PORT A switch is set out, position 1 will be blank. If the PORT B switch is set in, position 0 displays the letter A. When the PORT B switch is set out, position 0 will be blank. If none of the above switches are set in, then the unit number is displayed as described previously in this section. 1-14 Product Description Product Description 1.5 Control Panel Switches and Indicators Pushing the TEST switch causes the display to show the unit number if a PORT switch is set in and the PORT LED is lit. If the FAULT LED is lit, pressing the FAULT switch in causes the display to show the letter E in position 3, a blank in position 2, and a two-digit error code in positions 1 and 0. Pushing the FAULT switch again returns the display to normal mode. 1.5.8 Drive Actions on Switch Changes The drive must turn the appropriate LEDs on and off as the conditions warrant. That is, the drive alone manipulates the LEDs to reflect its current status as described in the next table. There is no facility in the controller for altering the OCP LEDs. Refer to Table 1-10. Table 1-10 Summary of Drive Actions on Switch Changes ------------------------------------------------------------ LED Turned ON... Turned OFF... ------------------------------------------------------------ READY when drive internally spun up or available, or spun up, online and read/write ready; requires RUN LED on. when drive not internally spun up or read/writer ready when online. RUN when drive spun up. when drive spun down. FAULT when DE = 1 and fault condition detected. when DE = 0. WRITE PROTECT when drive is physically write protected, or controller should not accept hosts lines. when drive is physically write enabled, or controller should accept host writes. PORT when drive is online through port. when drive is offline through port. TEST when TEST switch is in, PORT switches are out and PORT LEDs off. when TEST switch is out, a PORT switch is in and PORT LEDs on. ------------------------------------------------------------ Product Description 1-15 2 ------------------------------------------------------------ Diagnostics 2.1 Introduction This chapter describes the diagnostics available for the ESE50. 2.2 Troubleshooting Tips This section describes some useful tips: · Error code from the OpenVMS error log · OCP error codes · Use of the service adapter · DKUTIL The power supply module contains three green LEDs, which indicate that the three supply voltages are present. They are -5.2, +12, and +5 volts (refer to Figure 2-1). Figure 2-1 Power Supply Module LEDs The power monitor module contains three green and four yellow LEDs, which indicate the status of the power system during the powerup cycle. Any error detected causes the sequence to stop with the error code displayed. Refer to Figure 2-2. Diagnostics 2-1 Diagnostics 2.2 Troubleshooting Tips Figure 2-2 Power Supply Module LEDs 2.3 System Diagnostics 2.3.1 OpenVMS Diagnostics The OpenVMS diagnostics used in the ESE50 system are: · EVRAE · EVRCJ These diagnostics run under VAX Diagnostic Supervisor (VDS) and are described briefly in the following sections. 2-2 Diagnostics Diagnostics 2.3 System Diagnostics 2.3.2 HSC Controller Diagnostics The controller diagnostics used with the ESE50 system are: · ILDISK · ILEXER · DKUTIL The controller diagnostics are described briefly in the following sections. 2.3.2.1 ILDISK In-Line Disk Functional Test ILDISK isolates drive-related problems to one of the following FRUs: · Disk drive · SDI cable · Disk data channel module The In-Line Disk Functional Test runs in parallel with disk I/O from a host CPU, but the drive being diagnosed cannot be on-line to any host. This diagnostic is initiated upon demand through the local terminal, or remotely by the psuedo terminal, or may be initiated automatically upon disk drive failure by the HSC control program. 2.3.2.2 ILEXER In-Line Multidrive Exerciser This program exercises the various disk and tape drives attached to the HSC subsystem. Drives to be tested are selected by the operator. The program issues random READ, WRITE, and COMPARE commands to exercise the drives. The results of the exerciser are displayed on the terminal from which it was initiated. The reports given by the ILEXER do not provide any analysis of the errors reported, nor do they explicitly call out a faulty FRU. ILEXER is strictly an exerciser. 2.3.2.3 DKUTIL Off-Line Disk Utility DKUTIL is a general utility that displays disk structures and disk data. Unlike other utilities, DKUTIL is a command language interpreter. Initially, the user is prompted for the unit number of a disk. The program then goes into a command mode, prompting for a command, executing it, and then prompting for another command. Execution is terminated by ^C, ^Y, ^Z, or the EXIT command. The commands used are: · DEFAULT · DISPLAY · DUMP · EXIT · GET · POP · PUSH · SET DKUTIL is useful for fetching the contents of the drive-resident error log in the ESE50 device, as it does with other DSA disks. Diagnostics 2-3 3 ------------------------------------------------------------ Fault Isolation This chapter describes ESE50 fault isolation using the service adapter facility (SAF). 3.1 Service Adapter Facility The following provides information on the installation and use of the service adapter facility (SAF) included with the ESE50 Solid State Storage System. The section is arranged in the following manner. · INTRODUCTION--This section provides an overview of the SAF. · INSTALLATION--This section describes the requirements, installation, and operation of the SAF. · BASIC MONITOR--This section describes the use of the ESE50 Basic Monitor program and its capabilities. · ADVANCED MONITOR--This section describes the capabilities and operation of the ESE50 Advanced Monitor program. · POWER MONITOR--This section describes the capabilities and operation of the monitor functions of the power monitor assembly. 3.1.1 Service Adapter Facility Description (SAF) The SAF allows the operator or maintenance personnel the capability of loading new system firmware, performing system software configuration, displaying/clearing error logs, and running diagnostics. ------------------------------------------------------------ Caution ------------------------------------------------------------ The SAF is a powerful tool that should only be accessed when absolutely necessary. Virtually all fault isolation can be done from the OpenVMS error log. It is possible to modify ESE50 parameters to the point of nonfunctionality unless extreme care is excercised. ------------------------------------------------------------ The SAF functionality is provided by the firmware on the controller module. The only additional requirement is a simple RS-232 terminal. The terminal may be permanently attached to the ESE50 and will provide real-time status messages in addition to the monitor functions, or it may be connected only when needed to perform one or more of the monitor functions. The monitor functions are divided into three areas of the firmware. Fault Isolation 3-1 Fault Isolation 3.1 Service Adapter Facility The first function, termed the basic monitor, resides in the controller boot ROM. The basic monitor provides the capability of displaying status messages during the boot procedure and to display onboard error logs. Downloading of new system firmware into the onboard Flash ROM is also performed through the basic monitor function. The advanced monitor program resides in the Flash ROM and contains the capabilities to: · Display detailed logs · View and change system software configuration · Perform offline diagnostics · Enable/disable patrol diagnostics · Perform a manual save · Perform a manual restore · Display real time status · Switch to the power monitor function The third area of the monitor function is resident in the power monitor assembly firmware. It provides the capabilities of displaying the power system status, margining voltages and performing diagnostics on the power system. 3.1.2 Installation This section contains the information necessary to connect the user-supplied RS-232 terminal to the ESE50 to utilize the SAF functions. Additionally, basic operating information is provided. 3.1.2.1 Requirements To use the SAF, a simple RS-232 terminal is connected to the ESE50. The only exception to this requirement is when new firmware is to be loaded, which is accomplished by replacing the controller module. This function requires that the controller be connected to a serial port of a personal computer running DOS version 3.1 or higher. Also, communication software such as PROCOMM, capable of downloading the hexadecimal file, is required. The interface is available at the industry standard DB25 connector located inside the unit behind the front bezel. The interface specifications are as follows: 9600 baud Eight data bits Parity disabled Full duplex mode The protocol requires the active use of the RS-232 CTS and RTS signals. The ESE50 is equivalent to an RS-232 DCE (modem) and drives the receive data and CTS signals and receives the transmit data and RTS signals. These signals are wired directly to a device equivalent to a terminal. The two data signals and two control signals must be cross connected to the equivalent of another modem. 3-2 Fault Isolation Fault Isolation 3.1 Service Adapter Facility 3.1.2.2 Connection Connection is accomplished in three steps: 1. Verify that the terminal to be used is configured as previously defined. 2. Remove the front bezel from the ESE50 extrusion. 3. Connect the DB25 cable to the connector located on the top support bracket. 3.1.2.3 Basic Operation The ESE50 basic boot operation is shown in Example 3-1. Example 3-1 Basic Boot Operation SANITY CHECK PASSED HIT ESC KEY FOR BOOTUP MENU . . . . . - Waiting for disk - Testing disk \ VOL STATUS: VM Restore in progress... 5.120 128 Kb blocks transfer (640 Mb) - When the ESE50 is powered on, or reset, the microprocessor begins firmware execution with the boot procedure located in the boot ROM. A set of sanity checks are performed. When the checks are completed successfully, a status message is displayed on the terminal. If any of the checks fail, the unit halts. Next, the resident firmware in Flash ROM is checked. If the firmware installed is valid, a status message is displayed on the terminal, and control is transferred to the system firmware in Flash ROM. If the firmware is invalid or contains a checksum error, an error message is displayed on the terminal and the basic monitor is entered. If the operator types Esc at any time during the basic boot procedure, the boot is halted and the basic monitor is entered. Normal firmware execution begins with the power-on diagnostics. Any failure during these tests displays the appropriate error message and the advanced monitor is entered. When the power-on diagnostics have passed, a configuration check is performed. The configuration check begins by identifying the type and quantity of modules installed. This data is used to establish the device type. Next, the backup disk is checked. If it contains valid data and auto restore is enabled, the restore process is started. If the backup disk data is In_Valid or No_Data, the memory is first initialized followed by a format. At the end of either of the cases described above, the unit is placed in operation relative to the state at the last power down. Fault Isolation 3-3 Fault Isolation 3.1 Service Adapter Facility 3.1.3 Basic Monitor Example 3-2 shows the basic monitor menu. It allows the operator to select any of the five options described below. Example 3-2 Basic Dialog Display SANITY CHECK PASSED HIT literal>(ESC) KEY FOR BOOTUP MENU . . . . . - ESE50-SCSI BASIC MONITOR VERSION 1.0 1 -- LOAD FIRMWARE 2 -- DISPLAY CONFIGURATION/ERROR LOG 3 -- DISPLAY INDENTIFICATION 4 -- CONTINUE 5 -- LOCAL RAM TEST SELECT OPTION 1-5 : 3.1.3.1 Sanity Checks On power-on, or reset, the boot ROM begins by performing a series of tests to ensure the integrity of the system prior to passing control to the main operational firmware in Flash ROM. 3.1.3.2 Stack Memory Test The first test performed is a check of the processor 's stack area, which is part of the 128K byte local RAM. First, each longword in the stack area is written with its address. It is then read back, and verified. Next, the memory is filled with 5Ah, read back and verified. This bit pattern is inverted (A5h) and the test repeated. Finally, all ones (FFh) and all zeroes are written to memory, read and verified. If any memory location in this area fails, it will be impossible for the processor to continue, and the boot procedure is halted. 3.1.3.2.1 Multifunction Peripheral Chip Test The second test of the boot ROM checks the multifunction peripheral chip (68901) and is initialized. At this point the stack memory passed message is displayed on the terminal. 3.1.3.2.2 Local RAM Test The remainder of the 128K byte local RAM (less the stack area, which is now in use) is tested in the same manner described above. Any failure will cause the address, expected value, and actual value to be displayed, and the processor is then reset. 3.1.3.2.3 Processor Traps Processor traps are tested by attempting a divide by zero operation and verifying that a processor exception did occur. If this test passes, the sanity test passed message is displayed on the terminal. 3.1.3.2.4 Firmware Verification A checksum verification is performed on the operational firmware in Flash ROM. If the checksum is good, control is passed to the Flash ROM firmware. If the checksum does not verify, a message is displayed on the terminal and the system enters the load firmware dialog. 3-4 Fault Isolation Fault Isolation 3.1 Service Adapter Facility 3.1.3.2.5 Load Firmware ------------------------------------------------------------ Note ------------------------------------------------------------ Field service updates to the firmware are accomplished by controller module FRU replacement. ------------------------------------------------------------ The system firmware is loaded into the ESE50 controller through the SAF using this option of the basic monitor. The load firmware dialog is displayed when the routine is entered (see Example 3-3). When the current loaded firmware is valid, the part number and revision number are displayed. If the present firmware is invalid, the program goes directly to the erase and load dialog. If the operator, when prompted (Y/N) to continue the procedure answers no (N), then the display returns to the basic monitor menu. A yes (Y) response will initiate the current firmware erase routine. Enter the new firmware part number and revision number (for example, 5257-001 1.1 [the dash and period are required] at the prompt). At this point, the Flash ROM erasure will begin. If the erasure is successful, the erase complete message followed by the cycles required to erase will be displayed. The begin firmware load prompt will be displayed. At this point, the operator must use an appropriate communication package (such as PROCOMM) to transfer the hexadecimal file to the controller. When the firmware load begins, an ever increasing series of periods appears as the load progresses. When the firmware load is completed successfully, the firmware load complete message and the new firmware part number and revision number will be displayed. Pressing any key will return you to the basic monitor menu. If a failure occurs during the firmware load procedure, then the firmware load failure message will appear. Example 3-3 Load Firmware FIRMWARE VERIFICATION CURRENT FIRMWARE IS L 5275-001 REV 1.6 . CHECKSUM IS VALID CONTINUE WILL ERASE CURRENT FIRMWARE. CONTINUE (Y/N) ? ENTER NEW FIRMWARE P/N AND REV = 5275-001 REV 1.6 BEGIN FLASH ROM ERASE ERASE COMPLETED 0068 ERASE CYCLES REQUIRED BEGIN FIRMWARE LOAD ........................................................... ......................... FIRMWARE LOAD COMPLETED NEW FIRMWARE IS L 5275-001 REV 1.6 HIT ANY KEY TO CONTINUE...... Fault Isolation 3-5 Fault Isolation 3.1 Service Adapter Facility 3.1.3.2.6 Display Configuration/Error Logs Configuration and error information are maintained in the EEPROM located on the controller module. This option allows the operator the ability of displaying this information on the terminal. Selection of this option causes a hexadecimal dump of the EEPROM contents as shown in Example 3-4. It should be noted that once the advanced monitor is entered, this information is available in formatted displays. Example 3-4 Display System Configuration/Error Log DISPLAY SYSTEM CONFIGURATION/ERROR LOG (ESC) - RETURN TO MAIN (CR) - NEXT . - PREVIOUS 0010000 FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF ................ 0010010 FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF ................ 0010020 FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF ................ 0010030 FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF ................ 0010040 FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF ................ 0010050 FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF ................ 0010060 FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF ................ 0010070 FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF ................ 0010080 FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF ................ 0010090 FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF ................ 00100A0 FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF ................ 00100B0 FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF ................ 00100C0 00000000 00000000 00000000 00000000 ................ 00100D0 00000000 00000000 00000000 00000000 ................ 00100E0 00000000 00000000 00000000 00000000 ................ 00100F0 00000000 00000000 00000000 00000000 ................ > 3.1.3.2.7 Display Identification This selection causes the controller identification information to be displayed as shown in Example 3-5. The displayed information includes the controller part number, revision level, the installed firmware part number, and the firmware revision level. The user should ignore the SCSI ID and plug-and-play information displayed, as they are not used by the ESE50. Example 3-5 Display Identification CONTROLLER MODULE A 5122-001 REV E FIRMWARE L 5275-001 REV 1.6 SCSI ID = 7 . PLUG-AND-PLAY NOT INSTALLED HIT ANY KEY TO CONTINUE 3.1.3.2.8 Continue Selection of option 4 causes the system to proceed to the advanced monitor functions. 3.1.3.2.9 Local RAM Test This option causes the local RAM test described previously to be performed. At the conclusion, if no errors are detected, the system is reset. 3-6 Fault Isolation Fault Isolation 3.1 Service Adapter Facility 3.1.4 Advanced Monitor The advanced monitor provides all of the service adapter facility (SAF) functions of the ESE50 system. The advanced monitor is entered automatically when the unit completes the normal powerup sequence, or after the boot procedure has completed and any of the following conditions occur: · Power-on diagnostic failure is detected · A configuration error is detected · Auto-restore is enabled and an error occurs Refer to Figure 3-1 for a complete menu tree. Figure 3-1 SAF Menu Tree (continued on next page) Fault Isolation 3-7 Fault Isolation 3.1 Service Adapter Facility Figure 3-1 (Cont.) SAF Menu Tree (continued on next page) 3-8 Fault Isolation Fault Isolation 3.1 Service Adapter Facility Figure 3-1 (Cont.) SAF Menu Tree Fault Isolation 3-9 Fault Isolation 3.1 Service Adapter Facility 3.1.4.1 Advanced Monitor Menu Example 3-6 shows the advanced monitor main menu. It is from this menu that all SAF functions are initiated. Capabilities are restricted by the ESE50 system status at the time an action is requested. When the system is off-line, both port switches out, the operator has full access to all the functions. When the system is on-line, either Port A or Port B switch in, only functions which will not disturb operation are available. Example 3-6 Advanced Monitor Menu ESE50 SOLID-STATE DISK ADVANCED MONITOR VERSION 2.3 1 -- DISPLAY MENU 2 -- CONFIGURE SYSTEM 3 -- OFFLINE DIAGNOSTICS 4 -- PATROL DIAGNOSTICS 5 -- MANUAL SAVE 6 -- MANUAL RESTORE 7 -- ENABLE SDI SPECIAL FUNCTIONS 8 -- POWER MONITOR SELECT OPTION 1-8 : *** ESE50 is ALREADY FORMATTED *** The menu lists the eight possible selections available; it is displayed until a selection is made. If the selection chosen is unavailable due to system status, then the operator is notified and the main menu is displayed. Some of the selections cause immediate action to be performed, other cause submenus to be displayed. 3.1.4.2 Display Menu Selecting the display option from the main menu causes the display menu (see Example 3-7) to be displayed. From this menu the operator may select one of six items to be displayed. In addition, various logs may be cleared. Example 3-7 Advanced Monitor Menu DISPLAY MENU 1 -- CORRECTABLE ERROR LOG 2 -- INTERNAL ERROR LOG 3 -- STORAGE STATUS 4 -- DIAGNOSTIC RESULTS 5 -- DATA RETENTION & COUNTS 6 -- IMB MEMORY DUMP 7 -- RETURN TO PREVIOUS MENU SELECT OPTION 1-7 : 3-10 Fault Isolation Fault Isolation 3.1 Service Adapter Facility 3.1.4.3 Correctable Error Log Selection of the correctable error log option on the display logs menu results in the correctable error log submenu being displayed (see Example 3-8). This submenu gives the operator the choice of displaying and/or clearing the correctable error log. Example 3-8 Advanced Monitor Menu SELECT OPTION 1-7 : CORRECTABLE ERROR LOG(S) 1 -- DISPLAY CORRECTABLE ERROR LOG 2 -- CLEAR CORRECTABLE ERROR LOG 3 -- RETURN TO DISPLAY MENU SELECT OPTION 1-3 : If a display selection is made, then the data is displayed on the screen (see Example 3-9). If the amount of data causes a screen overflow, the display pauses until the operator requests more data by pressing any key. This log displays a compilation of single bit error occurrences by storage module identification, the bank of memory affected, the BIT, the number of times the error occurred (maximum of 16), and the identification of the DRAM associated with this location. When the clear selection is made, the operator is asked for confirmation before clearing the log. A positive reply (Y) causes the correctable error log to be set to all zeros. A negative reply (N) will not clear the log, and the display log menu will be redisplayed. Example 3-9 Advanced Monitor Menu CORRECTABLE ERROR LOG MODULE BANK BIT NUMBER DRAM DISPLAY COMPLETE - TYPE ANY KEY TO RETURN TO MAIN MENU 3.1.4.4 Internal Error Log The ESE50 internal error log contains the last 32 errors detected by the unit. The error reporting format consists of a 32-byte header and 32 error descriptors of 32 bytes each. This option gives the operator the ability of displaying any or all of the error entries along with the header. In addition, all of the entries and/or the header may be cleared. Example 3-10 shows the internal error log submenu. If the operator selects the display option, a prompt requests the descriptor to be displayed (1 to 32), or "X" for all. The requested descriptor is displayed along with the header (see Example 3-11). If all was selected, then descriptor number 1 is displayed; pressing any key causes number 2 to be displayed until number 32 has been displayed or the operator presses an Esc key. At the end of the display operation, the display returns to the internal error log submenu. Fault Isolation 3-11 Fault Isolation 3.1 Service Adapter Facility Option 2 causes all of the error descriptor entries to be cleared. Option 3 clears the internal error log header. Example 3-10 Internal Error Log 1 -- DISPLAY ERROR LOG 2 -- CLEAR ERROR LOG 3 -- CLEAR ERROR HEADER 4 -- RETURN TO DISPLAY MENU SELECT OPTION 1-4 : Example 3-11 Error Log Header ERROR LOG HEADER Seeks since power up: 0 Cumulative seeks: 0 Elapsed time: 0 Cumulative errors: 0 ERROR LOG ENTRY NUMBER 2 Error type: NONE Error code: 00 FRU number: 00 Number of seeks: 0 Entry number: 1 Minutes: 0 Status byte: 00 DR status: FB Cylinder: 0000 Array and bank: 00 Storage status: 00 HIT ANY KEY TO CONTINUE ....... 3.1.4.5 Storage Status The storage status consists of 16 bytes of status for each of the 16 possible storage module locations. This option provides for the display and/or clearing of each of the 16 entries. Selection of storage status from the display log menu will display the storage status submenu shown in Example 3-12. The display option prompts the operator for the module to be displayed by physical slot number. The operator selects from 2 through 16 or "X" for all. If "X" is typed, the storage status for the first slot is displayed; pressing any key displays the next slot's status in sequence. By selecting the clear option, the operator may clear any of the storage status entries individually, or all 16 by selecting the entry number or "X" at the prompt. 3-12 Fault Isolation Fault Isolation 3.1 Service Adapter Facility Example 3-12 Display Log Menu 1 -- DISPLAY STATUS 2 -- CLEAR STATUS 4 -- RETURN TO DISPLAY MENU SELECT OPTION 1-3 : Example 3-13 Storage Status Display 1 -- DISPLAY STATUS 2 -- CLEAR STATUS 4 -- RETURN TO DISPLAY MENU SELECT OPTION 1-3 : Enter module number (2 - 16 or x for all): SLOT 2 STORAGE MODULE A4942-004 REV B 64 MegaBytes GD = 1 MO = 0 RO = 0 CO = 0 FO = 0 F1 = 0 FR = 1 BANK LA GD MRDS RDS CRD 0 00 1 00 00 00 1 01 1 00 00 00 2 02 1 00 00 00 3 03 1 00 00 00 MORE - TYPE ANY KEY TO DISPLAY MORE: ESC TO RETURN TO MAIN MENU 3.1.4.5.1 Data Retention and Counts Selection of this option causes a display of the current status of the data retention system. The status is Valid Data, In_Valid Data, or No_Data. In addition, the save and restore counts are displayed. Example 3-14 is an example of the display. The operator has the option of clearing either the save count, the restore count, or both, by selecting option 1 or 2. Example 3-14 Display Counts Data retention status: Invalid data Save count: 0 Restore Count: 1 1 -- Clear Save Count 2 -- Clear Restore Count 3 -- Return to Display Log Menu SELECT OPTION 1-3 : 3.1.4.6 IMB Memory Dump This feature is used to display the contents of IMB memory on the terminal. When selected, the user is prompted for the start address. This address is on eight longword (32 bytes) boundaries. The default value is "0" for the first access; afterwards it is end address from the last display. After the start address is entered, the operator is prompted for the end address. The default is Start + 11, or one full screen. Fault Isolation 3-13 Fault Isolation 3.1 Service Adapter Facility Once the end address is entered, the memory contents are displayed and the operator is prompted for the next start address. Example 3-15 shows a typical IMB memory dump display. Pressing the Esc key causes the IMB memory dump program to exit and control to be returned to the main menu. Example 3-15 IMB Memory Dump Display Start [ 0] : End [ 11] : 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000002 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000003 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000004 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000005 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000006 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000007 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000008 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000009 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000A 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000B 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000C 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000D 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000E 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000F 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000010 00000000 00000000 00000000 00000000 37000045 40B223BB 78BEA6D0 119A6A36 00000011 16DE0468 D3654D7D 00000002 00000000 00000001 00000001 00000001 00000001 Start [ 11] : 3.1.4.7 Configure System The configure system selection on the advanced monitor menu initiates the system configuration submenu display (see Example 3-16). Selections from this submenu allow the operator to display system hardware and software configurations and enter new configurations. Example 3-16 System Configuration SYSTEM CONFIGURATION 1 -- DISPLAY HARDWARE CONFIGURATION 2 -- DISPLAY DISK GEOMETRY 3 -- DISPLAY BACKUP CONFIGURATION 4 -- DISPLAY/SET SYSTEM S/N 5 -- IMB MEMORY FORMAT 6 -- RETURN TO MAIN MENU SELECT OPTION 1-6 : 3.1.4.7.1 Display Hardware Configuration The hardware configuration display (see Example 3-17) shows the total storage capacity of the ESE50 and the type and quantity of modules installed. The controller module identification information contains part number and revision letter, and the firmware part number and revision number. Each storage module is identified by part number, revision letter, and the storage capacity of the module. 3-14 Fault Isolation Fault Isolation 3.1 Service Adapter Facility Example 3-17 Hardware Configuration Display ESE50 SYSTEM HARDWARE CONFIGURATION TOTAL STORAGE CAPACITY = 640 MEGABYTES SLOT 0 CONTROLLER MODULE A 5122-001 REV E FIRMWARE L 5275-001 REV 1.6 SLOT 1 STORAGE MODULE A4942-004 REV B 64 MEAGBYTES SLOT 2 STORAGE MODULE A4942-004 REV B 64 MEAGBYTES SLOT 3 STORAGE MODULE A4942-004 REV B 64 MEAGBYTES SLOT 4 STORAGE MODULE A4942-004 REV B 64 MEAGBYTES SLOT 5 STORAGE MODULE A4942-004 REV B 64 MEAGBYTES SLOT 6 STORAGE MODULE A4942-004 REV B 64 MEAGBYTES SLOT 7 STORAGE MODULE A4942-004 REV B 64 MEAGBYTES SLOT 8 EMPTY SLOT 9 STORAGE MODULE A4942-004 REV B 64 MEAGBYTES SLOT 10 STORAGE MODULE A4942-004 REV B 64 MEAGBYTES SLOT 11 STORAGE MODULE A4942-004 REV B 64 MEAGBYTES SLOT 12 STORAGE MODULE A4942-004 REV B 64 MEAGBYTES SLOT 13 DISK SLOT 14 DISK SLOT 15 DISK SLOT 16 DISK HIT ANY KEY TO CONTINUE ....... 3.1.4.7.2 Display Disk Geometry The ESE50 is formatted automatically on power-on when either the auto restore feature is disabled or the saved volume is In_Valid or No_Data. The default geometry is shown in Example 3-18 for an ESE56 configuration (600 MB). The disk geometry may be changed using the IMB memory format option in the configure submenu. This change is not saved and is lost on a power down. The display disk geometry selection displays the current formatted geometry. Pressing the Ctrl C key returns the advanced monitor to the main menu. Example 3-18 Default Geometry--Sample Display ESE50 DISK GEOMETRY ESE50 Memory Size = 640 MegaBytes 128 Tracks/Cylinders 4 Sectors/Track + 0 Replacement Sectors/Tracks 2340 Total Cylinders 299593 Total Tracks 1198372 Total Sectors 2337 LBN Cylinders 299136 LBN Tracks 1196544 LBN Sectors 4 RCT Copy Size 4 RCT Tracks 4 RCT Sectors 258 FCT Copy Size 65 FCT Tracks 258 FCT Sectors 1 XBN Cylinders 128 XBN Tracks 512 XBN Sectors 2 DBN Cylinders 256 DBN Tracks 1024 DBN Sectors Gap0 = 17 Gap1 = 17 Gap2 = 32 HOST LBNs = 1196540 = 584 MEGABYTES HIT ANY KEY TO CONTINUE ....... 3.1.4.8 Display Backup Configuration The disk backup (data retention) feature of the ESE50 system contains several configurable options. This selection of the configuration menu is used to display the current backup configuration and allows the operator the ability of modifying these parameters. Fault Isolation 3-15 Fault Isolation 3.1 Service Adapter Facility Example 3-19 shows a sample display. The operator enters the option number of the configuration parameter to change. Options having only two values (that is, auto save: enable/disable) toggle the selection when the option number is entered. Options requiring a parameter (that is, auto save delay) will prompt for the value to be used. Example 3-19 Backup Display Sample 1 -- Auto Save: ENABLED 2 -- Auto restore: ENABLED 3 -- Auto save delay: 5 seconds 4 -- Drive SCSI ID: 1 5 -- Verify backup: OFF 6 -- Use linked cmds: OFF 7 -- WP on low patt: OFF 8 -- Save/rest on spin down/up: ENABLED 9 -- Return to config menu SELECT OPTION 1-8 : The following describes the options. · AUTO SAVE: (ENABLE/DISABLE)--Turns on/off the auto save function. When enabled, an auto save will be performed whenever an AC power failure is detected and the battery is ready. If disabled, an auto save will not be performed and the system will shut down. · AUTO RESTORE: (ENABLE/DISABLE)--Turns on/off the auto restore function. When enabled, an auto restore will be performed each time power is applied to the unit and the saved volume is valid. When disabled, the auto restore will not be performed. The memory is first initialized and then formatted. · AUTO SAVE DELAY: --Sets the delay between the time a power failure is detected and the beginning of an auto save operation if enabled. The operator may select any value in seconds from 1 to 255. · DRIVE SCSI ID:--Allows the operator to select the SCSI ID of the backup disk. The default is 1 and should only be changed if the backup disk is also changed. · VERIFY BACKUP: --The default setting is OFF, and causes the system to shut down following an auto or manual save operation. If set to ON, the data saved will be verified prior to completion of the save operation. It should be noted that the verify operation requires additional time to perform and drains the battery further during an auto save operation. · USE LINKED CMDS:--Allows the system to use linked commands during a save or restore operation. The default is OFF and cannot be used if the backup disk is an RZ35 or RZ27. · WP ON LOW BATT:--When set to ON, the system will write protect the memory following a powerup sequence until the battery contains sufficient charge to support an auto save operation. If set to OFF, the unit will not be write protected. It should be noted that writes will be allowed and the backup volume will be marked In_Valid. Should a power failure occur before the battery becomes ready, all data will be lost. 3-16 Fault Isolation Fault Isolation 3.1 Service Adapter Facility · SAVE/REST ON SPIN DOWN/UP:--Turns on/off the SAVE on spin down. When enabled, a SAVE will be initiated on a spin down. If disabled, a SAVE will not be performed on a spin down. 3.1.4.8.1 Display/Set System Serial Number Each ESE50 system contains a unique system serial number, which is reported as the drive ID in the common characteristics. This number is the four-digit system serial number and is saved in the EEPROM located on the controller module. Whenever the controller module is replaced or moved to a new system, this number must be changed in the controller EEPROM. This selection from the configuration menu (see Example 3-20) is used to facilitate this. When selected, the current serial number is displayed and the operator is prompted for a new number. If no entry is made, the number is unchanged. The new number must consist of four digits. Example 3-20 System ID Sample 1 -- DISPLAY HARDWARE CONFIGURATION 2 -- DISPLAY DISK GEOMETRY 3 -- DISPLAY BACKUP CONFIGURATION 4 -- DISPLAY/SET SYSTEM S/N 5 -- IMB MEMORY FORMAT 6 -- RETURN TO MAIN MENU SELECT OPTION 1-6 : Current system serial number: 0013 Enter new system serial number (0 - 9999) 3.1.4.8.2 IMB Memory Format During normal operation, the SDI format is installed in the IMB memory from the restore operation if a valid volume was saved or written automatically if the backup disk volume is either In_Valid or No_Data during a power on or reset sequence. This option of the SAF allows the operator to reformat the IMB memory if it was changed during diagnostics, or if a different disk geometry is desired. Example 3-21 shows the dialog for the IMB memory format operation. After the sign on message, the operator is warned that continuing with the operation will overwrite all memory. Fault Isolation 3-17 Fault Isolation 3.1 Service Adapter Facility Example 3-21 IMB Memory Format Display * * * * E S E 5 0 I N T E R F A C E * * * * * *** ESE50 Will Be Formatted *** WARNING ---- You are about to write to all of memory. Input ^Z to skip question. ^C to abort. Gap0 [ 17]: Gap1 [ 17]: Gap2 [ 32]: RBN/Track [ 0]: Sector/Track [ 4]: Track/Cylinder [ 128]: FCT RCT Copies [ 1]: Data Preamble [ 15]: Headr Preamble [ 12]: This is followed by a series of nine parameters to be used by the format. The operator may use all of the defaults by typing Ctrl Z. At this point, the format operation begins; once completed the terminal displays the disk geometry. The operator may return to the main menu by typing Ctrl C. If the default values are to be changed, each parameter is displayed in order and the new values entered. Typing Ctrl C at any time will abort the format operation and return the SAF to the main menu. The following defines the parameters and their default values: · GAP0--Defines the duration in word clocks (16 SDI bit times each) from the end of sector/index to the sync at the start of the header field. The default value is 17. · GAP1--Defines the duration in word clocks (16 SDI bit times each) from the end of the header field to the sync at the start of the data field. The default value is 17. · GAP2--Defines the duration in word clocks (16 SDI bit times each) from the end of the data field to the start of the next sector/index (reinstruction period). The default value is 32. · RBN/TRACK--Defines the number of replacement blocks allocated to each track. The default value is 0. · SECTOR/TRACK--Defines the number of sectors per track. The default is 4. · TRACKS/CYLINDER--Defines the number of tracks per cylinder. The default is 128. · FCT RCT COPIES--Defines the number of FCT and RCT copies formatted. The default is 1. · DATA PREAMBLE--Defines a value reported during a Get Subunit Characteristics command. The SDI controller uses this value during Format and Write commands. The default value is 15. · HEADER PREAMBLE--Defines a value reported during a Get Subunit Characteristics command. The SDI controller uses this value during Format commands. The default value is 12. 3-18 Fault Isolation Fault Isolation 3.1 Service Adapter Facility 3.1.4.9 Offline Diagnostics When the offline diagnostics are selected from the advanced monitor menu, the offline diagnostics submenu is displayed (see Example 3-22). However, if the system is online, the operator is notified and the advanced monitor menu is redisplayed. The offline diagnostics menu gives the operator nine types of tests to run. These tests allow complete testing of the ESE50 system for both system verification and fault isolation. During testing, all errors are either displayed on the terminal screen or saved in controller RAM. The EEPROM error logs are not written to. The offline diagnostics are initiated and controlled from menus and dialog displayed on the terminal attached to the SAF. Example 3-22 Offline Diagnostics 1 -- SYSTEM VERIFICATION 2 -- CONTROLLER TEST 3 -- BACKUP DISK TESTS 4 -- STORAGE TESTS 5 -- DISPLAY ERROR LOG 6 -- IMB MEMORY EXERCISER 7 -- SCAN IMB MEMORY 8 -- TEST SGL BIT ERR LOGIC 9 -- TEST DBL BIT ERR LOGIC A -- EDAC TEST B -- RETURN TO MAIN MENU SELECT OPTION 1-B : 3.1.4.9.1 System Verification The system verification option allows the operator to select one mode that will run all the controller tests and a storage test on all installed memory. Example 3-23 shows the dialog on the terminal when the system verification option is selected. The operator enters the number of passes of the verification test to run. Any number between one pass and 256 passes can be selected. A decision to halt the tests on any error encountered is required at the prompt HALT ON ERROR[Y /N]. A positive response (Y) will cause the testing to halt and display the error message when an error is detected. A negative response (N) allows the system to log any detected error and continue the testing. Before the verification tests are started, the operator is given an opportunity to cancel the operation by a negative response N to prompts ANY DATA WILL BE DESTROYED, CONTINUE [Y/N]. A positive response Y allows the testing to start. As the tests run, the number of passes will increment as each pass is completed. Any errors encountered during testing will be accumulated in the error column. Fault Isolation 3-19 Fault Isolation 3.1 Service Adapter Facility Example 3-23 System Verification Sample SYSTEM VERIFICATION TESTING SELECT NUMBER OF PASSES [ 1 - 256 ] = 3 HALT ON ERROR [Y/N] = Y ANY DATA WILL BE DESTROYED. CONTINUE [Y/N] = Y PASS 3 ERROR 0 HIT ANY KEY TO CONTINUE ...... 3.1.4.9.2 Controller Tests Selection of controller tests causes the display of the controller tests submenu (see Example 3-24). This submenu allows the operator to select the tests to be performed. Selections 1 through 7 cause individual tests to be performed; selection 8 causes the SDI test submenu to be displayed. Example 3-24 Controller Test Submenu 1 -- POWER ON DIAGNOSTIC 2 -- LOCAL RAM TEST 3 -- DUAL PORT RAM TEST 4 -- MFP CHIP TEST 5 -- SCSI SHIP TEST 6 -- LCA TEST 7 -- IMB TEST 8 -- SDI TEST 9 -- RETURN TO MAIN MENU SELECT OPTION 1-9 : POWER ON DIAGNOSTIC This selection allows the user to run all of the controller power-on diagnostics. The tests consists of: LOCAL RAM TEST DUAL PORT RAM TEST MFP CHIP TEST SCSI CHIP TEST LCA TESTS IMB TESTS Each of these tests may be run individually from other selections in this menu and are described in the following paragraphs. As the tests are completed, a message is displayed on the terminal (see Example 3-25). Example 3-25 Power-On Samples MFP CHIP TEST ----- TEST COMPLETED LOCAL RAM TEST ----- TEST COMPLETED DUAL PORT RAM TEST ----- TEST COMPLETED SCSI CHIP TEST ----- TEST COMPLETED LCA1 - LCA4 TEST ----- TEST COMPLETED INTERNAL MEMORY BUS TEST ----- PASSED HIT ANY KEY TO CONTINUE ...... 3-20 Fault Isolation Fault Isolation 3.1 Service Adapter Facility LOCAL RAM TEST This selection tests all of local RAM with the exception of vector and stack space. First, a single 1 bit is shifted through each byte, pattern 01h through FFh. The data pattern is first written, then read and verified. This is followed by a floating 0's pattern, FEh through 00h. Again the pattern is first written, then read and verified. Progress messages are displayed on the terminal as the test proceeds (see Example 3-26). If an error occurs, its address, expected data and actual data are displayed. Example 3-26 Local RAM Test LOCAL RAM TEST . STARTING ADDRESS ---- 00300800 00002000 BYTES MEMORY TESTED 00004000 BYTES MEMORY TESTED 00006000 BYTES MEMORY TESTED 00008000 BYTES MEMORY TESTED 0000A000 BYTES MEMORY TESTED 0000C000 BYTES MEMORY TESTED 0000E000 BYTES MEMORY TESTED 00010000 BYTES MEMORY TESTED 00012000 BYTES MEMORY TESTED 00014000 BYTES MEMORY TESTED 00016000 BYTES MEMORY TESTED 00018000 BYTES MEMORY TESTED 0001A000 BYTES MEMORY TESTED 0001C000 BYTES MEMORY TESTED 0001E000 BYTES MEMORY TESTED 00020000 BYTES MEMORY TESTED TESTED COMPLETED HIT ANY KEY TO CONTINUE ...... DUAL PORT RAM TEST The dual port RAM test is used to verify the FIFO area in the dual port RAM. The test pattern and procedure is the same as described above for the local RAM. That is, first a shifting 1's pattern is written, read and verified for each location. This is followed by a shifting 0's pattern. If an error occurs, its address, expected data and actual data are displayed. Progress messages are displayed as shown in Example 3-27. Example 3-27 Dual Port RAM Test DUAL PORT RAM TEST . STARTING ADDRESS ---- 00600000 00000400 BYTES MEMORY TESTED 00000800 BYTES MEMORY TESTED 00000C00 BYTES MEMORY TESTED 00001000 BYTES MEMORY TESTED TESTED COMPLETED HIT ANY KEY TO CONTINUE ...... MFP CHIP TEST Proper operation of the multifunction peripheral (MFP) chip is tested by writing, reading and verifying data in two of its read/write registers. The data patterns consists of shifting 1's, 01h through FFh, followed by shifting 0's, FEh through 00h. If an error is detected, the expected data and actual data are displayed. Fault Isolation 3-21 Fault Isolation 3.1 Service Adapter Facility SCSI CHIP TEST Operation of the SCSI interface controller chip is tested by resetting and initializing the chip. The controller chip interrupts are verified. LCA TEST Operation of the LCS chips is tested by presetting and initializing each chip. IMB TEST This selection tests the IMB bus by reading and verifying identification data from each of the installed storage modules. SDI TEST This section of the offline controller diagnostics is used to verify operation of the IMB memory to SDI bus data path including interrupts and command detection. This selection causes the SDI test's submenu (see Example 3-28) to be displayed. The following paragraphs describe the individual tests. Example 3-28 SDI Tests Sample 1 -- INTERNAL TESTS 2 -- A CONTROL EXTERNAL LOOPBACK 3 -- A DATA EXTERNAL LOOPBACK 4 -- B CONTROL EXTERNAL LOOPBACK 5 -- B DATA EXTERNAL LOOPBACK 6 -- ALL EXTERNAL LOOPBACKS 7 -- INTERNAL TESTS AND ALL EXTERNAL LOOPBACKS 8 -- RETURN TO MASTER CONTROLLER TESTS SELECT OPTIONS 1-8 : INTERNAL TESTS The internal tests are used to verify correct operation of the TTL serial paths and SDI bus access to the IMB memory. These are run automatically at powerup; any failure inhibits the system from being placed on line. In addition, they may be run from the SAF SDI test menu. Once started they run continuously until stopped by the operator typing a Ctrl C. Any errors detected are printed on the screen. There are eight internal tests in the SDI test section. These are run sequentially and described below. 1. IORTEST IORTEST initializes the SDI hardware and then checks the CMD_STATUS and the SDI_STAT read-only registers for proper contents. 2. CLKTEST CLKTEST verifies that the diagnostic clock functions properly. 3. SECTEST SECTEST uses the diagnostic clock to generate sector and index clocks by means of the format PAL. Both the clocks and the resulting interrupts are checked. 3-22 Fault Isolation Fault Isolation 3.1 Service Adapter Facility 4. ICLTEST ICLTEST establishes an internal control loopback path and uses sector clock out to generate SDI initialize in. The SDI initialize request bit and the SDI initialize interrupt are verified. 5. IDLTEST IDLTEST establishes internal control and data loopback paths. The response transmission hardware sends valid frames to the receiver hardware using the diagnostic clock. The resulting input commands are verified for proper data patterns. 6. IDFTEST IDFTEST establishes internal control and data loopback paths. The response transmission hardware sends invalid frames to the receiving hardware using diagnostic clock. The resulting inputs are checked for the proper data patterns. 7. IMBTEST IMBTEST writes data into the dual-port RAM and then uses DMA to transfer to the IMB test area. The IMB test area is at the very end of the IMB memory space and not used by the normal disk format. The data is then DMA'ed from the IMB to the dual-port RAM, where it is verified. 8. SMTEST The SMTEST is divided into two sections: SDI to IMB and IMB to SDI. SDI to IMB establishes internal control and data loopback paths. The response transmission hardware sends data to the receiver hardware using the diagnostic clock. A format at index command is transmitted by means of the diagnostic clock followed by data patterns. After the transfer is complete, the data placed in the dual-port RAM by the DMA hardware is verified. IMB to SDI establishes internal control and data loopback paths. A data pattern is written in the dual-port RAM and DMA'ed to the IMB test area. The response transmission then sends a read command to the receiver hardware using diagnostic clock. DMA to the SDI is started and the diagnostic clock is used to shift the input shift register while sync is searched for. Once sync is detected, the data pattern is verified. A CONTROL EXTERNAL LOOPBACKS This test requires that a loopback plug (that connects the signal A Control Out to A Control In) be installed on the SDI port A connector. The test verifies that a serial path exists from the TTL logic section through the ECL control encoder back through the ECL control decoder and TTL logic, and that this path functions properly at the normal SDI clock rate. The test checks for loss of SDI control clock and for the generation of the SDI initialize interrupt. As the test executes, the word BUSY flashes on the display. Any errors detected are displayed. The test runs until terminated by the operator typing Ctrl C. A DATA EXTERNAL LOOPBACKS This test requires that a loopback plug (that connects the signal A Read/Response Data to A Write/Command Data) be installed on the A Port SDI connector. The test verifies the serial path from the TTL logic section through the ECL data encoder, back through the ECL data decoder and TTL logic. It verifies that the loop functions properly at the normal SDI clock rate. Fault Isolation 3-23 Fault Isolation 3.1 Service Adapter Facility The response transmission hardware sends data to the receiver hardware. The test checks for loss of the SDI data clock, checks that a command is received without a packet error, has the proper bit pattern, and checks that an SDI command interrupt is generated. As the test executes, the word BUSY flashes on the display. All errors are displayed. The test runs until terminated by the operator typing Ctrl C. B CONTROL EXTERNAL LOOPBACKS This test requires that a loopback plug (that connects signal B Control Out to B Control In) be installed on the SDI port B connector. The test verifies that a serial path exists from the TTL logic section through the ECL control encoder back through the ECL control decoder and TTL logic and that this path functions properly at the normal SDI clock rate. The test checks for loss of SDI control clock and for the generation of the SDI initialize interrupt. As the test executes, the word BUSY flashes on the display. Any errors detected are displayed. The test runs until terminated by the operator typing Ctrl C. B DATA EXTERNAL LOOPBACKS This test requires that a loopback plug (that connects signal B Read/Response Data to B Write/Command Data) be installed on the B Port SDI connector. The test verifies the serial path from the TTL logic section through the ECL data encoder, back through the ECL data decoder and TTL logic. It verifies that the loop functions properly at the normal SDI clock rate. The response transmission hardware sends data to the receiver hardware. The test checks for loss of the SDI data clock, checks that a command is received without a packet error, has the proper bit pattern, and checks that an SDI command interrupt is generated. As the test executes, the word BUSY flashes on the display. All errors are displayed. The test runs until terminated by the operator typing Ctrl C. ALL EXTERNAL LOOPBACKS Selection of option 6 by the operator causes each of the four external loopback tests described in the following paragraphs to be run in sequence. On each alternate pass the word BUSY is displayed on the terminal if no errors have occurred. Any errors are displayed. The test runs until terminated by the operator typing Ctrl C. INTERNAL TESTS AND ALL EXTERNAL LOOPBACKS Selection of option 7 by the operator causes the internal test to be executed, followed by the external loopback tests described in the following paragraphs to be run in sequence. On each alternate pass the word BUSY is displayed on the terminal if no errors have occurred. Any errors are displayed. The test runs until terminated by the operator typing Ctrl C. 3-24 Fault Isolation Fault Isolation 3.1 Service Adapter Facility 3.1.4.9.3 Backup Disk Tests Selection of this option displays the backup disk test submenu to be displayed as shown in Example 3-29. Selection of the surface scan option causes the disk to be scanned on a block-by- block basis. As each block is read, the transfer is checked for errors. Any errors detected are reported on the display. Once the selection is made, the maximum LBN number is displayed. As the test progresses, the current LBN being tested is displayed (see Example 3-30). The operator may abort the test at any time by typing a character. Selection of the random reads options causes the disk to be scanned on a block basis in random order. As a block is read, the transfer is checked for errors. The operator may abort the test at any time by a character. Example 3-29 Backup Disk Test Sample 1 -- SURFACE SCAN 2 -- RANDOM READS 3 -- RETURN TO MAIN MENU SELECT OPTIONS 1-3 : Example 3-30 LBN Tested Sample 1 -- SURFACE SCAN 2 -- RANDOM READS 3 -- RETURN TO MAIN MENU SELECT OPTIONS 1-3 : MAX LBN = 1,310,720 Press any key to abort scan LBN # 2.384 3.1.4.9.4 Display Error Log This selection causes the display menu to be entered. The operations available to the operator are identical to those described in Section 3.1.4.2. 3.1.4.9.5 IMB Memory Exerciser The IMB memory exerciser allows all or any portion of the installed memory to be tested. The test consists of first writing a pattern beginning with the starting pattern selected and incrementing by one for each longword (32 bits). A second pass reads each location in sequence and verifies the data. The operator may select both the starting and ending address to be tested. Address selection is on eight longword, 32-byte boundaries. During the test, ECC generation and checking are disabled. Example 3-31 shows the dialog on the terminal when the IMB memory exerciser is selected. · The starting address is selected; the default is 0, the beginning of installed memory. · The ending address is selected; the default is the end of installed memory. Fault Isolation 3-25 Fault Isolation 3.1 Service Adapter Facility · Report interval allows the operator to select the frequency that the status line is updated to while the test is being performed. The default value is 1000; update every 1000 addressed (8 longwords). It should be noted that the larger the selection, the faster that the exerciser will run. · Starting pattern allows for the selection of the data pattern to be used. During writes, this value is incremented for each successive address. · Halt flag when set to a 1 causes the exerciser to stop whenever an error is detected. The operator must type a character to continue. When the flag is set to a 0, any error(s) are reported and the exerciser continues to run. · Report error flag when set to a 1 allows each detected error to be reported on the terminal. If set to a 0, any errors detected are not reported. · Write flag when set to a 1 enables the write pass. When set to a 0, the write pass is not performed. · Read flag when set to a 1 enables the read/verify pass. When set to a 0, the read/verify pass is not performed. · Increment seed flag, if set to a 1, causes the seed (starting pattern) to be incremented by one for each pass of the memory exerciser. If set to a 0, the seed remains constant for each pass of the exerciser. Once the final entry in the dialog described above is complete, the exerciser starts running. Once started the exerciser continues to run until either an error is detected and the halt flag is set, or until the operator types a Ctrl C. Typing Ctrl C causes the advanced monitor to return to the main menu. The operator is notified that the memory must be formatted before the system may be placed on line. Example 3-31 shows an example of the status line displayed while the exerciser is running. Example 3-31 IMB Memory Exerciser * * * * * E S E 5 0 M E M O R Y E X E R C I S O R * * * * * ^C to abort ESE50 Memory Size = 640 MeagBytes Start [ 0]: End [1400000]: Report Interval [ 1000]: Starting Pattern [ 0]: Halt Flag [ 1]: Report Error Flag [ 1]: Write Flag [ 1]: Read Flag [ 1]: Increment Seed Flag [ 1]: WRITE mmp_address = 00010000 Pattern Increments from 00000000 3.1.4.9.6 Scan IMB Memory The operator may use this to test the installed memory of ECC errors. The test is nondestructive. When selected the number of the maximum LBN is displayed and testing starts. As the test progresses, the LBN being tested is displayed. Any errors are displayed. Example 3-32 shows the terminal dialog when the IMB memory scan is running. 3-26 Fault Isolation Fault Isolation 3.1 Service Adapter Facility Example 3-32 Terminal Dialog 1 -- SYSTEM VERIFICATION 2 -- CONTROLLER TEST 3 -- BACKUP DISK TESTS 4 -- DISPLAY ERROR LOG 5 -- IBM MEMORY EXERCISER 6 -- SCAN IBM MEMORY 7 -- RETURN TO MAIN MENU SELECT OPTION 1-7 : Max LBN = 1,310,720 Press any key to abort scan LBN # 9.472 3.1.4.9.7 TEST SGL BIT ERR LOGIC This test verifies the single bit detection and correction logic operation. It inserts a known correctable error and verifies the ECC logic detects and corrects it. 3.1.4.9.8 TEST DBL BIT ERR LOGIC This test verifies the double bit detection and correction logic operation. It inserts a known uncorrectable error and verifies that it is detected. 3.1.4.9.9 EDAC TEST The EDAC test verifies the operation of the ECC chip. It checks that it properly detects, corrects and reports data error. 3.1.4.10 Patrol Diagnostics When the patrol diagnostics are selected, the patrol diagnostic submenu is displayed (refer to Example 3-33). The submenu allows the patrol diagnostics to be enabled or disabled. The IMB SCRUB enables soft errors scrubbing in the storage arrays. If a single bit error is detected on a read, the word is written back to the corrected array. The OCP TEST enables a periodic communications test with the OCP. This test ensures that the OCP is operational in event of an error or status change. Example 3-33 Patrol Diagnostic Submenu 1 -- IMB SCRUB ENABLED 2 -- OCP TEST ENABLED 3 -- RETURN TO MAIN MENU SELECT OPTION 1-3 : 3.1.4.11 Manual Save Using this selection, the operator may perform a manual save of the IMB memory contents to the backup disk. The operation performed is identical to the automatic backup during a power failure except power remains applied to the system. There is no check made of the IMB memory contents and as long as the save operation completes successfully, the saved volume is marked VALID. Fault Isolation 3-27 Fault Isolation 3.1 Service Adapter Facility 3.1.4.12 Manual Restore This selection allows the operator the option of restoring the contents of the backup disk to the IMB memory. If the volume on the backup disk is VALID, the operation completes automatically once the selection is made. If the volume on the backup disk is marked IN_VALID or NO_DATA, the operator is notified and given the choice of continuing the restore or aborting and returning to the main menu. 3.1.4.13 Enable SDI Special Functions The ESE50 includes a series of special functions that are used to display real- time system status while the unit is on line to the host. Selection of this option invokes these features which are called by means of key word entries. Once the special functions are enabled, normal SAF functions are not available until control is returned by disabling the special functions. When selected, the operator is notified to enter the key word for the desired function when the main menu is displayed. Any key entry causes the main menu to be displayed. Example 3-34 shows the features which may be called from the special functions. The following paragraphs describe these features. Example 3-34 SDI Special Functions 6 -- MANUAL RESTORE 7 -- ENABLE SDI SPECIAL FUNCTIONS 8 -- POWER MONITOR SELECT OPTION 1-8 : p Illegal Terminal Input -- Valid Commands Are: log -- Display Logging Information short -- Display Functions in Short Form long -- Display Functions in Log Form off -- Turn Display Off clear -- Clear Function/Error Logs status -- Display System Status reset -- Reset Controller (Power On) dvar -- Display Variables mon -- Return to Monitor 3.1.4.13.1 Log - Display Logging Information The ESE50 controller maintains a semi-permanent log of the SDI functions received and any errors detected. Entering the key word LOG causes this log to be displayed (see Example 3-35). The display remains on the screen until it is redisplayed or another function is entered. 3-28 Fault Isolation Fault Isolation 3.1 Service Adapter Facility Example 3-35 Display Logging Information * * * * * E S E 5 0 L O G G I N G * * * * * COMMAND COUNT LOGGING COUNT SDI-RESET................ :0000 0000 SUCCESSFUL-RESPONSE... :0000 0000 SET_GROUP................ :0000 0000 UNSUCCESSFUL-RESPONSE. :0000 0000 CHANGE-MODE.............. :0000 0000 ILLEGAL COMMANDS...... :0000 0000 CHANGE-FLAGS............. :0000 0000 DRIVE_ERROR........... :0000 0000 DIAGNOSE................. :0000 0000 TRANSMISSION-ERROR.... :0000 0000 DISCONNECT............... :0000 0000 LEVEL-2-ERRORE........ :0000 0000 AUTO-DISCONNECT.......... :0000 0000 DIAGNOSTICS-ERROR..... :0000 0000 DRIVE-CLEAR.............. :0000 0000 IMB NON-EXISTANT MEM.. :0000 0000 ERROR-RECOVERY........... :0000 0000 IMB UNCORRECT. ERRORS. :0000 0000 DRIVE-CHARACTERISTICS.... :0000 0000 IMB SINGLE BIT ERRORS. :0000 0000 DRIVE-SUBCHARACTERISTICS. :0000 0000 RCT-DATA-ERRORS....... :0000 0000 STATUS-COMMANDS.......... :0000 0000 RCT-DRIVE-FAULTS...... :0000 0000 SEEK-DRIVE............... :0000 0000 RCT-CONTROL-FAULTS.... :0000 0000 DRIVE-ONLINE............. :0000 0000 RCT-PARITY-ERRORS..... :0000 0000 DRIVE-RUN................ :0000 0000 RCT-OVERRUN-ERRORS.... :0000 0000 READ-MEMORY.............. :0000 0000 NO-DATA-CLOCKS........ :0000 0000 RECALIBRATE-DRIVE........ :0000 0000 FRAMING ERRORS........ :0000 0000 WRITE-MEMORY............. :0000 0000 STARTING-ERRORS....... :0000 0000 TOPOLOGY................. :0000 0000 CONTINUE-ERRORS....... :0000 0000 TOTAL-FUNCTIONS.......... :0000 0000 TOO-MANY-PARAMETERS... :0000 0000 3.1.4.13.2 Short - Display Functions in Short Form SDI functions received by the ESE50 and its responses are displayed on the terminal in real time when invoked by the keyword SHORT. The short form remains active until another function is invoked or the display is terminated by the OFF keyword. The short form does not interfere with the operation of the ESE50 on-line functions. 3.1.4.13.3 Long - Display Functions in Long Form The long form of the SDI functions display provides the same function as the short form with additional information displayed. It should be noted that the long form requires additional processing time and can slow down system operation. 3.1.4.13.4 Off - Turn Display Off The key word OFF is used to stop either short or long functional display once invoked. 3.1.4.13.5 Clear - Clear Functions/Error Log This selection is used to clear the SDI functions/error log. 3.1.4.13.6 Status - Display System Status The key word STATUS causes the SDI GET STATUS information to be displayed on the terminal. It shows the state of the ESE50 at the time of the status request. 3.1.4.13.7 Reset - Reset Controller (Power On) The key word RESET causes an SDI reset to occur. 3.1.4.13.8 DVAR - Display Variables The key word DVAR displays the GET COMMON and GET SUBMIT characteristics on the terminal. 3.1.4.13.9 MON - Return to Monitor Entering the key word MON while in SDI special functions causes the SAF to return to the main menu. Fault Isolation 3-29 Fault Isolation 3.1 Service Adapter Facility 3.1.4.14 Power Monitor The ESE50 power monitor assembly includes its own SAF functions. During normal operation the SAF functions from the controller RS-232 port are passed through the power monitor. This selection causes the port to be switched and the power monitor SAF is placed on line to the terminal. The next section of this chapter describes the power monitor screens and functions. The power monitor remains in control of the SAF until a switch to the controller is performed. Refer to Section 3.1.5 for further details. 3.1.5 Power Monitor The power monitor section of the SAF firmware is contained in a PROM located on the power monitor assembly, and executed by the power monitor microprocessor. In normal operation (controller SAF on line), the RS-232 port of the controller is passed through the power monitor. When power monitor functions are selected from the main menu, the port is switched and the power monitor RS-232 port is connected to the terminal. It remains connected in this configuration until a switch (option C) back to the controller is performed. 3.1.5.1 Power Monitor Screen Example 3-36 shows the power monitor screen. The top portion displays the status of the ESE50 power system, while the bottom portion provides the menu used to perform power monitor functions. Example 3-36 Power Monitor POWER MONITOR STATUS FIRMWARE - L5280-001 REV 1C MARGIN PSM1 PSM2 TIME DATE TEMPERATURE ------ ----- ----- ----- -------- ----------- LOGIC - N/A NORM 12:06 05/28/92 LOGIC - 25 C FLOAT - N/A NORM 12:06 05/28/92 POWER - 24 C BUS PSM1 PSM2 ---- ----- ----- VBB --- 5.10 BATTERY VCC --- 4.96 ------- +5V --- LOW 5.45 STATUS -- READY +12V -- 12.4 LOW 12.7 CHARGER - OFF -5.2V - 5.16 LOW 5.63 TEST ---- OFF FLOAT - 35.0 LOW 35.3 VLOTAGE - 30.0 A. SET TIME OF DAY I. PSM2 LOGIC MARGIN HIGH B. START BATTERY TEST J. PSM2 LOGIC MARGIN LOW C. SWITCH TO CONTROLLER K. PSM2 LOGIC NOMINAL D. PSM1 LOGIC MARGIN HIGH L. PSM2 FLOAT MARGIN LOW27 E. PSM1 LOGIC MARGIN LOW M. PSM2 FLOAT NOMINAL F. PSM1 LOGIC NOMINAL N. MARGIN FLOAT LOW25 G. PSM1 FLOAT MARGIN LOW27 O. CANCEL FLOAT LOW25 H. PSM1 FLOAT NOMINAL 3-30 Fault Isolation 4 ------------------------------------------------------------ Error Handling 4.1 Introduction This chapter describes the error handling mechanisms of the ESE50 electronic storage element. It also provides information on controller/drive errors. The following information is provided for each error condition: · Name of the error · Brief description of the error · Error code · FRU that is the most probable cause of the error The error bits in the error byte of the generic status are: 80 DE Drive fault 40 RE Transmission error 20 PE Protocol error 10 DF Initialization failure (diagnostic) 08 WE Drive write protected Error Handling 4-1 Error Handling 4.2 FRUs and FRU Numbers 4.2 FRUs and FRU Numbers Table 4-1 lists the field replaceable units (FRUs) with a brief description and designator. Table 4-1 FRUs and FRU Numbers ------------------------------------------------------------ FRU Description FRU Number ------------------------------------------------------------ UPROC Controller Module 10 BACKP Backplane 20 MEM00 Storage Module, Physical Address 00 30 MEM01 Storage Module, Physical Address 01 31 MEM02 Storage Module, Physical Address 02 32 MEM03 Storage Module, Physical Address 03 33 MEM04 Storage Module, Physical Address 04 34 MEM05 Storage Module, Physical Address 05 35 MEM06 Storage Module, Physical Address 06 36 MEM07 Storage Module, Physical Address 07 37 MEM08 Storage Module, Physical Address 08 38 MEM09 Storage Module, Physical Address 09 39 MEM10 Storage Module, Physical Address 10 3A MEM11 Storage Module, Physical Address 11 3B MEM12 Storage Module, Physical Address 12 3C MEM13 Storage Module, Physical Address 13 3D MEM14 Storage Module, Physical Address 14 3E MEM15 Storage Module, Physical Address 15 3F MEMORY Storage Modules, multiple 40 OCP Operator Control Panel 50 SDI SDI Cables/Connectors/Bulk Head Assembly 51 PMM Power Monitor Module 60 PSM1 Power Supply Module #1 61 PSM2 Power Supply Module #2 62 BATT Battery 65 DISK1 Backup Disk Unit #1 70 DISK2 Backup Disk Unit #2 71 HOST Host Controller 80 OPER Operator 81 ------------------------------------------------------------ 4-2 Error Handling Error Handling 4.3 ESE50 Drive Faults (Error Type - DE) 4.3 ESE50 Drive Faults (Error Type - DE) Table 4-2 provides the error code associated with a specific drive fault syndrome and the FRUs most likely to be at fault. Table 4-2 Drive Faults ------------------------------------------------------------ Error Suspected FRU ------------------------------------------------------------ Error Syndrome Code 1st 2nd ------------------------------------------------------------ RDS During Transfer 01 MEMORY UPROC Multiple RDS Transfer 02 MEMORY UPROC Nonexistent Memory 03 MEMORY UPROC Memory Controller Abort 06 MEMORY CRD Overflow 1E MEMORY Read/Write Safety Interrupt Without Cause 46 UPROC Drive Disabled by "DD" Bit Set 4A Battery Failure B0 BATT Low Battery Charge BF OCP Failure C0 OCP Backup Disk Fault D0 DISK1 Data Retention Backup No Data E0 Data Retention Backup Invalid E1 Over Temperature, Warning @45C F0 Power Supply Module 1 Fault F1 PSM1 PMM Power Supply Module 2 Fault F2 PSM2 PMM Over Temperature, Shutdown @50C F3 Over Temperature, Shutdown @55C F4 SPS System On FF ------------------------------------------------------------ Error Handling 4-3 Error Handling 4.4 ESE50 Transmission Error (Error Type - RE) 4.4 ESE50 Transmission Error (Error Type - RE) Table 4-3 provides the error code associated with a specific transmission error syndrome and the FRUs most likely to be at fault. Table 4-3 Transmission Errors ------------------------------------------------------------ Error Suspected FRU ------------------------------------------------------------ Error Syndrome Code 1st 2nd ------------------------------------------------------------ Out of Range Enhance Mode 04 HOST UPROC SDI Frame Sequence 07 HOST UPROC SDI Checksum 08 HOST UPROC SDO Framing 09 HOST UPROC Drive Disconnect Due to Controller 1D HOST SDI Sector Overrun 1F UPROC HOST SDI Response Timed Out 41 HOST UPROC TCR and Not Read/Write Ready Fault 43 HOST UPROC Format Command and Not Enabled 44 HOST UPROC SDI Transfer 4F SDI UPROC ------------------------------------------------------------ 4-4 Error Handling Error Handling 4.5 ESE50 Protocol Error (Error Type - PE) 4.5 ESE50 Protocol Error (Error Type - PE) Table 4-4 provides the error code associated with a specific protocol error syndrome and the FRUs most likely to be at fault. Table 4-4 Protocol Errors ------------------------------------------------------------ Error Suspected FRU ------------------------------------------------------------ Error Syndrome Code 1st 2nd ------------------------------------------------------------ SDI Level 1/2 Opcode Parity 0A SDI UPROC SDI Invalid Opcode 0B SDI UPROC SDI Command Length Level 1/2 0C SDI UPROC SDI Invalid Command with Drive Error 0D SDI UPROC SDI Invalid Group Select Level 2 0E SDI UPROC SDI Write Enable/Write Protect 0F SDI UPROC SDI Transfer Command with Drive Error 12 HOST UPROC SDI Invalid Format Request 15 HOST UPROC SDI Invalid Variant Request 16 HOST UPROC SDI Invalid Command in Variant Mode 17 HOST UPROC SDI Invalid Cylinder Address 1A HOST UPROC SDI Invalid Error Recovery Level 29 HOST UPROC SDI Invalid Subunit Specified 2A HOST UPROC SDI Invalid Diagnose Memory Request 2B HOST UPROC SDI Spindle Not Ready-Seek/Recall 2C HOST UPROC Spinup Inhibited by Controller Flags 2E SDI Run Command/Run in Stop Position 2F HOST UPROC SDI Invalid Read Memory Region 40 HOST UPROC Not Online/Seek Command Issued 42 HOST UPROC Invalid Disconnect Command/"TT" Bit 47 HOST UPROC Invalid Write Memory Control/Offset 48 HOST UPROC Invalid Command During Topology Command 49 HOST UPROC ------------------------------------------------------------ Error Handling 4-5 Error Handling 4.6 ESE50 Initialization Faults (Error Type - DF) 4.6 ESE50 Initialization Faults (Error Type - DF) Table 4-5 provides the error code associated with a specific initialization fault syndrome and the FRUs most likely to be at fault. Table 4-5 Initialization Faults ------------------------------------------------------------ Error Suspected FRU ------------------------------------------------------------ Error Syndrome Code 1st 2nd ------------------------------------------------------------ IMB Access 50 UPROC MEMORY Data Retention, No Disk Response 51 DISK1 FIFO Data 52 UPROC Single Bit Correction Failure 53 UPROC Power Up Test Failure 54 UPROC Multiple Bad Memory Arrays 5E MEMORY UPROC Configuration Fault 5F MEMORY UPROC Bad Array Module, 0 60 MEM00 Bad Array Module, 1 61 MEM01 Bad Array Module, 2 62 MEM02 Bad Array Module, 3 63 MEM03 Bad Array Module, 4 64 MEM04 Bad Array Module, 5 65 MEM05 Bad Array Module, 6 66 MEM06 Bad Array Module, 7 67 MEM07 Bad Array Module, 8 68 MEM08 Bad Array Module, 9 69 MEM09 Bad Array Module, 10 6A MEM10 Bad Array Module, 11 6B MEM11 Bad Array Module, 12 6C MEM12 Bad Array Module, 13 6D MEM13 Bad Array Module, 14 6E MEM14 Bad Array Module, 15 6F MEM15 IORTEST Fault 70 UPROC CLKTEST Fault 71 UPROC SECTEST Fault 72 UPROC IDLTEST Fault 73 UPROC IDFTEST Fault 74 UPROC ICLTEST Fault 76 UPROC IMBTEST Fault 77 UPROC SMTEST Fault, IMB to SDI 78 UPROC SMTEST Fault, SDI to IMB 79 UPROC Internal SDI Loopback Hung 7A UPROC (continued on next page) 4-6 Error Handling Error Handling 4.6 ESE50 Initialization Faults (Error Type - DF) Table 4-5 (Cont.) Initialization Faults ------------------------------------------------------------ Error Suspected FRU ------------------------------------------------------------ Error Syndrome Code 1st 2nd ------------------------------------------------------------ IMB Header Verify, BAD 7D UPROC MEMORY IMB Header Verify, Abort 7E UPROC MEMORY IMB Format, Noack/Abort 7F UPROC MEMORY IMB Initialize, Noack/Abort 80 UPROC MEMORY IMB ECC Check, Noack/Abort 81 UPROC MEMORY IMB ECC Check, Controller 82 UPROC MEMORY Invaild Test Number EE ------------------------------------------------------------ 4.7 ESE50 Write Protected (Error Type - WE) Table 4-6 provides the error code associated with a specific write protection error syndrome and the FRUs most likely to be at fault. Table 4-6 Write Protected Errors ------------------------------------------------------------ Error Suspected FRU ------------------------------------------------------------ Error Syndrome Code 1st 2nd ------------------------------------------------------------ Write/Format While Protected 4E Host UPROC ------------------------------------------------------------ Error Handling 4-7 5 ------------------------------------------------------------ ESE50 Rev A and B -- Removal and Replacement Procedures 5.1 Introduction This chapter describes removal and replacement procedures for the ESE50 FRUs. The following FRU does not require the ESE50 to be removed from its external enclosure: OCP, part number 29-30072-01 The following FRUs require the ESE50 to be removed from its external enclosure: OCP cable 29-30076-01 Controller module 29-30068-01/02 Array module(s) 29-30073-01 Power monitor module (ESE50 Rev A and B) 29-30069-01 Power monitor module (ESE50 Rev A, B, C) 29-31173-01 Power supply module(s) 29-30070-01 RZ27 disk drive RZ27-E RZ35 disk drive RZ35-E RZ35/27 disk backup assembly (ESE50 Rev A and B) 29-30075-01 RZ35/27 disk backup assembly (ESE50 Rev A, B, C) 29-31171-01 Battery pack 12-38755-01 AC input box 29-30071-01 Fan (ESE50 Rev A and B) 29-30074-01 Fan (ESE50 Rev C) 29-31172-01 Follow the steps in Section 5.4 to remove the ESE50 from its external enclosure and gain access to the ESE50 internal FRUs. Then follow the steps in the section pertaining to the FRU you need to remove and replace. Finally, follow the steps in Section 5.10 before placing the ESE50 back in its external enclosure. ESE50 Rev A and B -- Removal and Replacement Procedures 5-1 ESE50 Rev A and B -- Removal and Replacement Procedures 5.2 Precautions 5.2 Precautions ------------------------------------------------------------ Note ------------------------------------------------------------ When repairing the ESE50, always perform a backup of all data to either another disk or tape, if available. ------------------------------------------------------------ Before attempting any removal or replacement procedure, read the following precautions: · Only qualified service personnel should remove or install FRUs. · Before you remove or install FRUs, power down the system. Refer to Section 5.4.1 for the orderly shutdown procedure. · Static electricity can damage integrated circuits. Always use a grounded antistatic wrist strap (PN 29-11762-00) and grounded work surface when working with the internal parts of a computer system. 5-2 ESE50 Rev A and B -- Removal and Replacement Procedures ESE50 Rev A and B -- Removal and Replacement Procedures 5.3 Removing and Replacing the OCP 5.3 Removing and Replacing the OCP First, perform the following steps: 1. Perform backup to disk or tape. 2. Do an OpenVMS dismount. 3. Power down the ESE50. Remove the OCP by grasping the OCP and pull gently forward (Figure 5-1). Once the OCP is free, place it aside. Refer to the system or cabinet service information for opening any doors or panels to gain access to the ESE50. Replace the OCP and labels. Figure 5-1 Removing/Replacing the OCP ESE50 Rev A and B -- Removal and Replacement Procedures 5-3 ESE50 Rev A and B -- Removal and Replacement Procedures 5.4 Removing the ESE50 5.4 Removing the ESE50 This section describes how to remove the ESE50 from its external enclosure. It does not describe the mechanical procedures for removing the ESE50 from the system or cabinet enclosure. Refer to the system or cabinet enclosure service guide or manual for these procedures. Refer to the system or cabinet service information for opening any doors or panels to gain access to the ESE50. Once the ESE50 is removed from its external enclosure, follow the steps in Section 5.4 to gain access to the internal FRUs. 5.4.1 Shutting Down the ESE50 Before performing an FRU procedure, do the following: 1. Perform backup to disk or tape. 2. Do an OpenVMS dismount. 3. Spin down the unit by pressing the RUN/STOP switch (out). ------------------------------------------------------------ Note ------------------------------------------------------------ Unit is performing an unload/save operation which may take up to 15 minutes. ------------------------------------------------------------ 4. Once the RUN LED is off, power down the ESE50 (switch at rear of unit). 5-4 ESE50 Rev A and B -- Removal and Replacement Procedures ESE50 Rev A and B -- Removal and Replacement Procedures 5.4 Removing the ESE50 5.4.2 Disconnecting the ESE50 Do not disconnect the unit until you have performed all the steps in Section 5.4.1. Open the rear door of the system or cabinet enclosure according to the enclosure service guide or manual. Once you have gained access to the rear of the ESE50, follow these steps. Refer to Figure 5-2. 1. Place the ON/OFF switch to the "0" or off position. 2. Disconnect the AC power cord at the rear of the ESE50. 3. Loosen the screws on the SDI cables and remove the cables. Figure 5-2 Rear View of ESE50 ESE50 Rev A and B -- Removal and Replacement Procedures 5-5 ESE50 Rev A and B -- Removal and Replacement Procedures 5.4 Removing the ESE50 5.4.3 Gaining Access to the ESE50 Internal FRUs Do not disconnect the unit until you have performed all the steps in Section 5.4.1. This section describes how to remove the ESE50 from its external enclosure, remove the top shock mount assembly, and the right and left side covers. Note that the ESE50 can be mounted either right-side up or up-side down in its external enclosure. This allows the same external enclosure to be used in various system and cabinet enclosures such as the SA600, SA800, and SA900 series products. Once you have removed the external enclosure (containing the ESE50) from the system or cabinet enclosure, place the external enclosure on a level and stable work surface, such as a table. Perform the following steps to remove the ESE50 from the external enclosure: 1. Remove the OCP by grasping the OCP (see Section 5.3) and pull gently forward. Once the OCP is free, place it aside. 2. Use a 5/32-inch Allen wrench to loosen the four Allen head captive screws securing the ESE50 to the front of the external enclosure (Figure 5-3). Do not fully loosen one screw at a time. Back each screw out part way, then another until all are loose. Figure 5-3 Removing/Replacing the Front Panel Assembly ------------------------------------------------------------ CaptiveðTx AllenðTx ScrewsðTx SHR-010000442-10-MPSðTx ------------------------------------------------------------ CaptiveðTx AllenðTx ScrewsðTx ------------------------------------------------------------ OCPðTx ConnectorðTx ESE50 Rev A and B -- Removal and Replacement Procedures 5.4 Removing the ESE50 3. Allow the front bezel to swing down and lie flat on the work surface (Figure 5-4). Ensure that the OCP cable has enough slack to allow the front bezel to lie flat. ------------------------------------------------------------ Caution ------------------------------------------------------------ Ensure that the front panel cable does not become pinched between the fan assembly and the main chassis. ------------------------------------------------------------ Figure 5-4 Removing/Replacing the Front Panel Assembly SHR-010000442-11-MPSðTx ------------------------------------------------------------ OCP ConnectorðTx RS-232 ConnectorðTx ------------------------------------------------------------ ESE50 Rev A and B -- Removal and Replacement Procedures 5.4 Removing the ESE50 4. Loosen the two screws on either side of the OCP cable and disconnect it (Figure 5-5). 5. Loosen, but do not remove, the four captive screws located on the rear panel of the external enclosure. Figure 5-5 Removing/Replacing the OCP Cable ------------------------------------------------------------ OCP Cable ScrewsðTx SHR-010000442-15-MPSðTx ESE50 Rev A and B -- Removal and Replacement Procedures 5.4 Removing the ESE50 6. Note the orientation of the ESE50 in its external enclosure by locating the handle by the fan. a. If the RS-232 connector is below the fan, then rotate the entire enclosure 180 degrees before proceeding. b. If the RS-232 connector is above the fan, then proceed. 7. Grip the handle and pull forward gently until the ESE50 is out of the external enclosure (Figure 5-6). Place the external enclosure out of the way. Figure 5-6 Removing/Replacing the ESE50 SHR-010000442-14-MPSðTx ------------------------------------------------------------ ESE50 Rev A and B -- Removal and Replacement Procedures 5.4 Removing the ESE50 8. Remove the top shock mount assembly. a. At the rear of the ESE50, disconnect the RS-232 cable on the shock mount assembly at the power monitor module (Figure 5-7). Figure 5-7 Removing/Replacing the RS-232 Cable SHR-010000442-16-MPSðTx CoverðTx ScrewsðTx RS-232ðTx ConnectionðTx ------------------------------------------------------------ ESE50 Rev A and B -- Removal and Replacement Procedures 5.4 Removing the ESE50 b. Loosen the top two captive screws on the fan assembly (Figure 5-8). Figure 5-8 Fan Assembly, Top Captive Screws ------------------------------------------------------------ CoverðTx ScrewsðTx CoverðTx ScrewsðTx ------------------------------------------------------------ CoverðTx ScrewsðTx HandleðTx ------------------------------------------------------------ Captive ScrewsðTx ------------------------------------------------------------ SHR-010000442-12-MPSðTx ESE50 Rev A and B -- Removal and Replacement Procedures 5.4 Removing the ESE50 c. Loosen the top four captive screws on the shock mount assembly (Figure 5-9). d. Lift the shock mount assembly off and place it aside. Figure 5-9 Removing/Replacing the Top Shock Mount Assembly Cover ScrewsðTx SHR-010000442-22-MPS ðTx ------------------------------------------------------------ Cover ScrewsðTx ------------------------------------------------------------ ESE50 Rev A and B -- Removal and Replacement Procedures 5.4 Removing the ESE50 Figure 5-10 Removing/Replacing the Covers Cover ScrewsðTx SHR-010000442-21-MPS ðTx ------------------------------------------------------------ Cover ScrewsðTx ESE50 Rev A and B -- Removal and Replacement Procedures 5.4 Removing the ESE50 Figure 5-10 (Cont.) Removing/Replacing the Covers CoverðTx ScrewsðTx ------------------------------------------------------------ CoverðTx ScrewsðTx SHR-010000442-13-MPSðTx ESE50 Rev A and B -- Removal and Replacement Procedures 5.5 Removing and Replacing ESE50 Modules Figure 5-11 ESE50 Modules ESE50 Rev A and B -- Removal and Replacement Procedures 5-15 ESE50 Rev A and B -- Removal and Replacement Procedures 5.5 Removing and Replacing ESE50 Modules 5.5.1 Controller and Array Modules Do not disconnect the unit until you have performed all the steps in Section 5.4.1. 1. To remove a module from the ESE50 backplane: a. Insert a small screwdriver, scribe, or needle-nose pliers into the removal holes on either side of the module. b. Pull the corner of the module gently forward until the module is free from the backplane. c. Grasp the center of the module with your fingers and pull the module the rest of the way out (Figure 5-12). 2. To replace a module into the ESE50 backplane: a. Slide the module into its appropriate slot. b. Use your thumbs to push the module into the backplane. c. Ensure that the module is properly seated. 3. Proceed to Section 5.10. Figure 5-12 Removing/Replacing the Controller Module ControllerðTx ModuleðTx ------------------------------------------------------------ SHR-010000442-06-MPSðTx ------------------------------------------------------------ ESE50 Rev A and B -- Removal and Replacement Procedures 5.5 Removing and Replacing ESE50 Modules 5.5.2 Power Monitor and Power Supply Modules Do not disconnect the unit until you have performed all the steps in Section 5.4.1. 1. To remove a module from the ESE50 backplane: a. When removing a power supply module you must first disconnect the connector (refer to Figure 5-13) connecting the module to the AC input box. Figure 5-13 Removing/Replacing the Power Supply Module ------------------------------------------------------------ AC ConnectorðTx SHR-010000442-26-MPSðTx ESE50 Rev A and B -- Removal and Replacement Procedures 5.5 Removing and Replacing ESE50 Modules b. Insert a small screwdriver, scribe, or needle-nose pliers into the removal holes on either side of the module. c. Pull the corner of the module gently forward until the module is free from the backplane. ------------------------------------------------------------ Caution ------------------------------------------------------------ When removing or replacing either module, ensure that the heat sinks on the power supply module do not come in contact with the underside of the power monitor module. ------------------------------------------------------------ d. Grasp the center of the module with your fingers and pull the module the rest of the way out (Figure 5-14). 2. To replace a module into the ESE50 backplane: a. Slide the module into its appropriate slot. b. Use your thumbs to push the module into the backplane. c. Ensure that the module is properly seated. 3. Proceed to Section 5.10. Figure 5-14 Removing/Replacing the Power Monitor Module ------------------------------------------------------------ PowerðTx MonitorðTx ModuleðTx ------------------------------------------------------------ SHR-010000442-17-MPSðTx ESE50 Rev A and B -- Removal and Replacement Procedures 5.6 Removing and Replacing the RZ35 Disk Drive 5.6 Removing and Replacing the RZ35 Disk Drive Do not disconnect the unit until you have performed all the steps in Section 5.4.1. This section describes how to remove and replace the RZ35 disk drive from the ESE50 and the slide module it resides on. ------------------------------------------------------------ Note ------------------------------------------------------------ An RZ35 disk drive can be replaced with an RZ27 disk drive. ------------------------------------------------------------ Once you have removed the ESE50 in its external enclosure using Section 5.4 as a reference, use the following steps to remove and replace the RZ35 disk drive: 1. To remove a RZ35 from the ESE50 backplane, use your fingers to pull the module and RZ35 from the backplane (Figure 5-15). Figure 5-15 Removing/Replacing the RZ35 with Module SHR-010000442-20-MPSðTx ------------------------------------------------------------ RZ35ðTx ------------------------------------------------------------ ESE50 Rev A and B -- Removal and Replacement Procedures 5.6 Removing and Replacing the RZ35 Disk Drive 2. To remove the disk drive from its module (Figure 5-16): a. Remove the four Phillips head screws securing the drive to the module. b. Disconnect the signal cable and then the power cable from the rear of the drive. Figure 5-16 Removing/Replacing RZ35 from Module ------------------------------------------------------------ Phillip ScrewsðTx ------------------------------------------------------------ Phillip ScrewsðTx PowerðTx ConnectorðTx SignalðTx ConnectorðTx ------------------------------------------------------------ SHR-010000442-18-MPSðTx ESE50 Rev A and B -- Removal and Replacement Procedures 5.6 Removing and Replacing the RZ35 Disk Drive c. Remove the drive and place it aside. You will need to check its option connector jumpers (Figure 5-17). 3. Replace an RZ35 into the ESE50 backplane. a. Set the jumpers on the option connector to the setting of the previous drive. b. Connect the power cable and then the signal cable. c. Place the disk drive on its module and secure with the four Phillips screws. 4. Push the module and RZ35 back into the backplane. Be sure the module is properly seated. 5. Proceed to Section 5.10. Figure 5-17 RZ35 Jumper Settings ESE50 Rev A and B -- Removal and Replacement Procedures 5-21 ESE50 Rev A and B -- Removal and Replacement Procedures 5.7 Removing and Replacing the Battery Backup Pack 5.7 Removing and Replacing the Battery Backup Pack Do not disconnect the unit until you have performed all the steps in Section 5.4.1. This section describes how to remove and replace the battery backup pack. Once you have removed the ESE50 in its external enclosure using Section 5.4 as a reference, use the following steps to remove and replace the battery backup pack: 1. Remove the power monitor and power supply modules as described in Section 5.5.2. 2. Locate and remove the nut/standoff (facing the left side of the ESE50) at the bottom of the battery pack bracket near the fan. Use a 5/16-inch nutdriver and remove the nut/standoff. Be careful not to damage the fan (Figure 5-18). 3. Remove the two Phillips head screws at the other end of the bracket. Figure 5-18 Removing/Replacing the Battery Pack ------------------------------------------------------------ Battery CableðTx (J90)ðTx SHR-010000442-05-MPSðTx ESE50 Rev A and B -- Removal and Replacement Procedures 5.7 Removing and Replacing the Battery Backup Pack 7. Replace the nut/standoff (facing the left side of the ESE50) at the bottom of the battery pack bracket near the fan. Be careful not to damage the fan. 8. Replace the power monitor and power supply modules as described in Section 5.5.2. 9. Proceed to Section 5.10. Figure 5-19 Battery Pack Cable (J90) ------------------------------------------------------------ StandoffðTx Battery PackðTx BracketðTx PhillipsðTx ScrewsðTx SHR-010000442-04-MPSðTx ------------------------------------------------------------ ESE50 Rev A and B -- Removal and Replacement Procedures 5.8 Removing and Replacing the AC Input Box 5.8 Removing and Replacing the AC Input Box Do not disconnect the unit until you have performed all the steps in Section 5.4.1. This section describes how to remove and replace the AC input box. Once you have removed the ESE50 in its external enclosure using Section 5.4 as a reference, use the following steps to remove and replace the AC input box: 1. Remove the power monitor and power supply modules as described in Section 5.5.2. 2. Locate and remove the nut (facing the left side of the ESE50) at the bottom of the battery pack bracket near the fan. Use a 5/16-inch nutdriver and remove the nut. Be careful not to damage the fan. 3. Remove the two Phillips head screws at the other end of the bracket. 4. Disconnect the battery pack (J90 on backplane) and remove it. 5. Remove the four Phillips head countersink screws that hold the AC box (Figure 5-20). Figure 5-20 AC Input Box Screws Phillips ScrewsðTx ------------------------------------------------------------ SHR-010000442-02-MPSðTx ESE50 Rev A and B -- Removal and Replacement Procedures 5.8 Removing and Replacing the AC Input Box 6. Pull the AC box out enough to gain access to its cables (Figure 5-21). 7. Unplug at P70 and J71. 8. Use needle-nose pliers to remove the two wires from the ON/OFF switch. REMEMBER, blue on top and brown on the bottom. 9. Use needle nose-pliers to replace the two wires from the ON/OFF switch. REMEMBER, blue on top and brown on the bottom. 10. Plug the cables to P70 and J71. 11. Set the voltage to the same setting as the previously removed AC box. 12. Place the AC box back into the ESE50. 13. Replace the four Phillips head countersink screws that hold the AC box. 14. Place the battery pack in the ESE50 and connect the cable to J90. 15. Secure the battery pack bracket with two Phillips head screws. 16. Replace the nut (facing the left side of the ESE50) at the bottom of the battery pack bracket near the fan. Be careful not to damage the fan. 17. Replace the power monitor and power supply modules as described in Section 5.5. 18. Proceed to Section 5.10. Figure 5-21 Removing/Replacing the AC Input Box To AC Input BoxðTx ------------------------------------------------------------ To AC Inout BoxðTx ------------------------------------------------------------ SHR-010000442-24-MPSðTx ESE50 Rev A and B -- Removal and Replacement Procedures 5.9 Removing and Replacing the Fan Assembly 5.9 Removing and Replacing the Fan Assembly Do not disconnect the unit until you have performed all the steps in Section 5.4.1. This section describes how to remove and replace the fan assembly. Once you have removed the ESE50 in its external enclosure using Section 5.4 as a reference, use the following steps to remove and replace the fan assembly: 1. Loosen all four captive screws and carefully pull the fan assembly away from the unit. 2. Disconnect the fan cable at J35 on the backplane (Figure 5-22). 3. Remove the fan assembly by loosening the bottom two captive screws. 4. Replace with the new fan assembly and connect the fan cable to J35. 5. Secure the fan with all four captive screws only at this time. 6. Proceed to Section 5.10. Figure 5-22 Removing/Replacing the Fan Cable Fan ðTx CableðTx ------------------------------------------------------------ SHR-010000442-25-MPSðTx ESE50 Rev A and B -- Removal and Replacement Procedures 5.10 Placing the ESE50 into External Enclosure 5.10 Placing the ESE50 into External Enclosure Do not disconnect the unit until you have performed all the steps in Section 5.4.1. Perform the following steps to place the ESE50 into the external enclosure: 1. Replace the left and right side cover (Figure 5-10) and secure it with Phillips head screws located on the front (4), rear (4), and top (6). 2. Replace the top shock mount assembly (Figure 5-9). a. Place the shock mount assembly on the unit. b. Tighten the top four captive screws on the shock mount assembly. c. Tighten the top two captive screws on the fan assembly. d. At the rear of the ESE50, connect the cable from the shock mount assembly to the power monitor module (Figure 5-7). 3. Remember the proper orientation for the ESE50. The RS-232 connector should be above the fan assembly. 4. Place the ESE50 into the external enclosure; use the handle and push the ESE50 back into the external enclosure (Figure 5-6). 5. Tighten the four captive screws located on the rear panel of the external enclosure. 6. Lay the front bezel down in front of the ESE50 (fan assembly facing you) (Figure 5-4). 7. Connect the OCP cable and tighten the two screws on either side of the OCP cable to secure it (Figure 5-5). ------------------------------------------------------------ Caution ------------------------------------------------------------ Ensure that the front panel cable does not become pinched between the fan assembly and the main chassis. ------------------------------------------------------------ 8. Use a 5/32-inch Allen wrench to tighten the four Allen head captive screws securing the ESE50 to the front of the external enclosure. Do not fully loosen a screw at a time. Back each screw out, then another until all are loose (Figure 5-3). 9. Replace the operator control panel (Figure 5-1). ESE50 Rev A and B -- Removal and Replacement Procedures 5-27 6 ------------------------------------------------------------ ESE50 Rev C -- Removal and Replacement Procedures 6.1 Introduction The maintenance philosophy for the ESE50 unit (Rev. C) is the replacement of a field replaceable unit. This chapter describes the removal and replacement procedures of the field replaceable units (FRUs). The following FRU does not require the ESE50 to be removed from its external enclosure: OCP, part number 29-30072-01 The following FRUs require that the ESE50 be removed from its external enclosure: FRU ------------------------------------------------------------ Part Number ------------------------------------------------------------ OCP cable 29-30076-01 Controller module 29-30068-02 Array modules 2 29-30073-01 Power monitor module 29-31173-01 Power supply module 29-30070-01 RZ27 disk drive 1 , 2 RZ27-E RZ27 disk backup assembly 2 29-31171-01 Battery pack 12-38755-01 AC input box 29-30071-02 Fan 29-31172-01 ------------------------------------------------------------ 1 All AA/BB, BA/BB, and DA/DB variants for the ESE50 Rev C will use an RZ27 disk drive. 2 These FRUs are backward compatible with Rev. A and B ESE50 units. ESE50 Rev C -- Removal and Replacement Procedures 6-1 ESE50 Rev C -- Removal and Replacement Procedures 6.2 ESE50 Shutdown 6.2 ESE50 Shutdown This section describes the recommended procedures for an orderly ESE50 shutdown whenever maintenance service is to be performed. ------------------------------------------------------------ Caution ------------------------------------------------------------ When repairing the ESE50, always perform a backup of all data to either another disk or tape, if available. All AA/BB, BA/BB, and DA/DB variants for the ESE50 Rev C will use an RZ27 disk drive. ------------------------------------------------------------ 6.2.1 Precautions Before attempting any removal or replacement procedure, read the following precautions: · Only qualified and authorized service personnel should remove or install FRUs. · Before you remove or install FRUs, perform a system backup and then power down the system. · Static electricity can damage integrated circuits. Therefore, alway use a grounded antistatic wrist strap (PN 29-11762-00) and a grounded work surface when working with the internal parts of a computer system. 6.2.2 Shutdown Procedure Before performing an FRU procedure, do the following: 1. Perform a system backup to disk or tape. 2. Initiate an OpenVMS dismount operation. 3. Spin down the ESE50 disk drive by pressing the RUN/STOP switch (out position). ------------------------------------------------------------ Note ------------------------------------------------------------ The ESE50 will perform an unload/save operation, which may take approximately eight minutes to complete. ------------------------------------------------------------ 4. Once the RUN LED is off, power down the ESE50 (the ac power switch at rear of unit). 6.3 ESE50 Assembly This section describes how to remove the ESE50 assembly from its external enclosure. It does not describe the mechanical procedure for removing the ESE50 from a system or cabinet enclosure. Refer to the system or cabinet enclosure documentation for these procedures. Refer to the system or cabinet enclosure documentation for information on opening any doors or panels to gain access to the ESE50. Once the ESE50 is removed from its external enclosure, follow the steps in Section 6.5 to gain access to its internal FRUs. 6-2 ESE50 Rev C -- Removal and Replacement Procedures ESE50 Rev C -- Removal and Replacement Procedures 6.3 ESE50 Assembly 6.3.1 Disconnecting the ESE50 Open the rear door of the system or cabinet enclosure, according to the procedures detailed in the enclosure documentation. ------------------------------------------------------------ Note ------------------------------------------------------------ Do not disconnect the ESE50 hardware until all the steps in Section 6.2.2 and Section 6.3.1 have been performed. ------------------------------------------------------------ With access to the rear of the ESE50, perform the steps listed below while referring to Figure 6-1: 1. Set the ac power switch to 0 (off). 2. Unplug the power cord from the unit's ac power receptacle. 3. Disconnect the SDI cable(s), use a screwdriver to loosen the screws on the SDI cable connector(s), and then remove the cable(s). Figure 6-1 ESE50 -- Rear View ESE50 Rev C -- Removal and Replacement Procedures 6-3 ESE50 Rev C -- Removal and Replacement Procedures 6.4 Operator Control Panel Removal 6.4 Operator Control Panel Removal To remove the operator control panel (OCP) from the ESE50, perform the following steps: ------------------------------------------------------------ Note ------------------------------------------------------------ Do not disconnect the ESE50 until all the steps in Section 6.2.2 and Section 6.3.1 have been performed. ------------------------------------------------------------ 1. Access the ESE50 assembly unit by referring to the system or cabinet documentation for information on opening any doors or panels to gain access to the ESE50. 2. Remove the operator control panel (OCP) by grasping the panel and gently pulling forward (Figure 6-2). Once the OCP is free, place it aside. 3. Replace the OCP and its labels. Figure 6-2 OCP -- Removable/Replacement 6-4 ESE50 Rev C -- Removal and Replacement Procedures ESE50 Rev C -- Removal and Replacement Procedures 6.5 Accessing the Internal FRUs 6.5 Accessing the Internal FRUs This section describes the removal of the ESE50 from its external enclosure, its right and left side covers, and its top shock mount assembly. ------------------------------------------------------------ Note ------------------------------------------------------------ Do not disconnect the ESE50 until all the steps in Section 6.2.2 and Section 6.3.1 have been performed. The ESE50 can be mounted either right-side up or up-side down within its external enclosure. This allows the same external enclosure to be used in various system or cabinet enclosures. For example: the SA600, SA800, and the SA900 series products. ------------------------------------------------------------ 6.5.1 Front Panel Removal Perform the following steps to remove the front panel assembly: 1. Remove the ESE50 and its external enclosure from either the system or cabinet enclosure. 2. Place the ESE50 on a level and stable work surface, such as a table. 3. Remove the operator control panel (OCP) by grasping the panel and gently pulling it forward (Figure 6-2). Once the OCP is free, place it aside. 4. Loosen the four Allen head captive screws, using a 5/32-inch Allen wrench, that secure the ESE50 to the front of the external enclosure (Figure 6-3). 5. Allow the front bezel to swing down and rest flat on the work surface (Figure 6-4). ------------------------------------------------------------ Caution ------------------------------------------------------------ Ensure that the OCP cable has enough slack to allow the front bezel to rest flat and that the cable does not become pinched between the fan assembly and the main chassis. ------------------------------------------------------------ 6. Loosen the two screws on either side of the OCP cable (Figure 6-5) and disconnect the cable. ESE50 Rev C -- Removal and Replacement Procedures 6-5 ESE50 Rev C -- Removal and Replacement Procedures 6.5 Accessing the Internal FRUs Figure 6-3 Front Panel Assembly -- Removable Figure 6-4 Front Panel Assembly -- Removable/Replacement 6-6 ESE50 Rev C -- Removal and Replacement Procedures ESE50 Rev C -- Removal and Replacement Procedures 6.5 Accessing the Internal FRUs Figure 6-5 OCP Cable Disconnection ESE50 Rev C -- Removal and Replacement Procedures 6-7 ESE50 Rev C -- Removal and Replacement Procedures 6.5 Accessing the Internal FRUs 6.5.2 ESE50 Chassis Removal Perform the following steps to remove the ESE50 chassis from its external enclosure: 1. Loosen, do not remove, the four captive screws located on the rear panel of the external enclosure (Figure 6-1). 2. Observe and note the orientation of the ESE50 within its external enclosure by locating the handle below the fan. a. If the RS-232 connector is below the fan, then rotate the entire enclosure 180 degrees before proceeding. b. If the RS-232 connector is above the fan, then proceed. 3. Slide the ESE50 chassis from the external enclosure by gripping the handle and gently pulling it forward (Figure 6-6). 4. Place the external enclosure aside. Figure 6-6 The ESE50 -- Removable/Replacement 6-8 ESE50 Rev C -- Removal and Replacement Procedures ESE50 Rev C -- Removal and Replacement Procedures 6.5 Accessing the Internal FRUs 6.5.3 Side Panel Removal Perform the following steps to remove the right and left side panels from ESE50 chassis: 1. Remove all the Phillips head screws located on both covers (Figure 6-7): front (4 screws) and rear (4 screws). 2. Lift the right side cover up and off; place it aside. 3. Lift the left side cover up and off; place it aside. Figure 6-7 Side Panel Removal (continued on next page) ESE50 Rev C -- Removal and Replacement Procedures 6-9 ESE50 Rev C -- Removal and Replacement Procedures 6.5 Accessing the Internal FRUs Figure 6-7 (Cont.) Side Panel Removal 6-10 ESE50 Rev C -- Removal and Replacement Procedures ESE50 Rev C -- Removal and Replacement Procedures 6.5 Accessing the Internal FRUs 6.5.4 Top Shock Mount Assembly Removal Perform the following steps to remove the top shock mount assembly from the ESE50 chassis: 1. Disconnect the RS-232 cable at the power monitor module (Figure 6-8). 2. Loosen the four captive screws on the shock mount assembly (Figure 6-9): a. Turn two captive screws accessible from the side near the RS-232 connector. b. Turn two captive screws accessible from the top rear of the shock mount assembly. 3. Remove the shock mount assembly and place it aside. Figure 6-8 The RS-232 Cable Disconnection ESE50 Rev C -- Removal and Replacement Procedures 6-11 ESE50 Rev C -- Removal and Replacement Procedures 6.5 Accessing the Internal FRUs Figure 6-9 Top Shock Mount Assembly -- Captive Screws 6-12 ESE50 Rev C -- Removal and Replacement Procedures ESE50 Rev C -- Removal and Replacement Procedures 6.6 ESE50 Modules Removal 6.6 ESE50 Modules Removal This section describes the procedures to remove and replace ESE50 modules (Figure 6-10) from the ESE50 backplane. ------------------------------------------------------------ Note ------------------------------------------------------------ Do not disconnect the ESE50 until all the steps in Section 6.2.2 and Section 6.3.1 have been performed. Also, do not proceed any further until all the removable procedures described in Section 6.5 have been performed. ------------------------------------------------------------ ------------------------------------------------------------ Caution ------------------------------------------------------------ Static electricity can damage integrated circuits. Therefore, alway use a grounded antistatic wrist strap (PN 29-11762-00) and a grounded work surface when working with the internal parts of a computer system. ------------------------------------------------------------ ESE50 Rev C -- Removal and Replacement Procedures 6-13 ESE50 Rev C -- Removal and Replacement Procedures 6.6 ESE50 Modules Removal Figure 6-10 ESE50 Modules 6-14 ESE50 Rev C -- Removal and Replacement Procedures ESE50 Rev C -- Removal and Replacement Procedures 6.6 ESE50 Modules Removal 6.6.1 Controller and Array Modules To remove and replace the controller or memory array modules from the ESE50 backplane, perform the following steps. ------------------------------------------------------------ Note ------------------------------------------------------------ Do not disconnect the ESE50 until all the steps in Section 6.2.2 and Section 6.3.1 have been performed. Also, do not proceed any further until all the removable procedures described in Section 6.5 have been performed. ------------------------------------------------------------ 1. To remove a module from the ESE50 backplane: a. Insert a narrow blade screwdriver, scribe, or needle-nose pliers into the removal hole on either side of the module (Figure 6-11). b. Pull the corner of the module gently forward until the module is free from the backplane. c. Grasp the center of the module with your fingers and pull the module completely free from the backplane (Figure 6-11). 2. To replace a module into the ESE50 backplane: a. Slide the module into its appropriate slot. b. Use your thumbs to push the module into the backplane. c. Ensure that the module is properly seated. 3. Proceed to Section 6.11. ESE50 Rev C -- Removal and Replacement Procedures 6-15 ESE50 Rev C -- Removal and Replacement Procedures 6.6 ESE50 Modules Removal Figure 6-11 Controller Module -- Removable/Replacement 6-16 ESE50 Rev C -- Removal and Replacement Procedures ESE50 Rev C -- Removal and Replacement Procedures 6.6 ESE50 Modules Removal 6.6.2 Power Monitor and Power Supply Modules To remove and replace the power monitor or power supply modules from the ESE50 backplane, perform the following steps. ------------------------------------------------------------ Note ------------------------------------------------------------ Do not disconnect the ESE50 until all the steps in Section 6.2.2 and Section 6.3.1 have been performed. Also, do not proceed any further until all the removable procedures described in Section 6.5 have been performed. ------------------------------------------------------------ 1. To remove a module from the ESE50 backplane: a. When removing a power supply module, you must first disconnect its power connector (Figure 6-13), which connects the module to the ac input box. b. Insert a narrow blade screwdriver, scribe, or needle-nose pliers into the removal hole on either side of the module. c. Pull the corner of the module gently forward until the module is free from the backplane. ------------------------------------------------------------ Caution ------------------------------------------------------------ When removing or replacing either module, ensure that the underside of the power monitor module does not come in contact with the heat sinks that reside on the power supply module. ------------------------------------------------------------ d. Grasp the center of the module with your fingers and pull the module completely free from the backplane (Figure 6-12 or Figure 6-13). 2. To replace a module into the ESE50 backplane: a. Slide the module into its appropriate slot. b. Use your thumbs to push the module into the backplane. c. Ensure that the module is properly seated. d. If the power supply module was removed, reconnect appropriate power connectors (Figure 6-13). 3. Proceed to Section 6.11. ESE50 Rev C -- Removal and Replacement Procedures 6-17 ESE50 Rev C -- Removal and Replacement Procedures 6.6 ESE50 Modules Removal Figure 6-12 Power Monitor Module -- Removable/Replacement Figure 6-13 Power Supply Module -- Removable/Replacement 6-18 ESE50 Rev C -- Removal and Replacement Procedures ESE50 Rev C -- Removal and Replacement Procedures 6.7 RZ27 Disk Drive Removal 6.7 RZ27 Disk Drive Removal To remove and replace the disk drive and the slide module that it resides on from the ESE50 chassis, perform the following steps. ------------------------------------------------------------ Note ------------------------------------------------------------ Do not disconnect the ESE50 until all the steps in Section 6.2.2 and Section 6.3.1 have been performed. Also, do not proceed any further until all the removable procedures described in Section 6.5 have been performed. ------------------------------------------------------------ 1. Remove the disk drive from the ESE50 chassis by using your fingers to pull the module and drive from the backplane (Figure 6-14). 2. Remove the disk drive from its module (Figure 6-15): a. Remove the four Phillips head screws securing the drive to the module. b. Disconnect the signal cable from the rear of the drive. c. Disconnect the dc power cable from the rear of the drive. d. Remove the drive from the module and place it aside. You will need to compare its option connector jumpers (Figure 6-16) with the new drive. 3. Replace the drive in the ESE50 backplane: a. Set the jumpers on the option connector exactly as the old drive, if the drive is being replaced. b. Reconnect the dc power cable at the rear of the drive. c. Reconnect the signal cable at the rear of the drive. d. Mount the new drive onto the module, or the old drive onto the new module. e. Secure the drive to the module with four Phillips head screws. f. Push the module and disk drive back into the ESE50 backplane. Be sure that the module is properly seated. 4. Proceed to Section 6.11. ESE50 Rev C -- Removal and Replacement Procedures 6-19 ESE50 Rev C -- Removal and Replacement Procedures 6.7 RZ27 Disk Drive Removal Figure 6-14 Disk Drive with Slide Module -- Removable/Replacement Figure 6-15 Disk Drive -- Removable/Replacement 6-20 ESE50 Rev C -- Removal and Replacement Procedures ESE50 Rev C -- Removal and Replacement Procedures 6.7 RZ27 Disk Drive Removal Figure 6-16 Disk Drive -- Jumper Setting ESE50 Rev C -- Removal and Replacement Procedures 6-21 ESE50 Rev C -- Removal and Replacement Procedures 6.8 Battery Backup Pack Removal 6.8 Battery Backup Pack Removal To remove the battery backup packs from the ESE50 chassis, perform the following steps. ------------------------------------------------------------ Note ------------------------------------------------------------ Do not disconnect the ESE50 until all the steps in Section 6.2.2 and Section 6.3.1 have been performed. Also, do not proceed any further until all the removable procedures described in Section 6.5 have been performed. ------------------------------------------------------------ 1. Remove the power monitor and power supply modules as described in Section 6.6. 2. Disconnect the battery pack from J90 on the ESE50 backplane (Figure 6-17). 3. Remove the two Phillips head screws located at either end of the mounting bracket (Figure 6-18). 4. Remove the old battery pack. 5. Install the new battery pack. 6. Secure the battery pack mounting bracket with two Phillips head screws (Figure 6-18). 7. Reconnect the battery pack to J90 on the ESE50 backplane (Figure 6-17). 8. Replace the power monitor and power supply modules as described in Section 6.6. 9. Proceed to Section 6.11. 6-22 ESE50 Rev C -- Removal and Replacement Procedures ESE50 Rev C -- Removal and Replacement Procedures 6.8 Battery Backup Pack Removal Figure 6-17 Battery Pack -- Cable Figure 6-18 Battery Pack -- Removable/Replacement ESE50 Rev C -- Removal and Replacement Procedures 6-23 ESE50 Rev C -- Removal and Replacement Procedures 6.9 AC Input Box 6.9 AC Input Box To remove the ac input box from the ESE50 chassis, perform the following steps. ------------------------------------------------------------ Note ------------------------------------------------------------ Do not disconnect the ESE50 until all the steps in Section 6.2.2 and Section 6.3.1 have been performed. Also, do not proceed any further until all the removable procedures described in Section 6.5 have been performed. ------------------------------------------------------------ 1. Remove the power monitor and power supply modules as described in Section 6.6. 2. Remove the four Phillips head countersink screws that secure the ac input box (Figure 6-19). 3. Disconnect the battery pack at J90 on the ESE50 backplane. 4. Pull the ac input box out, just enough to gain access to its cables (Figure 6-20). 5. Disconnect the cables at P70 and J71. 6. Remove the two wires from the on/off switch (upper wire is blue and lower wire is brown); use needle-nose pliers. 7. Remove the old ac input box and replace it with a new one. 8. Reconnect the two wires from the on/off switch (upper wire is blue and lower wire is brown); use needle-nose pliers. 9. Reconnect the cables at P70 and J71. 10. Replace the four Phillips head countersink screws that secure the ac input box (Figure 6-19). 11. Reconnect the battery pack at J90 on the ESE50 backplane. 12. Replace the power monitor and power supply modules as described in Section 6.6. 13. Proceed to Section 6.11. 6-24 ESE50 Rev C -- Removal and Replacement Procedures ESE50 Rev C -- Removal and Replacement Procedures 6.9 AC Input Box Figure 6-19 AC Input Box Mounting Screws Figure 6-20 AC Input Box Mounting -- Cables ESE50 Rev C -- Removal and Replacement Procedures 6-25 ESE50 Rev C -- Removal and Replacement Procedures 6.10 Fan Assembly -- Removable/Replacement 6.10 Fan Assembly -- Removable/Replacement To remove the fan assembly from the ESE50 chassis, perform the following steps. ------------------------------------------------------------ Note ------------------------------------------------------------ Do not disconnect the ESE50 until all the steps in Section 6.2.2 and Section 6.3.1 have been performed. ------------------------------------------------------------ 1. Remove the ESE50 chassis from its external enclosure (Section 6.5.2). 2. Remove the four Phillips head screws (Figure 6-21) and carefully pull the fan assembly toward you. 3. Disconnect the fan cable at J35 on the ESE50 backplane (Figure 6-21). 4. Disconnect the OCP cable on the ESE50 backplane (Figure 6-21). 5. Remove the fan assembly. 6. Replace with a new fan assembly. 7. Reconnect the fan cable at J35 on the ESE50 backplane (Figure 6-21). 8. Reconnect the OCP cable on the ESE50 backplane (Figure 6-21). 9. Secure the fan assembly with captive screws, at this time only (Figure 6-21). 10. Proceed to Section 6.11. Figure 6-21 Fan Assembly 6-26 ESE50 Rev C -- Removal and Replacement Procedures ESE50 Rev C -- Removal and Replacement Procedures 6.11 The ESE50 External Enclosure 6.11 The ESE50 External Enclosure To place the ESE50 chassis into its external enclosure, perform the following steps: 1. Replace the left and right side cover (Figure 6-7), and secure with Phillips head screws: front (4 screws), rear (4 screws), and top (6 screws). 2. Replace top shock mount assembly (refer to Figure 6-9): a. Place shock mount assembly on ESE50 chassis. b. Tighten the top four captive screws on the shock mount assembly. c. Tighten the top two captive screws on the fan assembly. d. Reconnect the RS-232 cable to the power monitor module. 3. Position the ESE50 chassis so that the RS-232 connector is above the fan assembly. 4. Slide the ESE50 chassis into the external enclosure; use the handle and push gently. 5. Tighten the four captive screws located on the rear panel of the ESE50 external enclosure. 6. Lay the front bezel down in front of the ESE50 external enclosure, with the fan assembly facing you (Figure 6-4). 7. Reconnect the OCP cable and tighten the two screws on either side of the OCP cable to secure it (Figure 6-5). ------------------------------------------------------------ Caution ------------------------------------------------------------ Ensure that the front panel cable does not become pinched between the fan assembly and the ESE50 chassis. ------------------------------------------------------------ 8. Tighten the four Allen head captive screws; use a 5/32-inch Allen wrench, securing the ESE50 chassis to the front of the ESE50 external enclosure. Do not fully tighten each screw; back each screw off, then another until all are loosened (Figure 6-3). 9. Replace the operator control panel (Figure 6-2). ESE50 Rev C -- Removal and Replacement Procedures 6-27 A ------------------------------------------------------------ ESE50 Error Codes This appendix lists the error codes for the ESE50. For each error code there is listed the error code title, the error code, the error type and a brief description. This appendix is intended for Field Service, Engineering, Software Development, Depot Repair Centers, Test Engineering and Manufacturing. A.1 Generic Status The generic status consists of three bytes returned as part of the response to an SDI Get Status/Topology command or any unsuccessful response to a level 2 command. The generic status bits consist of three bytes: REQUEST byte, MODE byte, and ERROR byte as shown in the following format. A.1.1 Request Byte · The OA bit is a 0 when the drive is drive available or drive on-line to the controller. · The RR bit is a 1 when the drive has requested a readjustment. · The DR bit is a 1 when the drive has requested the controller to downline load diagnostics to the drive. · The SR bit is a 1 when the drive spindle is ready. · The EL bit is a 1 when there is loggable information in the extended status area. · The PB bit is a 0 when the drive is connected to the controller through port A, and a 1 if the drive is connected to the controller through port B. · The PS bit is a 1 when the port switch is enabled (logically in). · The RU bit is a 1 when the drive run/stop switch is logically in. ESE50 Error Codes A-1 ESE50 Error Codes A.1 Generic Status A.1.2 Mode Byte · The ED bits, when set to 10, disable internal error logging. The bits are normally 00 to enable internal error logging. · The W1 bit indicates the logical position of the write protect switch for the subunit; logically in if this bit is a 1. · The DD bit is a 1 when a controller error routine or diagnostic has disabled the drive. When this bit is set, the fault light on the drive will be on. · The FO bit is a 1 when formatting is enabled in the drive. · The DB bit is a 1 when controller diagnostic cylinders access is enabled. · The S7 bit is a 0 when the drive is in 512-byte sector format. This bit should always be 0 for ESE50. A.1.3 Error Byte The majority of drive detected errors will fit into one of the five following mentioned classes and will be reported as one of these error types. A controller detected drive error can be logged without any of these bits being set. · The DE error bit is used to report any internal drive detected error that requires explicit controller recovery action other than simple command re- transmission or context readjustment. The drive fault light will be on when the DE bit is set. · The RE error bit is used to report transmission errors detected by the drive (framing errors and checksum errors). The fault light will not be on when this bit is set. · The PE error bit is used to report level 2 protocol errors detected by the drive (illegal cylinder, invalid command, and so forth). The drive fault light will not be on when this bit is set. · The DF error bit is used to indicate that the drive did not pass its initialization/diagnostics the last time it was initialized or powered on. The drive fault light will be on when this bit is set. · The WE error bit is used to report that the drive received a Select Track and Write or a Format command while the drive was write protected. The drive fault light will be on when this bit is set. A.2 Extended Status Bytes The extended status bytes are part of the response returned as a result of the SDI Get Status/Topology command or any unsuccessful response to a level 2 command. These bytes are passed through the controller to the host for error logging purposes. The extended status bytes consist of 7 bytes as shown in the following format. A-2 ESE50 Error Codes ESE50 Error Codes A.2 Extended Status Bytes Byte 9 - Will indicate the last command received from the controller with exception to the Get Status command, which will not be indicated in this byte. Byte 10 - Indicates the state of the drive. CO - The storage has a high CRD error rate SV - The storage contains valid data EH - The system is operating in variant mode UO - The system is operating on internal batteries DP - The data retention subsystem is present DF - The data retention subsystem is faulted DU - The data retention subsystem is unloading DL - The data retention subsystem is loading Byte 11 - Logical array number and logical bank number Byte 12 - Physical array number and physical bank number Byte 13 - Not used Byte 14 - Error code Byte 15 - FRU (field replaceable unit) code Refer to Example A-1. ESE50 Error Codes A-3 ESE50 Error Codes A.2 Extended Status Bytes Example A-1 Longword Example SAMPLE ERRORLOG: ERROR SEQUENCE 5945. LOGGED ON: SID 018CD890 DATE/TIME 19-MAY-1992 09:45:21.10 SYS_TYPE 00000000 SYSTEM UPTIME: 7 DAYS 00:37:35 SCS NODE: ESD08 VAX/VMS V5.5 ERL$LOGMESSAGE ENTRY KA785 HW REV# 3.A SERIAL# 2192. MFG PLANT 5. I/O SUB-SYSTEM, UNIT _HSC007$DUA105: MESSAGE TYPE 0001 DISK MSCP MESSAGE MSLG$L_CMD_REF 00000000 MSLG$W_UNIT 0069 UNIT #105. MSLG$W_SEQ_NUM 1186 SEQUENCE #4486. MSLG$B_FORMAT 03 SDI LOG MSLG$B_FLAGS 40 OPERATION CONTINUING MSLG$W_EVENT 00EB DRIVE ERROR DRIVE DETECTED ERROR MSLG$Q_CNT_ID 0000F807 01200000 UNIQUE IDENTIFIER, 00000000F807(X) MASS STORAGE CONTROLLER HSC70 MSLG$B_CNT_SVR 3C CONTROLLER SOFTWARE VERSION #60. MSLG$B_CNT_HVR 00 CONTROLLER HARDWARE REVISION #0. MSLG$W_MULT_UNT 0081 MSLG$Q_UNIT_ID 0000001B 02300000 UNIQUE IDENTIFIER, 00000000001B(X) DISK CLASS DEVICE (166) MODEL = 48. MSLG$B_UNIT_SVR 17 UNIT SOFTWARE VERSION #23. MSLG$B_UNIT_HVR 07 UNIT HARDWARE REVISION #7. MSLG$L_VOL_SER 00000000 VOLUME SERIAL #0. MSLG$L_HDR_CODE 00000000 LOGICAL BLOCK #0. GOOD LOGICAL SECTOR MSLG$Z_SDI REQUEST 1F RUN/STOP SWITCH IN PORT SWITCH IN LOG INFORMATION IN EXTENDED AREA SPINDLE READY PORT B RECEIVERS ENABLED MODE 00 512-BYTE SECTOR FORMAT ERROR 80 DRIVE ERROR CONTROLLER 00 NORMAL DRIVE OPERATION (continued on next page) A-4 ESE50 Error Codes ESE50 Error Codes A.2 Extended Status Bytes Example A-1 (Cont.) Longword Example RETRY 00 0. RETRIES DEVICE DEPENDENT INFORMATION LONGWORD 1. 0000488E /.H../ LONGWORD 2. 08316100 /.a1./ LONGWORD 3. 00000001 /..../ LONGWORD 4. 00000000 /..../ Longword Breakdown From Sample Errorlog: ___Byte 12: Physical Array and Bank Number / ____Byte 11: Logical Array and Bank Number / / _____Byte 10: State of drive / / / _______Byte 9: Previous CMD Op-Code / / / / LONGWORD 1. 0000488E ____Byte 15: FRU Code / _____Byte 14: Error Code / / _______Byte 13: Not used / / / LONGWORD 2. 08316100 ESE50 Error Codes A-5 ESE50 Error Codes A.3 ESE50 Error Definitions A.3 ESE50 Error Definitions The following sections describe the drive detected errors bytes, reporting mechanism, error reporting format, and each individual error detected. A.3.1 Drive Detected Errors · DE BIT--Drive error DE represents the majority of error conditions which can exist in the ESE50. When this bit is set, the fault indicator will be on. · WE BIT--The error write while write-protected is reported internal to the drive as a drive fault with the fault indicator illuminated; however, it is reported to the controller as a write lock error (WE). · RE & PE BIT--Transmission error and protocol error are detected by the drive but are not necessarily drive problems and therefore are not reported to the controller as drive faults. · DF BIT--Any error that occurs while the drive is in diagnostic mode (while on line) or initialization mode is reported to the controller as initialization/diagnostic failure (DF). A.3.2 Reporting Mechanisms The following paragraphs describe the drive actions as a result of a drive fault. · DRIVE OFF-LINE: The ESE50 is in the drive off-line state as a result of the following conditions: power first applied, the A and B port switches are both in the deasserted mode, or a hard failure which prevents communication with the controller. When power is applied, the ESE50 starts its internal integrity diagnostics. If a drive fault should occur while the drive is running its internal integrity diagnostic, the drive will light the front panel fault light, store the appropriate error code in the drive internal errorlog, and update the generic status bits and the extended status bytes. · DRIVE AVAILABLE: The ESE50 is in this state when either the A, B, or both port switches are in the asserted state and the controller has not placed the drive on line. If a drive fault occurs while the drive is in the drive available state, the drive will light the front panel fault light, store the appropriate error code in the drive internal errorlog, update the generic status bits in the extended status bytes, and wait for controller action. · DRIVE ON-LINE: The ESE50 enters the drive on-line state only after it has successfully completed its internal integrity diagnostics and has received a valid SDI on-line command (if the error byte is not zero, this will be an invalid command). If a drive fault (DE bit) should occur while the drive is in the drive online state, the drive will light the front panel fault light, store the appropriate error code in the drive internal errorlog, update the generic status bits and the extended status bytes, assert ATTENTION in the RDS, deassert read/write ready in the RDS, and wait for the appropriate action from the controller (a valid get status/drive clear or topology/drive clear exchange). If a read data transfer is taking place, the operation will most likely complete; however, the drive will refuse any subsequent data transfer commands until an SDI get status/drive clear sequence has been successfully completed. A-6 ESE50 Error Codes ESE50 Error Codes A.3 ESE50 Error Definitions If a write data transfer is taking place, the transfer will be aborted by the drive by deasserting read/write ready, which in turn will disable write gate from remaining or being asserted. Any status information associated with the drive fault will be saved until the error condition has been properly cleared. ------------------------------------------------------------ Code Type Title and Description ------------------------------------------------------------ 01 DE RDS During Transfer Error This error indicates that an uncorrectable Hamming code error was detected during a Level 1 transfer from the storage array. 02 DE Multiple RDS Transfer Error This error indicates that more than one uncorrectable Hamming code error was detected during a Level 1 transfer from the storage array in a single transfer. 03 DE Non Existent Memory Error This error indicates that a request has been made to address a nonexistent memory address in the RAM storage. 04 RE Out of Range Enh Mode Error This error indicates that the LBN sent in the Level 1 transfer command was outside the valid LBN space of the drive. This error is valid only in variant mode. 06 DE Memory Controller Abort This error indicates that the memory array has detected an invalid condition and aborted the data transfer. 07 RE SDI Frame Sequence Error This error indicates that the SDI commands decoded at Level 1 were detected in the wrong order of sequence. 08 RE SDI Checksum Error This error indicates that the calculated checksum did not compare with the checksum field sent to the drive from the controller (SDI commands decoded at SDI Level 2). 09 RE SDI Framing Error This error is generated when the drive has detected the sync pattern on the SDI WRITE/CMD DATA line, but does not detect any SDI Level 1 Control Message Transmission or Single Frame command. 0A PE SDI Level 1/2 Opcode Parity This error indicates that wrong parity was detected on the opcode byte of a Level 2 SDI command. 0B PE SDI Invalid Opcode This error indicates that the Level 2 opcode that was decoded was not one of the 16 valid opcodes listed in the SDI specification. 0C PE SDI Command Length Error LVL2 This error indicates that the number of bytes expected did not equal the number of bytes received for an SDI Level 2 command. 0D PE SDI Invalid Cmd With Drive Error This error indicates that the controller issued an Initiate Seek command, an Error Recovery command, or a Recalibrate command while there was a drive error. 0E PE SDI Invalid Group Select LVL2 This error indicates that the controller attempted to select a group that was nonexistent. ESE50 Error Codes A-7 ESE50 Error Codes A.3 ESE50 Error Definitions ------------------------------------------------------------ Code Type Title and Description ------------------------------------------------------------ 0F PE SDI Write Enable/Write Protect This error indicates that the drive write protect switch is logically in and the controller issued a Change Mode command that attempted to write-enable the drive. 12 PE SDI Xfer Command with Drive Error This indicates that a Level 1 transfer command was sent while the drive was in a faulted state. 15 PE SDI Invalid Format Request This error indicates that the controller directed the drive to place itself into the mode to transfer sector data in 576-byte format. 16 PE SDI Invalid Variant Request This error indicates that the controller directed the drive to place itself a variant mode not supported by the drive. 17 PE SDI Invalid Cmd In Variant Mode This error indicates that the controller directed the drive to execute a Recalibrate or Seek command while in variant mode. 1A PE SDI Invalid Cylinder Address This error indicates that the drive detected a nonexistent cylinder address that was decoded from a controller Initiate Seek command. 1D PE Drive Discon Due To Cntlr This code indicates that the drive has dropped a controller due to non- response. 1E DE CRD Overflow Error This code indicates that a bank of storage has generated more than 256 single bit corrections. 1F RE Sector Overrun Error When a sector or index pulse occurs with either write gate or read gate asserted, an overrun error will occur. This indicates that a write or read operation was attempted through a sector/index boundary. 29 PE SDI Invalid Error Recovery Level This error indicates that the controller issued an SDI Error Recovery command with an illegal recovery level. 2A PE SDI Invalid Subunit Specified This error indicates that the controller requested information on a subunit of the ESE50. 2B PE SDI Invalid Diagnose Mem Reg This error indicates that the controller/operator attempted to execute an internal drive test that was nonexistent or an internal drive test that was unable to execute while on-line to the controller. 2C PE SDI Spindle Not Ready - Seek/Recal This error indicates that the controller issued an SDI Initiate Seek or Recalibrate command when the drive was not spinning (the run/stop switch is in the out position). 2E PE Spinup Inhibited By Controller Flags This error indicates that the drive is prohibited to spin up while in the available or online state by means of manual intervention. 2F PE SDI Run Cmd/Run In Stop Position This error indicates that the controller issued the SDI Run command to the drive requesting the media to spin up while the run/stop switch on the operator control panel is in the logical stop position. 40 PE SDI Invalid Read Mem Region This error indicates that the controller issued an SDI Level 2 Read Memory Region command to a memory area that the drive designates as invalid. A-8 ESE50 Error Codes ESE50 Error Codes A.3 ESE50 Error Definitions ------------------------------------------------------------ Code Type Title and Description ------------------------------------------------------------ 41 PE SDI Response Timed Out This error indicates that the drive was attempting to send a response to the controller and the controller was not accepting the message response data from the drive. 42 PE Not Online/Seek Command Issued This error indicates that the controller issued an SDI Level 2 Initiate Seek command and the drive was not on line to the controller. 43 PE TCR and Not R/W READY Fault This error indicates that a data transfer command, Transfer Command Received (TCR), has been initiated and the drive is not ready to read/write or the drive detected a loss of read/write ready during the data transfer. 44 RE Format Command and Not Enabled This error indicates that the drive decoded a "Select Track and Format on Index" or "Format on Sector or Index" command without the FO bit in the drive being set. 46 RE R/W Safety Interrupt W/O Cause This error indicates that an error interrupt occurred, but no cause was found. 47 PE Invalid Dis. Cmd. TT Bit This error indicates that an SDI Disconnect command was issued to the drive and the TT modifier bit was in an incorrect state. 48 PE Invalid Wrt Mem Byte Cnt/Offset This error indicates that an incorrect number of data bytes were detected that were to be written in the drive's memory, or that the offset into the memory region is incorrect. 49 PE Invalid Cmd During Topology Command This error indicates that the drive received an SDI Level 2 command that was defined as being illegal. This is due to the circumstance that an SDI Level 2 Topology command was already in process. 4A * Drive Disabled by DD Bit Set Thie error indicates that the controller issued an SDI Level 2 Change Mode command with the DD bit asserted. 4E WE Write/Format While Protected This error indicates that the drive is write-protected and detected the assertion of write gate. 4F WE SDI Transfer Error This error indicates that a control or data clock dropout during a data transfer. 50 DF IMB Access Error This error indicates that a fault was detected during IMB testing. 51 DF Data Retention, No Disk Response Error This error indicates that the disk did not respond during initial testing. 52 DF FIFO Data Error This indicates that a fault was detected during testing of the FIFO. 53 DF Single Bit Correction Failure This error indicates that a fault was detected during testing of the ECC chip. 54 DF Power Up Test Failure This error indicates that a fault was detected during the power test. 5E DF Multiple Bad Memory Arrays This error indicates that during the memory array tests, multiple arrays were found bad. ESE50 Error Codes A-9 ESE50 Error Codes A.3 ESE50 Error Definitions ------------------------------------------------------------ Code Type Title and Description ------------------------------------------------------------ 5F DF Configuration Fault This error indicates that the configuration of the system is not valid. 60 - 6F DF Bad Array Module 0 to 15 This error indicates that a fault was detected during testing of the array module in Slot 0 to 15. 70 DF IORTEST Fault This error indicates that a fault was detected during testing of the SDI hardware initialization. 71 DF CLKTEST Fault This error indicates that a fault was detected during testing of the diagnostic clock. 72 DF SECTEST Fault This error indicates that a fault was detected during testing of the sector and index clocks. 73 DF IDLTEST Fault This error indicates that a fault was detected during testing of the SDI internal control and data loopback paths. 74 DF IDFTEST Fault This error indicates that a fault was detected during testing of the SDI internal control paths using invalid frames. 76 DF ICLTEST Fault This error indicates that a fault was detected during testing of SDI initialize. 77 DF IMBTEST Fault This error indicates that a fault was detected during testing of the dual-port RAM to memory transfers. 78 DF SMTEST Fault, IMB to SDI This error indicates that a fault was detected during testing of the internal SDI to IMB transfer. 79 DF SMTEST Fault, SDI to IMB This error indicates that a fault was detected during testing of the internal IMB to SDI transfer. 7A DF Internal SDI Loopback Test Hung This error indicates that the SDI loopback test is hung. 7D DF IMB Header Verify, BAD This error indicates that a bad header was detected during testing of the memory. 7E DF IMB Header Verify, Abort This error indicates that an abort or noack was detected during testing of the headers. 7F DF IMB Format Error, Noack/Abort This error indicates that an abort or noack was detected during the formatting of the memory. 80 DF IMB Initialize Error, Abort/Noack This error indicates that an abort or noack was detected during the writing of the SDI header. 81 DF IMB ECC Check Error, Abort/Noack This error indicates that an abort or noack was detected during the testing of the ECC logic. 82 DF IMB ECC Check Error, Controller This error indicates that a controller error was detected during the testing of the ECC logic. A-10 ESE50 Error Codes ESE50 Error Codes A.3 ESE50 Error Definitions ------------------------------------------------------------ Code Type Title and Description ------------------------------------------------------------ B0 DE Battery Failure This error indicates that the battery pack is faulty and unable to maintain a charge. BF DE Low Battery Charge This error indicates that the charge state of the system may not support a complete unload cycle in the event of a power loss. C0 DE OCP Failure This error indicates that a fault was detected in the OCP operation. D0 DE Backup Disk Fault This error indicates that a fault was detected in the SCSI disk operation. E0 DE Data Retention Backup No Data This error indicates that the data retention system was requested to do a load, but contained no valid information. E1 DE Data Retention Backup Invalid Data This error indicates that the data retention system contains data, but it is invalid. The data in the data retention system is not a copy of the most up-to-date information. EE DF Invalid Test Number This error indicates that the test number selected was not valid. F0 DE Over Temperature, Warning @45 C This error indicates that the temperature within the chassis is greater than 45 degrees Celsius. F1 DE Power Supply Module 1 Failure This error indicates that the power supply module has a failure. This may be a +12 V regulator or -5.2 V regulator fault. Also, in a redundant supply system, it may be a +5 V regulator or the primary regulator fault. F2 DE Power Supply Module 2 Failure This error indicates that the power supply module has a failure. This may be a +12 V regulator or -5.2 V regulator fault. Also, in a redundant supply system, it may be a +5 V regulator or the primary regulator fault. F3 DE Over Temperature Shutdown, @50 C This error indicates that the temperature within the chassis is greater than 50 degrees Celsius. It also indicates that a save operation will be performed and the system will be shut down upon completion. F4 DE Over Temperature Shutdown, @55 C This error indicates that the temperature within the chassis is greater than 55 degrees Celsius. It also indicates that the system will be shut down immediately. FF DE Standby Power System On This error indicates that the AC power has failed and the batteries are supplying power to the drive. ------------------------------------------------------------ ESE50 Error Codes A-11 B ------------------------------------------------------------ ESE50-CA Upgrade Procedure This appendix describes the procedure for upgrading the ESE50 from 120 MB to 640 MB storage capacity. Refer to Chapter 5 for instructions on how to: · Remove the ESE50 from the cabinet · Remove the covers of the ESE50 Next, unpack the upgrade kit. It should contain the following items: · One power supply module · Eight storage array modules · One upgrade sticker While referring to Figure B-1 and Section 5.5: · Begin the actual upgrade by first installing the power supply module directly above the existing power supply module. Connect it in the same fashion as the existing power supply module. · Install the new array modules in slots 02 through 07, 08 and 10 with the eight array modules. · To set up and reformat the ESE50, power up the unit and verify that it is reported as an ESE56 module type 48, and has the correct capacity using DKUTIL or other commands. ------------------------------------------------------------ Note ------------------------------------------------------------ Array locations are identical for an ESE50 (Rev. C) upgrade. ------------------------------------------------------------ ESE50-CA Upgrade Procedure B-1 ESE50-CA Upgrade Procedure Figure B-1 120 to 600 MB Conversion Once the previous steps have been performed, replace the covers and replace the ESE50 in the cabinet. B-2 ESE50-CA Upgrade Procedure ------------------------------------------------------------ Index A ------------------------------------------------------------ AC input box, 1-9 array module(s), 1-9 attention mechanism, 1-12 C ------------------------------------------------------------ connect, 1-13 controller module, 1-9 D ------------------------------------------------------------ description, ESE50 SSD, 1-1 disconnect, 1-13 display, 1-14 DSA/SDI compatibility, 1-1 E ------------------------------------------------------------ errors hard, 1-12 F ------------------------------------------------------------ fault codes, 1-15 FAULT switch/indicator, 1-11 field service terminal port, 1-3 H ------------------------------------------------------------ hard errors, 1-12 P ------------------------------------------------------------ port A switch/indicator, 1-11 port B switch/indicator, 1-11 port switches and LEDs, 1-13 power monitor module, 1-9 power supply module, 1-9 R ------------------------------------------------------------ READY indicator, 1-11 RUN/STOP switch/indicator, 1-11 RZ27 disk drive, 1-9 RZ35 disk drive, 1-9 S ------------------------------------------------------------ spin-down, 1-12 spin-up, 1-12 switches and indicators A port, 1-11 B port, 1-11 fault, 1-11 ready, 1-11 run/stop, 1-11 write protect, 1-11 system diagnostics, 2-2 T ------------------------------------------------------------ troubleshooting tips, 2-1 W ------------------------------------------------------------ write enable, 1-13 write error, 1-13 write protect, 1-13 WRITE PROTECT switch/indicator, 1-11 Index-1