digital equipment corporation maynard, massachusetts VAX 7000 Advanced Troubleshooting Order Number EK7001ATS.001 This manual is intended for Digital customer service engineers and self­ maintenance customers. It covers system troubleshooting information. First Printing, November 1992 The information in this document is subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no responsibility for any errors that may appear in this document. The software, if any, described in this document is furnished under a license and may be used or copied only in accordance with the terms of such license. No re­ sponsibility is assumed for the use or reliability of software or equipment that is not supplied by Digital Equipment Corporation or its affiliated companies. Copyright © 1992 by Digital Equipment Corporation. All Rights Reserved. Printed in U.S.A. The following are trademarks of Digital Equipment Corporation: Alpha AXP DECUS VAXBI AXP DWMVA VAXELN DEC OpenVMS VMScluster DECchip ULTRIX XMI DEC LANcontroller UNIBUS The AXP logo DECnet VAX OSF/1 is a registered trademark of the Open Software Foundation, Inc. FCC NOTICE: The equipment described in this manual generates, uses, and may emit radio frequency energy. The equipment has been type tested and found to comply with the limits for a Class A computing device pursuant to Subpart J of Part 15 of FCC Rules, which are designed to provide reasonable protection against such radio frequency interference when operated in a commercial environment. Operation of this equipment in a residential area may cause interference, in which case the user at his own expense may be required to take measures to correct the interference. iii Contents Preface ..................................................................................................... vii Chapter 1 Troubleshooting During Power­Up 1.1 Power System Overview ........................................................ 1­2 1.2 Power­Up Troubleshooting Flowchart .................................. 1­4 1.3 AC Input Box .......................................................................... 1­6 1.4 H7263 Power Regulators ....................................................... 1­8 1.5 Cabinet Control Logic Module ............................................. 1­10 1.6 Control Panel ........................................................................ 1­12 1.7 Blower ................................................................................... 1­14 1.8 XMI Plug­In Unit ................................................................. 1­16 1.9 Troubleshooting the XMI Plug­In Unit ............................... 1­18 Chapter 2 System Self­Test 2.1 System Self­Test Overview .................................................... 2­2 2.2 Power­Up Sequence ............................................................... 2­4 2.3 System Self­Test Results ..................................................... 2­10 2.4 Checking Self­Test Results: Console Display ..................... 2­12 2.4.1 Processor Fails Self­Test in a Uniprocessor System .... 2­14 2.4.2 Processor Fails ST1 in a Multiprocessor System ......... 2­16 2.4.3 Processor Fails ST2 or ST3 in a Multiprocessor System ............................................................................ 2­18 2.4.4 Memory Fails Self­Test ................................................. 2­20 2.4.5 System Fails Power­Up Exerciser ................................ 2­22 2.5 Checking Self­Test Results: Status LEDs ......................... 2­24 2.5.1 Processor LEDs .............................................................. 2­26 2.5.2 Determining Failing Test Number from LEDs ............ 2­28 2.5.3 IOP, DWLMA, and Clock Card LEDs .......................... 2­30 iv Chapter 3 Diagnostics 3.1 Test Command ....................................................................... 3­2 3.2 Running ROM­Based Diagnostics on XMI Devices .............. 3­4 3.3 Running Diagnostics on DUP­Based Devices ....................... 3­8 3.3.1 Testing an SI Device ........................................................ 3­8 3.3.2 Testing a DSSI Device ................................................... 3­12 Appendix A Parse Trees A.1 Reading Parse Trees .............................................................. A­2 Appendix B Power Requirements and Guidelines B.1 Power System Requirements ................................................. B­2 B.2 Getting Information on Power Regulator Status ................. B­3 B.2.1 Brief Data Packet ............................................................ B­5 B.2.2 Full Data Packet .............................................................. B­7 B.3 Show Power Command ........................................................ B­13 B.4 Checking the IOP Module During Power­Up ..................... B­13 B.5 Identifying an LSB Module Power Converter Failure ....... B­15 Examples Example 2­1 Self­Test Display ................................................................. 2­12 Example 2­2 Console Display: Processor Fails in Uniprocessor System 2­14 Example 2­3 Console Display: Processor Fails ST1 in Multiprocessor System .................................................................................. 2­16 Example 2­4 Console Display: Processor Fails ST2 or ST3 in a Multiprocessor System ........................................................ 2­18 Example 2­5 Console Display: Memory Fails Self­Test ......................... 2­20 Example 2­6 Console Display: Sample Unexpected Exception/Inter­ rupt ....................................................................................... 2­22 Example 2­7 Console Display: Sample Diagnostic Error Report ........... 2­23 Example 3­1 Test Commands ..................................................................... 3­2 Example 3­2 Sample RBD Session, Test Passing ..................................... 3­4 Example 3­3 Sample RBD Session, Test Failing ....................................... 3­6 Example 3­4 Testing an SI Device ............................................................. 3­8 Example 3­5 Testing a DSSI Device ........................................................ 3­12 Example A­1 Sample Machine Check, MCHK Code 06 ............................ A­2 Example B­1 Sample Output, Show Power Command ........................... B­13 v Figures Figure 1­1 Power System ......................................................................... 1­2 Figure 1­2 Power­Up Sequence ............................................................... 1­4 Figure 1­3 AC Input Box .......................................................................... 1­6 Figure 1­4 AC Input Box Troubleshooting Steps ................................... 1­7 Figure 1­5 H7263 Power Regulator LEDs .............................................. 1­8 Figure 1­6 H7263 Power Regulator Troubleshooting Steps .................. 1­9 Figure 1­7 CCL Module LEDs ............................................................... 1­10 Figure 1­8 CCL Module Troubleshooting Steps ................................... 1­11 Figure 1­9 Control Panel ........................................................................ 1­12 Figure 1­10 Control Panel Troubleshooting Steps ................................. 1­13 Figure 1­11 Blower ................................................................................... 1­14 Figure 1­12 Blower Troubleshooting Steps ............................................. 1­15 Figure 1­13 XMI Plug­In Unit LEDs ....................................................... 1­16 Figure 1­14 XMI PIU Troubleshooting Steps ­ 48V LED Off ................ 1­18 Figure 1­15 XMI PIU Power Connector .................................................. 1­19 Figure 1­16 XMI PIU Troubleshooting Steps ­ MOD OK LED Off ....... 1­20 Figure 2­1 KA7AA Power­Up Sequence, Part 1 of 3 .............................. 2­4 Figure 2­2 KA7AA Power­Up Sequence, Part 2 of 3 .............................. 2­6 Figure 2­3 KA7AA Power­Up Sequence, Part 3 of 3 .............................. 2­8 Figure 2­4 Determining Self­Test Results ............................................ 2­10 Figure 2­5 Processor and Memory Status LEDs .................................. 2­24 Figure 2­6 Processor LEDs After Self­Test ........................................... 2­26 Figure 2­7 IOP, DWLMA, and Clock Card LEDs ................................. 2­30 Figure A­1 KA7AA Machine Check Parse Tree ...................................... A­4 Figure A­2 KA7AA Hard Error Interrupts ............................................ A­11 Figure A­3 KA7AA Soft Error Interrupts .............................................. A­19 Figure A­4 IOP Interrupts ...................................................................... A­20 Figure A­5 DWLMA Interrupts ............................................................. A­22 Figure B­1 Command Packet Structure .................................................. B­4 Figure B­2 Brief Data Packet Structure .................................................. B­6 Figure B­3 Full Data Packet Structure ................................................... B­7 Figure B­4 Full Data Packet: Values for Characters 16 ....................... B­8 Figure B­5 Full Data Packet: Values for Characters 734 ..................... B­9 Figure B­6 Full Data Packet: Values for Characters 3547 ................. B­10 Figure B­7 Full Data Packet: Values for Characters 4854 ................. B­11 Figure B­8 IOP Module ........................................................................... B­14 Figure B­9 IOP Oscillator Switch Settings ........................................... B­15 Tables Table 1 VAX 7000 Documentation ..................................................... viii vi Table 2 Related Documents ................................................................... x Table 1­1 Power Regulator LED Summary ........................................... 1­8 Table 1­2 Control Panel LEDs During Power­Up ............................... 1­13 Table 1­3 XMI PIU Power Regulator LEDs ........................................ 1­17 Table 1­4 XMI PIU Power Switches ­ Regulator B ............................. 1­17 Table 2­1 System Testing ....................................................................... 2­2 Table 2­2 Test Numbers Indicated by KA7AA LEDs ......................... 2­28 Table 2­3 DWLMA LEDs ..................................................................... 2­31 Table 3­1 Exercisers ............................................................................... 3­3 Table B­1 Power Worksheet, System Cabinet Options ......................... B­2 Table B­2 Power Worksheet, Expander Cabinet Options ..................... B­3 Table B­3 Sample Brief Packet Information ......................................... B­5 Table B­4 Sample Full/History Packet Information ........................... B­12 Table B­5 LED Status When a Power Converter Fails ....................... B­15 vii Preface Intended Audience This manual is written for Digital customer service engineers and self­ maintenance customers. Document Structure This manual uses a structured documentation design. Topics are organ­ ized into small sections for efficient on­line and printed reference. Each topic begins with an abstract. You can quickly gain a comprehensive over­ view by reading only the abstracts. Next is an illustration or example, which also provides quick reference. Last in the structure are descriptive text and syntax definitions. This manual has three chapters and two appendixes, as follows: · Chapter 1, Troubleshooting During Power­Up, explains what can go wrong during power­up and how to identify the cause of the problem. · Chapter 2, System Self­Test, tells how to interpret the self­test console display and module LEDs. · Chapter 3, Diagnostics, describes the various diagnostics used to test the system. · Appendix A contains the parse trees, and Appendix B gives power requirements and guidelines. viii Conventions Used in This Document Book titles. In text, if a book is cited without a product name, that book is part of the hardware documentation. It is listed in Table 1 along with its order number. Icons. The icons shown below are used in illustrations for designating part placement in the system described. A shaded area in the icon shows the location of the component or part being discussed. Documentation Titles Table 1 lists the books in the VAX 7000 documentation set. Table 2 lists other documents that you may find useful. Table 1 VAX 7000 Documentation Title Order Number Installation Kit EK7000ADK Site Preparation Guide EK7000ASP Installation Guide EK700EAIN Hardware User Information Kit EK7001ADK Operations Manual EK7000AOP Basic Troubleshooting EK7000ATS ix Table 1 VAX 7000 Documentation (Continued) Title Order Number Service Information Kit EK7002ADK Pocket Service Guide EK7000APG Advanced Troubleshooting EK7001ATS Platform Service Manual EK7000ASV System Service Manual EK7002ASV Reference Manuals Console Reference Manual EK70C0ATM KA7AA CPU Technical Manual EKKA7AATM MS7AA Memory Technical Manual EKMS7AATM I/O System Technical Manual EK70I0ATM Platform Technical Manual EK7000ATM Upgrade Manuals KA7AA CPU Installation Guide EKKA7AAIN MS7AA Memory Installation Guide EKMS7AAIN DWLMA XMI PIU Installation Guide EKDWLMAIN H7237 Battery PIU Installation Guide EKH7237IN BA654 Disk PIU Installation Guide EKBA654IN DWMBB VAXBI PIU Installation Guide EKDWMBBIN Removable Media Installation Guide EKTFRRDIN x Table 2 Related Documents Title Order Number General Site Preparation Site Environmental Preparation Guide EKCSEPGMA System I/O Options CIXCD Interface User Guide EKCIXCDUG DEC FDDIcontroller 400 Installation/Problem Solving EKDEMFAIP DEC LANcontroller 400 Installation Guide EKDEMNAIN DEC LANcontroller 400 Technical Manual EKDEMNATM DSSI VAXcluster Installation and Troubleshoot­ ing Manual EK410AAMG InfoServer 150 Installation and Owner's Guide EKINFSVOM KFMSA Module Installation and User Manual EKKFMSAIM KFMSA Module Service Guide EKKFMSASV RF Series Integrated Storage Element User Guide EKRF72DUG TF85 Cartridge Tape Subsystem Owner's Manual EKOTF85OM Operating System Manuals VMS Upgrade and Installation Supplement: VAX 7000600 and VAX 10000600 Series AAPRAHATE VMS Network Control Program Manual AALA50ATE VAXclusters and Networking HSC Installation Manual EKHSCMNIN SC008 Star Coupler User's Guide EKSC008UG VAX Volume Shadowing Manual AAPBTVATE Peripherals Installing and Using the VT420 Video Terminal EKVT420UG LA75 Companion Printer Installation and User Guide EKLA75XUG Troubleshooting During Power­Up 1­1 Chapter 1 Troubleshooting During Power­Up This chapter gives troubleshooting information on the power system. Sec­ tions include: · Power System Overview · Power­Up Troubleshooting Flowchart · AC Input Box · H7263 Power Regulators · Cabinet Control Logic Module · Control Panel · Blower · XMI Plug­In Unit · Troubleshooting the XMI Plug­In Unit 1­2 Troubleshooting During Power­Up 1.1 Power System Overview The power system consists of the AC input box, the DC distribu­ tion box, one to three power regulators, and an optional battery plug­in unit. Figure 1­1 shows the power system. Figure 1­1 Power System Troubleshooting During Power­Up 1­3 AC Input Box The AC input box provides the interface to the AC utility power via a three­phase, five­wire connector with attached power cord. The AC input box also contains the main input circuit breaker and fuses, and a power line monitoring port. DC Distribution Box The DC distribution box provides the interconnect for the AC input box and power regulators. It also functions as the: · Distribution point for 48 VDC system power · Battery pack interface to the power regulators · Signal interconnect from the CCL module to the power regulators Power Regulators The system supports up to three power regulators operating in parallel with either one or two units required for the load. As an option, a third power regulator can be used as a backup unit. Each power regulator pro­ vides the following: · 48 VDC output · LPS OK L signal · Power system status via serial data lines · Non­switched 48 VDC power to the CCL module · Status indicators for fault isolation · Battery charging and monitoring circuitry · Battery backup converter Optional Battery Plug­In Unit (PIU) The battery PIU provides uninterrupted power in the event of a power fail­ ure. The battery PIU can contain up to three battery packs. Each battery pack contains four batteries. One battery pack is required for each power regulator. For more information: Platform Service Manual 1­4 Troubleshooting During Power­Up 1.2 Power­Up Troubleshooting Flowchart Figure 1­2 shows the power­up sequence. Figure 1­2 Power­Up Sequence Troubleshooting During Power­Up 1­5 Figure 1­2 Power­Up Sequence (Continued) 1­6 Troubleshooting During Power­Up 1.3 AC Input Box The AC input box with circuit breaker is located in the upper rear of the cabinet. The circuit breaker has four indicators (see Figure 1­3). All four indicators should be RED when the circuit breaker is in the On position. Figure 1­3 AC Input Box Troubleshooting During Power­Up 1­7 The AC input box accepts three­phase power; the three leftmost indicators on the circuit breaker show the state of each pole (one phase per pole). If an indicator is green, the pole is in the Off position or tripped due to an overload. If an indicator is red, the pole is in the On position and is not tripped. The fourth rightmost indicator reflects the mechanical position of the circuit breaker. This indicator is red when the circuit breaker is in the On position and green when the circuit breaker is in the Off position. Figure 1­4 shows the troubleshooting steps for the AC input box. Figure 1­4 AC Input Box Troubleshooting Steps For more information: Platform Service Manual 1­8 Troubleshooting During Power­Up 1.4 H7263 Power Regulators The H7263 power regulators are located in the upper right front of the cabinet. Each power regulator has a Run LED and a Fault LED (see Figure 1­5). Figure 1­5 H7263 Power Regulator LEDs Table 1­1 Power Regulator LED Summary Run (Green) Fault (Yellow) Condition Off Off No AC power present Off On Fatal fault Fast flash Off AC power present. Keyswitch in Disable position. On Fast flash Nonfatal fault On Slow flash Battery discharge mode On Off Normal operation Troubleshooting During Power­Up 1­9 Figure 1­6 H7263 Power Regulator Troubleshooting Steps NOTE: Replace the power regulator if the LEDs indicate a fatal fault. Nonfatal faults include: · Internal heatsink temperature warning · Power factor correction stage failed · Regulator/battery failed battery test (see Appendix B) · 48V to CCL module exceeds specified limits 1­10 Troubleshooting During Power­Up 1.5 Cabinet Control Logic Module The cabinet control logic (CCL) module is located in the upper front of the cabinet, behind the control panel. The CCL module controls power sequencing and is wired to the control panel, DC distribution box, LSB backplane, blower, PIUs, optional removable media, and expander cabinets. The module has a power LED and four PIU enable LEDs. You can see the CCL LEDs from the rear of the cabinet when the rear door is open. Figure 1­7 CCL Module LEDs During power sequencing, the CCL power LED goes on to indicate that power is present on the module. A PIU LED goes on to indicate that a PIU is present in the quadrant and that its power regulators are enabled. Fig­ ure 1­8 shows the troubleshooting steps for the CCL module. Troubleshooting During Power­Up 1­11 Figure 1­8 CCL Module Troubleshooting Steps 1­12 Troubleshooting During Power­Up 1.6 Control Panel The control panel has a keyswitch and three indicator LEDs. To power up the system, you turn the keyswitch to Enable. Figure 1­9 Control Panel The control panel LEDs are powered by the CCL module. Table 1­2 lists the state of each control panel LED during a normal power­up. Figure 1­10 shows troubleshooting steps for the control panel. Troubleshooting During Power­Up 1­13 Table 1­2 Control Panel LEDs During Power­Up Figure 1­10 Control Panel Troubleshooting Steps NOTE: The Fault LED blinks fast for 8 seconds to indicate a failure at power­up. Then the Fault LED blinks slowly until the failure con­ dition is cleared. Action Key On Run Fault Set circuit breaker to On Off Off Off Set keyswitch to Enable On Off Slow Blink Self­test starts On Off On Modules pass self­test On Off Off Operating system boots On On Off 1­14 Troubleshooting During Power­Up 1.7 Blower The blower is located in the center of the cabinet. The blower spins up when you turn the keyswitch to Enable. Figure 1­11 Blower Troubleshooting During Power­Up 1­15 Figure 1­12 shows the troubleshooting steps for the blower. NOTE: If the blower spins up but the control panel Fault LED blinks for more than 30 seconds, check the BLOWER OK signal cable. If the signal cable is properly connected, then replace the CCL module. Figure 1­12 Blower Troubleshooting Steps 1­16 Troubleshooting During Power­Up 1.8 XMI Plug­In Unit The XMI plug­in unit has two power regulators with indicator LEDs and switches. You can see the power regulators through the PIU enclosure when the front cabinet door is open. Figure 1­13 XMI Plug­In Unit LEDs Troubleshooting During Power­Up 1­17 Table 1­3 XMI PIU Power Regulator LEDs Table 1­4 XMI PIU Power Switches ­ Regulator B LED Color State Meaning MOD OK Green On Off Regulator is working Regulator is not working or V­OUT/DISABLE switch is set to DISABLE (down). 48V Green On 48V is present OC1 Yellow On Overcurrent condition OT1 Yellow On Overtemperature condition OV1 Yellow On Overvoltage condition 1The OC, OT, and OV LEDs are latching indicators. Each LED indicates that a fault con­ dition was or is present. The condition may have been cleared, but the LED remains lit until it is reset. Switch Function RESET Momentary switch resets all LEDs on both regulators. NOTE: If resetting does not clear the OC, OT, or OV LED, shut off the regulators and reapply power. This action should clear the LED. VOUT DISABLE Power output for both regulators is enabled when this switch is in the VOUT position (up). Power output is shut off when this switch is in the DISABLE position (down). 1­18 Troubleshooting During Power­Up 1.9 Troubleshooting the XMI Plug­In Unit Figure 1­14 and Figure 1­15 show the steps to take if the power regulator 48V LED indicates a power problem. If the MOD OK LED indicates a problem, see Figure 1­16. Figure 1­14 XMI PIU Troubleshooting Steps ­ 48V LED Off Troubleshooting During Power­Up 1­19 Figure 1­15 XMI PIU Power Connector 1­20 Troubleshooting During Power­Up Figure 1­16 XMI PIU Troubleshooting Steps ­ MOD OK LED Off System Self­Test 2­1 Chapter 2 System Self­Test This chapter describes self­test. Sections include: · System Self­Test Overview · Power­Up Sequence · System Self­Test Results · Checking Self­Test Results: Console Display · Processor Fails Self­Test in a Uniprocessor System · Processor Fails ST1 in a Multiprocessor System · Processor Fails ST2 or ST3 in a Multiprocessor System · Memory Fails Self­Test · System Fails Power­Up Exerciser · Checking Self­Test Results: Status LEDs · Overview of Processor LEDs · Determining Failing Test Number from LEDs · IOP, DWLMA, and Clock Card LEDs 2­2 System Self­Test 2.1 System Self­Test Overview When the system is powered up or reset, a series of tests is run. Table 2­1 lists the tests run during system testing. Table 2­1 System Testing Test Level Test Number of Tests 1 SROM tests 11 2 Gbus ROM tests 45 3 CPU/memory tests 10 4 Multiprocessor tests 7 5 IOP tests 17 6 DWLMA tests 18 7 Power­up exerciser Not applicable System Self­Test 2­3 Level 1 ­ SROM Tests The first phase of CPU self­test consists of 11 SROM tests. This initial group of diagnostics is loaded from serial ROM into the CPU's primary cache on power­up. The diagnostics are then executed from the primary cache; access to the backup cache is verified, and then the backup cache is tested. Level 2 ­ Gbus ROM Tests The Gbus ROM tests, stored in FEROM, are executed during the second phase of the CPU self­test. These tests continue CPU testing. Level 3 ­ CPU/Memory Tests These tests verify CPU logic that cannot be tested without memory. The CPU/memory tests also test memory logic that is not tested during the module self­test. Level 4 ­ Multiprocessor Tests Multiprocessor tests are executed by CPUs that have passed both self­test and CPU/memory testing. These tests verify CPU­specific logic that is not tested during previous test levels. Level 5 ­ IOP Tests The boot processor runs tests on the IOP module. Level 6 ­ DWLMA Tests The boot processor runs tests on all DWLMA I/O adapters. Level 7 ­ Power­Up Exerciser All CPUs run the power­up exerciser. For more information: KA7AA CPU Technical Manual 2­4 System Self­Test 2.2 Power­Up Sequence Figure 2­1 shows the power­up sequence for the KA7AA proces­ sors. All processors execute three test phases and a boot processor is designated after each test phase. The boot processor tests the IOP module and DWLMA adapters and prints the self­test display. Figure 2­1 KA7AA Power­Up Sequence, Part 1 of 3 System Self­Test 2­5 1 All CPUs and memories execute their on­board self­test at the begin­ ning of the power­up sequence. On line ST1 of the self­test display, a plus sign (+) is shown for every module that passes self­test. 2 The boot processor is determined. On the first BPD line, the letter B corresponds to the processor selected as boot processor. Because the processors have not yet completed their power­up tests, the designated processor may later be disqualified from being boot processor. For this reason, line BPD appears three times in the self­test display. 3 The boot processor prints the results of self­test, lines NODE #, TYP, ST1, and BPD on the self­test display. The boot processor then sig­ nals all CPUs to start running the CPU/MEM tests. 4 All CPUs execute the CPU/MEM tests using the memories. On line ST2 of the self­test display, a plus sign (+) is shown for every module that passes the CPU/MEM test. If all CPUs pass the CPU/MEM tests, then the original boot processor selection is still valid. 5 The boot processor is again determined, for the second time. Results are printed on the BPD line. 2­6 System Self­Test Figure 2­2 KA7AA Power­Up Sequence, Part 2 of 3 System Self­Test 2­7 6 The boot processor prints line ST2 and the second BPD of the self­test display. If no processor is selected as the boot processor, an error mes­ sage is displayed and the console hangs (see Section 2.4.1). 7 All passing CPUs execute the multiprocessor tests. On line ST3 of the self­test display, a plus sign (+) is shown for every module that passes the multiprocessor tests. If all CPUs pass the multiprocessor tests, then the original boot processor selection is still valid. 8 The boot processor is again determined, for the third time. Results are printed on the BPD line. 9 The boot processor copies the console to memory and begins executing in multiprocessor mode. Next, the boot processor prints the results of the multiprocessor tests on the ST3 line and then executes the IOP tests. 2­8 System Self­Test Figure 2­3 KA7AA Power­Up Sequence, Part 3 of 3 System Self­Test 2­9 10 DWLMA adapter test results are indicated on the lines labeled C0 XMI to C3 XMI on the self­test display. A plus sign (+) at the extreme right means that the adapter passed; a minus sign (-) means that the adapter failed. IOP test results are indicated on line ST3. 11 If the DWLMA adapter passes its self­test, then the boot processor re­ ports the self­test results for each XMI adapter. 12 Testing continues. All CPUs execute the power­up exercisers. Specific exercisers test the following: · Cache/memory · Floating point · Network · Disk (internal loopback only) 2­10 System Self­Test 2.3 System Self­Test Results The results of self­test can be determined in three ways. Figure 2­4 Determining Self­Test Results System Self­Test 2­11 There are three ways to check the results of self­test: · Control panel Fault LED. This LED remains lit if a processor, a memory, an IOP module, or an XMI adapter fails self­test. · Module LEDs. The LEDs on the LSB modules display the results of self­test, as described in Section 2.5. · Console terminal. A summary report of self­test appears on the con­ sole terminal. This summary report is described in Section 2.4. 2­12 System Self­Test 2.4 Checking Self­Test Results: Console Display The console display gives the results of module self­tests and addi­ tional testing. Example 2­1 Self­Test Display F E D C B A 9 8 7 6 5 4 3 2 1 0 NODE # 1 A M . . . . . P P TYP o + . . . . . + + ST1 . . . . . . . E B BPD o + . . . . . + + ST2 . . . . . . . E B BPD + + . . . . . + + ST3 . . . . . . . E B BPD . . . . . . . . . . . . . . C0 XMI ­ + . . . . . + . . . . ­ . . C1 XMI + . . . . . . . . . . . . . . C2 . . . . . . . . . . . . . . C3 10 . A0 . . . . . . . ILV . 128 . . . . . . . 128Mb Firmware Rev = V1.0­1625 SROM Rev = V1.0­0 SYS SN = GAO1234567 11 P00>>> 1 The first line lists the node numbers on the LSB and XMI I/O buses. 2 This line indicates the type of module at each LSB node. Processors are type P, memories are type M, and the IOP module is type A. In this example processors are at nodes 0 and 1, a memory is at node 7, and the IOP module is at node 8. 3 This line shows the results of on­board self­test. Possible values for processors are pass (+) or fail (-). For memories, the pass (+) value indicates successful completion of self­test. (Self­test failure indica­ tions are shown in Example 2­2 and Example 2­3.) The "o" at node 8 (IOP module) indicates no on­board self­test. 2 3 4 5 6 7 8 9 System Self­Test 2­13 4 The BPD line indicates boot processor designation. When the system completes on­board self­test, the processor with the lowest LSB ID number that passes self­test and is eligible is selected as boot proces­ sor. This process occurs again after ST2 and ST3 when the boot proc­ essor designation is reported on the second and third BPD lines. 5 During the second round of tests (ST2), all processors run CPU/MEM tests. On line ST2, results are reported for each processor and mem­ ory; a plus sign (+) indicates that ST2 testing passed and a minus sign (-) that ST2 testing failed. The boot processor is again reported on the BPD line. 6 During the third round of tests (ST3), all processors run multiprocessor tests. Results are reported on line ST3, and the boot processor desig­ nation is again reported on the third and final BPD line. 7 A minus sign (-) at the right of the C0 XMI line means that the DWLMA adapter on I/O channel 0 failed self­test. Self­test results for adapters on this I/O channel will not be reported. 8 A plus sign (+) at the right of the C1 XMI line indicates that the DWLMA adapter on I/O channel 1 passed self­test. However, the adapter at XMI node 3 failed its own self­test. I/O channels C2 ( 9 ) and C3 ( 10 ) are not used in this configuration. 11 The last line of the self­test display shows the console firmware and SROM version numbers and the system serial number. For more information: Basic Troubleshooting 2­14 System Self­Test 2.4.1 Processor Fails Self­Test in a Uniprocessor System When the processor in a uniprocessor system fails self­test, the op­ erator is prompted for the slot number of the processor. Where the error message appears in the console display indicates the round of tests the processor failed: ST1, ST2, or ST3. See Example 2­2. Example 2­2 Console Display: Processor Fails in Uniprocessor System >>> init CPU00: Test Failure ­ Select primary CPU 1 F E D C B A 9 8 7 6 5 4 3 2 1 0 NODE # A M . . . . . . P TYP o + . . . . . . ­ ST1 . . . . . . . . B BPD CPU00: Test Failure ­ Select primary CPU 1 o + . . . . . . ­ ST2 . . . . . . . . B BPD CPU00: Test Failure ­ Select primary CPU 1 + + . . . . . . ­ ST3 . . . . . . . . B BPD . . . . + . + . + . . . + . C0 XMI + . . . . . . . . . . . . . . C1 . . . . . . . . . . . . . . C2 . . . . . . . . . . . . . . C3 . A0 . . . . . . . ILV .128 . . . . . . . 128Mb Firmware Rev = V1.0­1625 SROM Rev = V1.0­0 SYS SN = GAO1234567 >>> System Self­Test 2­15 Example 2­2 shows a processor failure in a uniprocessor system. The error message, CPU00: Test Failure ­ Select primary CPU, prompts you to enter the node ID of the failing processor. Note that the CPU node ID appears in the error message (CPU00). Type 0 to obtain the full console display. If you do not type the node ID when prompted, the processor continues to hang. NOTE: The user input in response to the error message is not echoed at the console terminal. Possible Solutions · Move the processor to another slot and retry self­test. · Replace the failing processor with a new processor (see the System Service Manual). 2­16 System Self­Test 2.4.2 Processor Fails ST1 in a Multiprocessor System When a processor in a multiprocessor system fails self­test at ST1, no failure information is reported to the console display. Only passing processors show in the console display. Example 2­3 Console Display: Processor Fails ST1 in Multiprocessor System F E D C B A 9 8 7 6 5 4 3 2 1 0 1NODE # A M M . . . . P . TYP o + + . . . . + . ST1 . . . . . . . B . BPD o + + . . . . + . ST2 . . . . . . . B . BPD + + + . . . . + . ST3 . . . . . . . B . BPD . . . . + . + . + . . . + . C0 XMI + . . . . . . . . . . . . . . C1 . . . . . . . . . . . . . . C2 . . . . . . . . . . . . . . C3 . A1 A0 . . . . . . ILV .128128 . . . . . . 256Mb Firmware Rev = V1.0­1625 SROM Rev = V1.0­0 SYS SN = GAO1234567 >>> System Self­Test 2­17 When a processor fails ST1 testing in a multiprocessor system, no informa­ tion is reported, and the failing processor is logically disconnected from the backplane to prevent faulty system operation. Dots are displayed, as though no processor were physically present. In this example the processor in slot 0 fails ST1 (see 1 ). If the processor in slot 1 failed ST1, then the column for slot 1 would report no information. To confirm a processor failure at ST1, open the LSB card cage and check the module positions against the self­test display. If you find a processor occupying a slot that is not reporting to the self­test display, check the CPU LED lights for test failure information. Possible Solutions · Check module seating in the LSB card cage. Remove the failing mod­ ule and re­insert it; check that the module case is in the tracks and latched securely. · Place a passing processor in the failing slot; if the passing processor fails, you may have a bad LSB slot. Next, take the failing module and try it in a slot where a module has passed self­test. If the failing proc­ essor now passes self­test, avoid using the slot in which both proces­ sors failed testing. · Replace the failing processor with a new processor (see the System Service Manual). 2­18 System Self­Test 2.4.3 Processor Fails ST2 or ST3 in a Multiprocessor System Example 2­4 shows a multiprocessor system with ST2 and ST3 fail­ ures. Since ST2 is a CPU/memory test, the example shows a mem­ ory failure to illustrate the CPU/memory interaction. Example 2­4 Console Display: Processor Fails ST2 or ST3 in a Multiprocessor System F E D C B A 9 8 7 6 5 4 3 2 1 0 NODE # A M M . . . P P P TYP o + + . . . + + + ST1 1 . . . . . . E E B BPD o + + . . . + + ­ ST2 2 . . . . . . E B E BPD + + + . . . + ­ ­ ST3 3 . . . . . . B E E BPD . . . . + . + . + . . . + . C0 XMI + . . . . . . . . . . . . . . C1 . . . . . . . . . . . . . . C2 . . . . . . . . . . . . . . C3 . A1 A0 . . . . . . ILV .128128 . . . . . . 256Mb Firmware Rev = V1.0­1625 SROM Rev = V1.0­0 SYS SN = GAO1234567 P02>>> System Self­Test 2­19 Processors can fail ST1, ST2, or ST3 testing. When a processor fails ST1 or ST2, subsequent ST lines will also indicate failure. Possible Solutions · Reseat processors in slots 0 and 1 and repeat testing. · Place passing CPU in failing slot and if the passing CPU fails, you may have a bad LSB slot. Next, take the failing CPU and try it in a slot where a module has passed self­test. If the failing CPU now passes self­test, avoid using the slot in which both processors failed testing. · Replace the failing processor with a new processor (see the System Service Manual). 1 The ST1 line shows that each of the three CPUs passed the first round of testing. The two memories successfully completed ST1 also. 2 ST2 is the CPU/memory test. The ST2 line shows the CPU in slot 0 failing. Consequently, the failing CPU is no longer designated as the boot processor. The CPUs in slots 1 and 2 conduct the CPU/memory tests. The memories in slots 6 and 7 pass ST2 testing. 3 Only the CPUs in slots 1 and 2 undergo ST3 testing; the processor in slot 0 is not tested because of its previous failure during ST2 testing. In the example, the CPU in slot 1 fails the third round of testing. 2­20 System Self­Test 2.4.4 Memory Fails Self­Test A minus sign (-) at ST1 indicates that the on­board self­test was unable to complete. A minus sign at ST2 or ST3 following a plus sign (+) at ST1 indicates errors in the CPU/memory tests. Example 2­5 Console Display: Memory Fails Self­Test F E D C B A 9 8 7 6 5 4 3 2 1 0 NODE # A M M . . . . P P TYP o ­ + . . . . + + ST1 . . . . . . . E B BPD o ­ + . . . . + + ST2 . . . . . . . E B BPD + ­ + . . . . + + ST3 . . . . . . . E B BPD . . . . + . + . + . . . + . C0 XMI + . . . . . . . . . . . . . . C1 . . . . . . . . . . . . . . C2 . . . . . . . . . . . . . . C3 . . A0 . . . . . . ILV . . 128 . . . . . . 128Mb Firmware Rev = V1.0­1625 SROM Rev = V1.0­0 SYS SN = GAO1234567 P00>>> 1 2 3 4 System Self­Test 2­21 At power­up or reset, each memory module executes a self­test designed to test and initialize its RAMs. The self­test performs a quick scan of the DRAM array and records sections of the array that contain defective loca­ tions. These sections will eventually be mapped out by the console and will no longer be included in the console bitmap. The operating system uses this bitmap to determine which memory to use and not to use. The memory self­test does not provide a pass/fail status. The module LED indicates only that self­test completed. The length of testing depends on the size of the memory array. In Example 2­5: 1 The failure reported at ST1 indicates that the memory module at node 7 is unable to complete its on­board self­test. Consequently, the self­ test LED on the memory module remains unlit. 2 The CPU/memory tests are run on the passing memory at node 6. The failed memory at node 7 is not used during this testing. The ST2 line indicates that both processors and one memory module passed the CPU/memory test. 3 The failing memory is not used during ST3 testing. The minus sign appears only to identify the memory as a failing FRU. 4 The memory at node 6 is configured in the system. The memory at node 7 is not configured because of its failure during ST1. 2­22 System Self­Test 2.4.5 System Fails Power­Up Exerciser When the system fails the power­up exerciser, an error message is displayed at the console terminal. The error message is either an unexpected exception/interrupt (Example 2­6) or a diagnostic er­ ror report (Example 2­7), depending on the type of error found. See Appendix A for parse trees. Example 2­6 Console Display: Sample Unexpected Exception/Interrupt CPU2: unexpected exception/interrupt, vector 60 (18) 1 process entry_02, pcb = 000dad60, pc: 0007feed psl: 00000004 Interrupt/Exception: hard error notification LMERR: 00000180 LMODE: 000102a0 LBER: 0004121f LLOCK: 00004430 2 LDEV: 00008002 LCNR: 00000001 LBESR0:0000000c LBESR1:0000000c LBESR2:0000000c LBESR3:0000000c LBECR0:1f555555 LBECR1:00001000 BIU_CTL: afe09ff8 DIAG_CTL: 00000001 BC_TAG: 00003800 BIU_STAT: f01e10a1 BIU_ADDR:eaaaaaae FILL_SYN: 00000000 FILL_ADDR: 000002a8 gprs: 0: 0000001F 3: 0008E478 6: 00000002 9: 00088610 12: 000DBE8C 1: 0008E478 4: 00000000 7: 00000004 10: 00000000 13: 000DBE74 2: 0008E478 5: 00000000 8: 00083B10 11: 00000000 14: 0007FEED ksp: 000DBE74 esp: 00000000 ssp: 00000000 usp: 00000000 1 A hard error (vector 60) was detected. 2 These are the relevant error registers in the CPU bus interface gate ar­ ray. 3 These are the internal processor error registers (IPRs). 3 System Self­Test 2­23 Example 2­7 Console Display: Sample Diagnostic Error Report *** Hard Error ­ Error #23 on FRU: MS7AA1 1 Memory compare error ID Program Device Pass Hard/Soft Test Time ­­­­­­­­ ­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­ ­­­­­­­­­ ­­­­ ­­­­­­­­ 8e mem_ex mem 6 1 0 1 03:07:01 Expected value: ffffff71 10 Received value: fffffe71 Failing addr: 010003d0 ***End of Error*** 1 A hard error, error #23, is reported on FRU MS7AA1, a memory mod­ ule. The three types of errors reported are hard, soft, and fatal. The error number, in this case error #23, corresponds to the location of the actual error report call within the source code for the failing diagnostic. 2 The process identification number (ID) is 8e. This is the process ID of the failing diagnostic. 3 The program running when the error occurred is mem_ex, the memory exerciser. 4 The device being tested at the time of the error. The device name in this field may or may not match the device mnemonic displayed in the FRU field ( 1 ). 5 The current pass count, 6, is the number of passes executed when the error was detected. 6 The current hard error count is 1. The hard and soft error counts are the number of errors detected and reported by the failing diagnostic since the testing started. 7 The current soft error count is 0. 8 In this example, the failing test number is 1. 9 The time stamp shows when the error occurred. 10 The expected and received values at failing address 010003d0 are re­ ported. 2 3 4 5 6 7 8 9 2­24 System Self­Test 2.5 Checking Self­Test Results: Status LEDs You can check self­test results by looking at the status LEDs on the modules. The processor diagnostic LEDs are described in Sec­ tion 2.5.1 and Section 2.5.2. The LEDs on the IOP module, DWLMA adapter, and clock card are described in Section 2.5.3. Figure 2­5 Processor and Memory Status LEDs System Self­Test 2­25 Processor Status LEDs The large green LED at the bottom of the processor lights when the mod­ ule passes self­test. You can see this LED through the peephole on the module enclosure. To view the diagnostic LEDs on a failing processor: 1. Open the front door of the cabinet. 2. Release the plate covering the modules by loosening the two top screws. 3. Remove the opaque plastic window covering the diagnostic LEDs on the processor by pulling it out with your fingers. Section 2.5.1 describes the diagnostic LEDs on the processor module. Memory Status LEDs A memory module has two green LEDs: a self­test completed LED and a power LED. The self­test completed LED lights when the module com­ pletes self­test. This LED is visible through the peephole on the module enclosure. The power LED lights to indicate that power is present on the module. 2­26 System Self­Test 2.5.1 Processor LEDs The processor LEDs display the results of self­test. You must re­ move the plate covering the card cage and the plastic window on the processor module to view the diagnostic LEDs. Figure 2­6 Processor LEDs After Self­Test System Self­Test 2­27 When self­test passes, the processor's LEDs are set as shown in Figure 2­6. The two LEDs closest to the self­test LED are on if the KA7AA is the boot processor; the LED closest to the self­test LED is on if the KA7AA is a sec­ ondary processor. If self­test fails for the processor or the memory module fails, the top seven processor LEDs contain an error code that corresponds to the number of the failing test. The test number is represented in binary­coded decimal, with the most significant bit at the top. A bit is ONE if the light is ON. For example, assume a processor fails its self­test (large green LED is OFF) and shows the following pattern in the top seven LEDs: TOP (MSB) off 0 = 3 on 1 on 1 off 0 off 0 on 1 = 2 (LSB) off 0 BOTTOM The failing test number decodes to 011 0010 (binary­coded decimal 32). Section 2.5.2 gives more detail on the failing tests indicated by the proces­ sor LEDs. 2­28 System Self­Test 2.5.2 Determining Failing Test Number from LEDs When self­test fails, the top seven green LEDs on the processor in­ dicate the test number. A failing test number is in binary­coded decimal. Table 2­2 Test Numbers Indicated by KA7AA LEDs Test Number Type of Test Failing Device Self­Test Line 111 SROM tests KA7AA ST1 1259 GROM tests KA7AA ST1 6069 CPU/memory tests KA7AA or MS7AA ST2 7076 Multiprocessor tests KA7AA ST3 System Self­Test 2­29 You can see the results of self­test from the LEDs on the processor. KA7AA Self­Test LED Off If the processor's large green LED is off and the top seven small LEDs show an error code in the range of 1 to 59, then the processor's self­test failed and the processor board is bad. After the on­board self­test, each processor that passes self­test runs the CPU/memory tests. The LEDs display error codes for failing CPU/memory tests with numbers ranging from 60 to 69. The self­test LED on the failing processor or the failing memory module is off. Next, processors that pass both the on­board self­test and CPU/memory testing run multiprocessor tests. For failing multiprocessor tests, the LEDs display numbers ranging from 70 to 76. The self­test LED on the processor is off. KA7AA Self­Test LED On, IOP LED Off The IOP module has failed testing if its LED is off. The self­test LED on the KA7AA will be on. 2­30 System Self­Test 2.5.3 IOP, DWLMA, and Clock Card LEDs Figure 2­7 shows the LEDs on the IOP module, the DWLMA adapter, and the clock card. Figure 2­7 IOP, DWLMA, and Clock Card LEDs System Self­Test 2­31 IOP Module LED To view the IOP self­test LED, open the rear door of the cabinet and re­ lease the plate covering the card cage by loosening the two top screws. The green LED is on to indicate that the IOP passed self­test. DWLMA Adapter LEDs Table 2­3 lists the DWLMA LEDs and their self­test passed status. NOTE: If the DWLMA adapter fails self­test, check the clock card at node 7 in the XMI card cage. If the clock card fails testing (power LED is off), the DWLMA adapter will also fail. Table 2­3 DWLMA LEDs Clock Card The clock card, XMI node 7, has a yellow LED that lights to indicate that power is enabled in the XMI card cage. The POWER ENABLE H signal is looped through the clock card so that the XMI power system cannot be en­ bled unless the clock card is properly installed. LED Color Self­Test Passes STP (Self­test passed) Yellow On DBGDIS (Debug disabled) Green On POK (Power OK) Green On FTLERR (Fatal error) Red Off ES (Error Summary) Red Off Diagnostics 3­1 Chapter 3 Diagnostics This chapter discusses how to test processors, memory, and I/O. Sections include: · Test Command · Running ROM­Based Diagnostics on XMI Devices · Running Diagnostics on DUP­Based Devices · Testing an SI Device · Testing a DSSI Device 3­2 Diagnostics 3.1 Test Command The test command allows you to test the entire system, an I/O sub­ system, a single module, a group of devices, or a single device. Example 3­1 Test Commands >>> test # Tests the entire system. # Default run time is 10 # minutes. >>> t xmi0 ­t 60 # Tests all devices # associated with the XMI0 # I/O subsystem. Test # run time is 60 seconds. >>> t xmi1 ­omit "demna*" # Tests all devices # associated with XMI1 except # for Ethernet devices. >>> t ­nowrite "dub*" ­write ­t 120 # Do write/read/compare # testing on all disks not # associated with controller # b. Test run time is 120 # seconds. >>> t demna* # Tests all DEMNA adapters. >>> t du*.0.4.0 # Tests all MSCP disks # associated with the # adapter in slot 4 of # XMI0. >>> t ­q # Status messages will not be # displayed during system # test. Diagnostics 3­3 You enter the command test to test the entire system using exercisers. No module self­tests are executed when the test command is issued without a mnemonic. When you specify a subsystem mnemonic or a device mnemonic with test such as test xmi0 or test ka7aa1, self­tests are executed on the associ­ ated modules first and then the appropriate exercisers are run. Table 3­1 lists the exercisers associated with each module. The same set of tests that run at power­up will run if you enter a test iop0 or a test dwlman command. Table 3­1 Exercisers NOTE: Testing tape devices is not supported by the test command. Run DUP­based tests to test an MSCP­based tape device. See Section 3.3. Module Module Self­Test Run? Exerciser KA7AA Yes Floating Point, Multiprocessor, Memory MS7AA No Memory CIXCD Yes Disk DEMFA Yes Network DEMNA Yes Network KDM70 Yes Disk KFMSA Yes Disk Disk Device No Disk 3­4 Diagnostics 3.2 Running ROM­Based Diagnostics on XMI Devices Some XMI devices can be tested from the console terminal with their on­board ROM­based diagnostics (RBDs). The set host com­ mand is used to connect to the XMI device. Example 3­2 shows a passing RBD test display, and Example 3­3 shows a test failure dis­ play. Example 3­2 Sample RBD Session, Test Passing >>> sh config 1 Name Type Rev Mnemonic LSB 0+ KA7AA (8002) 0000 ka7aa0 7+ MS7AA (4000) 0000 ms7aa0 8+ IOP (2000) 0001 iop0 C0 XMI xmi0 8+ DWLMA (102A) 0104 dwlma0 C+ KDM70 (0C22) 1E11 kdm700 E+ DEMNA (0C03) 0802 demna0 2 >>> set h demna0 3 Connecting to remote node, ^Y to disconnect. t/r 4 RBDE> ST0/TR 5 ;Selftest 3.00 ; T0001 T0002 T0003 T0004 T0005 T0006 T0007 T0008 T0009 T0010 ; T0011 T0012 T0013 T0014 T0015 T0016 T0017 T0018 ; P 6 E 0C03 1 ;00000000 00000000 00000000 00000000 00000000 00000000 00000000 RBDE> ^Y 7 >>> 8 Diagnostics 3­5 1 The show configuration command shows that this system includes a DEMNA at XMI0 node E. 2 The assigned mnemonic for the DEMNA is demna0. 3 The set host demna0 command is typed at the console prompt. A connection is established to the DEMNA adapter. A message confirms that the connection has been made. 4 After the console message no prompt is displayed. Typing t/r invokes the RBD monitor on the adapter being tested and returns the RBD monitor prompt. Note that the E in the RBD prompt refers to the XMI node. 5 The RBD is started with trace set. 6 This field indicates whether the RBD passed or failed; P for passed, F for failed. 7 Enter Ctrl/Y to exit from the RBD monitor. 8 The console prompt returns. For more information: VAX 6000 Model 600 Service Manual 3­6 Diagnostics Example 3­3 Sample RBD Session, Test Failing >>> set h demna0 1 Connecting to remote node, ^Y to disconnect. t/r RBDE> ST0/TR 2 ;Selftest 3.00 ; T0001 T0002 T0003 T0004 T0005 T0006 T0007 T0008 T0009 T0010 ; T0011 T0012 T0013 T0014 T0015 T0016 T0017 T0018 ; F 3 E 0C03 1 ; HE 4 XNAGA XX T0018 5 ; 03 00000000 0000A000 00000000 20150004 20051D97 08 6 7 ; F 8 E 0C03 1 ; HE XNAGA XX T0018 ; 05 00020000 80020000 00000000 20150204 200524A4 01 ; F E 0C03 1 9 ;00000000 00000002 00000000 00000000 00000000 00000000 00000000 10 RBDE> ^Y >>> Diagnostics 3­7 1 The set host demna0 command is typed to establish the connection to the DEMNA adapter. A message confirms that the connection has been made. 2 The RBD is started with trace set. 3 F indicates the first failure during T0018, or test 18. 4 The class of error is displayed here. HE indicates that the error was a hard error. SE means that the error was a soft error, and FE indicates a fatal error. 5 This field lists the number of the test that failed; test 18 failed here. 6 The expected data is shown here. 00000000 is the data test 18 ex­ pected. 7 The received data is shown here. 0000A000 is the data test 18 re­ ceived. 8 F indicates the second failure during test 18. 9 This is the summary line, and a repeat of the failure summary. It lists the pass/fail code (P or F), the node number and device type number of the device executing the RBD, and the number of passes of the RBD. 10 This is the number of hard errors detected. For more information: VAX 6000 Model 600 Service Manual 3­8 Diagnostics 3.3 Running Diagnostics on DUP­Based Devices To run diagnostics on a DUP­based device, enter the set host com­ mand to invoke the DUP server on the selected node. You can test devices associated with the KDM70 (SI) adapter or the KFMSA (DSSI) adapter. 3.3.1 Testing an SI Device Example 3­4 is a sample test session of an SI device. The device tested is a disk associated with the KDM70 adapter. Example 3­4 Testing an SI Device >>> show device 1 polling for units on kdm700, slot 11, xmi0... duc1.0.0.11.2 DUC1 RA70 duc2.0.0.11.2 DUC2 RA70 duc3.0.0.11.2 DUC3 RA70 duc213.0.0.11.2 DUC213 RA82 >>> set host ­dup duc1.0.0.11.2 2 dup: starting DIRECT on kdm70_c.0.0.11.2 () DIRECT 1 D Directory Utility ILEXER 1 D InLine Exerciser Task? ilexer 3 dup: starting ILEXER on kdm70_c.0.0.11.2 () Diagnostics 3­9 1 Type show device to obtain a list of disks and device mnemonics. 2 Enter set host ­dup to connect to the disk you want to test. In the example, the disk with the mnemonic duc1.0.0.11.2 is selected. The DUP program prompts you to select Directory Utility or InLine Exer­ ciser. 3 Type ilexer to start the inline exerciser. For more information: KDM70 Controller User Guide KDM70 Controller Service Manual 3­10 Diagnostics Example 3­4 Testing an SI Device (Continued) *** *** ILEXER (InLine Exerciser) V 001 *** 17­NOV­1992 03:10:28 *** *** Enable Bad Block Replacement (Y/N) [N] ? 4 Available Disk Drives: D0001 D0002 D0003 D0213 Available Tape Drives: NONE Select next drive to test (Tnnnn/Dnnnn) [] ? d0003 5 Write enable drive (Y/N) [N] ? *** Available tests are: 1. Random I/O 2. Seek Intensive I/O 3. Data Intensive I/O 4. Oscillatory Seek Select test number (1:4) [1] ? Select start block number (0:547040) [0] ? Select end block number (0:547040) [547040] ? Select data pattern number 0=ALL (0:15) [0] ? Select another drive (Y/N) [] ? n Select execution time limit, 0=Infinite, minutes (0:65535) [0] ? 1 Select report interval, minutes (0:65535) [1] ? Select hard error limit (0:32) [0] ? Report soft errors (Y/N) [N] ? Execution Performance Summary at 17­NOV­1992 03:12:36 6 D0003 193531233 1998 4508 0 0 0 0 7 8 9 14 Execution Performance Summary at 17­NOV­1858 00:02:37 D0003 * 193531233 2003 4513 0 0 0 0 *** *** ILEXER is exiting. *** 11 12 13 10 Diagnostics 3­11 4 You are prompted to answer a series of questions before testing can be­ gin. 5 Indicate the disk drive to be tested. 6 The execution performance summary line includes the following en­ tries: 7 Unit number 8 Unit serial number 9 Number of requests issued 10 Kbytes read 11 Kbytes written 12 Hard error count 13 Soft error count 14 ECC error count 3­12 Diagnostics 3.3.2 Testing a DSSI Device Example 3­5 is a sample test session of a DSSI device. The device tested is a disk associated with the KFMSA adapter. Example 3­5 Testing a DSSI Device >>> set host ­dup duc1.1.0.13.3 1 dup: starting DIRECT on kfmsa_c.1.0.13.3 (R2UJBC) 2 Copyright (C) 1990 Digital Equipment Corporation PRFMON V1.0 D 20­FEB­1991 09:49:00 3 DKCOPY V1.0 D 20­FEB­1991 09:49:00 DRVEXR V2.0 D 20­FEB­1991 09:49:00 DRVTST V2.0 D 20­FEB­1991 09:49:00 HISTRY V1.1 D 20­FEB­1991 09:49:00 DIRECT V1.0 D 20­FEB­1991 09:49:00 ERASE V2.0 D 20­FEB­1991 09:49:00 VERIFY V1.0 D 20­FEB­1991 09:49:00 DKUTIL V1.0 D 20­FEB­1991 09:49:00 PARAMS V2.0 D 20­FEB­1991 09:49:00 Total of 10 programs. Task? drvtst 4 dup: starting DRVTST on kfmsa_c.1.0.13.3 (R2UJBC) Copyright (C) 1990 Digital Equipment Corporation Write/read anywhere on medium? [1=Yes/(0=No)] 0 5 5 minutes to complete. 6 R2UJBC::MSCP$DUP 5­MAR­1991 11:13:11 DRVTST CPU= 0 00:00:13.72 PI=248 R2UJBC::MSCP$DUP 5­MAR­1991 11:13:41 DRVTST CPU= 0 00:00:28.00 PI=506 R2UJBC::MSCP$DUP 5­MAR­1991 11:14:12 DRVTST CPU= 0 00:00:42.48 PI=765 R2UJBC::MSCP$DUP 5­MAR­1991 11:14:42 DRVTST CPU= 0 00:00:57.03 PI=1024 R2UJBC::MSCP$DUP 5­MAR­1991 11:15:12 DRVTST CPU= 0 00:01:11.30 PI=1282 R2UJBC::MSCP$DUP 5­MAR­1991 11:15:43 DRVTST CPU= 0 00:01:25.62 PI=1541 R2UJBC::MSCP$DUP 5­MAR­1991 11:16:13 DRVTST CPU= 0 00:01:40.13 PI=1800 R2UJBC::MSCP$DUP 5­MAR­1991 11:16:43 DRVTST CPU= 0 00:01:54.63 PI=2059 R2UJBC::MSCP$DUP 5­MAR­1991 11:17:14 DRVTS CPU= 0 00:02:08.94 PI=2318 Test passed. Task? 7 >>> Diagnostics 3­13 1 Enter set host ­dup to connect to the disk you want to test. In the example, the disk with the mnemonic duc1.1.0.13.3 is selected. 2 A message confirms that the connection has been made. 3 The DUP test programs are listed. 4 In response to the user input, the test program drvtst is started. 5 The user types 0 in response to this question. 6 Testing begins. 7 Press RETURN to exit from the DUP program. The console prompt re­ turns. For more information: KFMSA Module Service Guide Parse Trees A­1 Appendix A Parse Trees This appendix shows parse trees. An example showing how to read the parse trees is provided. This appendix includes: · Reading Parse Trees · KA7AA Machine Checks (Figure A­1) · KA7AA Hard Error Interrupts (Figure A­2) · KA7AA Soft Error Interrupts (Figure A­3) · IOP Interrupts (Figure A­4) · DWLMA Interrupts (Figure A­5) A­2 Parse Trees A.1 Reading Parse Trees Example A­1 Sample Machine Check, MCHK Code 06 Parse Trees A­3 A parse tree represents the way the system "sorts" an error condition. The four types of error conditions are machine check, hard error (INT60), soft error (INT54), and IPL 17 errors for the IOP module and the DWLMA adapter. In Example A­1, a machine check error occurred. In the error report, the error was identified as a MCHK_SYNC_ERROR ( 1 ) with a code number of 06 ( 2 ). There are many conditions that can cause a MCHK_SYNC_ERROR. To determine what caused the error, follow this branch of the parse tree and evaluate each condition. The first condition under MCHK_SYNC_ERROR is ICSR.LOCK ( 3 ). If the ICSR.LOCK bit was set, you would then branch off and evaluate each condition under ICSR.LOCK to determine the type of error. In this case, there are three types of errors: VIC data parity error, VIC tag parity error, and inconsistent error. NOTE: Inconsistent errors are usually fatal errors since the machine state is not understood. If the ICSR.LOCK bit was not set, you would advance to the next error condition, not PCSRS.PTE_ER ( 4 ). If this condition was met, you would branch off here and evaluate the conditions listed on this branch of the parse tree. A­4 Parse Trees Figure A­1 KA7AA Machine Check Parse Tree Parse Trees A­5 Figure A­1 KA7AA Machine Check Parse Tree (Continued) A­6 Parse Trees Figure A­1 KA7AA Machine Check Parse Tree (Continued) Parse Trees A­7 Figure A­1 KA7AA Machine Check Parse Tree (Continued) A­8 Parse Trees Figure A­1 KA7AA Machine Check Parse Tree (Continued) Parse Trees A­9 Figure A­1 KA7AA Machine Check Parse Tree (Continued) A­10 Parse Trees Figure A­1 KA7AA Machine Check Parse Tree (Continued) Parse Trees A­11 Figure A­2 KA7AA Hard Error Interrupts A­12 Parse Trees Figure A­2 KA7AA Hard Error Interrupts (Continued) Parse Trees A­13 Figure A­2 KA7AA Hard Error Interrupts (Continued) A­14 Parse Trees Figure A­2 KA7AA Hard Error Interrupts (Continued) Parse Trees A­15 Figure A­2 KA7AA Hard Error Interrupts (Continued) A­16 Parse Trees Figure A­2 KA7AA Hard Error Interrupts (Continued) Parse Trees A­17 Figure A­2 KA7AA Hard Error Interrupts (Continued) A­18 Parse Trees Figure A­2 KA7AA Hard Error Interrupts (Continued) Parse Trees A­19 Figure A­3 KA7AA Soft Error Interrupts A­20 Parse Trees Figure A­4 IOP Interrupts Parse Trees A­21 Figure A­4 IOP Interrupts (Continued) A­22 Parse Trees Figure A­5 DWLMA Interrupts Parse Trees A­23 Figure A­5 DWLMA Interrupts (Continued) Power Requirements and Guidelines B­1 Appendix B Power Requirements and Guidelines This appendix discusses system power requirements and guidelines. Sec­ tions include: · Power System Requirements · Getting Information on Power Regulator Status · Show Power Command · Checking the IOP Module During Power­Up · Identifying an LSB Module Power Converter Failure B­2 Power Requirements and Guidelines B.1 Power System Requirements A second H7263 power regulator may be required to supply adequate power depending on the system configuration. Table B­1 lists the power requirements for each option in the system cabinet and provides a method for determining the need for a second power regulator. Table B­2 lists the power requirements for each option in an expander cabinet. Power re­ quirements are measured in equivalent power units (EPUs). NOTE: If the number of EPUs is greater than 85 in either the system or the expander cabinet, then a second power regulator is required. Table B­1 Power Worksheet, System Cabinet Options Option EPUs Quantity (EPUs x Quantity) Base system 30 1 30 KA7AA 7 MS7AA (64 Mbytes) 10 MS7AA (128 Mbytes) 10 MS7AA (256 Mbytes) 10 DWLMA 4 DEMNA 3 DEMFA 6 CIXCD 3 KDM70 6 KFMSA 4 SF73 storage 8 Total EPUs in last column Power Requirements and Guidelines B­3 Table B­2 Power Worksheet, Expander Cabinet Options B.2 Getting Information on Power Regulator Status Typing a command packet at the console terminal when the console is not running provides you with detailed information about the power system. Figure B­1 shows the command packet structure. Each power regulator has a unique address, determined by its location in the DC distribution box (slot A, B, or C). NOTE: You must type in upper case when entering a command packet. Option EPUs Quantity (EPUs x Quantity) DWLMA 4 DEMNA 3 DEMFA 6 CIXCD 3 KDM70 6 KFMSA 4 SF73 storage 8 Total EPUs in last column B­4 Power Requirements and Guidelines Figure B­1 Command Packet Structure Entering a Command Packet To enter a command packet at the console terminal: 1. Enter the packet header by typing Ctrl/B two times. 2. Type the 1­letter command. 3. Type the power regulator identification letter. 4. Enter the packet terminator by typing Ctrl/M. Power Requirements and Guidelines B­5 B.2.1 Brief Data Packet Data packets sent from the power regulator in response to a B (brief cur­ rent status) command are a stream of nine ASCII characters consisting of four parts: 1. Packet header ­ One ASCII character. The power regulator transmits an A, B, or C, depending on its slot position. 2. Packet Data ­ Two ASCII characters representing the remaining bat­ tery capacity in minutes. 3. Packet State ­ Four ASCII characters which provide the heatsink status, battery pack state, test status, and power supply state. 4. Packet Terminator ­ Two ASCII characters representing the checksum to determine data packet errors. Figure B­2 shows the brief data packet structure. Table B­3 lists the meaning of each value in the following example of a brief data packet: The character format is 8 bits, no parity, with one stop bit. The baud rate is 9600. Table B­3 Sample Brief Packet Information A|23|0|­|P|1|84 Character Value Information 1 A Data packet from power regulator A 23 23 Battery capacity remaining = 23 minutes 4 0 Heatsink temperature within range 5  Battery pack discharging 6 P Last battery pack test completed successfully 7 1 BBU mode 89 84 Checksum value B­6 Power Requirements and Guidelines Figure B­2 Brief Data Packet Structure Power Requirements and Guidelines B­7 B.2.2 Full Data Packet A data packet in response to an S (full current status)/H (history) com­ mand is a single stream of 54 ASCII characters consisting of four parts: 1. Packet header ­ Six ASCII characters 2. Packet data ­ 42 ASCII characters representing 11 parameters 3. Packet state ­ Four ASCII characters which provide the heatsink status, battery pack state, test status, and power supply state 4. Packet terminator ­ Two ASCII characters which represent the check­ sum to determine data packet errors The following figures show the full/history data packet structure. The char­ acter format is 8 bits, no parity, with one stop bit. The baud rate is 9600. Figure B­3 Full Data Packet Structure B­8 Power Requirements and Guidelines Figure B­4 Full Data Packet: Values for Characters 16 Power Requirements and Guidelines B­9 Figure B­5 Full Data Packet: Values for Characters 734 B­10 Power Requirements and Guidelines Figure B­6 Full Data Packet: Values for Characters 3547 Power Requirements and Guidelines B­11 Figure B­7 Full Data Packet: Values for Characters 4854 B­12 Power Requirements and Guidelines Table B­4 lists the meaning of each value in the following example of a full/history data packet: Table B­4 Sample Full/History Packet Information A|L|01|11|0778|0444|0960|0600|0867|0867|0000|0540|0623|23|08|00|0|F|P|O|A8 1 30 54 Character Value Information 1 A Data packet from power regulator A 2 L 30­33798­01 34 01 Primary micro firmware revision = 0.1 56 11 Secondary micro firmware revision = 1.1 710 0778 Peak AC line voltage = 152 volts 1114 0444 DC bulk voltage = 159 volts 1518 0960 48 VDC bus voltage = 46.8 volts 1922 0600 48 VDC bus current = 29.3 amps 2326 0867 48V battery pack voltage = 50.8 volts 2730 0867 24V battery pack voltage = 25.4 volts 3134 0000 Battery pack discharge current = 0.0 amps 3538 0540 Ambient temperature = 26.3 degrees Celsius 3942 0623 Elapsed run time = 6230 hours 4344 23 Remaining battery capacity = 23 minutes 4546 08 Battery discharge time = 8 minutes 4748 00 Spare 49 0 Heatsink temperature within range 50 F Battery pack fully charged 51 P Last battery pack test completed successfully 52 0 Normal operation 5354 A8 Checksum value Power Requirements and Guidelines B­13 B.3 Show Power Command As shown in Example B­1, the show power command can be used to dis­ play the power status of the system. The cabinet contains three power regulators. If the cabinet has fewer than three regulators, the appropriate column (A, B, or C) is left blank. The bottom three lines of the output, showing PIU power status, are printed for the main cabinet only. Example B­1 Sample Output, Show Power Command >>> show power Cabinet: Main Regulator : A B C ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­ Primary Micro Firmware Rev : 2.0 2.0 2.0 Secondary Micro Firmware Rev : 2.0 2.0 2.0 Power Supply State : NORMAL NORMAL BBU MODE AC Line Voltage (V RMS) : 113.71 114.35 115.93 DC Bulk Voltage (VDC) : 227.02 227.02 227.02 48V DC Bus Voltage (VDC) : 47.57 47.57 47.57 48V DC Bus Current (ADC) : 30.17 29.68 29.58 48V Battery Pack Voltage (VDC) : 50.85 50.72 47.91 24V Battery Pack Voltage (VDC) : 25.56 25.56 23.95 Battery Pack Charge Current (IDC) : 2.91 2.90 0 Ambient Temperature (Degree C) : 26.22 24.80 24.75 Elapsed Time (Hours) : 290.00 290.00 290.00 Remaining Battery Capacity (Minutes) : 8.00 8.00 8.00 Battery Cutoff Counter (Cycles) : 0 1.00 1.00 Battery Configuration : 4 Batteries 4 Batteries 4 Batteries Heatsink Status : NORMAL NORMAL NORMAL Battery Pack Status : CHARGING CHARGING DISCHG'G Last UPS Test Status : PASSED PASSED TESTING LDC POWER Status : 0 PIU Primary Status : 0 PIU Secondary Status : 0 B.4 Checking the IOP Module During Power­Up If the console hangs with no error indication during power­up, check the oscillator switch settings on the IOP module. To access the oscillator switch: 1. Open the rear door of the cabinet and release the plate covering the LSB card cage by loosening the two top screws. 2. Slide the IOP module out of the LSB card cage so that you can visually check the oscillator switch settings. Figure B­8 shows the location of the oscillator switch on the IOP module. Figure B­9 shows the correct settings on the IOP oscillator switch. B­14 Power Requirements and Guidelines Figure B­8 IOP Module Power Requirements and Guidelines B­15 Figure B­9 IOP Oscillator Switch Settings B.5 Identifying an LSB Module Power Converter Failure Each LSB module converts 48 volts to 5 volts on the module. If a module power converter fails, damage to the LSB bus is prevented by disabling the 2V reference voltage at all LSB nodes. The self­test LED on the failing LSB module remains unlit. If the control panel Fault light remains lit and the console prompt is dis­ played, then the LSB is good and the failing module is indicated by its self­ test LED. If the IOP LED (see Figure B­8) remains off in a uniprocessor system, then the CPU should be presumed bad. Table B­5 lists the state of the self­test LEDs when a processor, memory, or IOP module power converter fails. Table B­5 LED Status When a Power Converter Fails Failing Module Other CPU LEDs are... Other Memory LED is... IOP LED is... CPU Self­test LED Off On On Off Memory Self­test LED Off On On Off IOP Self­test LED Off On On Off Index­1 Index A AC input box indicators, 1­6 location, 1­6 troubleshooting, 1­7 B Blower location, 1­14 troubleshooting, 1­15 C CCL module LEDs, 1­10 location, 1­10 troubleshooting, 1­10 Clock card LED, 2­31 Control panel keyswitch, 1­12 LEDs, 1­12 troubleshooting, 1­13 D DSSI devices, 3­12 DWLMA adapter LEDs, 2­31 E Exercisers, 3­3 H H7263 power regulator checking status, B­3 LEDs, 1­8 location, 1­8 troubleshooting, 1­9 I IOP module oscillator switch settings, B­13 power converter failure, B­15 accessing the, 2­31 LED, 2­30 M Memory module power converter failure, B­15 LEDs, 2­25 O Oscillator switch, B­13 P Parse trees DWLMA interrupts, A­22 IOP interrupts, A­20 KA7AA hard error interrupts, A­11 KA7AA machine check, A­4 KA7AA soft error interrupts, A­19 Power system command packets, B­3 data packets, B­3 requirements, B­2 show power command, B­13 Processor power converter failure, B­15 Index­2 diagnostic LEDs, 2­24 interpreting diagnostic LEDs, 2­27 R ROM­based diagnostics testing XMI devices, 3­4 S SI devices, 3­8 System self­test checking results of, 2­10 console display, 2­12 control panel Fault LED, 2­11 module LEDs, 2­24 overview, 2­2 T Test command, 3­2 X XMI plug­in unit location, 1­16 power connector, 1­19 power regulators, 1­16 switches and LEDs, 1­16 troubleshooting, 1­18