Table of ContentsLibraryView in Frames

Troubleshooting BMC hardware failures

An BMC problem can occur when a hardware failure has occurred on the BMC.

Considerations

When the BMC fails, an EMS event similar to the following can be found: [asup.msg.bmc.heartbeat.stops:critical]: Data ONTAP lost communication with the baseboard management controller (BMC).

Steps

  1. Run diagnostics by entering the following command from the boot environment prompt: boot_diags

    The diagnostics main menu appears.

  2. From the main menu, enter the following option: mb

    The motherboard diagnostic menu appears.

    Enter Diag, Command or Option: mb
    Motherboard Diagnostic
    ------------------------------
    1: Comprehensive motherboard diags  71: Show PCI                                         configuration
    2: Misc. board test menu            72: Show detailed PCI                                         info
    3: Cache test menu                  73: Initialize real-                                        time clock
    4: On-board GbE test menu           75: System serial                                         info setup[Mfg]
    5: On-board FCAL test menu
    6: SAS Test Menu                    91: Enable/disable                                         looping
    7: IB Test Menu                     92: Stop/Continue                                         looping on error
    8: BMC Test Menu                    93: Extended/Normal                                         test mode
    9: NVMEM Test Menu                  99: Exit

  3. From the diagnostic prompt, enter test number 8.

    The BMC diagnostic menu appears.

    Select test or feature by number [0]: 8
    BMC Diagnostics
    --------------- 
    1: Comprehensive Test             72: Get Reason for                                       Restart
    2: BMC Self Test                  73: Show Device Info
    3: Environment Test               74: Show SDR Info
    4: SDR Read Test                  75: Show SEL Info
    5: SEL Read Test                  76: Clear SEL [Mfg]
    6: LCD Exercise                   77: Emergency Shutdown                                       [Mfg]
    7: BMC Timer test                 78: BMC Update Menu                                       [Xtnd]
    10: Show BMC SSH Keys             79: Dump SEL Records
                                      80: Dump Raw SEL                                       Records
    41: BMC NMI Test
    42: BMC Front Panel Button Test   91: Enable/disable                                       looping
    43: SEL Write Test [Xtnd]         92: Stop/continue                                       on error
                                      93: Extended/Normal                                       test mode
    71: Show BMC SEL Time             99: Exit

  4. Enter the appropriate test number from the diagnostic prompt. To perform a comprehensive test, enter test number 1.

    Note: It takes several minutes to complete the comprehensive test.

    The results of the test are displayed.

  5. Based on the results of Step 4, diagnose the problem. If the problem persists, reseat the BMC and repeat Steps 1 to 5.

    If the problem still persists, contact technical support for assistance.