Storage

Storage

lol

When sending alerts from MegaRAID Storage Manager (MSM) fails, even though the SMTP-server configuration is correct and the network access is permitted, it might be due to what I believe is a bug in MSM.

I've seen this issue in servers with multiple NICs/servers having changes made to the pNICs after configuring MSM.

In short, the configuration utility does not bind to the correct IP-address/NIC when saving the configuration. This setting is nowhere to be seen in MSM, which is why it took me hours to figure out. To check and potentially fix the issue, do the following:

1) Open MSM and navigate to Tools > Monitor Configure Alerts

2) Make sure the settings for the SMTP-server are correct

3) Click Save Backup, store the file monitorconfig.xml on the Desktop and click "OK" to close the Configure Alerts-window

4) Edit monitorconfig.xml with Notepad or another text-editor

5) Find the <nic>-tag in the file and set it to the IP-address of the interface that should be used to access the SMTP-server

Example, see last line:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<monitor-config>
<actions>
<popup/>
<email>
<nic>10.106.22.40</nic>
...

6) Save the file and return to MSM

7) Navigate to Tools > Monitor Configure Alerts

8) Click Load Backup, then Yes, select monitorconfig.xml and click OK

9) Navigate to Tools > Monitor Configure Alerts and test again

Note: The dialog may display "The test email could not be sent. Check the mail server settings and try again." - ignore this and check if the email is delivered within a few minutes.

I don't know the mechanism MSM uses for verifying the email is sent - more often than not, the error displays but the email is delivered anyway.

Versions tested: 16.05.04.01, 17.05.00.02, 17.05.01.03

lol

First use the storcli binary to identify failed drives on each controller (sure, multiple instances of grep could be improved with regex)

./storcli /cALL/eALL/sALL show all|grep Failure|grep -vi predict

   Example output:

Status = Failure
/c0/e1/s5  Failure    46 -

  Start locating the failed drive:

./storcli /c0/e1/s5 start locate

  Example output:

CLI Version = 007.1017.0000.0000 May 10, 2019
Operating system = VMkernel 6.7.0
Controller = 0
Status = Success
Description = Start Drive Locate Succeeded.

   Stop locating the failed drive:

./storcli /c0/e1/s5 stop locate

  Example output:

CLI Version = 007.1017.0000.0000 May 10, 2019
Operating system = VMkernel 6.7.0
Controller = 0
Status = Success
Description = Stop Drive Locate Succeeded.

  To stop locate for all controllers, run the following command:

./storcli /cALL set activityforlocate=off