Save to My DOJO
Table of contents
Microsoft failover clustering offers Cluster Shared Volumes (CSVs). These allow all nodes of a cluster to communicate with the same block storage LUN simultaneously. “Redirected Access” enhances this feature by redirecting I/O through the CSV’s owning node when any of the other nodes cannot access it directly. Unfortunately, if you don’t have a monitoring system in place, a CSV could go into Redirected Access mode and you’d never know. The best outcome is a minor performance hit. Depending on your available physical pathways, that performance hit might also impact Live Migration. If only a single node has direct access, then all of the contained roles will fail if that node fails. Of course, you’d also probably like to know if a CSV goes offline completely.
This script allows Nagios to watch a single designated CSV. If it fails completely, a Critical state is set in Nagios. If it is in Maintenance Mode, a Warning state is set in Nagios. My thought process for that condition is that Maintenance Mode is usually intentional, but you don’t want it left there for an extended period of time. You configure the response level to a Redirected Access state. If you’re using a guest cluster in 2012 R2 with a shared VHDX, then the CSV will always be in Redirected Access mode, so that would be a normal condition for you.
This script is useful for any cluster that uses CSVs (for example, SOFS and SQL), not just Hyper-V clusters.
If you’re new to Nagios, then you should probably start with the How To: Monitor Hyper-V with Nagios article first. I did publish a follow-up article with a script with some base functions for Hyper-V, but that script is not required to use this one. The base script for clusters is required. It’s linked below.
NSClient++ Configuration
These changes are to be made to the NSClient++ files on all Windows nodes that are part of the cluster to be monitored. These instructions do not include configuring NSClient++ to operate PowerShell scripts. Please refer to the aforementioned how-to article for that.
C:Program FilesNSClient++nsclient.ini
If the indicated INI section does not exist, create it. Otherwise, just add the second line to the existing section.
[/settings/external scripts/wrapped scripts] check_csvstatus=check_csvstatus.ps1 $ARG1$ $ARG2$
C:Program FilesNSClient++scriptscheck_csvstatus.ps1
The required script clusterbase.ps1 must exist in the same folder. This script was written against version 1.1 of that script and will check for it.
<# check_csvstatus.ps1 Written by Eric Siron (c) Altaro Software 2017 Version 1.1 November 17, 2017 Intended for use with the NSClient++ module from http://nsclient.org Checks a Cluster Shared Volume and returns the status to Nagios. # for $RedirectedAccessHandleMode, specify 0 to ignore, 1 to treat as a warning, 2 to treat as critical #> param( [Parameter(Position=1)][String]$CSVName, [Parameter(Position=2)][UInt16]$RedirectedAccessHandleMode = 1 ) begin { $RequiredClusterBaseVersion = 1.1 } process { if([String]::IsNullOrEmpty($CSVName)) { Write-Host -Object 'No CSV was specified' Exit 3 } $ClusterBase = Join-Path -Path $PSScriptRoot -ChildPath 'clusterbase.ps1' . $ClusterBase $ClusterBaseVersion = Get-ANClusterBaseVersion if($ClusterBaseVersion -lt $RequiredClusterBaseVersion) { Write-Host -Object ('clusterbase.ps1 must be at least version {0} to use this script (found version: {1})' -f $RequiredClusterBaseVersion, $ClusterBaseVersion) Exit 3 } $CSVPartition = Get-ANCSVFromCSVName -CSVName $CSVName switch($CSVPartition.FaultState) { 0 { Write-Host -Object 'Normal operation' Exit 0 } 1 { Write-Host -Object 'Redirected Access' Exit $RedirectedAccessHandleMode } 2 { Write-Host -Object 'No Access' Exit 2 } 3 { Write-Host -Object 'Maintenance Mode' Exit 1 } default { Write-Host -Object ('Unable to detect the status of CSV "{0}"' -f $CSVName) Exit 3 } } }
Nagios Configuration
These changes are to be made on the Nagios host. I recommend using WinSCP as outlined in our main Nagios and Ubuntu Server articles.
/usr/local/nagios/etc/objects/commands.cfg
The Hyper-V Host Commands section should already exist if you followed our main Nagios article. Add this command there. If you are not working with a Hyper-V system, then you can create any section heading that makes sense to you, or just insert the command wherever you like.
define command{ command_name check-csvstatus command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -p 5666 -c check_csvstatus -a $ARG1$ $ARG2$ }
/usr/local/nagios/etc/objects/hypervhost.cfg
This file and section were created in the Hyper-V base scripts article. As long as it appears somewhere in one of the activated .cfg files, it will work.
This is a sample! You must use your own cluster name object and CSV name!
The parts you want to set are:
- For the host_name, enter the cluster name object of the cluster that hosts the CSV. Mine is called “clhv1”.
- For the service_description, use whatever makes sense to you. This is what appears in the Nagios web interface and in any alert e-mails.
- For the check_command, use the format check-csvstatus!csvname!#
The number at the end of the check_command line specifies how you want to treat the CSV if it is in Redirected Access mode. In the following sample, I used a 2. Values are:
- 0: a Redirected Access status will be noted but ignored. Use this for CSVs with guest clusters using shared VHDX on 2012 R2
- 1: a Redirected Access status will set a Warning condition in Nagios; this is the default in the script, although I didn’t test how Nagios/NSClient++ cope with a parameter that isn’t specified
- 2: a Redirected Access status will set a Critical condition in Nagios
############################################################################### ############################################################################### # # CLUSTER SERVICE DEFINITIONS # ############################################################################### ############################################################################### # check status of CSV1 on CLHV1 define service { use generic-service host_name clhv1 service_description CSV1 Status check_command check-csvstatus!CSV1!2 }
Nagios must be restarted after these files are modified.
sudo service nagios checkconfig sudo service nagios restart
Not a DOJO Member yet?
Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!
5 thoughts on "Nagios for Hyper-V: Cluster Shared Volume Status"
Hello Eric,
thank you for the nice article.I have 8 node hyper v Cluster and have already installed the Nsclient plugin and can easily read the SNMP values of the servers, like ram, cpu and Uptime,
when i follow your article to monitor the status of CSV volumes, it gives me this error:
CHECK_NRPE: Error – Could not complete SSL handshake.
can you help me what can be the cause ?
Go through this article: https://www.altaro.com/hyper-v/securely-monitor-hyper-v-nagios-nsclient/
I’m trying to execute the command but I get the following. Any ideas?
[root@BHNAGIOSXI libexec]# ./check_nrpe -H 10.11.110.51 -t 30 -c check_csvstatus -a CSV01 80 90
Unable to detect the status of CSV “CSV01”
On a node, preferably not the owner node, preferably logged in as the same account that runs the script:
. C:Program FilesNSClient scriptsclusterbase.ps1
Get-ANCSVFromCSVName -CSVName CSV01