The script in this article will scan a Hyper-V host to find its oldest checkpoint. It is an active check tied to the host, not to any particular virtual machine. In order to use it, you must have a functioning Nagios environment and NSClient++ operating as configured in our main Nagios article. It does not directly require any of the base scripts, but the sections mentioned in that article are used here.
Updated May 2, 2018: Version 2.0
- Using the CIM cmdlets instead of WMI cmdlets for speed
- Improved performance by reducing number of CIM calls
- The checkpoint report properly identifies the owning virtual machine
- Ignores checkpoints created by a pooled VDI collection
NSClient++ Configuration
These changes are to be made to the NSClient++ files on all Hyper-V hosts to be monitored.
C:\Program Files\NSClient++\nsclient.ini
If the indicated INI section does not exist, create it. Otherwise, just add the second line to the existing section.
[/settings/external scripts/wrapped scripts] check_checkpointage=check_hvcheckpointage.ps1 $ARG1$ $ARG2$
C:\Program Files\NSClient++\scripts\check_hvcheckpointage.ps1
This script scans a Hyper-V host for its oldest existing checkpoint and reports back to Nagios. This file does not exist and must be created.
<# check_hvcheckpointage.ps1 Written by Eric Siron (c) Altaro Software 2018 Version 2.0 May 2, 2018 Intended for use with the NSClient++ module from http://nsclient.org Checks a Hyper-V host for its oldest checkpoint and returns the status to Nagios. #> param( [Parameter(Position=1)][String]$WarningLevel = '2d', [Parameter(Position=2)][String]$CriticalLevel = '3d' ) Set-Variable -Name OldestCheckpoint if($WarningLevel -match '[mhdwMHDW]') { $WarnMeasurement = $Matches[0][0] if($WarningLevel -match '\d*') { $WarnLength = $Matches[0] } } if($CriticalLevel -match '[mhdwMHDW]') { $CriticalMeasurement = $Matches[0][0] if($CriticalLevel -match '\d*') { $CriticalLength = $Matches[0] } } $OldestCheckpointCreationTime = [DateTime]::Now $RawCheckpointIDs = Get-CimInstance -Namespace root/virtualization/v2 -Property Dependent -Class Msvm_SnapshotOfVirtualSystem foreach ($RawCheckpointID in $RawCheckpointIDs) { $Checkpoints = Get-CimInstance -Namespace root/virtualization/v2 -Property VirtualSystemIdentifier, CreationTime, ElementName -Class Msvm_VirtualSystemSettingData -Filter ('InstanceID="{0}" AND VirtualSystemType="Microsoft:Hyper-V:Snapshot:Realized" AND NOT ElementName LIKE "%RDV_ROLLBACK%"' -f $RawCheckpointID.Dependent.InstanceID) foreach($Checkpoint in $Checkpoints) { $CheckpointCreationDate = $Checkpoint.CreationTime if($CheckpointCreationDate -lt $OldestCheckpointCreationTime) { $VM = Get-CimInstance -Namespace root/virtualization/v2 -Property ElementName -Class Msvm_ComputerSystem -Filter ('Name="{0}"' -f $Checkpoint.VirtualSystemIdentifier) $OldestCheckpoint = @($Checkpoint.ElementName, $VM.ElementName, $CheckpointCreationDate) } } } if($OldestCheckpoint) { [TimeSpan]$CheckpointAge = [DateTime]::Now - $OldestCheckpoint[2] $AgeString = '{0} minutes' -f $CheckpointAge.Minutes if($CheckpointAge.Hours) { $AgeString = '{0} hours, {1}' -f $CheckpointAge.Hours, $AgeString } if($CheckpointAge.Days) { $AgeString = '{0} days, {1}' -f $CheckpointAge.Days, $AgeString } Write-Host ('Checkpoint "{0}" for VM "{1}" is {2} old. Created: {3}.' -f $OldestCheckpoint[0], $OldestCheckpoint[1], $AgeString, $OldestCheckpoint[2]) $ComparisonLength = 0 switch($CriticalMeasurement) { 'm' { $ComparisonLength = $CheckPointAge.Minutes } 'h' { $ComparisonLength = $CheckpointAge.Hours } 'd' { $ComparisonLength = $CheckpointAge.Days } default { $ComparisonLength = $CheckpointAge.Days * 7 } } if($ComparisonLength -gt $CriticalLength) { Exit 2 } $ComparisonLength = 0 switch($WarnMeasurement) { 'm' { $ComparisonLength = $CheckPointAge.Minutes } 'h' { $ComparisonLength = $CheckpointAge.Hours } 'd' { $ComparisonLength = $CheckpointAge.Days } default { $ComparisonLength = $CheckpointAge.Days * 7 } } if($ComparisonLength -gt $WarnLength) { Exit 1 } Exit 0 } else { Write-Host 'No checkpoints' exit 0 }
Restart the NSClient++ service.
Nagios Configuration
These changes are to be made on the Nagios host. I recommend using WinSCP as outlined in our main Nagios and Ubuntu Server articles.
/usr/local/nagios/etc/objects/commands.cfg
The Hyper-V Host Commands section should already exist if you followed our main Nagios article. Add this command there.
################################################################################ # # Hyper-V Host Commands # ################################################################################ # $ARG1$: age that triggers a warning condition. use one letter (m = minute, h = hour, d = day, w = week) and one number. ex: 3d for 3 days. order does not matter # $ARG2$ age that triggers a critical condition define command{ command_name check-checkpoint-age command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -p 5666 -c check_checkpointage -a $ARG1$ $ARG2$ }
/usr/local/nagios/etc/objects/hypervhost.cfg
This file and section were created in the required base scripts article.
############################################################################### ############################################################################### # # HYPER-V SERVICE DEFINITIONS # ############################################################################### ############################################################################### # check hosts individually for oldest checkpoint define service{ use generic-service hostgroup_name hyper-v-servers service_description All VMs: Max Checkpoint Age check_command check-checkpoint-age!2h!3d }
As shown, each host in “hyper-v-servers” will be checked at the default interval. If a checkpoint is older than 3 days, it will trigger a Critical alert. If a checkpoint is older than 2 hours, it will trigger a warning. You can modify the above as needed. You can also duplicate this service but apply it to specific a specific “hostname” instead of “hostgroup_name” to set per-host warning and critical levels.
You must restart Nagios to apply this configuration.
sudo service nagios checkconfig sudo service nagios restart