Save to My DOJO
Table of contents
Although VMware hosts are usually highly reliable, things can and sometimes do go wrong and this is in such cases that Esxtop shines. When this happens, it is important to have troubleshooting tools that can help you quickly resolve the issue. One especially helpful tool is VMware’s Esxtop. This article runs down how using Esxtop to collect performance statistics will help you solve production issues.
Esxtop is a command-line tool that is natively included on your VMware hosts. Here we will demonstrate how to troubleshoot with Esxtop. To get started, connect an SSH session to the host server that you wish to examine. PuTTY works well for this purpose, but there are other tools available as well.
Once you have logged in, the first thing that you will need to do is to retrieve the VMIDs for your virtual machines. The Esxtop utility identifies VMs by their VMID, so creating a list of VMIDs ahead of time will help you to better understand the information that is provided to you. The easiest way to retrieve the VMIDs is to use this command:
Vim-cmd vmsvc/getallvms
You can see what this looks like here:
The first column displays the VMIDs of the virtual machines on this host.
Now, enter the Esxtop commands to access the Esxtop interface shown here:
This is the information that is displayed when you enter the ESXTOP commands.
As you can see in the screen above, the Esxtop commands tool provides a wealth of information about the host’s workload. Although this information might initially seem to be somewhat convoluted, it can be used to help track down performance issues.
CPU Load
The very first line of Esxtop counters demonstrates CPU contention as shown above. It provides information about your VM’s CPU usage with multiple metrics. You will notice that this line concludes with a statement of CPU load averages, followed by three numbers (0.01, 0.05, and 0.15). The first number displays the load average for the last five seconds. The remaining numbers display load averages for longer periods of time (one minute, five minutes, and fifteen minutes).
The load averages should ideally be around 1.00. Lower values mean that the CPUs are being underutilized, while higher values mean that the CPUs are being overutilized. If the load average reaches 2.00, it means that the CPUs are seriously overloaded and that you need to either upgrade your host’s hardware or move some VMs to another host.
The VMware host displayed above is a lab machine with very low CPU usage, but if the load averages had been excessively high then the next logical question is which VMs are consuming the most CPU resources.
The easiest way to determine which VMs are currently using the most CPU resources is to look at the %Used column. This column reflects the percentage of the host’s physical CPU resources that a virtual CPU is using. Looking at the %Ready column can also be telling. This column reflects the percentage of time when the VPU was waiting to execute an instruction but had to wait for CPU resources to be made available. Ideally, the %Ready column should never exceed 5%.
Switching Modes
The Esxtop commands tool is able to display resource usage data for more than just CPU resources. If you press the H key, you will be taken to a help menu that lists the various commands that are supported by the Esxtop tool. If you look at the bottom of the next screengrab, you can see a section labeled Switch Displays. The commands shown in this section can be used to look at other types of performance metrics. For example, pressing M displays memory data. Similarly, pressing N displays networking data.
The help menu lists the Esxtop commands tool’s various modes.
Memory
Press M to access the VMware host memory status. This screen shows you the current virtual machine memory size (MEMSZ), as well as how much memory has been granted to each VM (Grant). You can also see how much swap memory is currently being used (SWCUR), as shown below:
The Esxtop tool provides memory usage statistics.
The main things to check for on the memory display are memory depletion and excessive swapping. While some swapping can be expected on a heavily loaded host, excessive swapping indicates that the host doesn’t have enough memory, and can also lead to performance problems. In this type of situation, you should add more memory to the host or migrate some of the VMs to another host.
Disks
Storage performance issues have always been a hard one for vSphere admins as it has a high impact on virtual machines’ performances and can be tricky to figure out. High storage latency will render the virtual machines sluggish and harm the performance of the app running in it.
The issue can lie anywhere in the IO path from the virtual machine itself to the disks in the storage array, going through server HBA, SAN switch ports, switch load, storage array controllers, RAID type, disk speed. Add to those design choices such as the sizing and number of LUNs as well as path selection policies and a number of virtual machines per datastores and you end up with an overwhelming amount of possibilities when it comes to finding out the root cause of a VM storage performance issue.
All these potential causes of problems each have a specific metric or a specific way to identify if the value is too high or too low. You want high bandwidth and IOPS but you want low latency for instance. Of course, not all of them can be observed in vSphere as some of them will only be available in the virtual machine itself or on the storage array through the troubleshooting tools. However, you can already get a good amount of information solely from Esxtop as we are about to see. We will especially look at latency metrics as these make up the bulk of storage performance issues in virtualized environments (not only vSphere mind you).
Note that there are three different types of disk visualizations in Esxtop; d for disk adapter or storage controllers, u for disk device or volumes and v for disk VMs.
Starting with M, you will get details about each VM’s disks. You will get insights about the latency a VM is observing on read (LAT/rd) and write (LAT/wr) operations, as well as bandwidth metrics (MBREAD/S and MBWRTN/s) and IOPS (CMDS/s, READS/s and WRITES/s).
Esxtop gives VM performance details with V
Then pressing D will show information for each disk adapter or vmhba. This will be particularly useful to identify bottlenecks on your server and will make it quite obvious if something is wrong with a specific HBA. You get similar metrics as above with more detailed latency data (xAVG/ cmd) which we will explain further in the next section.
Esxtop gives disk adapter performance details with D
Finally, pressing U gets you valuable data for each volume or LUN that is presented to the host. This one is probably my go to display as performance issues are usually observed at the volume or array level, meaning it is more granular to identify. Here you get data about queue length (which is a deeper advanced level of troubleshooting) but it also contains the latency metrics with xAVG/ cmd.
Esxtop gives volume performance details with U
CMDS/s | This is the total amount of commands per second and includes IOPS and other SCSI commands such as SCSI reservations, locks, vendor string requests, unit attention commands etc. being sent to or coming from the device or virtual machine being monitored. |
DAVG/cmd | Average latency in milliseconds per command being sent to the volume. |
KAVG/cmd | Latency caused by the host’s VMkernel. |
GAVG/cmd | Latency as it is observed by the guest OS in the VM. This number is calculated with the formula: DAVG + KAVG = GAVG |
Getting More Data
Finally, one of the most important things to know about using the Esxtop commands tool is that the displays are highly customizable. Simply press the F key and you will see a list of columns that Esxtop can display for the current mode. The next screengrab, for example, shows the columns that are available for the tool’s memory mode. The columns with an asterisk next to them are currently enabled, while the others are disabled. To display a column, simply press the corresponding letter. You can also press the letter associated with a column if you want to stop displaying that column.
Esxtop allows you to toggle columns on and off.
To properly protect your VMware environment, use Altaro VM Backup to securely backup and replicate your virtual machines. We work hard perpetually to give our customers confidence in their VMware backup strategy.
To keep up to date with the latest VMware best practices, become a member of the VMware DOJO now (it’s free).
Conclusion
In a world filled with dashboards and UI, sometimes all you need is a command-line tool that provides you instantly with the details you need. Esxtop provides those managing VMware hosts with access to valuable and often insightful information that can be used to help make critical decisions.
Duncan Epping wrote a timely blog over a decade ago that is still relevant to this day where he details the metrics in esxtop. This blog was updated over time to match software evolutions in vSphere.
Not a DOJO Member yet?
Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!