Fast. Very fast. That’s what we had in mind when we designed SexiGraf. When you need vSphere metrics, the obvious way is the PerformanceManager, but we need something faster so we choosed managed object properties and quickstats like ResourcePoolQuickStats. If we have no other choice, we failback to the PerformanceManager but we only query the last 15 samples of the RealTime samplingPeriod since we pull vSphere metrics every 5 minutes.
The Cluster FullStats dashboard offers you a single pane of glass for your cluster performance and usage metrics. You will find simple informations like CPU/RAM usage or vmnic traffic but also complex graph like memory quickstats (inspired by the vCenter Cluster utilization tab) or the distributed fairness graph with the fairness quickstats of distributed CPU/RAM resources allocation on the cluster hosts.
In the IORM row, you’ll find the average latency and the IOPS sum of the cluster shared datastores. The metrics used here were initialy introduced by Storage I/O Control but since vSphere 5.1 you can chose to only collect the stats without enabling SIOC. We encourage you to do enable the stats collection since those counters are really mission critical. Besides, the latency metric resolution is 1 microsecond where the legacy latency metrics resolution is 1 milisecond and normalized meaning cluster wide latency.
If the stats collection is not enabled, we failback to the legacy read/write metrics from a random host in the cluster. Unfortunately, in failback mode you won’t have the aggregated IOPS metric since collecting metrics from all hosts would be very time consuming and we want to keep SexiGraf very fast!
All the metrics you will find here are aggregated to give you a full cluster level experience. For example, the shared datastores utilization graph aggregates all the multipleHostAccess datastores in the cluster so you won’t see local storage here as we hope it does not participate in your vm storage.
If you need to aggregate clusters, you want to use the Multi Cluster FullStats dashboard and select the clusters you want to participate in the graphs.
The ESX FullStats dashboard is similar to the Cluster FullStats but for standalone ESX servers. Because we focus on ESX resources here, we did not aggregated the datastore and vmnic metrics. You’ll find a graph for every single one of them but you can select which one will be displayed if not all.
Like in Cluster Fullstats, you’ll be able to track memory overcommit (i.e. TPS) in the memory quickstats graph but also CPU power management impact in the CPU utilization/demand graph if demand>usage.
Multi Cluster Capacity Planning
“How many more VMs can we deploy on those clusters?” Your boss probably asked you this one a dozen time. This time is over. This great dashboard shows you how many vm runs and how many you got left based on compute and storage consumption for each cluster.
Want to compare only 3 of them? Just select them in the list. You got different SLA based on overcommit ratios or replication on your DR site? Use the scale variable to change the filling ratio of the compute.
Multi Cluster CPU/RAM Utilization
The CPU/RAM Utilization dashboard allows to compare the compute metrics between clusters. Why? Because in 1s you’ll be able to catch if you forgot a host in maintenance mode in a cluster because the effective metric is much low that the total metric. You’ll also be able to see the impact of the power management policy you set on your servers with the demand and usage metrics. You may also noticed how low the guest usage (active) is compared to private metric (consumed). Or simply check how well balanced your clusters are.
Cluster IORM Stats
The IORM (aka SIOC) Stats dashboard let you compare datastores Storage I/O Control latency and IOPS metrics among clusters. See Cluster FullStats dashboard for details.
If you need to aggregate clusters, you want to use the Multi Cluster IORM Stats dashboard and select the clusters you want to participate in the graphs.
Multi Cluster Usage
In this dashboard, each cluster is a color box displaying the most constrained resource among CPU, RAM and storage. For example, when a cluster is using 33% of CPU, 58% of RAM and 40% of the shared storage, 58% will be displayed. If another cluster uses 21% of CPU, 45% of RAM and 76% of the shared storage, 76% will be displayed. The box is green between 1% and 65%, yellow between 65% and 80%, red above 80% and white under 1% so you can’t miss anything in your infrastructure.
The FlambX (pronounced Flambi-X) dashboard is not actually for you but rather for managers who love BIG numbers. But you’ll probably like it too. It basicaly shows your infrastructure horsepower.
Multi Cluster QuickStats
If you need to compare the basic compute and storage metrics between your clusters, this is the dashboard you’re looking for.
If you need to compare standalone ESX, it’s the Multi ESX QuickStats.
Multi Cluster vMotion
The vMotion frequency is a very useful indicator of the cluster compute resources availabilty for your VMs. If you witness a lot of vMotions in one of your cluster, you may want to evacuate VMs, add resources or change the DRS migration threshold to a more restrictive level.
You got a mail from your boss asking you a storage report to plan the next storage array migration? Just go the Multi Datastore dahsboard, pick the target clusters and send him a screenshot. It only took you 15sec. Or maybe you’re just curious about the overall storage overcommit of your VDI infrastructure. The big graph a the top is the aggregation of every selected datastores of the selected clusters displayed in the small graph at the bottom.
This dashboard allows you to compare the selected vmnic of the selected ESX in your favorite cluster. If this cluster is contained in a blade enclosure, you’re now able to check what is going in and out from your chassis. Noticed the cacti style of the graphs?
Cluster Multi ESX LiteStats
Since we got some spare stats, we used it in a light host centric dashboard so you can have a quick look at ALL your hosts CPU/RAM usage and uptime (cluster member and standalone).
Multi Cluster Top N VM Stats
Starting from SexiGraf 0.99b you can monitor the top N VM quickstats* (1 to 20 VM per graph). Now you can find the bad *sses of your datacenters!
Of course you can also search and pick some VM of your choice to compare their stats. The legend is formated as such to help you locate any tango: <cluster_name>.<vm_name>
You may want a top 5 VM per cluster instead? We did that too 😉
Check out the Multi Cluster Top N VM QuickStats dashboard and simply slide over to reveal the usual suspects:
Multi Cluster Top N VM Overcommit
We could have added those graphs in the Top N VM Stats dashboard but we wanted to kept the bad and the ugly apart from the good. Since SexiGraf 0.99b you can also monitor the top N overcommited* VM (1 to 20 VM per graph).
This dashboard will help you to identify situations like memory limits, vm that have been in contention in the past with remaining zipped/swapped pages or idle tax.
* The graph names match the Counters names of the Performance Manager to let you dig into the details if you want to.