A few week ago I went over an issue I faced when deploying a number of VMs on a Linux OS.  I did find a solution to my issues which was thanks to the open nature of Linux itself, however I promised myself to learn more about performance monitoring and to write about it.  Today I feel much more comfortable analysing and dealing with issues that come up and this list of utilities helped me tune the performance of my systems.
Just like KVM, a virtualisation solution available directly on the Linux Kernel, numerous tools exist right out of the box on many Linux distributions that help one monitor and tune performance.  When these are not enough, a simple package installation will make available even more powerful tools.
When dealing with performance issues we would typically look at CPU usage, memory consumption, and disk and network utilisation. 
The CPU is by far the fastest component in your system.  In order to make the most efficient use of the system we would need to have it at a high usage percentage (without saturating it, of course).  If we are running some heavy load service and it is performing badly while the CPU is sitting happily at 4% than something else must be going very wrong.  I'll go through some commands and methods that I came across that helped detect and solve some severe performance issues.
|  | 
| Pedal to the metal | 
General overview
First things first - before digging into configuration files and what not, we should get a general overview of what is happening our system.  I would generally start with some tools that provide a good context of the processes, such as a list of processes and respective resource utilisation, general memory availability and overall system load.
top
The top utility quickly gives a good indication of which processes are using too many resources.  It is good to note that top shows CPU usage as a percentage of all processing capabilities; if you have 4 CPUs, 99% CPU usage means that your consumption is about 25% of all available processing power.  We'll get into this later on when we see how much each CPU is being utilised.
Top can be easily brougt up by typing in just top.  Various other parameters may be used for finer control.  These can be seen by passing in man top.  When done, hit q to exit. 
dstat
Similar to top but this is system focused rather than process oriented.  It shows general metric on CPU, disk, network and memory.  This utility is extremely extensible, with plugins enabling many more features, even integrating with MySQL, for example.  Despite these features, simply firing up dstat with zero arguments is enough to provide a good overview.  A favourite config of mine is the one below which I found very useful when analysing my VM problem.  It highlights the top CPU consuming process as well as the blocking IO (which is typically slo-o-ow) .  Additionally, I also get some nice metrics on memory usage, including buffers and caches.
Errors
|  | 
| Gotta catch 'em all | 
/var/log/
On Debian or Ubuntu based distributions it is normal to find application and daemon logs in this directory.  Navigating to it and listing all the files (and grepping the result) will probably yield the log files or your underperforming service.  In this example, I simply list all the files in this log directory and look for the apt directory - a trivial command which may be easily extended.  MySQL for example has a slow query log file, which may be useful in case of a slow database.  It is a matter of simply opening up the log file and looking for warnings and errors to find a potential problem.
dmesg | tail
Another great command is dmesg.  This is just like calling cat and a log file, however this is by default callable from anywhere just like a normal command.  What dmesg does is list the log messages from the kernel.  Deamons will typically log messages that are accessible form this command.  For example, if we misconfigure nginx and try to launch the service, any errors will be logged here.  To bring up the last few lines of the log (which can be very long), simply pass the output to the tail command.  The syntax is simply dmesg | tail.
mentioned earlier on, the general areas are the CPU, memory, and IO.  From this point, it is better to make use of an other package not typically available out of the box.  I'll deal with Debian based distributions (Ubuntu, Mint, Elementary, etc.), however such packages are available on others via their respective package managers.
The sysstat packages offers numerous performance monitoring tools - install it using sudo apt-get install sysstat.
To invoke pidstat and keep a rolling output, simply pass pidstat 1.
As a result, the free command will display another column and show that there is actually very little memory that is free. This can also be seen in the dstat output where the total RAM available is calculated by adding all 4 columns rather than just 2. The cache is quite flexible and will be cleared as soon as more memory is required by processes, meaning that practical free RAM is equal to the free + cached memory. To run this command, simply pass free -m. You may pass zero parameters to get the values in kilobytes, -m for megabytes and -g for gigabytes.
Similar to pidstat, this is from the sysstat package and may be used to produce a rolling output. An excellent way to relate CPU usage to slow IO is the iowait column. The lower this is the greater the efficiency, since it means that the CPU is actually doing work rather than waiting uselessly.
Using this command is simple: mpstat -P ALL 1
Do use this command when things are getting slow since it may quickly lead to either a problem in IO or simply an improperly configured application.
On relatively basic systems though, a high utilisation which is accompanied with a high iowait is very much likely a case of very bad IO performance. I noted that (unsurprisingly) SSDs will drop utilisation from 99% down to about 15%. SSDs may not always be available, however in my case I was able to map a region of memory as a filesystem. Of course, many cases will not have (or want) to be mapped to RAM, but finding a better physical device will most probably fix issues in this metric (or, if possible, implementing efficient buffers and writing to disk on separate threads).
In order to produce a nice rolling update, just issue: iostat -xz 1
The amount of data going through each network interface can be monitored in, again, a rolling output. This may be useful to check if the NIC is being used to its potential or if it is able to handle many more connections before getting saturated.
This can be called using sar -n DEV 1
Detailed Analysis
Once we have a better idea of what's malfunctioning, we can start digging deeper into the metrics. As|  | 
| The root of so many problems | 
The sysstat packages offers numerous performance monitoring tools - install it using sudo apt-get install sysstat.
pidstat
pidstat is quite similar to top in the sense that it offers an overview of top processes and their related metrics. The main difference is that this command will keep writing to the output rather than refreshing the list - making it easier to keep outputs in a file or try to find out patterns.To invoke pidstat and keep a rolling output, simply pass pidstat 1.
free
This will not set anything free, but only show some number on how much memory is free. This is the one exception that is actually available out of the box rather than requiring an extra package. free displays some numbers on memory usage however it might be confusing to new comers or users who are typically accustomed to total = free + used. In the case of Linux, a considerable part of memory is used by the cache when not being used by any applications. This helps the system open up files from disk much more quickly.As a result, the free command will display another column and show that there is actually very little memory that is free. This can also be seen in the dstat output where the total RAM available is calculated by adding all 4 columns rather than just 2. The cache is quite flexible and will be cleared as soon as more memory is required by processes, meaning that practical free RAM is equal to the free + cached memory. To run this command, simply pass free -m. You may pass zero parameters to get the values in kilobytes, -m for megabytes and -g for gigabytes.
mpstat
On multiprocessor systems, it is vital to monitor each processor utilisation when things get ugly. Sometimes you may note that one CPU is handling all the work while the others are basking in its heat. This is a bad sign indicating that some process is not handling multiple processors correctly and, worse, hogging one of them to unusability. The great mpstat command will show a breakdown of the utilisation of each CPU on your system.Similar to pidstat, this is from the sysstat package and may be used to produce a rolling output. An excellent way to relate CPU usage to slow IO is the iowait column. The lower this is the greater the efficiency, since it means that the CPU is actually doing work rather than waiting uselessly.
Using this command is simple: mpstat -P ALL 1
Do use this command when things are getting slow since it may quickly lead to either a problem in IO or simply an improperly configured application.
iostat
In case of slow IO identified from mpstat, iostat will provide further details on what device is functioning slowly. This utility shows which devices are being used at an instant and their utilisation. Ideal utilisation is below 60%, otherwise it is likely that it is being saturated. This mostly applies to physical block devices - a virtual device that maps to multiple physical ones may simply be used heavily while the physical backend may be capable of handling much more load (i.e. thing are working quite efficiently).On relatively basic systems though, a high utilisation which is accompanied with a high iowait is very much likely a case of very bad IO performance. I noted that (unsurprisingly) SSDs will drop utilisation from 99% down to about 15%. SSDs may not always be available, however in my case I was able to map a region of memory as a filesystem. Of course, many cases will not have (or want) to be mapped to RAM, but finding a better physical device will most probably fix issues in this metric (or, if possible, implementing efficient buffers and writing to disk on separate threads).
In order to produce a nice rolling update, just issue: iostat -xz 1
sar
IO performance may suffer also on the network side. This however is less likely, at least from my experience, but also mostly because networks do not deal with any mechanical devices such as hard disks. It is also much more likely that an application is not correctly managing its network handling rather than a slow TCP stack or network card. Tools exist though that allow monitoring of network performance, one of which is sar, also available from the sysstat package.The amount of data going through each network interface can be monitored in, again, a rolling output. This may be useful to check if the NIC is being used to its potential or if it is able to handle many more connections before getting saturated.
This can be called using sar -n DEV 1
 














