03 December, 2015

Linux Performance Monitoring and Tuning

A few week ago I went over an issue I faced when deploying a number of VMs on a Linux OS.  I did find a solution to my issues which was thanks to the open nature of Linux itself, however I promised myself to learn more about performance monitoring and to write about it.  Today I feel much more comfortable analysing and dealing with issues that come up and this list of utilities helped me tune the performance of my systems.

Just like KVM, a virtualisation solution available directly on the Linux Kernel, numerous tools exist right out of the box on many Linux distributions that help one monitor and tune performance.  When these are not enough, a simple package installation will make available even more powerful tools.

When dealing with performance issues we would typically look at CPU usage, memory consumption, and disk and network utilisation. 

The CPU is by far the fastest component in your system.  In order to make the most efficient use of the system we would need to have it at a high usage percentage (without saturating it, of course).  If we are running some heavy load service and it is performing badly while the CPU is sitting happily at 4% than something else must be going very wrong.  I'll go through some commands and methods that I came across that helped detect and solve some severe performance issues.

Pedal to the metal

General overview

First things first - before digging into configuration files and what not, we should get a general overview of what is happening our system.  I would generally start with some tools that provide a good context of the processes, such as a list of processes and respective resource utilisation, general memory availability and overall system load.


The top utility quickly gives a good indication of which processes are using too many resources.  It is good to note that top shows CPU usage as a percentage of all processing capabilities; if you have 4 CPUs, 99% CPU usage means that your consumption is about 25% of all available processing power.  We'll get into this later on when we see how much each CPU is being utilised.

Top can be easily brougt up by typing in just top.  Various other parameters may be used for finer control.  These can be seen by passing in man top.  When done, hit q to exit.


Similar to top but this is system focused rather than process oriented.  It shows general metric on CPU, disk, network and memory.  This utility is extremely extensible, with plugins enabling many more features, even integrating with MySQL, for example.  Despite these features, simply firing up dstat with zero arguments is enough to provide a good overview.  A favourite config of mine is the one below which I found very useful when analysing my VM problem.  It highlights the top CPU consuming process as well as the blocking IO (which is typically slo-o-ow) .  Additionally, I also get some nice metrics on memory usage, including buffers and caches.


Gotta catch 'em all
Performance may not always suffer because the hardware is not able to keep up.  Application errors may be causing software to underperform while unfortunately not making evident that something is going wrong.  Applications will typically log any errors or problems they encounter, but where are they stored?


On Debian or Ubuntu based distributions it is normal to find application and daemon logs in this directory.  Navigating to it and listing all the files (and grepping the result) will probably yield the log files or your underperforming service.  In this example, I simply list all the files in this log directory and look for the apt directory - a trivial command which may be easily extended.  MySQL for example has a slow query log file, which may be useful in case of a slow database.  It is a matter of simply opening up the log file and looking for warnings and errors to find a potential problem.

dmesg | tail

Another great command is dmesg.  This is just like calling cat and a log file, however this is by default callable from anywhere just like a normal command.  What dmesg does is list the log messages from the kernel.  Deamons will typically log messages that are accessible form this command.  For example, if we misconfigure nginx and try to launch the service, any errors will be logged here.  To bring up the last few lines of the log (which can be very long), simply pass the output to the tail command.  The syntax is simply dmesg | tail.

Detailed Analysis

Once we have a better idea of what's malfunctioning, we can start digging deeper into the metrics.  As
The root of so many problems
mentioned earlier on, the general areas are the CPU, memory, and IO.  From this point, it is better to make use of an other package not typically available out of the box.  I'll deal with Debian based distributions (Ubuntu, Mint, Elementary, etc.), however such packages are available on others via their respective package managers.

The sysstat packages offers numerous performance monitoring tools - install it using sudo apt-get install sysstat.


pidstat is quite similar to top in the sense that it offers an overview of top processes  and their related metrics.  The main difference is that this command will keep writing to the output rather than refreshing the list - making it easier to keep outputs in a file or try to find out patterns.

To invoke pidstat and keep a rolling output, simply pass pidstat 1.


This will not set anything free, but only show some number on how much memory is free.  This is the one exception that is actually available out of the box rather than requiring an extra package.  free displays some numbers on memory usage however it might be confusing to new comers or users who are typically accustomed to total = free + used.  In the case of Linux, a considerable part of memory is used by the cache when not being used by any applications.  This helps the system open up files from disk much more quickly.

As a result, the free command will display another column and show that there is actually very little memory that is free.  This can also be seen in the dstat output where the total RAM available is calculated by adding all 4 columns rather than just 2.  The cache is quite flexible and will be cleared as soon as more memory is required by processes, meaning that practical free RAM is equal to the free + cached memory.  To run this command, simply pass free -m.  You may pass zero parameters to get the values in kilobytes, -m for megabytes and -g for gigabytes.


On multiprocessor systems, it is vital to monitor each processor utilisation when things get ugly.  Sometimes you may note that one CPU is handling all the work while the others are basking in its heat.  This is a bad sign indicating that some process is not handling multiple processors correctly and, worse, hogging one of them to unusability.  The great mpstat command will show a breakdown of the utilisation of each CPU on your system.

Similar to pidstat, this is from the sysstat package and may be used to produce a rolling output.  An excellent way to relate CPU usage to slow IO is the iowait column.  The lower this is the greater the efficiency, since it means that the CPU is actually doing work rather than waiting uselessly.

Using this command is simple: mpstat -P ALL 1

Do use this command when things are getting slow since it may quickly lead to either a problem in IO or simply an improperly configured application.


In case of slow IO identified from mpstat, iostat will provide further details on what device is functioning slowly.  This utility shows which devices are being used at an instant and their utilisation.  Ideal utilisation is below 60%, otherwise it is likely that it is being saturated.  This mostly applies to physical block devices - a virtual device that maps to multiple physical ones may simply be used heavily while the physical backend may be capable of handling much more load (i.e. thing are working quite efficiently).

On relatively basic systems though, a high utilisation which is accompanied with a high iowait is very much likely a case of very bad IO performance.  I noted that (unsurprisingly) SSDs will drop utilisation from 99% down to about 15%.  SSDs may not always be available, however in my case I was able to map a region of memory as a filesystem.  Of course, many cases will not have (or want) to be mapped to RAM, but finding a better physical device will most probably fix issues in this metric (or, if possible, implementing efficient buffers and writing to disk on separate threads).

In order to produce a nice rolling update, just issue: iostat -xz 1


IO performance may suffer also on the network side.  This however is less likely, at least from my experience, but also mostly because networks do not deal with any mechanical devices such as hard disks.  It is also much more likely that an application is not correctly managing its network handling rather than a slow TCP stack or network card.  Tools exist though that allow monitoring of network performance, one of which is sar, also available from the sysstat package.

The amount of data going through each network interface can be monitored in, again, a rolling output.  This may be useful to check if the NIC is being used to its potential or if it is able to handle many more connections before getting saturated.

This can be called using sar -n DEV 1


This post was probably no revelation for many, however it can be a good starting point if you're feeling lost in a world of tools and sometimes weird commands.  Linux offers numerous metrics which many utilities use to provide a good picture of the system's efficiency.  This list of utilities is by no means exhaustive - it is simply a collection of utilities that I used along the way and found useful when performing load testing on various servers.  Feel free to comment on any other tools, suggestions or even corrections.