James Farrugia's Blog: 2015

03 December, 2015

Linux Performance Monitoring and Tuning

A few week ago I went over an issue I faced when deploying a number of VMs on a Linux OS. I did find a solution to my issues which was thanks to the open nature of Linux itself, however I promised myself to learn more about performance monitoring and to write about it. Today I feel much more comfortable analysing and dealing with issues that come up and this list of utilities helped me tune the performance of my systems.

Just like KVM, a virtualisation solution available directly on the Linux Kernel, numerous tools exist right out of the box on many Linux distributions that help one monitor and tune performance. When these are not enough, a simple package installation will make available even more powerful tools.

When dealing with performance issues we would typically look at CPU usage, memory consumption, and disk and network utilisation.

The CPU is by far the fastest component in your system. In order to make the most efficient use of the system we would need to have it at a high usage percentage (without saturating it, of course). If we are running some heavy load service and it is performing badly while the CPU is sitting happily at 4% than something else must be going very wrong. I'll go through some commands and methods that I came across that helped detect and solve some severe performance issues.

Pedal to the metal

General overview

First things first - before digging into configuration files and what not, we should get a general overview of what is happening our system. I would generally start with some tools that provide a good context of the processes, such as a list of processes and respective resource utilisation, general memory availability and overall system load.

top

The top utility quickly gives a good indication of which processes are using too many resources. It is good to note that top shows CPU usage as a percentage of all processing capabilities; if you have 4 CPUs, 99% CPU usage means that your consumption is about 25% of all available processing power. We'll get into this later on when we see how much each CPU is being utilised.

Top can be easily brougt up by typing in just top. Various other parameters may be used for finer control. These can be seen by passing in man top. When done, hit q to exit.

dstat

Similar to top but this is system focused rather than process oriented. It shows general metric on CPU, disk, network and memory. This utility is extremely extensible, with plugins enabling many more features, even integrating with MySQL, for example. Despite these features, simply firing up dstat with zero arguments is enough to provide a good overview. A favourite config of mine is the one below which I found very useful when analysing my VM problem. It highlights the top CPU consuming process as well as the blocking IO (which is typically slo-o-ow) . Additionally, I also get some nice metrics on memory usage, including buffers and caches.

Errors

Gotta catch 'em all

Performance may not always suffer because the hardware is not able to keep up. Application errors may be causing software to underperform while unfortunately not making evident that something is going wrong. Applications will typically log any errors or problems they encounter, but where are they stored?

/var/log/

On Debian or Ubuntu based distributions it is normal to find application and daemon logs in this directory. Navigating to it and listing all the files (and grepping the result) will probably yield the log files or your underperforming service. In this example, I simply list all the files in this log directory and look for the apt directory - a trivial command which may be easily extended. MySQL for example has a slow query log file, which may be useful in case of a slow database. It is a matter of simply opening up the log file and looking for warnings and errors to find a potential problem.

dmesg | tail

Another great command is dmesg. This is just like calling cat and a log file, however this is by default callable from anywhere just like a normal command. What dmesg does is list the log messages from the kernel. Deamons will typically log messages that are accessible form this command. For example, if we misconfigure nginx and try to launch the service, any errors will be logged here. To bring up the last few lines of the log (which can be very long), simply pass the output to the tail command. The syntax is simply dmesg | tail.

Detailed Analysis

Once we have a better idea of what's malfunctioning, we can start digging deeper into the metrics. As

The root of so many problems

mentioned earlier on, the general areas are the CPU, memory, and IO. From this point, it is better to make use of an other package not typically available out of the box. I'll deal with Debian based distributions (Ubuntu, Mint, Elementary, etc.), however such packages are available on others via their respective package managers.

The sysstat packages offers numerous performance monitoring tools - install it using sudo apt-get install sysstat.

pidstat

pidstat is quite similar to top in the sense that it offers an overview of top processes and their related metrics. The main difference is that this command will keep writing to the output rather than refreshing the list - making it easier to keep outputs in a file or try to find out patterns.

To invoke pidstat and keep a rolling output, simply pass pidstat 1.

free

This will not set anything free, but only show some number on how much memory is free. This is the one exception that is actually available out of the box rather than requiring an extra package. free displays some numbers on memory usage however it might be confusing to new comers or users who are typically accustomed to total = free + used. In the case of Linux, a considerable part of memory is used by the cache when not being used by any applications. This helps the system open up files from disk much more quickly.

As a result, the free command will display another column and show that there is actually very little memory that is free. This can also be seen in the dstat output where the total RAM available is calculated by adding all 4 columns rather than just 2. The cache is quite flexible and will be cleared as soon as more memory is required by processes, meaning that practical free RAM is equal to the free + cached memory. To run this command, simply pass free -m. You may pass zero parameters to get the values in kilobytes, -m for megabytes and -g for gigabytes.

mpstat

On multiprocessor systems, it is vital to monitor each processor utilisation when things get ugly. Sometimes you may note that one CPU is handling all the work while the others are basking in its heat. This is a bad sign indicating that some process is not handling multiple processors correctly and, worse, hogging one of them to unusability. The great mpstat command will show a breakdown of the utilisation of each CPU on your system.

Similar to pidstat, this is from the sysstat package and may be used to produce a rolling output. An excellent way to relate CPU usage to slow IO is the iowait column. The lower this is the greater the efficiency, since it means that the CPU is actually doing work rather than waiting uselessly.

Using this command is simple: mpstat -P ALL 1

Do use this command when things are getting slow since it may quickly lead to either a problem in IO or simply an improperly configured application.

iostat

In case of slow IO identified from mpstat, iostat will provide further details on what device is functioning slowly. This utility shows which devices are being used at an instant and their utilisation. Ideal utilisation is below 60%, otherwise it is likely that it is being saturated. This mostly applies to physical block devices - a virtual device that maps to multiple physical ones may simply be used heavily while the physical backend may be capable of handling much more load (i.e. thing are working quite efficiently).

On relatively basic systems though, a high utilisation which is accompanied with a high iowait is very much likely a case of very bad IO performance. I noted that (unsurprisingly) SSDs will drop utilisation from 99% down to about 15%. SSDs may not always be available, however in my case I was able to map a region of memory as a filesystem. Of course, many cases will not have (or want) to be mapped to RAM, but finding a better physical device will most probably fix issues in this metric (or, if possible, implementing efficient buffers and writing to disk on separate threads).

In order to produce a nice rolling update, just issue: iostat -xz 1

sar

IO performance may suffer also on the network side. This however is less likely, at least from my experience, but also mostly because networks do not deal with any mechanical devices such as hard disks. It is also much more likely that an application is not correctly managing its network handling rather than a slow TCP stack or network card. Tools exist though that allow monitoring of network performance, one of which is sar, also available from the sysstat package.

The amount of data going through each network interface can be monitored in, again, a rolling output. This may be useful to check if the NIC is being used to its potential or if it is able to handle many more connections before getting saturated.

This can be called using sar -n DEV 1

Conclusion

This post was probably no revelation for many, however it can be a good starting point if you're feeling lost in a world of tools and sometimes weird commands. Linux offers numerous metrics which many utilities use to provide a good picture of the system's efficiency. This list of utilities is by no means exhaustive - it is simply a collection of utilities that I used along the way and found useful when performing load testing on various servers. Feel free to comment on any other tools, suggestions or even corrections.

16 October, 2015

Java Programming Tutorial - Unit 2 - Methods and Variables

Let's start from where Unit 1 left off. During the first steps, we instructed our computer to write "Hello World!" to the display. Through that unit, we went through quite some material, despite it being so simple. In this unit, we'll cover more practical points rather than theoretical ones, so get ready to write some more code this time!

Variables

So, what is a variable? As the name clearly indicates, it is something that varies. A variable is just a label which you can use to store something. Let's say we want to store our user's name, we create a new label named username and assign the user's input to this label. A user types in their name and we instruct the computer to store the input somewhere which can be addressed using the word "username".

To better understand and appreciate how useful that little label is, try imagining having hundreds of such variables all without a human-friendly name. You d not need to go far, older languages had no such concept and used exclusively memory addresses.

With this knowledge, you can now think of your computer's memory as being a large room full of P.O. Boxes. Each P.O. Box may be referred to by its number. In modern languages you can give each P.O. Box its own unique name too, so it's easier for you to know what you're working with.

The next bit is theoretical, however its good to know about variable terminology.
Java is known as being strongly-typed. This strongly named description simply means that each variable can have one type, and one type only. If we declared our username variable as being of type text, it can only contain text. If we had another one for storing a number, it can only store a number. It's a restriction, but it's convenient. There are language that have variables whose types change during runtime, or weakly-typed. It's convenient too, but its easier to shoot yourself in the foot if you're new to it.

Let's make use of a variable in a more practical example. In this task, we want our text to be defined as a variable, rather than passing a direct value to System.out.println.

As you can see, the change is minor. In the new line, the only thing which may be new is the String label. A String is simply a series of characters; it is a type of variable which you'll find in the vast majority of programming languages.

Variable types

Now that you have declared your first variable, and hopefully got to understand the relation between the type of the variable and the content it stores, it is safe to introduce the list of primitive types in Java. As you now know, Java is object oriented and everything is defined as a class. This implies that every instance in our program is an object. However, this is not entirely true, since objects need to be made up of something. If we keep going deeper into what constitutes and object, we find that there are only 8 primitive types. Each primitive type is made up of some number of bits. These are as follows; afterwards we'll go through them:

boolean - no specific number of bits, but practically 1
byte - 8 bits
char - 16 bits
short - 16 bits
integer - 32 bits
long - 64 bits
float - 32 bits
double - 64 bits

As you can see (assuming you're familiar with bits, the basic units of information), all types are practically increasing sizes of numbers. No letters, no images, nothing but numbers. Later I'll explain how everything can be made from these primitives, but first let's see how we can organise them into roughly three categories.

First we have the boolean type. This can have just two values, 1 or 0. Effectively we use true or false in Java, and is mostly used for setting states and flags.

Next come the natural numbers. All primitives from the byte to the long can fall under this category. Values stored by these types cannot have any values after the decimal point. One thing to note about the char type is that it does store a numeric value, however it is treated as a character. Note also that it is a 16-bit unicode.

Finally we have the real numbers; the float and double. Double, as the name implies, is just double the size of a float. It is usually much more practical to work with a double unless you're working on a high performance system where memory is precious (not all systems have gigabytes of memory to waste).

Primitive types can be easily declared or have a value assigned to them. If you want an integer with a value of 10, simply enter:

int myNumber = 10;

Composites

Now that you have the most granular types, it is possible to mix and match to create more complex types. A composite is basically another name for an object. The String type, for example, is a composite. In order to explain this composite, we need to introduce another programming term; arrays. An array is just a contiguous series of memory cells, each containing a value of the same type. Java has native support for arrays supports defining new arrays during runtime (older languages did not support this directly). The next snippet shows how we can use an array of characters to emulate a String, albeit in a less practical way.

Unlike primitives, composites, or the proper name, objects, are created using the new keyword. The declaration also follows this convention:
Type myTypeVariable = new Type();

As you can see, there is the type, the name, followed by the assignment to a new instance of the class (or type). Note though, that the String is an exceptional case in Java and can be declared like a primitive. This is only an exception and does not apply to any other class.

Probably the String is not enough, so let's go through some more examples. Let's say we want to show a picture. What constitutes a picture? Pixels, the number of pixels in width, and in height. Width and height are just numbers. The pixels are an array of the Pixel object (so we also have nested composites). And after that, what is in each pixel? Three values for the primitive colours Red, Green and Blue; again, three numbers.

Let's define our own Picture type. First, we need a Pixel. Then we'll create a Picture and we'll find its area. Using this area, we'll set the value of the pixels in our Picture, since initially this is null.

Now we'll create the program "body". The main class this time will create the image, the pixel array, and print out the area. Note how we concatenated the text and a variable using the '+' symbol. I'll explain the operators later on in this unit.

So you see, pretty much anything can be reduced to a number.

Operators

As I mentioned earlier, I'll give an introduction to operators. These are not so complex so there is not much else to learn about them.

Operators are the symbols used in code, such as the '+', '-', etc. The plus can be used for concatenating anything. For example, let's say we have variables a and b. a + b could mean the following:
If a and b are primitive, the result is a primitive. If any of a or b is not a primitive, the result is always a String representation.

Other operators are only reserved for primitives:

The minus '-' used to subract;
The star '*' is used to multiply;
The slash '/' is to divide. The value is rounded if not of types float or double;
The percent '%' used for obtaining the modulo;
The hat '^' is used for bitwise XOR;
The pipe '|' is used for bitwise OR;
The ampersand '&' is used for bitwise AND;
The exclamation mark '!' is used for NOT;
The greater than '>' and less than '<', for...well greater or less than;
The double greater and less than ('<<' and '>>') for bit shifting;

You shall not be using many of these in the early days. However you should be familiar with the computing terms used here (such as shifting and bitwise operations).

Methods

We mentioned something about methods during the first unit, mostly trying to relate them to the methods in your recipe books. This time, we shall add more methods to our little picture program. At first it might seem like overkill to have too many methods for a simple task, but as your project grows you'll come to appreciate shorter and more frequent methods.

So, the first task - adding new methods. But why, what are they going to do? Imagine we want our program to accept a user input. For this task, we'll use methods that were written by others - we'll be calling those methods. Afterwards we'll break up our program into smaller methods so that later on we can follow better programming practice. In this case we'll write our own methods too.

The program

Our next task will be to add on to the Hello World Picture program. This time the area will be calculated by the picture, rather than us having to calculate it in our main program. We'll also let the user specify the width and height of the picture. This user input will need some processing, as we shall see next.

First, we'll extend the Picture class so that it can support its own methods. We shall call this PictureExtended to avoid confusion for now.

Next we'll upgrade the main program. As you can see, it has many more methods and the functionality is more granular. If we had two pictures for example, we could still call the same createPicture, thus avoiding duplicate code.

Static vs not static

Note how we put static in front of methods in the main class, while we did not put any in the PictureExtended. Now that we do have some methods and classes, it is safe to explain it.

Static methods are those methods that can be called without having an instance of the enclosing class. For example, we never declare a new System or Stream class, but we call println on System.out variable (which is of type Stream). This is because it is declared static. However, we cannot call getArea() on PictureExtended by itself. We must have a new PictureExtended and place it in a variable. We are then able to call it from the variable.

This is basically the difference between static and non-static; if it is static, it can be called without an instance of the enclosing class; However, it cannot access the non-static members of the class. Let's say we make the getArea() static, in that case, we cannot access the width and height values of the PictureExtended instance.

Accepting user input

We are able to accept user input via the Scanner class. Again, this is just like System, a class already provided with Java (although we had to create a new Scanner, unlike System). Note how we passed System.in to it, telling it that we expect to receive input from the standard system input; the keyboard.

The difference from System is that Scanner resides in what is known as a package which is different from ours. We will go through packages in a later Unit, however note how we needed to import the class. The import statement has to be at the very top, outside the class declaration.

Conclusion

This was quite a long unit and covers quite a lot. We created new classes, instances of these classes, or objects, static methods and imported some other ones too. In the next units we shall go over further interesting bits of programming in Java, such as loops, cases and conditionals.

Other tutorials (which are just as good or better) may hold off explaining the details of classes and objects initially. I believe that this might send off the wrong message about Java. It is understandable that it is initially complicated, however it will embed the idea that in Java one should follow an object oriented methodology, otherwise the code will not be up to standard. Not that it is incorrect, but as projects grow, not following conventions will make Java very frustrating.

So, as a precaution, I'm giving out fairly detailed descriptions of why classes and objects before going further into the traditional loops and conditionals. Hopefully the descriptions coupled with the actual code will make it more natural.

Thank you!

10 October, 2015

Java Programming Tutorial - Unit 1 - Hello World!

You've probably already seen this little "hello, world" thing somewhere on the Internet. It is the most popular phrase to write to the display when learning a new programming language and has been around since the 70's.

Before we start heading into the development part of this unit, we shall install what is known as the Java Development Kit (JDK). The JDK is an excellent set of of tools that includes compilers, runtimes, libraries - a lot of tools and buzzwords. It's enough to know that it is necessary if you plan on programming in Java.

Installing the JDK

A explained in Unit 0, the JDK is compiled for every platform and architecture. As a result, you'll have to select the JDK for your system. I would assume you are running Windows, but I'll also consider Linux. If this is already too much, just follow the steps which I label as for Windows (although I'd suggest you read up a bit on Operating Systems and general computing before proceeding).

Pre-flight checks

Before you go through a 100MB+ download, make sure you do not have a JDK installation already. To check if you do, follow these steps:

On Window (or if you're unsure) press the Windows key (the flag icon on your keyboard) and R simultaneously. The run dialog will open up and in it type cmd, which will open the console. On Linux or UNIX systems, open the terminal as specified in your distribution (I expect you know this by now).

In your console (whether it is Windows or Linux), enter javac -version. This is the command for the Java Compiler, so no there are no typos there. If you get some meaningful output (i.e. a version number such as javac 1.8.0_4), then you can skip this installation part. If you get an error on the lines of "not found", then you'll need to install the JDK.

Downloading the JDK

Selecting and download the JDK

The JDK setup is trivial; download it, and install it. That's all there's to it. So first of all, head to Oracle's website and select your version. If you're totally unsure, just select Windows x86. In case you're simply not sure if your system is 32 or 64 bit, do the following (if you system was made less than 4 years ago it's probably 64 bit):

Windows

Windows architecture

Right-click on "Computer"
Select "Properties"
Under "System", the "System type" tells you whether it is 64 or 32 bit (refer to image).

Linux/UNIX

Open terminal;
Type uname -i
If it is i586, i686, or any 86, then it's 32 bit. If it's 64 bit you'll get x64.

A note on Ubuntu and its derivatives

If you're working on a Ubuntu (or Mint, elementary or other derivatives) an excellent and very short guide can be found on webupd8. I suggest following that guide for JDK on Ubuntu.

GUI Installation

Once downloaded, it is only a matter of running it and clicking next, however this setup is unfortunately bundled with unwanted software too. So before hitting next, make sure you uncheck any field that tells you to install toolbars or whatever. These are absolutely unnecessary and are included only for marketing.

Installed

Now that the JDK is set up we're ready to do some work. Despite the elaborate system, the development kit is a very simple to install. From the installation comment, "The Java Standard Edition Development Kit includes both the runtime environment (Java Virtual Machine, Java platform classes and supporting files) and development tools (compilers, debuggers, tool libraries and other tools)" so as you see, its a great platform to work with.

Make sure everything is fine by again running javac -version. This should print out the version number of the Java compiler.

Hello, world!

Everything is now in place and all that remains is your first class! First what? Java is a pure object-oriented programming language (with brand new elements of functional features since Java 8). These are a lot of buzzwords for now, so I'll keep it simple and then elaborate on these as we go along in the series.

For now suffice it to say that everything you do in Java in contained in what is known as a class. If you're interested, read up on object oriented programming, but for the first few units we'll keep it low.

Our first program will look like the following. I'll explain each line afterwards.

Comments

The first thing to note is the fairly natural language in this snippet. This is the easiest concept to grasp. Comments in code help make your code more understandable and easy to follow. It is of utmost importance to document your code especially when your projects get larger. One day you'll just leave it out and when you look at your code after a month you'll regret it - so it's better to get used to it right now.

Comments can be identified by being wrapped between /* and */ (a single star, I'll explain the double ones in the code too). Alternatively, for a single line of code, it is enough to start it with //. Java is adamant about standard and correct coding and documentation, so much so that a particular category of comments are know as JavaDocs.

JavaDocs are basically the multi-line documentation blocks (those between /* and */) with a very small difference and requirement. JavaDocs have an extra * after the opening /* and are to be written in specific areas.

After defining the other basic parts of the code, I'll go into more details on JavaDoc. For now, it is fine to understand that we can write normal text in our code to help us understand what is happening. Note that the compiler will ignore your comments, so complaining in code is futile :P

Class

As explained earlier, everything in Java is defined as a class. Classes are a blueprint for an object in an object oriented system. For now, it is not that important however it is wise too keep this in mind.

In our case, we have just one class named HelloWorld. You may have noted the 'public' keyword. This will make more sense later on, so for now think of it as a requirement for your program to compile.

In Java the file must be named as the class, so our program here must be saved to a file named HelloWorld.java. This 'limit' actually makes thing much simpler - you don't have to remember a bunch of names for the same program.

Method

Methods are where we define functionality. Think of this as the method in your food recipe; it tells you how to put things together to get something done. In an object oriented system, objects (which are instances of classes, I'll explain this soon) are made up of values and methods. Methods operate on these values to return some other value. This concept will be explored in the next unit, however it is important to note that these values are called variables.

But let's get back to methods. In our case we have just one method, named main, and in it we explain what to do to print our "Hello World!". In Java a method named main is the primary starting point of the program. Think of programs as a water hose - the main method is the point at which the water starts flowing; the origin. This main method, though, has some extra details which we will include but will be explained later on in the series. As you can see, we started it again with public, followed by static and finally void. It also has a String args[] in the brackets. Let's dissect this declaration:

public static are keywords for the VM which will be explained later in the series
void is the return type. Remember from the definition we said that methods work on variables to return a value. In some cases, there is no value returned by the method after it runs. In those cases we say that the method does not return a value, so the declared return type is void. Return values, etc are not important for now, but I'm mentioning the terms so you can get used to such concepts in context.
main defines the name of method. For now it is best to use unique names, however as we'll see later on, we can use duplicate names with some limits. You'll use the name to call the method.
(String args[]) is defining the method as taking one parameter or argument. Arguments are given to your method when it is called. In the recipe book, think of it as the book instructing you to put 100g of flour in the bowl. The 100g is a parameter to the method "put flour". This defines context and extra information for the method to work on. The method can access the parameter as if it were a variable.

System.out.println

This might seem a bit complex but let's analyse it like we did for the main. It is good to go over this again later on after we cover more topics, so you can better 'get it'. It is OK and expected that you will not understand the specifics right away. However, we shall go through the line:

System is a class, just like our very own HelloWorld. This class is provided with Java, so we did not have to write it. There are various methods and variables in this class.
out is a variable, or member, defined in the class System. It is not important to know the specifics, but the name is indicative enough as pointing to the output of the program. So up till now we accesses the output of our program from the System class. Note that a variable can also be another class. What you need to recall now is that a variable is an instance of a class (i.e. an object, whereas the class is the type of the variable).
println is, finally, a method inside the out class. This is the one which does the writing, and as you can see, we gave it a parameter, which is the text to write.

As I said, try to get the idea, but for now we'll keep it very simple and it is enough to know that System.out.println("my text"); will print "my text" to the output.

Structure Summary

So let's wrap this up after which we'll compile and run our hello world!

Recall that in Java everything is a class. Each class defines methods and variables (we haven't used these yet). Variables can be other classes too. When we create an instance of the class (which we haven't yet neither), it is known as an object.

In our basic case, we have just one class named HelloWorld with a single method named main. Here's the pattern now: the JVM is calling HelloWorld.main(). It does this behind the scenes and as you can see it is identical to the way in which we called System.out.println(). The pattern is <class><dot><method>. It is possible and normal to have multiple classes in one call, such as the System.out.println, which has two classes.

Going back to the JavaDoc, you can now see how the @param args is referring to the parameter passed to main. So what we are doing in JavaDoc is explain the use of each parameter in a method. Note also how the JavaDoc blocks are explaining the building blocks; the classes and the methods that we define.

Again, it is not vital to know these details yet, however as we go along they will start making a lot more sense.

Compiling

Compilation is fairly straightforward in Java. Let's go back again on this process as explained in Unit 0. Compilation in Java converts our code into byte code. We'll use javac to accomplish this, after which we will run it using the java command which will fire up a JVM to execute the byte code.

So, in order to compile, open up your console or terminal and navigate to the directory which contains your HelloWorld.java. For example if it is at C:\Users\james\mycode, enter cd C:\Users\james\mycode. The same goes for Linux and UNIX systems.

Once inside the directory, enter javac HelloWorld.java

This will not do much other than compile your code silently and that's it unless you have some compilation errors. You should note a new file now, called HelloWorld.class. This is not a source file now but an executable for running in the JVM. That's all there's to compilation in Java, so now onto the best part of this unit - running the program.

Running

Running your program is even simpler than compiling it. All you need to do, in the same directory where you have the HelloWorld.class, is enter java HelloWorld in your console. Note that we do not add the .class extension. We are running the class, not the file per se.

If everything went well, you should see your first code running perfectly on your system, shouting Hello World! at you! Do not underestimate this simple code. It's where almost everyone began. It would be ideal to experiment a bit, that's the key for your success. Note my explanations, but doing extra reading will help you grasp concepts which you may not have correctly understood in your first reading.

Conclusion

This is your first step in a very long and never ending journey. Do not expect that a few years will go by and you'll be done learning. Programming is a very active field and it is best to keep looking for new concepts, languages, methodology, etc. But this is the vital first step.

The code for this unit may be found on the github repository.

Soon I'll be putting up Unit 2, where we shall be going into variables and more methods. Some text in this first hands-on unit may be disorientating for the absolute beginners, but do not give up. In a few weeks you'll be much more proficient in Java!

09 October, 2015

Java Programming Tutorial - Unit 0 - The Basics

Unit 0? If you're new to this world, it might seem odd, but you'll see why we start 0. If not, well you probably might skip this post for now.

If you decided to stay and read on, then welcome to the world of computer programing, where media reports are exaggerated and computers are very dumb ;)

A brief intro to programming

Cheesy Java code

Programming is "the act of instructing computers to perform tasks". Computers don't get it when you tell them "write text". What they do understand however is a series of bits, which ultimately lead to the text being written.

I'm not going into the great detail about the origins of programming, however the following is a very brief overview to put you in context.

The first computer programs consisted primarily of punch-card. These were the earliest forms of bits, holes to represent 'on' or '1' and solid wood for 'off' or '0'. As time went by, digital valves were used and nowadays we use semiconductors and integrated chips. The concept has remained the same though. Writing ones and zeros is of course complex, so much so that no one has ever actually programmed in ones and zeros. What they did do was devise a system to write meaningful text and then convert it to ones and zeros. For example, the earliest code, in Assembly Language, would have looked like this

It is complicated, unless you're an engineer in the 50's (although it's fair to note that it is still used today for very specific reasons). The development of programming languages was just like this exact case. A language becomes too unwieldy (projects gets larger and reaches the practical limitations of the language), so a new higher level language is created to cater for new features.

The next language then uses a single command to represent a group of lower level commands. As you might expect, it is vastly more complex than just wrapping the lower level, but you get the idea. A tool that converts the high level language to the lower level is called a compiler.

Once computers became mainstream, more and more different kinds of architectures were created. An architecture is a CPU design which usually has its own machine language (the way 1's and 0's are organised for it to understand). As each CPU typically understood different instruction sets, different compilers were written to "wrap" architecture-dependent code.

Now the next problem was that it's not as simple as writing the code once and compiling it for each architecture. The code usually had to be modified for each system it was intended to run on, so portability was lost. Having a large project would render this process impractical so a new language came along that tried to solve this problem once and for all.

Java

Java was a language developed in 1995 by what was once Sun Microsystems (now Oracle). What's interesting about Java is that it is much more than just a language. Java is a whole ecosystem, having the language syntax, compilers, environments, SDK and community.

Duke, the Java mascot

But how did it solve the portability issue? As I mentioned, it is a whole ecosystem so there is a more elaborate system at play. What Sun did was create a language that is then compiled to a byte code rather than machine code. This byte code is then executed by the Java Virtual Machine (so, yes, it's still machine code, but a different kind of machine). This JVM is the only part which is differently coded and compiled depending on the architecture.

So we now have a language which can be written and compiled just once and being confident that it will run on any kind of CPU as long as the JVM exists for that CPU. This VM is also known collectively as the Java Runtime Environment (JRE), of which the byte code interpreter plays a major role.

Along the years Java was prominent on the web in what are knowns as applets. Nowadays with the emergence of HTML5 and JavaScript (which has a relation of 0% to Java, so don't get confused), applets have become a thing of the past. Java is also popular on desktop applications, mobilephones, TV set top boxes, DVD players, etc.

Java today

Java has a very strong community and many standards go through what are known Java Specification Requests (JSRs ), much like Request For Comments (RFCs) if you're familiar with network protocols. Basically this is a process for definitions of ideas, standards, protocols, etc.

Through this process, Java has become arguably one of the top languages for high end websites (technically known as web apps). Twitter for example, runs on Java, so you get the idea of the strength of Java. Throughout this series we shall cover, quite in depth, how to write enterprise web applications in Java.

Java used to, and still does in a revived way, dominate the mobile aspect to. This sheer adaptability, from top range servers to mobile phones, without doubt, makes Java the most versatile language ever. In the early days of smartphones, Symbian was the king of mobile operating systems. It used to run a version of Java known as J2ME (Mobile Edition). Today the Operating System with the largest active user base is Android. Surprise surprise, apps written for this OS are also in Java and use almost the exact same tools - it's a bit more complex - but we'll see as we go along how seamless it is to adapt your Java code to it.

Next Steps

So now you have a very basic idea of what programming is and how Java relates to it. Of course, a lot more resources can be found elsewhere if you're interested in more history and details. Wikipedia is one of those sites so you can head over there to further whet your appetite.

Following this post we shall start off with a basic Java environment set up; from the quick installation to your first Hello World!

Tutorials on Java and related subjects

Like many others in the field of IT, specifically software development, my passion for coding started way before I even considered looking for a job. I was probably 13 or so when I wrote down my first few lines of code in Pascal and the moments of glory in class when my mates looked in awe at that spanking grey-on-black "Hello World!".

Could have launched a satellite with this!

Many things have changed since then, but coding has remained a central part of my life...mostly because it pays my bills (more than that too :) ). One thing which got me to this point is the Internet community. I wouldn't have been able to write the second line of code had it not been for that tutorial on some obscure website. Of course, teachers and lecturer played a big part - they are not to be underestimated. But once you're out the door, the tutorials on the Internet are your "only hope". This only hope, though, is a treasure trove full of resources, from zero to hero.

So, after all these years I now feel able, and willing, to contribute back to the awesome community. My experience is vastly Java and so I hope I shall contribute valuable information to those aspiring Java programmers. I shall start publishing a crash course in Java, starting with the venerable "Hello World!" and ending up who knows where. I'm aiming at enterprise Java, but we'll see. The target audience would be hobbyist, student, and even professional developers, but having of course a basic understanding of a computer.

Thanks for coming by and I hope to welcome you again for the Java Tutorial Series!

08 October, 2015

Accelerated Mobile Pages

Browsing the web from our phones is nowadays a common thing. In fact it is now likelier to browse from your phone than from a desktop computer. Personally, I find myself using a desktop browser only while I'm at work or while doing some desktopy thing (such as coding or messing with VMs and networks). If I'm just browsing during the evening, for instance, its 99% from my phone.

My preferred way of browsing is via the forum kind of applications, such as reddit or hacker news, so at that point I'm not really using a browser. However, the majority of the content is delivered from websites so you see and interesting title, tap on it, and the in-app browser or the main browser is opened. This typically works fine, until the site you're accessing is a megalith and takes tens of seconds to load. After at most 3 seconds, if barely any content has loaded, the link is forgotten and I move on the the next link. That's it.

The problem is that these websites are offering too many features for them to be practical on a smartphone. Sometimes websites take even longer because they need to load the comments section, then come the suggested posts, with ultra big resolution images, followed by the author's biography... It's unnecessary, I just want to see content.

A team of internet companies, including Google, have come up with Accelerated Mobile Pages (AMP). It is primarily a technological development (not exactly unheard of, as we'll see), but through its restrictions it tries to limit the amount of unnecessary crap on pages. As I said, it's a development, however much of this development is in terms of standards and rules rather than faster networks, or something like that.

In fact ,the focus is on basically banning a whole bunch of heavy and also some outdated HTML elements. Unsurprisingly, no more <applet>, no more <frame> and no more <embed>. There are also strict limitations on JavaScript, however the most surprising (but great) banned elements are <input> and <form> (with the exception of <button>). It may not directly impact immediate performance of HTML, but it will surely stop developers from adding useless "post a comment" forms.

The focus is primarily on immediate content. If I get a link while chatting and I open it up, I don't have more than 3 seconds to read the title and move back to the chat. Thankfully, on Android, this experience shall now improve with the new chrome tabs introduced in Marshmallow. It's a technical thing, but basically it avoids having to use either an in-app browser (which is isolated from your standard chrome) or opening up chrome (which is slow).

Chrome tabs are much faster, at least in this demo (via Ars Technica)

But let's get back to AMP. As I said, it is content that the majority wants, so in this age of platform webapps, single-page sites and all the rest, simplicity, again, trumps features. Despite the lack of features, static areas of a website are hugely important. If you're interested, here's a short how-to, however it is fair to note that static this time is mostly client side, so no JavaScript - which means you'll probably need server-side processing if you have "dynamic" content.

AMP avoids the common JavaScript the web is used to and realises the idea of Web Components. These do have JavaScript under the hood, but since they are managed differently, it makes the page load faster without synchronous blocks by JavaScript. AMP also restricts inline styling, conditional comments and some CSS attributes (although CSS is not so limited compared to JS).

As yet, (being days or hours since being announced) I personally do not consider this as a major breakthrough technologically - it's only a set of rules to reduce the bloat on webpages who primarily host content. However, I am very glad with the way things are going and I do hope it gains traction.

The benefits I see are greatly improved user experience with much faster load times and no nonsense web pages along with better development. The more modular the pages, due to web components, the easier it is to develop. There are no messy inline styles or randomly placed JavaScript. Things are put in their place and the rules are strict - otherwise you'll not qualify for AMP and your page won't make it to the top of search results.

Unfortunately, I don't have that much control on this blog, otherwise I would have AMP'd it right away!

For further details, there are quite some resources:

07 October, 2015

The Volatile Security of Volatile Memory

I forgot about yesterday...

It is the black box in every system, even our brain. Volatile memory goes by many names, working memory, temporary memory, RAM, even just memory. Whatever your preference, when you mention it you're most likely referring to the area of a system in which data is stored for a relatively short period of time until it is used and then discarded (or transferred to persistent storage). One cannot possibly imagine a system without some form of memory; even if it is the same are where it is stored permanently, there is still some area used for temporary calculations.

Among the major differences between RAM and persistent storage is that RAM typically contains data about the processes that are currently in execution along with the data we are working on right now and will be discarded soon (yes I hear your screams, persistent storage does that too, but it also has data that we haven't looked at for months). Along with this fact, hard disks enjoy the possibility of being totally encrypted. They cannot be read unless the key is provided. This is not possible in RAM, primarily because the CPU cannot work with encrypted commands.

I do not mean that the CPU is not able to process encrypted data and convert it to plain text, what I am referring to is the inability of the CPU to understand encrypted commands (opcodes) or work on the encrypted data as data rather than a decryption payload. Let's say we have the binary value of 13 = 1101 and we want to add that to 5=101. Our simple XOR encrypter will give us the values 0111 and 000 for the keys 1010 and 101 respectively. Adding 0111 and 000 does not give the actual result for 18=10010. The values have to be in plain text before actual processing. XOR is simple and integral to CPUs so it is the simplest operation for it to decrypt the values. Once decrypted it is then possible to add the values.

But here is the problem - where is the key stored? Of course, working memory. What is the point of encrypting the data in RAM when the key is in the same RAM? What is the point of encrypting RAM after all?

Boom!

We encrypt disks because they can be removed or because they are portable, yet still contain data, unlike RAM which hold it until we turn off the system (or a bit longer if you're into memory freezing and forensics). So, we think, RAM is inaccessible to would-be hackers. Or so we used to think.

Recent research by various people and organisations (Sophos, Brian Krebs, Volatility Labs, among others) have identified a simple and small malware that simply looks up processes, maps their memory regions, copy paste and onto the attackers server for them to enjoy. And by the way, the kind of data was not you're ex's text messages, but the PIN to you credit card, so it's a bit more expensive I would say.

Use only for great dinners.

I did my own research (and eventually BSc. thesis) on this subject, and it is quite scary knowing that the very heart of your system may be so easily compromised. What's worse is that when you enter your PIN into any other system on which you have no control...God knows what's running on them and where your data goes. Anti viruses barely have an idea how to capture such an attack, and neither do firewalls, internet protection or whatever you have. If they did, they would block your debugger too, because that's how it works - like a debugger. It's like a kitchen knife used for a murder - you cannot ban knives.

Here's a short and sweet step-by-step on how you can scrape your memory. It's not intended to attack anyone, and it wouldn't be easy any way. It's successful only if your target cannot protect their networks and you manage to get in. The sample was done on Linux; Windows would be totally different but still very possible (the Target attacks were in fact on Windows). So here it goes:

A dummy little program was written in C. All it did was store a username and password (entered using getpass() for increased security) along with a series of credit card numbers that are "swiped" into the system.

Swiping cards

We then find the PID of this running process just by ps aux | grep scrape (the program is named scrape, but it may be something like POSSwiper for example)

Getting the PID

Now we can get all the memory regions and maps used by our processes. The /proc directory gives us a hand there.

/proccing to analyse the memory

We are interested in the heap space of our program which shows up nicely in the fourth line; ranging from address 009580000 to 00979000 (both hex). Next thing we do is fire up the actual scraper (which is, in our case, a kitchen knife. A legitimate gdb debugger).

Dumping memory in just one line!

GDB shows a bunch of text; we're only interested in how we started it (gdb -pid <PID>) and how we stole the memory (dump memory <to where> 0x958000 0x979000) As you can see, using the exact heap space memory range we got from /proc. The memory will be dumped to the file we choose. Of course, this requires administrator rights, but as one might expect, tens and hundreds of POS devices will most likely share the same password, and will probably have the default one too (such a typical case of a security breach - I found the password to my ISP's router on a public forum...).

Now, onto the next step - the analysis, if you call it that. Data dumped into the file is from RAM, so as expected it is binary. Linux simplifies this analysis by providing another tools - strings. All it does is see what's in a file and spit our all the strings it could find. That's it, so we pass the dump to it and we get a nice list of string, including the password (you didn't see it in the first screenshot because of getpass()) and all the numbers and everything.

The gold mine

That is all. Now go and whitelist the list of processes on your system, before someone gets to scrape the memory off it.

06 October, 2015

Some points on the Android UI

Android is a great OS - there's no doubt about that, even if you measure that statement using the number of active installations. It has an interesting history, starting from plans to create an OS for digital cameras and ending up being Google's core mobile platform running on around a billion devices. It is technically well designed, open source and very adaptable; from CPU architectures to screen sizes, Android can adapt.

Progress of Android over the years (Ars Technica)

As Android progressed to meet the expected standards of the day, the general UI got more minimalist while more colours were introduced (older versions looked darker). Despite the move towards a more modern UI in general, it is still possible for application developers to apply their own style. A typical result of this support was that developers of older applications did not bother updating their styles to the latest version (this is basically an XML file).

What we ended up with is a FIAT 127 in 2015's motorshow.

Not quite in the same league

The problem with this situation is that not only we have to sometimes use outdated applications, but Google is also pushing a new 'UI language'. There is nothing wrong in having a new UI language...except when few developers are following it, and you're not one of them. If Android is to have a uniform, clean and modern UI, there should be a mechanism which automates the transition of styles to the latest standard in cases where the default file was left lying around. Automation is not uncommon on the Android ecosystem - Code is checked for potential errors, style files must be up to standard, even copyright issues are flagged by a bot - so why not a simple style file?

What's the deal with this UI (SIM Tool Kit)?

As I mentioned earlier, there is also another problem which Google does not seem to want to fix - Material Design. Consistency is key in product branding and Google is/was known for their efforts in this regard. The ubiquitous bar in all their web products and their logo in the exact same location made it clear that this is a Google product.

Nowadays, their Android apps are cacophony of UI element styles and whatnot. Despite their efforts to make Material Design the next standard, it's already been 2 years and I have no idea when this next will be now. It can be seen in some apps, such as the settings, and the major Google apps. However, applications such as the Google Analytics still sport the Android 4.3 UI. Even worse is the app for Blogger - with UI probably designed by the Romans.

Yet again, even though the apps follow the general material design, all of them seem to have a language of their own. One aspect which was recently highlighted was the lack of consistent scrollbars. Now we got a new scrollbar in the application launcher too, for diversity.

Google Calender app has to be one of my favourites. It's fast, visually appealing and above all, useful. I use it regularly to set appointments, reminders, etc. just like all other users. The problem with the whole Calendar ecosystem is the web version. Why has Google introduced the Material Design, implemented it correctly in Calender on Android yet left the web version in the dark while at the same time it developed the Inbox service with correct material guidelines on both Android and the web (I'm not discussing whether Inbox is practical or not)? I understand the drive towards mobile and I truly appreciate the improved UI on mobile, but I'm not in favour of sheer inconsistency (and then again, there are web versions better than their apps).

Yes this was quite a rant - not really helpful for many. But it gets frustrating when you're working on your services and try to follow as many guidelines as possible to make your users happy. Thankfully apps are not accepted or rejected based on their looks and interaction, although sometimes I do favour such a system as it does improve the users' mobile experience.