14 December, 2011

Caching : The unsung hero of performance

It's not just abstract numbers after all :)
Many people tend to forget, or worse, ignore the fact that caching plays a vital role in many applications.  Why would they place a cache in the CPU for example?

This idea might come from the fact that in an age dominated by the web and fast internet access, some might think that cached pages are unnecessary.  True, at times you end up with a lot of space taken up by pages you visited only once.

But caching is more than just a copy of a page on your hard drive (or SD Card).  In this post I shall demonstrate how a terribly simple implementation of a cache in a Fibonacci algorithm will have a massive impact on performance.  Obviously one might not find many uses for Fibonacci numbers, but the idea can be used in many areas, such as crawlers, searching and environments where some value which is unknown but is somewhat constant is required frequently.

Many know what Fibonacci numbers are, so I won't be going into a lot of details about their algorithm, but don't worry though, it's very easy to implement it Java.  This time we'll be writing a single class, having only 3 methods.


import java.math.BigInteger;
import java.util.HashMap;


public class Fibonacci 
{
private static HashMap<Integer, BigInteger> cache = new HashMap<Integer, BigInteger>();

public static void main(String[] args) 
{
int v = 35;
if(args.length>0)
v = Integer.valueOf(args[0]).intValue();

System.out.println("Fibonacci efficiency tester for Fib("+v+").\n\n");

long n = System.currentTimeMillis();
System.out.println("Cached: "+fib(v, true));
long mst = (System.currentTimeMillis() - n);
System.out.println(getTimeFromMs(mst/1000));

n = System.currentTimeMillis();
System.out.println("Non-Cached: "+fib(v, false));
mst = (System.currentTimeMillis() - n);
System.out.println(getTimeFromMs(mst/1000));
}

Not much here.  Just declaring some variables and a main class.  The main class starts off by declaring the number of iterations we plan on doing.  In this case variable v takes care of that.  Don't set it too high, otherwise you might end up waiting 5 minutes until you get a result!  Next we check if we have any arguments;  if so, we assume that it is the number of iterations to perform, so we set v as that value.

Then we just start displaying the result messages.  As you can see we are measuring the time it takes for cached and non cached calculations.  I know I have created a benchmarking tool, but it's OK to use the normal system time here.

Now it's time to code the real Fibonacci method.


private static BigInteger fib(int f, boolean cached)
{
if(f<2) return new BigInteger(""+f);


if(cached)
{
if(!cache.containsKey(new Integer(f)))
{
BigInteger v  = (fib(f-2, cached)).add(fib(f-1, cached));


cache.put(new Integer(f), v);
}
else
{
return cache.get(new Integer(f));
}
}

BigInteger n1 = fib(f - 2, cached);
BigInteger n2 = fib(f - 1, cached);
return n1.add(n2);
}


What we are doing here is recursively calling the same method.  It is much cleaner than a loop, and anyway, a loop does not always suffice, such as in cases of crawling.  Before performing anything complicated, we check if the number is high enough to be calculated.  So if we have a 1 or 0, there's nothing much to do, so just return it.  Otherwise, we will perform the normal calculation.

We check if the caching is enabled, as this is all about caching after all, an then the calculation is performed.  So if we have caching enabled, we will first check if the cache contains the Fibonacci of the number we are currently analysing, and if it does, we are done, and return it.  Otherwise we will calculate it and cache it.  If caching is not enabled, the value is calculated every time.

We then write the usual method which cleanly shows the value in a more humane way :)


public static String getTimeFromMs(long ms)
{
if (ms < 1000)
return ms + "ms";
else if (ms >= 1000 && ms < 60000)
return (ms/1000) + "s " + getTimeFromMs(ms - ((ms/1000)*1000));
else if (ms >=60000 && ms < 3600000)
return (ms/60000) + "m " + getTimeFromMs(ms - ((ms/60000)*60000));
else return (ms/3600000) + "h " + getTimeFromMs(ms - ((ms/3600000)*3600000));
}


That is all basically.  You can now run this program and see for yourself the improvement which is gained through caching.  I have provided my results below;  naturally yours will have some variations, but there definitely will be a huge difference between the cached and non-cached runs.



Fibonacci efficiency tester for Fib(35).


Cached: 9227465
in 2ms
Non-Cached: 9227465
in 3s


Make no mistake. There is an 'm' in the cached, while there is not in the non-cached. That means we had a whole 3 second difference. Now, hopefully, you shall be writing more sensible code and freeing the CPU from eccessive and unnecessary work :)

The full code is available here.