13 December, 2011

Building Brainf**k Interpreter in Java

Excuse the title, but really, there is a (esoteric) programming language named just like that; "Brainf**k" (without  censorship).

So what is this BF language?  Basically from a coding point of view, it is a simple language made up of 8 commands.  That's right, eight.  Since technically all a CPU does is manipulate a value in a memory location and does some IO with it, the language can to a certain extent, do everything - a.k.a, Turing Complete.  OK theoretically I can fly if I attach a pair of wings to my arms, but it's not practical, just like BF.  Therefore I am in no way saying that it will practically do tasks; just theoretically, it can.


That really is the code.  Now you
know why it has got that name.
This language was designed by Urban Müller in 1993, just for fun.  It was not meant to be utilised and there are no real definitions, as numerous compilers and/or interpreters were built with some additional features.  Anyway, if you are interested in the language itself, rather than the interpreter, head to Wikipedia, you know it helps :P


Just some more details about the structure, and we can start.  Now, as I said, the CPU basically works by manipulating data in memory and performing some sort of IO, and BF does just that.  We have an array of memory cells, and a pointer.  Two commands move the pointer, another two increase or decrease the value of the current cell (the one we are pointing to), the next two print out the value or take in a value while the last pair is used for looping.


(btw, got this table from Wikipedia)
CharacterMeaning
>increment the data pointer (to point to the next cell to the right).
<decrement the data pointer (to point to the next cell to the left).
+increment (increase by one) the byte at the data pointer.
-decrement (decrease by one) the byte at the data pointer.
.output a character, the ASCII value of which being the byte at the data pointer.
,accept one byte of input, storing its value in the byte at the data pointer.
[if the byte at the data pointer is zero, then instead of moving the instruction pointer forward to the next command, jump it forward to the command after the matching] command*.
]if the byte at the data pointer is nonzero, then instead of moving the instruction pointer forward to the next command, jump it back to the command after thematching [ command*.

That sums up the features we need to support.  Next thing is to start with the basics.  We will need about 4 classes or so, not much.  Firstly we shall create the building blocks, and finally joined them using some main class.

So, the "building blocks" consist of a Cell, an Interpreter, a Utilities class, and the main class.
  • Cell : The cell is the core of the system.  We'll have a lot of these, possibly as many as the RAM can hold.  This class will be the simplest thing you'll ever code though;  it should only hold it's value, and have some quick methods such as incrementValue and decrementValue.
  • Interpreter : The language consists of a simple stream of commands, so all we need is to move one character at a time and interpret it's meaning.  Therefore what we'll do is build a class which should preferably take in a stream of characters (not an InputStream) and just work it's way to the end.  We'll be having a switch with some method calls that do the work.
  • Utilities : You might need a utilities class if, like me, you prefer to have static methods in one place.
  • Main : The main class is the, well , main entry point.  Should be called first and will display some nice messages insulting your brain.
Now, to the code.

Cell


package com.jf.bfc.objects;


public class Cell
{
private int value;

public Cell()
{
value = 0;
}

public void inc()
{
value ++;
}

public void dec()
{
value --;
}

public int val()
{
return value;
}

public void set(char c)
{
value = c;
}
}


I hope no one got confused.  You know, this is some über complex code we're dealing with here.  Really, it's self explanatory, but if any newbies are around, offering help is not a problem; just leave a comment or something.  One thing to note is that we are using "int".  This is preferable as it is easier to convert to ASCII and back again.  You might want to use long and UTF maybe, that's not the idea of this post.

Interpreter



package com.jf.bfc;

import java.util.ArrayList;
import java.util.Scanner;

import com.jf.bfc.objects.Cell;

public class Interpreter
{
private String program;
private ArrayList<Cell> memory;
private int memoryPointer;
private int progPointer;

public Interpreter(String program)
{
this.program = program;
this.memoryPointer = 0;
this.memory = new ArrayList<Cell>();
this.memory.add(new Cell()); //create an initial cell.
this.progPointer = 0;
}

public void interpret()
{
try
{
char [] progArr = program.toCharArray();
for (progPointer = 0; progPointer < progArr.length; progPointer ++)
{
switch(progArr[progPointer])
{
case '>' : pointerUp(); break;
case '<' : pointerDown(); break;
case '+' : cellValueUp(); break;
case '-' : cellValueDown(); break;
case '.' : out(); break;
case ',' : in(); break;
case '[' : subroutine(); break;
default : break;
}
}
}
catch (Exception e)
{
System.err.println("Interpretation failed at command " + progPointer);
System.err.println(e);
}
}


Now this is a bit more important.  Here we are setting up some variables and declaring the Interpreter class, along with the main "interpret" method.
  • Program is the String which contains any number of the 8 commands, i.e. the program stream.
  • Memory is an ArrayList of Cell objects which technically make up our memory.
  • MemoryPointer is pointing to the cell we are currently working with.
  • ProgPointer points to the last command we were interpreting before jumping in a loop.  This will be explained in more detail later on.
The Constructor is simply initialising the variables.  It is better this way rather than initialising them at the same time you are declaring them.  Firstly because the code will look cleaner with declarations in one place and initialisation in a method specifically made for initialisation, and secondly, performance is increased since the JVM will simply move down the constructor.  Declaring and initialising the variables at the same time makes the JVM move out of the method just to initialise the variable, wasting time and memory.  Now this is not vital, but it's interesting to know it, especially for limited memory or processing power environments.

Let's move on.  The interpret method, as described earlier, simply moves one character at at time, reading it, parsing and executing the required command.  As you can see, it is quite straight forward; just consider the String as a character array (which it actually is) and loop through, checking it using a switch statement, and calling the corresponding method.  If the code has some problems, we will simply tell the user that he or she has actually got an f'd up program.

pointerUp


The pointer up, as it is called, moves the pointer up (or left, depending on how you imagine the memory).


private void pointerUp()
{
if (memory.size() - 2 < memoryPointer)
memory.add(new Cell());

memoryPointer ++;
}


All we do is ensure we are not going beyond our memory limit, and then adding a new empty cell in our memory.  After that, we increase the pointer.  Basically the first two lines ensure we have enough space in memory, and adding a new cell is only required if there is no cell yet initialised in that area.

pointerDown


What goes up, must come down, and that is what we do to the pointer here.   The same idea as the pointerUp, but this time we are checking the bottom (or right side) of the memory.


private void pointerDown()
{
if (memoryPointer == 0)
return;
memoryPointer --;
}


cellValueUp & cellValueDown


Wondering around in memory space won't be getting us anywhere.  So our next step is to do something with the memory we have.  Actually, with the cell we have.  We can only manage a cell at a time.  Increasing the cell value will essentially change the sense and meaning of the memory as a whole.  We could either replace a single letter and change 'hello' to 'hellp' or even changing the value of an operator code and change from 1+1 to 1-1.  So anyway, this is the code we should have.



private void cellValueUp()
{
memory.get(memoryPointer).inc();
}


There you go.  Simply ask the cell, which is in a memory location referred to by pointer, to increase its value.  The same can be done for reducing the value.  This time, we'll be asking it to decrease its value.


private void cellValueDown()
{
memory.get(memoryPointer).dec();
}


In & Out


The next two methods are quite simple too.  They either output the value of the current cell, or just take in a value and set that as the value as the current cell.

The simpler function is the output.


private void out()
{
System.out.print((char)memory.get(memoryPointer).val());
}


Again, nothing much;  simply printing out the character whose code is the numerical value in the current cell.


private void in()
{
Scanner sc = new Scanner(System.in);
memory.get(memoryPointer).set(sc.next().charAt(0));
}

The in method is just a slightly bit more complicated.  What we do is initialise a Scanner, a simple class which can be easily used to read user input.  Then we put the value in the current cell.  Keep in mind though, we are dealing with a single cell, so we cannot store a string.  Therefore we shall get the first character and save only its value in the cell.

Subroutines


Basically we have covered the essential functions of BF.  You can build something out of those commands.  Loops however are essential if you plan to write a program and don't actually intend to follow the language's name.  So here we come to possibly the most difficult part of the interpreter.  Subroutines provide a practical way to code repetitive tasks but also present a challenge when you try to code some sort of loop handling system, just like what we shall be doing next.

Let's start off with the code.  It might look intimidating, but I'll go through it, so hold on tight.


private void subroutine() throws Exception
{
int stopPoint = memoryPointer;
StringBuffer subroutine = new StringBuffer();
int brakcetsTillEnd = 0;
progPointer ++; //skip the bracket


for (int x = progPointer; x < program.length(); x++)
{
char chr = program.charAt(x);
subroutine.append(chr);
if (chr == ']')
brakcetsTillEnd --;

if (brakcetsTillEnd == -1)
break;
progPointer ++;
}

if (brakcetsTillEnd > -1)
throw new Exception("Unclosed bracket!");


String originalCode = program;
program = subroutine.toString();
int stopCommandPoint = progPointer;
while (memory.get(memoryPointer).val() != 0)
{
interpret();
memoryPointer = stopPoint;
}
progPointer = stopCommandPoint;
program = originalCode;
}


Hmm, there you go, one chunk of seemingly kernel level code.
So, we start off by keeping track of where we are.  We declare a stopPoint which holds the location of the last command before the loop (the '[' command).  Next, the subroutine code is grouped into a single StringBuffer so that we sort of have a new program.  This is done by keeping track of all opening brackets, and ensuring that we stop when the closing bracket associated with the one we started with is encountered.  We then move the program pointer by one so that we skip the square bracket and start recording the subroutine code.  We also ensure that the bracket is closed by detecting if not enough closing brackets were encountered before the end of the whole code.

An important thing to do is store the position in the main code at which the subroutine ends.  We will need that to continue execution of the program after the subroutine is done with it's job.

After that, we back up the original code and set the program code as the one in the subroutine.  Basically what we have done is switch the code we are executing with the one in the subroutine while keeping the same memory and pointer.

If you read the documentation of BF, you would know that looping stops when the value of cell selected at the end of the subroutine is equal to zero.  What we do here is constantly check the current cell, and while it is not equal to zero, we call interpret which will automatically interpret and execute the subroutine code.  This is because, as you know, we have replaced the main code with the one in the subroutine.  Also we reset the memory pointer to where it was before entering the subroutine, every time we execute it.  Once execution of the subroutine is done, we can simply restore the original code and send the pointer to the end of the subroutine.

The main interpreter can simply continue working, calling functions as they come up.  This is practically the BF interpreter.  All we need now is the Utilities and Main class which do not much work directly related to BF, but will nonetheless be listed here.

Utils



package com.jf.bfc.objects;


import java.io.FileInputStream;
import java.io.IOException;
import java.util.Scanner;


public class Utils
{
/**
* Fetch the entire contents of a text file, and return it in a String.
* @param aFile is a file which already exists and can be read.
* @throws IOException 
*/
static public String readStringFromDisk(String str_file_name) throws IOException
{
StringBuilder ds_text = new StringBuilder();
String str_sep = System.getProperty("line.separator");
Scanner ds_scan = new Scanner(new FileInputStream(str_file_name), "UTF-8");
try
{
while (ds_scan.hasNextLine())
ds_text.append(ds_scan.nextLine() + str_sep);
}
finally
{
ds_scan.close();
}


return ds_text.toString();
}
}


As you can see, all we do is read a simple file from disk and get the String value stored in it.  We use this only to read a saved BF program.

Main


The main class is just one main method which displays a simple message and asks the user if he either prefers to open a pre-written BF program, or if he should write one in the console and just execute it immediately.  There are no features such as saving or debugging.  I might one day do something like that, but I'd rather have my own or at least a more humane language hehe.

So here comes Mr.Main.


package com.jf.bfc;


import java.io.IOException;
import java.util.Scanner;


import com.jf.bfc.objects.Utils;


public class BFC
{
public static void main(String[] args) throws IOException
{
System.out.println("Brainf**k Interpreter v0.1\n");


Scanner s = new Scanner(System.in);
String path = "about";

while (path.toLowerCase().equals("about"))
{
s = new Scanner(System.in);
s.useDelimiter("\n");
System.out.println("Created by James Farrugia, within an hour, while wondering how to solve other problems.");
System.out.println("Ended up thinking about Brainf*ck Plus, which might have some sort of String and function support;");
System.out.println("but I don't know about that.  You can contact me on jamsinux _at_ gmail.com, if you like.\n");
System.out.println("If you need help about Brainf*ck as a language, I know a guy, we call him Google, very helpful guy...\n\n");
System.out.println("Enter path to your program or type 'new' to write one now.  Type 'About' for some unnecessary info...");
path = s.next().trim();
}
String bfProgram = "";

if ("new".equals(path.toLowerCase()))
{
System.out.println("The console is your playing field, you imagination is the limit.");
System.out.println("You can starting f**king you brain:\n");
s = new Scanner(System.in);
s.useDelimiter("\n");
bfProgram = s.next();
}
else
{
try
{
bfProgram = Utils.readStringFromDisk(path);
}
catch (IOException e) {
System.err.println("That file does not exist, or it might have kind of some problems...");
}
}

System.out.println(bfProgram);

Interpreter intr = new Interpreter(bfProgram);

System.out.println("Interpreting...\n-------");

try
{
intr.interpret();
}
catch (Exception e) 
{
System.err.println("You seem to have blown up the VM...");
}

System.out.println("\n-------\nInterpreting complete.");
}
}


Obviously you can have your own main method, since as you can see we simply present the interpreter to the user using this method, and the interpreter is only initialised once, and asked to interpret some BF code.

Conclusion

There you go then.  A nice little post about writing an interpreter for a useless language, but just like this same language, it's a way to do some interesting stuff and yet surprisingly challenging.  you should try coding using BF yourself to see what I'm talking about.  Also, basing on this concept, one can move on and develop some sort of basic script.  That would be a good idea.

As usual the full code is available here.