In the last article, we discussed brainfuck and implemented a compiler + VM that can run brainfuck code.
Continuing the journey to run brainfuck on billions of devices, I will now explore the Java Virtual Machine (JVM).
To run brainfuck on the JVM, we must translate brainfuck commands into JVM bytecode. This can be achieved by generating Java source code and compiling it, or by using a tool that generates assembly-like code, which is then assembled into bytecode.
I chose a more direct route: generating JVM bytecode directly. Generating Java source code felt like cheating, and using an assembler would conceal details I wanted to examine.
To accomplish such a task, I need to understand the JVM bytecode format. If this were the first compiler ever written for the JVM, I would have to read through the JVM specification, implement, and discover things during the process. This would be a huge pinpoint task, but luckily, there are existing compilers already.
To get the JVM building blocks, I used the Java compiler (javac) to compile a couple of Java programs and examined the generated bytecode.
To examine the bytecode, I wrote a simple JavaScript (Node.js) script that parses .class files and disassembles them. After doing this, understanding the javap tool output was much easier.
I will not implement a full disassembler here; I will discuss the main parts of the class file format and JVM execution model. Only what’s necessary to understand how to generate bytecode for brainfuck.
The Java Virtual Machine
A Virtual Machine (VM) provides an abstraction layer between the program and the underlying hardware. It allows programs to run in a platform-independent manner.
It’s supposed to emulate a real chip and to do so it has to provide a set of instructions that can be executed, similar to what we did with our brainfuck VM (remember the brainfuckCPU function).
JVM is implemented as a stack-based machine, meaning that it uses a stack to hold intermediate values during computation. Instructions operate on the stack, pushing and popping values as needed.
But before discussing code execution, let’s discuss the structure of a class file and its patterns.
Class File Structure
To do our probe, let’s compile a simple Java program:
// file: Hello.java
public class Hello {
public static void main(String[] args) {
System.out.println("Hello, World!");
}
}Compiling it with javac Hello.java generates a Hello.class file. Examining this file with a hex editor shows its binary structure. Using the xxd command, we can see the hexadecimal representation of the file:
xxd Hello.class
00000000: cafe babe 0000 0044 001d 0a00 0200 0307 .......D........
00000010: 0004 0c00 0500 0601 0010 6a61 7661 2f6c ..........java/l
00000020: 616e 672f 4f62 6a65 6374 0100 063c 696e ang/Object...<in
00000030: 6974 3e01 0003 2829 5609 0008 0009 0700 it>...()V.......
00000040: 0a0c 000b 000c 0100 106a 6176 612f 6c61 .........java/la
00000050: 6e67 2f53 7973 7465 6d01 0003 6f75 7401 ng/System...out.
00000060: 0015 4c6a 6176 612f 696f 2f50 7269 6e74 ..Ljava/io/Print
00000070: 5374 7265 616d 3b08 000e 0100 0c48 656c Stream;......Hel
00000080: 6c6f 2077 6f72 6c64 210a 0010 0011 0700 lo world!.......
00000090: 120c 0013 0014 0100 136a 6176 612f 696f .........java/io
000000a0: 2f50 7269 6e74 5374 7265 616d 0100 0770 /PrintStream...p
000000b0: 7269 6e74 6c6e 0100 1528 4c6a 6176 612f rintln...(Ljava/
000000c0: 6c61 6e67 2f53 7472 696e 673b 2956 0700 lang/String;)V..
000000d0: 1601 0005 4865 6c6c 6f01 0004 436f 6465 ....Hello...Code
000000e0: 0100 0f4c 696e 654e 756d 6265 7254 6162 ...LineNumberTab
000000f0: 6c65 0100 046d 6169 6e01 0016 285b 4c6a le...main...([Lj
00000100: 6176 612f 6c61 6e67 2f53 7472 696e 673b ava/lang/String;
00000110: 2956 0100 0a53 6f75 7263 6546 696c 6501 )V...SourceFile.
00000120: 000a 4865 6c6c 6f2e 6a61 7661 0021 0015 ..Hello.java.!..
00000130: 0002 0000 0000 0002 0001 0005 0006 0001 ................
00000140: 0017 0000 001d 0001 0001 0000 0005 2ab7 ..............*.
00000150: 0001 b100 0000 0100 1800 0000 0600 0100 ................
00000160: 0000 0100 0900 1900 1a00 0100 1700 0000 ................
00000170: 2500 0200 0100 0000 09b2 0007 120d b600 %...............
00000180: 0fb1 0000 0001 0018 0000 000a 0002 0000 ................
00000190: 0003 0008 0004 0001 001b 0000 0002 001c ................It shows the binary data in hexadecimal format, which is not human-readable, except for some ASCII strings like Hello, World! (which is the string being printed), java/lang/Object, java/io/PrintStream, etc. You’ll understand these later.
This file obeys the Java Class File format specification.
ClassFile {
u4 magic;
u2 minor_version;
u2 major_version;
u2 constant_pool_count;
cp_info constant_pool[constant_pool_count-1];
u2 access_flags;
u2 this_class;
u2 super_class;
u2 interfaces_count;
u2 interfaces[interfaces_count];
u2 fields_count;
field_info fields[fields_count];
u2 methods_count;
method_info methods[methods_count];
u2 attributes_count;
attribute_info attributes[attributes_count];
}To parse this structure, you need to walk through the bytes, reading each field according to its type and size. The u1, u2, and u4 types represent unsigned integers of 1, 2, and 4 bytes, respectively.
xxd -s0 -l4 Hello.class
00000000: cafe babe ....
xxd -s4 -l2 Hello.class
00000004: 0000 ..
xxd -s6 -l2 Hello.class
00000006: 0044 .D
xxd -s8 -l2 Hello.class
00000008: 001d ..From the simple commands above, we could extract some important information:
- Magic Number (
0xcafebabe): identifies the file as a Java class file. - Minor Version (
0x0000): minor version of the class file format. - Major Version (
0x0044): 68 in decimal, this indicates this class file is compatible with Java 24) - Constant Pool Count (
0x001d): Number of entries in the constant pool plus one (29 in decimal, so there are 28 entries).
The constant pool count will be useful to parse the constant pool. That’s a thing that happens in binary files; its own values tell you how to parse it.
With this information, you could write a parser that reads the class file byte by byte, extracting each field according to its type and size. I did that, and recommend that you do the same exercise. But it’s not necessary. To get details of a class, you can use javap.
Running javap -v Hello.class disassembles the class file and provides a human-readable representation of its contents. It shows a bunch of information; I will focus on the most relevant parts for our understanding.
Constant Pool
The constant pool is a table of constants that are referenced by the bytecode instructions. It includes literals (like strings and numbers), class and method references, and other constants used in the class.
javap -v Hello.class
Constant pool:
#1 = Methodref #2.#3 // java/lang/Object."<init>":()V
#2 = Class #4 // java/lang/Object
#3 = NameAndType #5:#6 // "<init>":()V
#4 = Utf8 java/lang/Object
#5 = Utf8 <init>
#6 = Utf8 ()V
#7 = Fieldref #8.#9 // java/lang/System.out:Ljava/io/PrintStream;
#8 = Class #10 // java/lang/System
#9 = NameAndType #11:#12 // out:Ljava/io/PrintStream;
#10 = Utf8 java/lang/System
#11 = Utf8 out
#12 = Utf8 Ljava/io/PrintStream;
#13 = String #14 // Hello world!
#14 = Utf8 Hello world!
#15 = Methodref #16.#17 // java/io/PrintStream.println:(Ljava/lang/String;)V
#16 = Class #18 // java/io/PrintStream
#17 = NameAndType #19:#20 // println:(Ljava/lang/String;)V
#18 = Utf8 java/io/PrintStream
#19 = Utf8 println
#20 = Utf8 (Ljava/lang/String;)V
#21 = Class #22 // Hello
#22 = Utf8 Hello
#23 = Utf8 Code
#24 = Utf8 LineNumberTable
#25 = Utf8 main
#26 = Utf8 ([Ljava/lang/String;)V
#27 = Utf8 SourceFile
#28 = Utf8 Hello.javaThe string that is printed, Hello world!, is stored in the constant pool, in the entry #13.
#13 = String #14This entry is a String type, which references another entry #14, which is of type Utf8 and contains the actual string value. All entries in the constant pool are said to be constants (who would guess that?) and can be one of several types:
Constant_Class: Represents a class or interface.Constant_Fieldref: Represents a field in a class or interface.Constant_Methodref: Represents a method in a class or interface.Constant_InterfaceMethodref: Represents a method in an interface.Constant_String: Represents a string literal.Constant_NameAndType: Represents a field or method, including its name and descriptor.Constant_Utf8: Represents a string in UTF-8 encoding.
There are more types you can see at https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.4
Methods
Walking a little bit more in the file structure, we find the methods section. The structure of a method is:
method_info {
u2 access_flags;
u2 name_index;
u2 descriptor_index;
u2 attributes_count;
attribute_info attributes[attributes_count];
}javap prints it like this:
javap -v Hello.class
{
public Hello();
descriptor: ()V
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
Code:
0: getstatic #7 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #13 // String Hello world!
5: invokevirtual #15 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
}Our class has two methods, even though we only defined one (main). The other method is the constructor (<init>), which is automatically created by the compiler.
In some classes, you might also see a method called
<clinit>, which is a static initializer. This method is executed once when the class is loaded, and it’s used to initialize static fields.
Ref: https://docs.oracle.com/javase/specs/jvms/se14/html/jvms-2.html#jvms-2.9
Each method has a descriptor, which specifies its parameter types and return type. For example, the main method has
the descriptor ([Ljava/lang/String;)V and the constructor has the descriptor ()V.
Java method descriptors have the following format:
( parameter_descriptor* ) return_descriptorSome of the possible parameters and return descriptors are:
| FieldtTpe term | Type |
|---|---|
| B | Byte |
| C | char |
| I | int |
| Z | boolean |
| V | void (only for return type) |
| L fully-qualifiedclass; | reference to an object of the given class |
| [type | array of the given type |
Se more at https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.3
The ([Ljava/lang/String;)V descriptor means the method takes a single parameter, which is an array ([) of java.lang.String objects, and returns void.

It makes sense, the constructor is public Hello() and main is public static void main(String[] args).
Code
The Code_attribute is the most important attribute of a method, as it contains the actual bytecode instructions that the JVM executes. In the specification, it’s defined as shown below.
Code_attribute {
u2 attribute_name_index;
u4 attribute_length;
u2 max_stack;
u2 max_locals;
u4 code_length;
u1 code[code_length];
u2 exception_table_length;
{ u2 start_pc;
u2 end_pc;
u2 handler_pc;
u2 catch_type;
} exception_table[exception_table_length];
u2 attributes_count;
attribute_info attributes[attributes_count];
}In this article, we won’t go through the execution of the JVM instructions. For now, the only observation you can make is that it’s all based on references to the constant pool.
The code in the main method is:
0: getstatic #7 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #13 // String Hello world!
5: invokevirtual #15 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: returnNotice how each instruction references an entry in the constant pool. getstatic, for example, references entry #7, which is a Fieldref. This entry points to #8.#9, which are: java/lang/System and out:Ljava/io/PrintStream;, respectively.
This means that the getstatic instruction loads the static field out from the System class, a reference to the standard output stream.
It makes sense, the out field is defined in the System class as:
public final static PrintStream out = null;Conclusion
In this article, you touched the surface of the Java Class File format and the JVM execution model. You understood the constant pool, the basis of a class file. This is an important step toward generating JVM bytecode directly.
In the next article, I will discuss the JVM code execution model, stack-based VMs, and how to generate bytecode for brainfuck.
Stay tuned!
We want to work with you. Check out our Services page!

