Java Virtual Machine: Behind the Code

In the first part of this two-part explainer, Dmytro Vezhnin, CEO and co-founder at CodeGym, discussed the details of virtual machines’ workings. Here in part two, he takes us through the inner systems of Java Virtual Machine (JVM) and what makes it convenient to use.

JVM, the core of the Java ecosystem, creates a layer that lets you run your code, once written, on any platform that has a Java virtual machine. It’s very convenient. I have to admit.Â

The fact is that in many programming languages, for example, C and C ++, the code written by the programmer is first compiled into machine code for a specific platform. These languages â€‹â€‹are called compiled languages.

On the other hand, in languages â€‹â€‹such as JavaScript and Python, the computer executes instructions directly without the need for compilation. These languages â€‹â€‹are called interpreted.

Java Virtual Machine or JVM is Different

Java uses a combination of both methods. Java code is first compiled to byte code and generates a .class file. After this, the Java Virtual Machine interprets the class file for the underlying platform. It turns out that the same class file can be executed on any version of the JVM, on any platform and operating system.

JVM creates an isolated space on the host machine like regular virtual machines. This space can be used to run Java programs regardless of the computer’s platform or operating system.

Thus, the main purpose of the JVM is to run a Java application on any device or operating system. The second purpose of the JVM is to manage memory and optimize its use by the application. Now, let’s take a closer look at how the JVM works in terms of running programs.

.class File Structure

If we write an application and then compile it, the compiler will create a .class file and put all the information about our application that the JVM needs in there. What are we going to see inside? The file is divided into ten sections. Their sequence is strictly defined and determines the entire structure of the class file. Since the file’s content is binary, you can’t just read it. However, there are ways. For example, it can be opened with any hex editor. Well, to make it easier to read its contents, you can use the javap tool. It is part of the JDK. javap is a command line utility for reading .class files.

If you’re familiar with the command line, you can type such commands:

javap -v java.lang.Object

By default, only public member declarations are shown.Â

-p switch will add private methods and fields;

-v will output additional metadata;Â

-c will display the bytecode itself â€“ the compiled implementation of the methods.

If you’re writing code in an IDE like IntelliJ IDEA, you can look at the bytecode. To do this, go to the View â†’ Show Bytecode.

Now let’s write a simple application and compile it.

Here is our program to add two integers:

public class TestCode {
Â private static int sum (int a, int b){
Â Â Â return a+b;
Â }

Â public static void main(String[] args) {
Â Â Â String mySum = â€œMy sum = â€œ;
Â Â Â int myInt = sum (2,3);
Â Â Â System.out.print(mySum);
Â Â Â System.out.println(myInt);
Â }
}

Now let’s see what its .class-file looks like in hexadecimal:

Probably not very clear, right? Here is the same file opened in the IDE:

// class version 57.0 (57)
// access flags 0x21
public class TestCode {

Â // compiled from: TestCode.java

Â // access flags 0x1
Â public <init>()V
Â L0
Â Â LINENUMBER 1 L0
Â Â ALOAD 0
Â Â INVOKESPECIAL java/lang/Object.<init> ()V
Â Â RETURN
Â L1
Â Â LOCALVARIABLE this LTestCode; L0 L1 0
Â Â MAXSTACK = 1
Â Â MAXLOCALS = 1

Â // access flags 0xA
Â private static sum(II)I
Â L0
Â Â LINENUMBER 3 L0
Â Â ILOAD 0
Â Â ILOAD 1
Â Â IADD
Â Â IRETURN
Â L1
Â Â LOCALVARIABLE a I L0 L1 0
Â Â LOCALVARIABLE b I L0 L1 1
Â Â MAXSTACK = 2
Â Â MAXLOCALS = 2

Â // access flags 0x9
Â public static main([Ljava/lang/String;)V
Â L0
Â Â LINENUMBER 7 L0
Â Â LDC â€œMy sum = â€œ
Â Â ASTORE 1
Â L1
Â Â LINENUMBER 8 L1
Â Â ICONST_2
Â Â ICONST_3
Â Â INVOKESTATIC TestCode.sum (II)I
Â Â ISTORE 2
Â L2
Â Â LINENUMBER 9 L2
Â Â GETSTATIC java/lang/System.out : Ljava/io/PrintStream;
Â Â ALOAD 1
Â Â INVOKEVIRTUAL java/io/PrintStream.print (Ljava/lang/String;)V
Â L3
Â Â LINENUMBER 10 L3
Â Â GETSTATIC java/lang/System.out : Ljava/io/PrintStream;
Â Â ILOAD 2
Â Â INVOKEVIRTUAL java/io/PrintStream.println (I)V
Â L4
Â Â LINENUMBER 11 L4
Â Â RETURN
Â L5
Â Â LOCALVARIABLE args [Ljava/lang/String; L0 L5 0
Â Â LOCALVARIABLE mySum Ljava/lang/String; L1 L5 1
Â Â LOCALVARIABLE myInt I L2 L5 2
Â Â MAXSTACK = 2
Â Â MAXLOCALS = 3
}

Of course, it’s much simpler this way. It looks somewhat like an assembler. An experienced programmer can roughly understand what is written here. Let’s assume that aload is clearly a load command, and astore is something that needs to be stored somewhere. But it will still not be easy for those who see it for the first time.

We are not going to delve into the structure of such files, but it is quite possible to learn how to read them. To do this, you first need to study the general structure of the .class file, below is an excerpt from documentationOpens a new window .Â Â

ClassFile {

Â Â Â Â u4 Â Â Â Â Â Â magic;

Â Â Â Â u2 Â Â Â Â Â Â minor_version;

Â Â Â Â u2 Â Â Â Â Â Â major_version;

Â Â Â Â u2 Â Â Â Â Â Â constant_pool_count;

Â Â Â Â cp_infoÂ Â Â Â constant_pool[constant_pool_count-1];

Â Â Â Â u2 Â Â Â Â Â Â access_flags;

Â Â Â Â u2 Â Â Â Â Â Â this_class;

Â Â Â Â u2 Â Â Â Â Â Â super_class;

Â Â Â Â u2 Â Â Â Â Â Â interfaces_count;

Â Â Â Â u2 Â Â Â Â Â Â interfaces[interfaces_count];

Â Â Â Â u2 Â Â Â Â Â Â fields_count;

Â Â Â Â field_info Â Â fields[fields_count];

Â Â Â Â u2 Â Â Â Â Â Â methods_count;

Â Â Â Â method_infoÂ Â methods[methods_count];

Â Â Â Â u2 Â Â Â Â Â Â attributes_count;

Â Â Â Â attribute_info attributes[attributes_count];

}

Here u1, u2 and u4 are the size of the fields in bytes.

Magic is a magical constant: If you look at the first 4 bytes of our file, you will see: 0xCAFEBABE. This number is present in each class and is a mandatory flag for the JVM: the system sees it and understands there is a .class file.

Minor_version, major_version define version of the .class file format: It depends on the installed JDK version. For Java version 5 and above, the formula Major-44 = Java version works. In my case we see 57 â€“ 44 = 13. In the hex file you see 39, which is equal to decimal 57.

Constant_pool_counter and constant_pool: From the ninth byte comes the constant pool, which contains all the constants of our class. Since there can be a different number of them in each class, the Constant_pool_count variable is in front of the array, indicating its length. That is, the constant pool is an array of variable length. Each constant occupies one element in the array. Throughout the class file, constants are specified by an integer index indicating their position in the array. The initial constant has index 1, the second constant has index 2, and so on. Each element of the constant pool begins with a one-byte tag that specifies its type. This allows the JVM to figure out how to handle the next constant properly.

Access_flags: a set of flags (public, abstract, enum, etc.) After reading the block with constants, the JVM moves on to the next two bytes â€“ access flags that determine whether this file describes a class or interface, public or abstract, and whether the class is final.

This_class: reference to the constant pool that defines this class.

Super_class: a permanent reference to the pool that defines the parent class.

Interfaces_count and interfaces: Interfaces are the number of interfaces the class implements and references to the constant pool for those interfaces. Since a class can inherit from many interfaces simultaneously, it is necessary to store an array of references to the constant pool. That is, the definition of the class and its parent class is followed by a number characterizing the size of the array of interfaces, and the array itself.

Fields_count and fields: Here you can find information about fields. This block starts with a two-byte parameter for the number of fields in this class or interface. An array of variable length structures follows it. Each structure contains information about one field: field name, type, value, if it is, for example, a final variable. The list displays only the fields that’ve been declared by the class or interface defined in the file. There are no fields from parent classes and implemented interfaces here, they are set in their class files.

Methods_count and methods: information about methods. Next, we move on to the most important place in any class â€“ its methods. All the logic of any program is concentrated in them, all the executable bytecode.

A variable length array contains structures that include a full description of the method signature: access modifiers, method name and its attributes, which also represent a structure, since there can be many of them, and each can belong to different types.

Attribute_count and attributes: Attribute information is crucial. The last block contains additional meta-information such as the name of the compiled file. It may or may not be present. In case of any problems, the JVM simply ignores this block.Â

Java Virtual Machine ArchitectureÂ

The JVM consists of three separate components:

class loader;
runtime memory/data area;
execution mechanism.

Class loader

Since we’ve already looked at the structure of class files, now let’s see how they are loaded into the JVM and then executed. Let’s say you compiled a program and got a class file with bytecode. Now the class file should be loaded into the JVM (main memory), and then the bytecode from this class should be executed.

Usually, the main() method is loaded first into memory. A part of the JVM, the loader, is responsible for loading the class. Sounds logical, doesn’t it? The class loading process consists of three steps: loading, linking, and initialization.

There are special loader classes for loading (because in Java everything is a class, isn’t it):

Bootstrap is a base loader that loads platform classes. This loader is the parent of all other classes and part of the platform.
Extension ClassLoader is an extension loader, a descendant of the Bootstrap loader. Loads extension classes that are located in the jre/lib/ext directory by default.
AppClassLoader is a system classpath classloader, a direct child of Extension ClassLoader. It loads classes from directories and jars specified by the CLASSPATH environment variable, the java.class.path system property, or the -classpath command line option.
Or just a programmer’s loader. An application can have its own loaders.

The system loader always loads the main application class, while various user loaders can load other classes. The Java Virtual Machine uses the ClassLoader class’s loadClass() method to load into memory. It tries to load a class based on its full name. If the loader can’t find the class, the child loader tries to do it, and so on through the inheritance chain. If the last child in the chain can’t load the class, then an exception is thrown; it throws a NoClassDefFoundError or ClassNotFoundException.

By the way, the loader name creates a unique namespace. That is, there can be several classes in the program with the same full name if different loaders processed them.

That is why each loader delegates its authority to the parent, that is, before looking for a class to load, it will try to find out if the required class has not been loaded before.

LinkingÂ

After loading the class, the linking stage begins, which is divided into three parts.

Bytecode verification: This is static code analysis, performed once per class. The system checks for errors in the bytecode. For example, it checks the correctness of instructions, stack overflows, and compatibility of variable types.

Memory allocation for static fields and their initialization.

Symbolic link resolution: The JVM replaces symbolic memory references with the original references from Method Area. In most cases, this happens lazily, that is, the first time the class is accessed.

Initialization

This involves executing the initialization method of a class or interface (known as <clinit>). This may include calling the class constructor, executing the static block, and assigning values â€‹â€‹to all static variables. Here we go, it’s the final stage of class loading.

Runtime Data Area

The JVM runtime data area consists of 5 parts:

Method scope

This storage area is for class member structures such as fields, method data, run-time constant pool, field and method data, and method and constructor code.

A method scope is created when the virtual machine starts, and there is only one method scope per virtual machine. Java provides two main types of methods: instance methods and class methods. Instance methods use dynamic (late) binding, while class methods use static (early) binding.

The Java Virtual Machine invokes a class method, choosing it based on the type of object reference, which is always known at compile time. On the other hand, when a virtual machine calls an instance method, it chooses a method to call based on the actual class of the object, and it can only be known at runtime. That’s why different instructions are used to call methods: invokevirtual and invokestatic. These functions refer to the constant pool entry as the full path to the required function. The virtual machine removes the required number of variables from the stack and passes them to the method.

Heap area

The heap stores all objects and their corresponding instance variables. This is the runtime data area. Memory is allocated from it for all instances of classes and arrays. If we create a new class instance, it is loaded into the heap area. The heap itself is created immediately when the virtual machine starts and remains the only one while running. Because of this, it is not thread-safe.

Stack area

As we said, the heap is created once. A new runtime stack is always created when a new thread is created. All local variables, method calls, and partial results are loaded into it. You’ve probably come across an error like StackOverflowError. So, it occurs when the stack size is insufficient to process the thread.

When a program calls a method, it creates a single entry in memory, the stack frame.

PC registers

Java supports multithreading. Each thread has its own program counter register to store the address of the currently executing JVM instruction. As soon as the instruction is executed, the register is reset to zero by the next instruction.

Stacks of native methods

Native methods are methods that are written in C or C++ (the same languages that the core JVM is written in). The JVM contains stacks that support such methods, with a separate stack of native methods allocated for each thread.

Execution system

So, the bytecode is created and loaded into RAM. Now it’s time to run the program, class by class. Before executing the program, the interpreter and the JIT compiler convert the bytecode into machine language instructions. The interpreter does this line by line. To speed up this process, a JIT compiler is connected to it. JIT starts working if the execution engine finds duplicate code. The JIT compiler then compiles the bytecode and changes it to native machine code. This native code is used directly for repeated method calls. Such a process improves system performance.

The garbage collector is also part of the runtime system. In some other languages, such as C++, it was up to the programmer to free memory from unreferenced objects. In Java, this is handled by the garbage collector (GC). The process of destroying unneeded objects from heap memory and making room for new objects, making memory use more efficient.

Garbage collection happens inside the running JVM. The GC first identifies unused objects in memory, then removes the objects identified in the previous step.

It is important to note that garbage collection is done automatically by the JVM at regular intervals and does not require separate processing. It can also be started by calling System.gc(), but it is not guaranteed to start.

The JVM supports various garbage collectors, and programmers sometimes provide their own. By the way, the JVMs themselves, like the brands of machines, are also different.

Our Future with Virtual Machines

Of course, here, we have taken a very superficial look at how the JVM works. Many programmers may ask why this is needed at all. However, I can confidently say that a developer who knows how his system works is much better at navigating internal processes and understanding why the code works one way or another. And if he understands, then he can optimize if he wants. In addition, the Java machine today has a whole pool of languages, such as Kotlin or Groovy, so knowing how it works will be useful not only for Java developers.

Was this two-part explainer on virtual machines and JVM helpful? Share with us on FacebookOpens a new window , TwitterOpens a new window , and LinkedInOpens a new window . We’d love to know!