An Era of Virtual Threads: Java

13 min readApr 11, 2023

Introduction

The concept of “Virtual threads” has gained considerable attention in recent times. Many programming languages are updating their thread libraries to support the Virtual Threading feature. Java introduces the virtual thread as a preview feature in the Java 19 release. This article provides a deep introduction to the thread from the basic to the in-depth level of it in order to express the benefit of virtual threads, and the advantages they offer compared to conventional thread creation methods.

A short introduction to Thread

A computer Program is a set of instructions to achieve a specific task. When you launch a program, the OS loads it into the main memory and allocates a designated space (known as address space) to store and execute its instructions. At this moment, it is known as a ‘Process’. In other words, a process is an instance of a program running on a computer.

A thread is a set of instructions within a process that can be executed independently by a core of the CPU. Multiple threads are created within a process, allowing for simultaneous executions of multiple tasks and better utilization of CPU resources, which increases throughput. For example, when you launch Google Chrome, OS creates a process for Google Chrome. You can do many things simultaneously, like look at a web page and download a file, because these functionalities run on separate threads. Thread is also called as the Lightweight process as it lives within a process and shares the same address space of the process.

Parallel vs Concurrent Execution

Parallel Execution: A computer executes multiple tasks at the same time. For example, suppose you have a 4-core CPU, you are running four different tasks on each core. Each task can run simultaneously.

Concurrent Execution: A computer creates an illusion of executing multiple tasks at the same time when the tasks are more than the CPU cores. For example, suppose you have a 4-core CPU, and you are executing 8 different tasks. Since you have only 4 cores, your OS will have to do context-switching to execute these 8 different tasks. Here the OS creates an illusion of executing 8 different tasks simultaneously. However, in reality, only 4 instructions can run at a time as we have only 4-cores.

Why Threads?

Let’s first understand how threads really improve the system's efficiency.

Suppose you have a computer with a CPU of 4 cores. You’re writing a program that computes the sum of two numbers, and run it on 12 threads as shown below.

class SumOfNums {

  static void sum() {
    int a = 1, b = 2;
    int sum = a + b;
    System.out.println(sum);
  }

  public static void main(String[] args) {
    for (int i = 1; i <= 12; i++) {
      Thread t = new Thread(SumOfNums::sum);
      t.start();
    }
  }
}

How many threads can run parallelly? Do all 12 threads run parallelly? Since we created 12 threads, does it mean that all 12 threads run parallelly? The answer is no. We have only 4 cores of CPU. It signifies that a maximum of 4 parallel execution can happen. Each thread has to be allocated to a core of the CPU for execution, as we have only 4 cores, and only 4 threads at a time can be run parallel. CPU does context-switching between these 12 threads to run them concurrently.

Then why do the applications have hundreds of threads? What’s the use of that? Why don’t we create the number of threads which is equal to the number of cores of the CPU? Let’s understand the reason below.

A task can be classified into two types: CPU-bound and IO bound.

CPU-Bound: When the execution of a task is highly dependent on the CPU, such as arithmetic, logical, relational operations etc., such tasks are known as CPU-Bound tasks.

IO-Bound: When the execution of a task is highly dependent on the Input/Output operations such as communication with a network, reading/writing a file in a file system etc, such tasks are known as IO-Bound tasks.

What type can we classify our above sum task into? we’re initiating two variables a and b, assigning values to them, adding those two numbers and assigning the result to a new variable sum and then printing it on the console. There are no input/output operations here. Hence, we can classify this sum task as a CPU-Bound task.

Typically, the tasks will be a mixture of both CPU and IO operations. For instance, consider a task that involves reading text files, computing the count of distinct words within it, and writing the result to another text file. In this case, reading the file is an IO operation, calculating the unique words is a CPU operation and writing the result to another file is an IO operation.

How does the number of threads affect the system's efficiency?

Consider our sum task, which heavily relies on the CPU. We execute it on 12 threads, allowing for concurrent execution. Internally, the CPU switches back and forth between these threads using its 4 cores. Without completing one thread, the CPU switches to another thread to provide concurrency.

Is it truly beneficial to run them on 12 threads? The answer is no. In this scenario, we’re wasting the CPU resource on frequent context-switching. For CPU-bound tasks, it is optimal to select the number of threads that aligns closely with the available cores, in order to maximize efficiency.

What about the Unique-words-calculation task?

Consider there are 20 input text files to read. CPU will be idle while reading the file. Because the file reading happens on the specific hardware (disk drives), when the file context is obtained, the CPU will perform the calculation on it and writes the output to another file. While writing also, the CPU will b idle, the operation performed by the disk drive.

Let’s say that, we’re running this on a single thread as illustrated below.

As illustrated, the CPU goes idle when the file reading and file writing operation takes place. This results in less than optimal utilization of its full processing capacity, i.e., not achieving 100% CPU usage.

Now, let's consider we’re running this using 4 threads, as earlier we have 4 cores of CPU. Let’s visualize this scenario in the figure below.

each thread is assigned to a dedicated core and operates concurrently. As a result, the throughput is enhanced compared to a single-threaded approach, as all 4 cores are utilized. However, it’s important to note that during I/O operations, the cores may still experience idle periods.

Will this improve efficiency if I increase the number of threads?

Let’s see what happens if we run this task on 8 threads.

Now, focusing on the first core, as Thread 1 executes, it may encounter an I/O operation causing the CPU to go idle. However, with multiple threads at hand, instead of remaining idle, the CPU swiftly switches to another thread and continues executing it until Thread 1 completes the I/O operation. This way, the full capacity of the CPU can be effectively utilized, optimizing performance with multiple threads.

So, in our first task, which is sum calculation, using more than 4 threads reduces the performance due to unnecessary context switching. But here, 8 threads lead to improvement in efficiency. What we’ve learned from this is, we have to choose the number of threads for a task wisely based on how long and how frequently the IO operations happen. The number of threads is directly proportional to the number of IO operations happening in the system. The more the IO, the more the number of threads will improve the efficiency.

So, in our initial task of sum calculation, using more than 4 threads actually hampered the performance due to needless context switching. That was because the task was highly CPU bound. However, in the current scenario, utilizing 8 threads results in enhanced efficiency. This highlights the importance of selecting the number of threads, considering the frequency and duration of IO operations. The number of threads should be directly proportional to the number of IO operations in the system. Higher IO demands to necessitate a greater number of threads to optimize efficiency.

How does Thread work internally?

Before we explore the Virtual threads, it’s important to know the types of threads and how they work internally.

There are two types of threads in modern operating systems: Kernel level Thread and User level Thread.

1. Kernel Thread

This is also known as OS Thread. Kernel Threads are managed and scheduled by the operating system kernel. Each thread is represented by a thread control block (TCB) in the kernel (similar to PCB for processes), which contains information about the thread’s state, priority, and other properties. Kernel threads are relatively heavyweight and require system calls to create, schedule, and synchronize.

2. User Thread

User-level threads are managed and scheduled by user-level thread libraries and do not require intervention from the operating system kernel. The kernel doesn’t aware of User threads. Each user-level thread is represented by a separate data structure in the application, which contains information about the thread’s state and properties. User threads are lightweight and faster to create and destroy than OS threads, but they suffer from certain limitations such as the inability to take advantage of multiple processors or cores.

In simpler terms, when a process starts, it creates a default thread to execute the application’s entry point (main method). Subsequently, the process can create additional threads as needed. The user thread cannot be directly executed, it must be mapped to a Kernel thread, so in the end, Kernel is the one which executes every instruction in the computer system. The mapping between the User thread and the Kernel thread can be one of the following three types.

M:1 model: all user threads are mapped to one kernel thread. The mapping is handled by a library scheduler.
1:1 model: each user thread is mapped to one kernel thread
M:N model: all user threads are mapped to a pool of kernel threads

Internal Threading model used in Java

Green Thread: Java employed the Green Thread threading model in its very earlier version. In this model, the threads are managed and scheduled directly by JVM. The green thread model uses M:1 thread mapping. Green threads are significantly faster than native threads. One problem that Java faced in this model was it could not scale over multiple processors, making Java unable to utilise multiple cores. It’s also challenging to implement green threads in libraries because they would need very low-level support to perform well. Later Java removed green threading and switched to native threading. That made Java Threads slower than green threads.

Native Thread: Since 1.2. Java stopped support for the Green thread and switched to the Native thread model. Native threads are managed by the JVM with the help of the underlying OS. Native threads are very efficient to run, but they have a high cost around starting and stopping them. This is why we’re using thread-pooling in current days. This model follows 1:1 thread mapping, where each Java thread is mapped to a distinct Kernel thread. When a Java thread is created, a corresponding native thread is created by the operating system to execute the thread’s code. Java has been following the native thread model ever since then and continues to do so to this day.

What’s wrong with the current Thread Model in Java?

In the earlier section, we understood that Java has been using the Native-thread model. Let’s see what’s wrong with this model.

Java thread library was written in the very earlier version of Java.
It’s a thin wrapper on the platform thread (aka Native Thread).
Native threads are very expensive to create and maintain.
Native threads need to store their call stack in memory. For this 2MB to 20MB (this number depends on the JVM and platform) is reserved upfront in memory. If you have 4GB of ram, you can create only nearly 200 threads, considering each thread takes 20MB of RAM.
Since the native thread is a system resource, it takes about 1 millisecond to launch a new native thread.
Context switching in the native threads is also expensive, as it requires a system call to the kernel.
These constraints limit the number of threads that can be created and may result in declined performance and increased memory usage. Hence we cannot create many threads.
We cannot scale our application by adding more threads, due to the context switching and their memory footprint, the cost of maintaining those threads is significant and affect the performance.

A Real-world example

Consider a web server deployed on a machine with 16GB of RAM. Like a typical web server, this web server uses the thread-per-request style, where, each user request is handled by a separate thread. With 16GB of RAM available, and assuming each thread requires 20MB of RAM, the system can accommodate up to 800 threads. In today’s world, backend APIs often involve tasks such as database operations or passing messages to other APIs via REST/SOAP calls. Hence, these systems are majorly IO-bound rather than CPU-bound.

Let’s assume that, for a single request, IO operations take 100 milliseconds, request processing takes 0.1 milliseconds and response processing also takes 0.1 milliseconds as shown below.

Assuming that this web server gets 800 requests per second, each request is handled by separate threads. The thread count reached its maximum capacity.

Let’s calculate the Total CPU time for a single request as below.

Total CPU Time = Request preparation time + Response preparation time
               = 0.1ms + 0.1ms
               = 0.2 ms

For a single request, it takes 0.2 milliseconds of CPU time. How about 800 requests?

Total CPU time for 800 requests = 800 * 0.2 milliseconds
                                = 160 milliseconds

Recapping our capacity, in a second our server can handle only 800 requests because we can create only a maximum of 800 threads. 1 second = 1000 milliseconds. Let’s calculate the CPU utilization of 1 second:

CPU Utilization = 160 milliseconds / 1000 milliseconds
                = 16%

In a second, only 16% of the CPU was utilized. This shows that we’re underutilizing the CPU, we’re not optimally utilizing the CPU at its capacity.

How many threads are required to utilize at least 90% of the CPU?

16% = 800 threads
90% = ? threads
number of threads required = (800 * 90) / 16
                           = 4500 threads

This shows that our backend server can actually capable of handling 4500 requests per second by utilizing 90% of the CPU. Due to the constraint, we’re able to create only 800 threads.

How much RAM is required to create 4500 threads? considering each thread required 20MB of RAM := 4500 * 20 = 90GB.

This number is crazy, isn’t it? hence, with native thread, we are not able to utilize the hardware at its full capacity. The creation of threads poses a significant overhead that restricts our ability to fully utilize the hardware’s capacity.

Virtual Thread

Virtual threads are a lightweight implementation of Java threads, available as a preview feature in Java 19 as part of Project Loom. Virtual threads will solve the overhead of the creation and maintenance of native threads and allow to write high-throughput concurrent applications with near-optimal hardware utilization.

Unlike platform threads, which are managed by the OS, virtual threads are user-level threads that are managed by the JVM. Virtual threads are lightweight and have a smaller memory footprint compared to platform threads whereas each virtual thread takes a few bytes of memory. This makes them more suitable for tasks that handle a large number of client connections or processing I/O-bound operations.

There is no limit on the number of virtual thread creations, you can even create millions of them, as virtual threads don’t require system calls to the kernel

Thread pooling is not required since virtual threads are lightweight, it requires only a few bytes and takes very less time to create them. Hence, you can create them and destroy them whenever needed.

Virtual threads are compatible with all the operations that are performed by classic threads such as thread-local variables, synchronization blocks, thread interruption, etc.

How does the Virtual thread work?

JVM manages a pool of native threads. A Virtual thread is attached to an available native thread when it wants to perform a CPU operation. The JVM automatically suspends the virtual thread and detaches it from the native thread until the thread finishes its IO operation, during this time, another virtual thread can perform the CPU operation on this OS thread. This is why we can create many virtual threads, as they are significantly lightweight.

JVM uses M:N thread mapping model to map the virtual threads to the native threads.

M:N mapping of virtual thread to OS thread

An example of Program of Virtual thread in Java

Using the new ofVirtual() factory method in the existing Thread class.

for (int i = 0; i < 5; i++) {
  Thread vThread = Thread.ofVirtual().start(() -> System.out.println("Hello World"));
}

2. Using the new method newVirtualThreadExecutor() method from Executors factory.

public static void main(String[] args) {
  var executor = Executors.newVirtualThreadExecutor();

  for (int i = 0; i < 5; i++) {
    executor.submit(() -> System.out.println("Hello World"));
  }
   
  executor.awaitTermination();
  System.out.println("All virtual threads are finished");
}

Green thread vs Virtual thread

Does the virtual thread sound like Java’s old green thread?

Green Threads had an N:1 mapping with OS Threads. All the Green Threads ran on a single OS Thread. With Virtual Threads, multiple virtual threads can run on multiple native threads (M:N mapping). A bit more details from the JEP 425

Conclusion

In conclusion, virtual threads are a promising new feature that offers several advantages over traditional threads. By providing a lightweight concurrency model that is managed entirely in user space, virtual threads can make it easier to write concurrent code that scales well with large numbers of threads.