When I first started exploring Virtual Threads in Java, I wanted to understand everything about them—their yielding behavior,
performance characteristics, and limitations. This journey led me to an interesting challenge: file I/O operations cause
Virtual Threads to become “pinned” to platform threads, limiting their effectiveness for I/O-heavy applications.
This pinning issue has been widely acknowledged across the Java community. It’s mentioned in the Virtual Threads JEP,
discussed on Reddit threads, debated on mailing lists,
and highlighted in the “State of Loom“. The consistent message: File I/O and Virtual Threads don’t play nicely together.
The consensus is that this shouldn’t be a problem because disk access is “fast enough.” But is it? When
accessing remote filesystems or handling thousands of concurrent I/O operations, those milliseconds add up quickly. Java’s
inability to differentiate between local and remote filesystems creates an opportunity for improvement.
I’ll share my expedition into solving this challenge by combining two powerful technologies: Project
Panama for native interoperability and Linux’s io_uring for high-performance asynchronous I/O. I’ll walk through how I
built a bridge between Java’s Virtual Threads and io_uring, transforming blocking I/O operations into yielding ones.
If you’re building high-throughput web servers, data processing pipelines, or any application with intensive I/O
requirements, this approach might be the missing piece in fully realizing the potential of Virtual Threads.
What is Panama
Project Panama is Java’s modern approach to native interoperability, released as a standard feature in Java 22. It offers an alternative to JNI with a more elegant and safe API for interacting with native code and memory. Panama’s API provides the developers with the performance benefits of native code while maintaining Java’s safety and usability characteristics.
At its core, Panama solves a fundamental problem that Java developers have faced for decades: how to efficiently
interact with code and data outside the Java runtime while maintaining Java’s safety guarantees. Before Panama,
developers relied on JNI, which required writing C/C++ code, compiling shared libraries, and writing verbose Java code
to load and call these libraries. JNI development was error-prone and difficult to debug, creating a significant
cognitive disconnect from Java’s usual programming model.
For applications that need to leverage native libraries or system capabilities, Panama transforms what was a
dreaded necessity into a natural extension of Java development. This transformation is especially relevant for domains
like high-performance computing, machine learning, and the focus of our exploration of high-performance I/O operations. Let’s break down Panama’s capabilities into its core components: memory management and function calling, to understand how they enable our io_uring integration.
Managing memory
Memory management represents one of the most significant challenges when interacting with native code. Panama addresses this through an approach centered around the Arenas.
For Java applications, you rarely have to think about memory management thanks to the garbage collector. When working with native code, however, memory must be explicitly allocated and freed. This manual management creates numerous opportunities for errors like memory leaks or accessing freed memory.
Panama’s Arena API provides deterministic resource management for off-heap memory. When an Arena closes, all memory segments allocated within it are automatically deallocated. This approach aligns with Java’s resource management patterns, making it intuitive for Java developers. All with a simple try-with-resource statement.
Different Arena implementations accommodate various usage patterns. The confined Arena ensures single-thread access. The shared Arena allows multiple threads to access memory segments. The global Arena addresses cases
where memory segments need to outlive their creating threads. The Auto Arena uses the Garbage collector to deallocate segments. Creating an Arena can be done as follows. The Arena is active inside the try-with-resource statement. This means the MemorySegments created inside of it are only available inside that scope and will be freed when the arena is closed.
try(Arena arena = Arena.ofConfined()){
MemorySegment memorySegment = arena.allocate(1024);
}
MemorySegment serves as Panama’s primary abstraction for working with memory, representing a contiguous region with well-defined boundaries and lifecycle. Unlike direct ByteBuffers, MemorySegments are explicitly bound to an Arena, creating a clear ownership model.
For structured data, Panama provides the MemoryLayout API to describe complex memory organizations including
structs, arrays, and unions. This enables safe manipulation of complex data structures without resorting to
error-prone pointer arithmetic. Like is done in the following example. Here a StructLayout is created with different
kinds of values. The values can be accessed using a VarHandle.
AddressLayout C_POINTER = ValueLayout.ADDRESS
.withTargetLayout(MemoryLayout.sequenceLayout(Long.MAX_VALUE, JAVA_BYTE));
StructLayout requestLayout = MemoryLayout.structLayout(
ValueLayout.JAVA_LONG.withName("id"),
C_POINTER.withName("buffer"),
ValueLayout.JAVA_INT.withName("fd"),
ValueLayout.JAVA_BOOLEAN.withName("read"))
.withName("request");
VarHandle idHandle = requestLayout.varHandle(MemoryLayout.PathElement.groupElement("id"));
The integration of these concepts creates an approach to memory management that maintains Java’s safety guarantees while providing the flexibility needed for native interoperability.
Making calls up and down
Panama’s approach to function calls builds upon Java’s existing metaprogramming features like Method handles to create a bridge between Java methods and native functions. The API distinguishes between downcalls (Java calling native functions) and upcalls (native code calling back into Java methods).
For downcalls, Panama’s Linker API creates method handles that invoke native functions. This process begins with obtaining a SymbolLookup to locate native symbols within libraries. Once found, the Linker transforms these symbols into method handles using FunctionDescriptors that specify parameter and return types.
MethodHandle malloc = linker.downcallHandle(
linker.defaultLookup().find("malloc").orElseThrow(),
FunctionDescriptor.of(ADDRESS, JAVA_LONG)
);
malloc.invokeExact(size);
This description-based approach eliminates the verbose boilerplate associated with JNI. Instead of writing C code to
bridge between Java and native functions, developers describe the function signature using Java code. This not only
reduces error potential but also enables better tooling support. You will get an exception if you call the method with incorrect typing.
Upcalls follow a similar pattern but in reverse, with Java methods wrapped as function pointers that native code can
invoke. This capability is particularly valuable when working with callback-based APIs, where native code needs to
notify Java about events or completion status. In the following example, a down call and an up call are made for the
native signal method.
public static void main(String[] args) throws Throwable {
final var linker = Linker.nativeLinker();
// up call
// describes the Java method
MethodHandle handleSignal = MethodHandles.lookup()
.findStatic(UpCallExample.class,
"handleSignal",
MethodType.methodType(void.class, int.class));
// Signature of the java method
FunctionDescriptor signalDescriptor = FunctionDescriptor.ofVoid(ValueLayout.JAVA_INT);
// The stub to be passed to the native code
MemorySegment handlerFunc = linker.upcallStub(handleSignal,
signalDescriptor,
Arena.ofAuto());
// down call
MethodHandle signal = linker.downcallHandle(
linker.defaultLookup().find("signal").orElseThrow(),
FunctionDescriptor.ofVoid(JAVA_INT, ADDRESS)
);
signal.invoke(2, handlerFunc);
}
static void handleSignal(int signal) {
System.out.println("Received signal: " + signal);
}
Both upcalls and downcalls benefit from Panama’s unified type mapping system, which automatically handles conversion between Java and native types. When passing complex data structures to native functions, developers can use MemoryLayout to describe the structure and MemorySegment to allocate appropriate memory. What is great about this approach is that it is a complete shift from JNI. Rather than treating native code as a separate environment with its own rules, Panama extends Java’s programming model to more natural interaction with native code. Now that we understand how Panama enables safe and efficient native interoperability, let’s examine how it can be used with io_uring.
What is io_uring
IO_uring is a Linux kernel interface that improves how applications perform input/output operations. At its core,
it establishes a shared memory communication channel between applications and the kernel using two ring buffers: one for submitting requests and another for receiving completions.
Applications submit multiple I/O operations (in batches) to the submission ring without making individual system
calls for each operation. Applications can continue executing other tasks while I/O operations complete asynchronously, with results delivered through the completion queue.
It addresses the performance limitations of traditional I/O methods by eliminating per-operation system calls and context switches, enabling true asynchronous operation for all types of I/O, and dramatically reducing overhead under high concurrency. This makes it particularly valuable for applications that need to handle thousands of concurrent I/O operations efficiently.
The relationship between io_uring and Virtual Threads is particularly interesting. Virtual Threads excel at managing
concurrent tasks but can become pinned during blocking I/O operations. IO_uring’s asynchronous nature potentially
addresses this limitation, allowing Virtual Threads to yield during I/O operations rather than remaining pinned to
carrier threads. To make these concepts concrete, let’s implement a basic file read operation using io_uring through Panama. This example will demonstrate the essential pattern that supports more complex I/O operations.
Single read with Java and Uring
Implementing a basic file read operation with io_uring through Panama illustrates how these technologies work together to create an alternative for Java’s I/O capabilities. The following application creates a bridge between Java and io_uring native library.
The implementation involves initializing an io_uring instance, preparing a read request, submitting it to the kernel,
and checking for completion. Throughout this process, Panama provides memory management for buffers, function calling, and type mapping between Java and native structures.
The following pattern demonstrates how we bridge Java and io_uring through Panama to perform a read operation. First, we initialize the io_uring instance and allocate memory for our buffer within a confined Arena to ensure proper resource management. We then prepare a Submission Queue Entry (SQE) that describes our read operation, including the file descriptor, buffer location, size, and offset. After submitting this request to io_uring with io_uring_submit, we wait for the operation to complete using io_uring_wait_cqe. Once completed, we can extract the result from the Completion Queue Entry (CQE) and process the data that was read into our buffer. Finally, we mark the completion as seen and let the Arena’s automatic cleanup handle resource deallocation. This basic pattern forms the foundation for more sophisticated I/O operations while maintaining both Java’s safety guarantees and io_uring’s performance benefits.
try (Arena arena = Arena.ofConfined()) {
// Create and initialize the ring
MemorySegment ring = arena.allocate(ring_layout);
int ret = (int) io_uring_queue_init.invokeExact(10, ring, 0);
// Prepare file descriptor and buffer
int fd = (int) open.invokeExact(MemorySegment.ofArray("red_file".getBytes()), 0, 0);
MemorySegment buffer = arena.allocate(1024);
// Prepare read operation
MemorySegment sqe = (MemorySegment) io_uring_get_sqe.invokeExact(ring);
io_uring_prep_read.invokeExact(sqe, fd, buffer, buffer.byteSize(), 0L);
// Submit to uring
ret = (int) io_uring_submit.invokeExact(ring);
// Wait for and process completion
MemorySegment cqePtr = arena.allocate(ValueLayout.ADDRESS);
ret = (int) io_uring_wait_cqe.invokeExact(ring, cqePtr);
// print the result
System.out.println(buffer.getString(0));
// Process data and cleanup
io_uring_cqe_seen.invokeExact(ring, cqePtr);
}
For applications that perform numerous read operations, such as web servers, databases, or file-processing utilities, this integration delivers significant performance improvements by reducing system call overhead and enabling true asynchronous I/O. Another benefit is that io_uring supports sockets, streamlining the process of reading data and sending it to clients without unnecessary data copying. While this basic implementation works, we can further optimize it by examining some of the performance characteristics and challenges when bridging Java and native code.
Performance improvements
The combination of Panama, io_uring, and Virtual Threads creates new opportunities for Java applications with
I/O-intensive workloads. Using a native library on its own can be beneficial and provide performance improvements. To get more speed out of the bindings we need to take a closer look at three patterns for memory allocation. The first speed-up is for applications that do a lot of allocations.
When an Arena allocates memory using the allocateSegment function you get a continuous region of memory that is filled with zeroes. This is a safe segment of memory the application can use. However, the actual creation of a segment is quite an expensive operation compared to using malloc or calloc. These two native methods are a lot quicker when you need a continuous region of memory. In the following output, you can see how much faster these native methods are when you need to allocate some memory.
A higher score is better
Benchmark (memory size) Mode Score Units
MemoryBenchmark.arena 64 thrpt 23759.714 ops/ms
MemoryBenchmark.arena 128 thrpt 24430.636 ops/ms
MemoryBenchmark.arena 512 thrpt 18927.759 ops/ms
MemoryBenchmark.arena 1024 thrpt 15068.157 ops/ms
MemoryBenchmark.arena 2048 thrpt 6657.966 ops/ms
MemoryBenchmark.calloc 64 thrpt 44291.364 ops/ms
MemoryBenchmark.calloc 128 thrpt 25280.973 ops/ms
MemoryBenchmark.calloc 512 thrpt 20667.819 ops/ms
MemoryBenchmark.calloc 1024 thrpt 22050.309 ops/ms
MemoryBenchmark.calloc 2048 thrpt 19291.683 ops/ms
MemoryBenchmark.malloc 64 thrpt 82383.964 ops/ms
MemoryBenchmark.malloc 128 thrpt 82172.938 ops/ms
MemoryBenchmark.malloc 512 thrpt 86222.730 ops/ms
MemoryBenchmark.malloc 1024 thrpt 85490.634 ops/ms
MemoryBenchmark.malloc 2048 thrpt 29076.375 ops/ms
As you can see from the output generated by JMH benchmarks using native methods can allocate memory faster than an Arena can. If your use case requires the application to allocate lots of memory segments it can be beneficial to switch to native methods.
There are two ways to use this native memory. We can tie its lifetime to an arena, or you can manage the lifetime of these segments yourself and free them when necessary. The first option gives you more safety while the latter gives you more control over resources. Calling free() is easy enough, so let’s explore how we can use malloc and calloc in a safe way using the Arenas that project Panama provides us.
public class AllocArena implements Arena {
private static final MethodHandle malloc;
private static final MethodHandle calloc;
private static final MethodHandle free;
static {
Linker linker = Linker.nativeLinker();
malloc = linker.downcallHandle(linker.defaultLookup().find("malloc").orElseThrow(),
FunctionDescriptor.of(ADDRESS, JAVA_LONG));
free = linker.downcallHandle(linker.defaultLookup().find("free").orElseThrow(),
FunctionDescriptor.ofVoid(ADDRESS));
}
final Arena arena = Arena.ofConfined();
@Override
public MemorySegment allocate(long byteSize, long byteAlignment) {
return ((MemorySegment) malloc.invokeExact(size)).reinterpret(size, arena, m -> free(m));
}
@Override
public MemorySegment.Scope scope() {
return arena.scope();
}
@Override
public void close() {
arena.close();
}
}
The arena overrides the allocation method. Doing so we can use our own code that uses malloc or calloc to allocate a memory segment. Calloc and malloc return a pointer to a contiguous region of memory. A MemorySegment that is a pointer has a length of zero, so we can’t use it directly. To make it usable we have to call reinterpret which takes as a parameter the new size, and optionally an arena and cleaning method. From the client’s perspective, this arena doesn’t look or behave differently, just the allocation happens using malloc.
Using Native memory is not the only way to pass values to native code. There is also the option to pass the address of heap-backed memory to native code. The benefit of doing so is that it saves you from doing a memory allocation but also having to manage that segment’s lifetime. All of this responsibility is handed to Java. To be able to pass heap-backed memory you need to mark a method as critical using the Linker. This option is only meant for calls that have a very short lifetime. It’s also not meant to be used to create callbacks. What happens behind the scenes is that Java will create a temporary address for that value and pass it to the native code. In the following section, you see how to mark
methods as critical and use them.
open = linker.downcallHandle(
linker.defaultLookup().find("open").orElseThrow(),
FunctionDescriptor.of(JAVA_INT, ADDRESS, JAVA_INT, JAVA_INT),
Linker.Option.critical(true));
open.invokeExact(MemorySegment.ofArray((filePath).getBytes()), flags, mode);
Using Linker.Option.critical(true) enables the application to use heap memory. The invokeExact method creates a heap-backed memorySegment of a String. This should really only be used for methods that perform worse by making a copy of the data first.
There are several ways to speed up memory segment allocation but the best way is to reuse memory that you have already allocated. The risk is accessing previous values, if you are not careful. The benefit is the almost free performance improvement. Allocating memory will always have a cost and so will be turning that memory into a memorySegment. This can all be prevented by reusing segments. The easiest/safest way is to reuse memory that holds pointers, this is an easy way to start and learn how to do it safely. In the end, it looks almost the same as Java. Beyond memory optimization, our primary goal was addressing the Virtual Thread pinning issue. Let’s explore how we can transform blocking operations into yielding operations to preserve how Virtual Thread work.
Turning pinning into yielding
Virtual Threads face a significant limitation with file I/O operations: pinning. When a Virtual Thread performs
blocking operations, including file I/O, it becomes “pinned” to its carrier thread, preventing that carrier
thread from executing other Virtual Threads. This pinning effectively negates one of the primary benefits of the Virtual Thread model. More carrier threads get created if this happens, so prevent a degradation in performance. Making native calls is not great either as those will pin the virtual thread as well. So let’s see how we can prevent pinning in the first place.
The trick to prevent pinning is to yield a virtual thread before making a pinning call. The mechanism needed looks familiar to what happens inside Java. Ask yourself the question: how does a virtual thread know when to wake up? The answer is another thread wakes it up. To get the same behavior we can recreate it with a polling thread and futures. The future will block the thread till the polling threads wake it up with a result. The next example shows you the essence of what is needed for this behavior.
public final class BlockingReadResult {
private final CompletableFuture<Void> future = new CompletableFuture<>();
private MemorySegment buffer;
BlockingReadResult(long id) {
super(id);
}
public MemorySegment getBuffer() {
try {
future.get();
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
}
return buffer;
}
// code continues
The trick is to have it wait till a result is available. One way of doing so is to use CompletableFuture.get(). The virtual
thread will yield till a result is available. The polling thread that checks if values are available looks like the following example. It’s a while loop that will set the result of the CompletableFuture allowing the virtual thread to continue.
while (running) {
final List<Result> results = jUring.peekForBatchResult(100);
results.forEach(result -> {
BlockingResult request = requests.remove(result.getId());
request.setResult(result);
});
}
The approach is pretty straightforward and allows you to yield virtual threads for operations that do not yield them. The cost of doing this is that you will need a polling thread. If this is beneficial to you application depends on the workload.
Bringing It All Together
We explored how Panama, io_uring, and Virtual Threads can fundamentally improve Java’s approach to I/O operations.
By integrating io_uring using Panama, Java applications can leverage Linux’s I/O capabilities while maintaining Java’s safety and productivity benefits.
This implementation exercise also reveals two distinct approaches to native integration. You can create direct bindings
where Java methods map directly to native calls, resulting in a less Java-like but more straightforward API. Alternatively,
you can build a facade around the native library, creating a more idiomatic Java API at the cost of additional abstraction.
For most production applications, I recommend the facade approach. It gives you more control over the API
and creates a more familiar experience for Java developers. The slight performance overhead is usually worth the improved maintainability and safety.
While this solution focuses on file I/O, the same approach can be applied to other blocking operations that cause Virtual
thread to pin. Network I/O, database access, and inter-process communication are all candidates for similar optimization.
As Panama and Virtual Threads continue to mature in future Java releases, we can expect even tighter integration between
these technologies, potentially making solutions like this part of the standard library. Until then, this approach offers
a powerful way to unlock the full potential of Virtual Threads for I/O-intensive applications.