JVM Iceberg – Modern Performance Edition

Artur Skowronski

The “Iceberg” meme is an internet phenomenon that humorously, and sometimes unsettlingly, illustrates levels of knowledge or initiation into a given topic – from simple, widely known facts at the tip of the iceberg to the dark, esoteric depths comprehensible only to the most battle-hardened veterans. Picture an iceberg floating on water: what’s visible on the surface is just the beginning, while the real magic (or nightmare) lurks beneath, in increasingly inaccessible layers.

Personally, I love it. So I decided to create Java ones. I’ve already published one covering JVM as a whole, but this time I decided to focus on a particular topic – performance! I hope you will like it!

Level 1: The Tip

Project Loom (Virtual Threads)

TLDR: Project Loom is Project Loom – you know it all, you love it all

Project Loom is a new concurrency model in Java (introduced in JDK 19/21) offering lightweight virtual threads managed by the JVM. Virtual threads are significantly cheaper than traditional system threads – thousands can be launched with minimal overhead, as they consume minimal resources and are suspended/resumed by the runtime (e.g., during I/O) instead of blocking system threads; this improves scalability and throughput in highly parallel applications while maintaining the simple, thread-based programming model.

JDK Flight Recorder (JFR)

TLDR: JVM profiling with minimal overhead. A must-have for diagnosing bottlenecks. Like a black box for your Java app (minus the crash)

JDK Flight Recorder (JFR) is the JVM’s built-in profiler and event recorder—think of it as a super lightweight flight data recorder for your Java application. It tracks all the good stuff: CPU usage, memory allocations, thread activity—you name it—with minimal performance overhead. Seriously, it’s so efficient you can (and should) run it in production.

It’s been baked into the JDK since version 11, and you can kick it off at startup or even attach it mid-flight. Perfect for continuous monitoring without dragging down your app.

And the best part? JFR helps you catch bottlenecks, memory leaks, and other sneaky performance gremlins before they start tanking your uptime. Quietly powerful, just like we like our tooling.

Java Mission Control (JMC)

TLDR: Graphical analysis of JFR data. Perfect for your morning coffee while tracking memory leaks.

Java Mission Control (JMC) is your go-to visual toolkit for making sense of what Java Flight Recorder (JFR) captures while your app’s doing its thing. CPU spikes? Thread pileups? Suspicious memory allocations? JMC puts it all on a pretty timeline so you don’t have to piece it together from log spaghetti.

It’s fast, it’s surprisingly user-friendly, and yes — even if you’re not a JVM tuning ninja, you can spot memory leaks or GC overloads without needing a PhD in diagnostic tooling. Just fire it up and start connecting the dots.

Because honestly, if you’re already collecting JFR data, not using JMC is like owning a telescope and only using it to look at clouds. ☁️👀

Foreign Function & Memory API (Panama)

TLDR: JNI without JNI. Java gets closer to C performance without losing comfort.

The Foreign Function & Memory API—aka Project Panama—is Java’s shiny new way of calling native code and poking around in off-heap memory without diving into the dark depths of JNI. It’s all about letting you talk to C (or C++) libraries directly, but with clean, modern Java APIs that won’t make your eyes bleed.

No more boilerplate glue code or wrestling with native headers. Panama makes it way easier (and faster) to integrate with performance-critical native libs—think things like image processing, numerical computing, or custom hardware interfaces.

For anyone building apps that need to crunch serious data or do things the JVM wasn’t exactly born for, Panama opens up some exciting new doors—and it holds the door open for you too.

GraalVM

TLDR: Turbo for JVM with a JIT compiler written in Java. Aggressive optimization, plus polyglot runtime support.

GraalVM is like HotSpot’s cooler, more experimental cousin. It’s built on OpenJDK but swaps in the Graal compiler—a next-gen JIT written in Java itself—for smarter, leaner, and more aggressive optimizations. Think: fewer CPU cycles wasted, snappier performance, and an all-around tighter runtime.

But wait, there’s more—it’s not just about squeezing more out of Java. GraalVM is also a full-blown polyglot platform, meaning you can run Java, JavaScript, Python, Ruby, R, and more in a single VM. Yes, really.

In practice? It lets you build flexible apps where components written in different languages play nice together in one runtime. Faster builds, cleaner code, and the freedom to pick the right tool for the job—all without leaving the GraalVM playground.

Level 2: Just Below the Surface

Vector API

TLDR: SIMD in pure Java. Numerical computing and multimedia in JVM have never been faster

The Vector API—introduced via JEP 338 and still cooking nicely in preview—is Java’s ticket to the SIMD party. Instead of processing data one sad element at a time, this API lets you crunch multiple values in parallel using Single Instruction, Multiple Data magic. Think IntVector, FloatVector, DoubleVector—entire arrays of data, sliced and diced in a single CPU operation.

This isn’t just cool for the sake of it—it’s a big win for number-heavy tasks like image and signal processing, scientific computing, or anything that screams “data crunch me harder.” Add, multiply, compare—all way faster than your regular loops.

Even better? It’s built with portability in mind. The API figures out what SIMD instructions your CPU supports and uses them behind the scenes, so your code stays clean and runs fast on multiple architectures. Write once, vectorize everywhere.

VisualVM

TLDR: Classic monitoring tool. Intuitive UI, effective performance, and evergreen status among developers.

VisualVM has been around forever—and there’s a reason it’s still in toolbox of many devs. This classic GUI-based tool lets you peek inside your running Java apps in real time: CPU and memory usage, thread activity, object allocations, heap dumps, stack traces—you name it.

Sure, it might not have the shiny new branding of some newer tools, but when you need to track down a rogue memory leak or figure out why your app’s suddenly cooking CPUs like breakfast, VisualVM gets the job done. Fast.

It’s lightweight, easy to use, and perfect for both local dev and staging environments. Sometimes, you don’t need fancy—you just need something that works. VisualVM is exactly that.

Java Microbenchmark Harness (JMH)

TLDR: The standard benchmarking tool. Helps avoid pitfalls with JVM and JIT that can skew results.

JMH (Java Microbenchmark Harness) is the official go-to tool from the OpenJDK crew when you want to actually measure how fast your Java code runs. It’s built specifically for benchmarking on the JVM, which means it handles all the tricky stuff—JIT warm-ups, dead-code elimination, and those sneaky measurement traps that make your stopwatch lies look believable. In short: if you want numbers you can trust, you use JMH.

Whether you’re optimizing a hot loop or validating that a refactor didn’t nuke performance, JMH lets you write clean, focused microbenchmarks that produce reliable data, not vibes.

async-profiler

TLDR: Native sampling profiler with flame graphs. A master of low overhead and accurate diagnostics.

Forget the old-school safepoint-skewed profilers — async-profiler is a modern, native-level (yep, written in C++) sampling profiler that knows how to keep it real. It skips the safepoint bias that messes with your stack traces and gives you an honest look at what your app’s actually doing.

It covers CPU, memory allocations, I/O, and even lock contention, with crazy low overhead, which means you can run it in production without sweating bullets. And when you’re done? Boom—flame graphs. 🔥 One glance and you’ll know exactly where the bottlenecks are hiding.

If you’re serious about performance tuning on the JVM, async-profiler is basically your X-ray vision.

Level 3: Deeper Level

YourKit and JProfiler

TLDR: Commercial profiling powerhouses, convenient for both local and production environments. Comfort for the price of a license? Why not!

If async-profiler is your lean command-line ninja, YourKit and JProfiler are the luxury sedans of the Java profiling world—fully loaded, smooth UI, and packed with features. Both are commercial tools, but they earn their keep with powerful diagnostics: deep CPU and memory profiling, heap analysis, leak detection, and top-notch thread monitoring.

They shine especially in big, complex projects where you need that extra level of insight, and can afford the tooling. Bonus points for IDE integrations, support for remote sessions, and features tailored for different stages of development and ops.

Not everyone needs them, but when you do, they’re rock solid. 🧰🚀

Ahead-of-Time Compilation (AOT)

TLDR: Fast start at the expense of peak performance. Ideal for microservices and serverless scenarios.

Ahead-of-Time (AOT) compilation lets you skip the whole JIT warm-up ritual and go straight to a native binary. Instead of compiling code during runtime, you compile it before—resulting in lightning-fast startup and reduced memory usage. The trade-off? You might lose a bit of peak performance, but for many workloads, that’s totally worth it.

The poster child here is GraalVM Native Image. Apps built this way can launch in milliseconds, which makes it a perfect fit for microservices, serverless functions, or anything where “cold start” sounds like a curse word.

But! That’s not the only game in town — Project Leyden is Oracle’s long-term plan to bring similar startup + footprint improvements within the OpenJDK, aiming for more standardized and JVM-integrated solutions without needing a separate toolchain.

Z Garbage Collector (ZGC)

TLDR: The garbage collector of the future? Pauses under 10 ms, maximum concurrency, and since JDK 21, even with generations.

ZGC (Z Garbage Collector) is HotSpot’s answer to the age-old question: “Can I GC without killing my app’s vibe?” Spoiler: yes. Designed for massive heaps and ultra-low pause times (we’re talking ~10 ms or less), ZGC does most of its work concurrently — so your app keeps humming while GC quietly tidies up in the background.

And starting with JDK 21, ZGC got even better with generational support. That means it now reclaims memory more efficiently without sacrificing its signature low-latency magic.

In real life? ZGC is what you reach for when you’re building data-heavy, user-hungry systems that need to stay snappy under pressure. Whether you’re scaling up or handling real-time workloads, ZGC helps keep things smooth, fast, and garbage-free(ish).

AWS Lambda SnapStart

TLDR: Cold start up to 10 times faster with no additional costs. The magic of AWS!

AWS Lambda SnapStart is Amazon’s magic trick for making Java functions start way faster. Instead of going through the whole cold start dance every time, SnapStart takes a snapshot of your function after it’s fully warmed up—class loading, dependency wiring, the whole shebang.

Once you publish a new version, Lambda does that initialization once, saves the memory state, and next time your function runs… boom 💥 it just resumes from that snapshot. No repeated setup, no waiting around. You get up to 10x faster startup, with no extra cost.

In practice? SnapStart makes Java way more viable in serverless scenarios, where every millisecond counts. Suddenly, Java’s not the “slow starter” in the room—it’s sprinting off the blocks with the best of them.

Level 4: Even deeper

Shenandoah GC

TLDR: Red Hat made a GC for fans of ultra-short pauses. Worth trying on large heaps.

Designed by the fine folks at Red Hat, Shenandoah GC is all about keeping pause times tiny—even when your heap isn’t. It does most of its garbage collection work concurrently with the application, which means your 2GB dev box and your 200GB production beast get the same short pauses. No kidding.

The secret sauce? Concurrent memory compaction and region-based collection that keeps latency low and throughput solid.

In real-world terms: Shenandoah shines in systems where responsiveness is non-negotiable—financial platforms, real-time dashboards, anything that panics when the GC hiccups. It’s one of those tools that just works quietly in the background… and your users never know how close they came to a full-GC freeze.

Class Data Sharing (CDS)

TLDR: Faster start, lower memory usage, ideal for microservices in containers.

Class Data Sharing (CDS) is one of those JVM features that quietly pulls serious weight. Instead of reloading and recompiling the same classes every time your app starts, CDS lets you create a shared class archive—basically a pre-baked bundle of commonly used classes.

The JVM can then load these directly from disk into shared memory, skipping all the warm-up overhead. Result? Faster startup and lower memory usage—especially handy in containerized or microservices setups where you’re spinning up JVMs like there’s no tomorrow.

In practice, CDS helps your apps boot quicker, run leaner, and scale better. Less RAM, less CPU, and less time wasted on things you’ve already loaded a hundred times.

Eclipse OpenJ9

TLDR: JVM focused on fast start and minimal memory footprint. Cloud-native JVM at its best!

Eclipse OpenJ9 is an alternative JVM implementation, originating from IBM J9, designed with a focus on fast startup and small memory footprint. OpenJ9 offers unique features such as shared class memory between runs and JIT Server mode, which offloads the JIT compilation cost from the application process.

In practice, OpenJ9 excels in cloud environments where resources are limited, and fast application startup is critical. With lower memory usage and quicker startup, OpenJ9 allows for more efficient resource usage and better scalability of applications.

Coordinated Restore at Checkpoint (CRaC)

TLDR: Checkpoint and restore JVM. Applications start instantly with full warm-up.

Coordinated Restore at Checkpoint (CRaC) is one of the coolest (pun fully intended) things happening in OpenJDK right now. It introduces an API for creating snapshots of a running Java application—including its JIT-warmed code, populated caches, and full runtime state—and then restoring from that snapshot like nothing ever happened.

The result? Your app wakes up from its cryo-nap with zero warm-up and hits full performance instantly. No cold starts, no ramp-up, just go. A bit like reading Save States in the emulator.

In practice, CRaC is a game-changer for use cases where startup time and latency matter—like autoscaling microservices, serverless workloads, or anything that doesn’t have time to wait for the JVM to “get ready.” Java, but instant on.

Azul Platform Prime

TLDR: JVM on steroids with pause-free GC C4 and Falcon JIT based on LLVM. Real-time and transactions? Absolutely.

Azul Platform Prime is like the high-performance luxury edition of the JVM. Under the hood, it packs the C4 garbage collector (pause-free, even under pressure), the Falcon JIT compiler (LLVM-based and laser-focused on peak performance), and ReadyNow!, which nukes warm-up time so your app can hit full speed right out of the gate.

In practice, Azul Prime is built for the kind of workloads where latency kills and throughput pays — think financial systems, trading platforms, or real-time apps that can’t afford a hiccup. It’s not your everyday JVM — but if you need max performance with zero compromise, this one earns its “Prime” label.

Level 5: The Bottom

Falcon JIT

TLDR: LLVM meets JVM. A compiler with sharp optimization edge.

We already gave Falcon JIT a nod earlier, but let’s be honest—it deserves its own spotlight. Developed by Azul, Falcon is a just-in-time compiler that swaps out the classic HotSpot C2 for something far spicier: LLVM. That’s right—the same backend used to power compilers in C, Rust, Swift, and more is now optimizing your Java code.

The result? Lean, mean, machine code that’s more aggressively optimized than what C2 typically offers. In real-world terms: better throughput, faster execution, and lower resource usage—all without changing your application code.

If your workload is CPU-hungry, latency-sensitive, or just plain performance-obsessed, Falcon is like strapping a turbocharger to your JIT.

OpenLiberty InstantOn

TLDR: Running Java containers in milliseconds. IBM dusted off CRIU, and Java got a turbo boost.

Open Liberty InstantOn, brought to you by IBM, is all about skipping the slow boot sequence and getting straight to business. It uses CRIU (Checkpoint/Restore In Userspace) to take a snapshot of your Java app’s fully-initialized state—then brings it back to life almost instantly.

InstantOn is a big win in cloud-native and serverless environments where startup time can make or break scalability. Apps scale faster, respond sooner, and your platform stops feeling like it’s running in molasses.

Thread-Local Allocation Buffers (TLAB)

TLDR: Faster object allocation in multithreaded applications. Each thread gets its own piece of Eden!

Thread-Local Allocation Buffers (TLAB) is a mechanism in the JVM that allocates each thread its own memory area in the young generation of the heap (Eden). This allows threads to allocate objects without needing synchronization, speeding up the allocation process.

TLAB increases the performance of multithreaded applications by reducing synchronization costs during object allocation. As a result, applications can scale better on multi-core systems.

Epsilon GC

TLDR: A GC that doesn’t collect garbage. Sounds strange? Perfect for performance testing!

Epsilon GC is the “do nothing” garbage collector in OpenJDK—literally. It doesn’t reclaim memory. At all. It just allocates until the heap is full and then… well, crashes.

Why would anyone want that? Because it’s perfect for performance testing. With Epsilon, there’s zero GC overhead, so you can benchmark your app without any collector skewing the results. It’s also handy for short-lived processes where memory management is irrelevant.

It’s not meant for production (unless you really like living on the edge), but for benchmarking, tuning, or chaos experiments — Epsilon keeps things simple, brutal… and honest.

Arthas

TLDR: Arthas – live JVM diagnostics. A must-have in production when debugging without modifying code.

Arthas is an open-source diagnostic tool from Alibaba’s middleware team that lets you troubleshoot live Java applications—no code changes, no restarts, no redeploys. By enhancing bytecode on the fly, it gives you powerful tools for real-time monitoring: think class decompilation, method tracing, thread analysis, and resource inspection—all from a slick interactive CLI.

In practice, Arthas is a go-to for DevOps teams and developers working on high-availability systems. It helps you catch memory leaks, deadlocks, or rogue SQL calls before they snowball—without taking your app offline.

Just like his namesake in Warcraft III, Arthas isn’t afraid to step into the battlefield mid-fight and take control (sorry, I’m nerd, I couldn’t resist this one).

Total
0
Shares
Previous Post

Code Reviews with AI: a Developer Guide

Next Post

02-2025 | 30 Years Of JAVA – (Part 2) – Special Edition

Related Posts