The common narrative suggests Java is unsuitable for AI, relegating it to backend services while Python dominates research via TensorFlow and PyTorch – me myself I was stating that for extended time. However, the technical reality is shifting. The integration of AI into Java go beyond just high-level API wrappers, but it is driving fundamental changes within the JVM. Python is the research lab – but Java is the software factory.
But here’s the twist: under that calm surface, under those simple API calls, lies a nightmare. A Lovecraftian journey into the depths of the virtual machine itself.
It turns out the battle for Java in AI isn’t happening at the framework level. It’s fighting in the guts of the JVM – in memory allocation, in new data types, and in projects with names that sound like ancient deities.
So today, I’m using the “Iceberg” meme as our map. We are leaving the sunny waters of Spring AI and diving into the freezing dark, where the “forgotten names of ancient gods” reside – Valhalla, Panama, and Babylon.
Buckle up! 🚀

Level 1: The Tip of the Iceberg (The “It Just Works” Zone) 🏖️

Welcome to the surface! The sun is shining, the water is warm, and your “biggest drama is a missing semicolon.”
This is the world 90% of Java devs see—a world of blissful ignorance where AI is just another annotation to inject.
The Modern LLM Orchestrators (The New Wave)
The LLM revolution spawned a new class of tools. Their goal isn’t just running model, but also the orchestration of external ones. We have two big players in that space:
- Spring AI: This is the ultimate form of “Spring Magic.” You don’t build; you just inject:
@Autowired ChatClient. Relying on the “convention over configuration” philosophy, it integrates seamlessly with your existing Spring Boot apps. Perfect for whipping up REST endpoints, chatbots, and simple RAG chains. - LangChain4j: An alternative universe for those not married to Spring. It bets on “explicit composition” and a “Java-first” approach. People see it as more “LLM-native,” with a cleaner API based on builder patterns. Historically, it’s had broader support for various LLM providers and vector DBs, often showing slightly better performance in benchmarks.
For most folks, the story ends here. It’s a world of abstraction – calling APIs from OpenAI or HuggingFace. You don’t care about tensors, linear algebra, or “how the sausage is made”.
The Classic Titans (The Old Guard)
Before LLMs were cool, there were titans proving Java could actually count.
- Weka: The “Grandpa” of ML in Java. Still the absolute king in academia (I had it on my university as well). Its GUI is friendly, making it perfect for learning the basics – classification, regression, clustering – on small datasets.
- Deeplearning4j (DL4J): The real “pioneer” and the first serious contender for deep learning on the JVM. While Weka was an academic toy, DL4J was built for the enterprise – and it was my first ever introduction to running models through Java. Scalable, native Spark/Hadoop integration, and full GPU acceleration via CUDA. This was the first signal that we could do serious math on the JVM.
It started easy but… we need to go deeper now!
Level 2: Just Below the Surface (The Classpath Battle Zone) 🦈

Now you start asking questions. “But how does Spring AI actually do that?” “What happens when the data science team hands me nothing but a .onnx file?”
The water is getting colder 🥶. Welcome to the classpath battlefield, where you realize everything you knew was just a facade.
The False Prophet: GraalPython 🐍❌
At this point, you might think: “Instead of messing with exporting models, why not just run the Python code itself?” We have GraalVM and its powerful Truffle framework, promising to JIT-compile Python with JVM performance!
It’s a trap. 🚨
The brutal truth? GraalPython has “limited compatibility.” Worse, while compatibility is getting better, it struggles with native C extensions – the libraries the entire Python ecosystem relies on for performance (NumPy, TensorFlow-C, etc.). Without them, GraalPython is hard to setup for serious AI.
The Java ecosystem stopped trying to be Python. Instead, it focused on being a better production environment for artifacts (models) created in Python. The bridge is the artifact (ONNX), not the code (GraalPython).
A Doll Inside a Doll: Deep Java Library (DJL)
You peek into the pom.xml dependencies for Spring AI and find something unexpected: ai.djl. Spoiler alert: Spring AI is really just DJL in a trench coat.
DJL, an open-source project from Amazon, is the true workhorse for inference in Java. It’s an “agnostic engine” – an abstraction layer allowing us to write one code to rule them all (PyTorch, TensorFlow, MXNet, and crucially, ONNX). Designed specifically for us Java folks, it has an intuitive API and its own Spring Boot Starter. Even LangChain4j uses it under the hood for certain tasks!
The Rosetta Stone: ONNX Runtime
DJL is a nice abstraction. But what if you want raw power? 💪 What if your data science team trains in PyTorch and exports to the Open Neural Network Exchange (ONNX) standard?
ONNX is the “Esperanto” of machine learning. It’s a universal container allowing training in any framework (Python) and running inference natively in any other language (Java). This is the real bridge between worlds.
Ghost of the Past: Apache Mahout
Sometimes, you stumble upon old tutorials mentioning Apache Mahout. What happened to it? It didn’t die. It transformed. In 2025, Mahout isn’t a general ML framework anymore. It’s a “distributed linear algebra platform” and a “mathematically expressive Scala DSL.” It lives on, but as a specialized tool for data scientists working on Apache Spark.
Here is the translation of the article, adapted to the requested personal, conversational, and enthusiastic style, while maintaining strict adherence to the citations provided.
Level 3: The Depths (Kingdom of the “Mechanical Sympathizers”)

The descent is complete. You are now in the domain of the “Mechanical Sympathizers.” Sunlight is gone. You start caring about “memory bandwidth” and “compute time.” You understand why float32 is a waste of resources when AI models run perfectly fine on float16.
Here, you stop asking “Which API should I use?” and start asking: “How do I get access to the metal?”
The answer comes in three forms.
1. Project Panama: The New Era (Goodbye JNI)
The Old Evil, JNI (Java Native Interface), was slow, unsafe, and forced us to write glue code in C. It was the bane of anyone trying to link Java with native libs.
Project Panama is our New Savior. It replaces JNI with two key components:
- Foreign Function & Memory (FFM) API: This is the heart. It lets you allocate memory off-heap (
MemorySegment,Arena) and safely call native functions. Why is this critical for AI? Because huge models and weight matrices cannot live on the GC-managed heap. GC pauses are unacceptable. FFM gives us full control over off-heap memory, crucial for sharing data with CUDA or ONNX Runtime. - jextract: A tool that reads C header files (like
cuda.h) and auto-generates all the Java bindings for you.
Vector API: Unleashing the Inner SIMD Monster
AI math is 99% matrix operations. Java loops are naturally slow. The JIT compiler tries to “auto-vectorize” them, but it often fails.
The Vector API (currently in its 10th incubator as JEP 508!) is your manual override. It lets you explicitly tell the JVM: “Take this 512-bit array chunk and do eight float operations at once using a single CPU instruction (SIMD).” This is the optimization that makes C++ libraries so fast.
TornadoVM: Pure Java Alternative
Panama and Vector API are powerful. But what if I told you there is a third way? A way where you don’t have to link to C code or manually manage SIMD?
TornadoVM is an approach from another dimension. It’s a Graal compiler extension that takes standard Java code (marked with annotations) and compiles it Ahead-of-Time (AOT) directly to accelerator code – OpenCL or Nvidia PTX. It runs tasks on multi-core CPUs, GPUs, and even FPGAs, managing complex data flows via Task-Graphs.
It’s “write once, accelerate everywhere.” You write AI algorithms in pure Java and run them on a GPU without ever touching JNI.
——–
At this level, paths diverge. We have three roads to the “metal,” each solving native performance differently.
| Project | Main Goal | Key Technology | Abstraction Level | AI Use Case |
| Project Panama | Interop with native code | FFM API, jextract | Low (memory/pointer mgmt) | Calling CUDA/cuDNN; sharing off-heap memory for tensors 97 |
| Vector API | Explicit CPU Vectorization (SIMD) | JVM instructions for SIMD | Low (vector ops) | Accelerating math (inference) on CPU; writing numeric libs in pure Java 98 |
| TornadoVM | Auto-acceleration of Java on GPU/FPGA | Graal compilation to OpenCL/PTX | High (Annotations, Task-Graph) | Writing ML/LLM algos in pure Java and running on GPU without JNI 99 |
Level 4: The Abyss – Where “Forgotten Gods” Reside

This is the bottom. No light reaches here. You are “treading the edge of madness.” Project names like Valhalla and Babylon sound like ancient myths. This is the heart of OpenJDK, where decisions are made that will define the platform for decades.
It is time for The Holy Trinity of Java’s AI Future 🕍 You discover that the ultimate future of AI on the JVM depends on the completion and convergence of three fundamental projects.
Project Valhalla: The Foundation of Reality with Float16
As we established on Level 3, AI loves float16. It’s faster and uses half the memory of float32. Java doesn’t have native float16. The current “solution” (JEP 489 & 508) wraps a short, which is a huge “footgun” without type safety.
The Holy Grail is Valhalla and its “primitive classes.” It will allow a true Float16 value class that wraps a short with zero performance overhead and full type safety. This is mandatory for the Vector API to hit max speed.
Project Babylon: The Universal Translator through Code Reflection)
This is the new, mythical OpenJDK project. Babylon gives Java something called “Code Reflection.” Unlike old reflection (which just looked at classes/methods), Code Reflection allows tools to analyze, parse, and transform Java code at compile or runtime.
Why do we care? The AI use case is mind-blowing: You write your model (e.g., a neural net) in pure Java. Babylon analyzes that code and automatically translates it into optimized GPU kernels. No more Python. No more external model files.
Heterogeneous Accelerator Toolkit (HAT): The Chariot of Babylon
HAT shouldn’t be considered a separate project – it’s the chariot Babylon rides into battle. It’s the concrete, battle-ready implementation of the Babylon idea for heterogeneous hardware like GPUs and FPGAs.
How does it work? It uses Code Reflection from Babylon to get a structured, high-level understanding of your Java code – not just bytecode tricks, but a real model of what your methods are doing. Then it hands that over to the Foreign Function & Memory (FFM) API from Project Panama, which becomes the low-level transport layer to talk to native drivers and runtimes.
No JNI, no ad-hoc glue: Babylon describes the computation, HAT turns it into kernels, and Panama wires the whole thing into whatever accelerator you have underneath. It’s the ultimate synergy: three once-separate efforts (Babylon, HAT, Panama) snapping together into a single pipeline that lets plain Java reach all the weird and wonderful silicon in your data center.
Joining everything Together: GPULlama3.java 🦙
While OpenJDK committees slowly build their “Holy Trinity,” the TornadoVM team decided not to wait. They built GPULlama3.java. It is exactly what it sounds like: the “first native implementation of Llama3 in Java” that automatically compiles and runs on the GPU. It is the ultimate proof that the TornadoVM heresy (automagical Java-to-GPU compilation) isn’t just theory. It works.
Bonus: The Graveyard of Projects

No god rises without sacrifice. At this depth, you find the ghosts of projects that tried and failed so today’s could succeed.
Project Sumatra
Before HAT, Oracle had already taken a serious shot at the same idea with Project Sumatra – offloading Java workloads to GPUs to get transparent acceleration without forcing developers to leave the Java ecosystem.
On paper it looked great: let the JVM decide which parts of your code are worth sending to the GPU and handle all the ugly details under the hood. In practice, Sumatra hit a wall. The GPU world was a mess of competing APIs and vendor-specific extensions, so the JVM team had no stable, long-term target to optimize for, and every integration choice risked locking Java into one ecosystem.
On top of that came classic “organizational” problems: coordinating compiler, JVM, and GPU vendors across different roadmaps and priorities turned out to be painfully hard. Officially, the post-mortem boiled down to two bullets – “lack of GPU standardization” and “communication difficulties” – but behind them was a pretty simple truth: the ecosystem and the org chart just weren’t ready yet.
The ghost of Sumatra couldn’t succeed because it lacked the tools. Today, Panama solved the “communication” problem , and Babylon is solving the “code translation” problem. HAT is the reincarnation of Sumatra, and this time, it has a chance to win.
Aparapi & Rootbeer
The true ancestors. Ten years ago, these projects were already doing the impossible: compiling Java bytecode to OpenCL/CUDA. Names like Aparapi, Rootbeer and a couple of research compilers quietly proved that you could walk JVM bytecode, carve out a data-parallel kernel and spit out GPU code – all while pretending the developer was just writing “normal” Java.
The price was brutal: a tiny, heavily restricted subset of the language, no allocation, no fancy polymorphism, plus a minefield of driver quirks and performance cliffs waiting behind every release of the GPU stack. They mostly survived as research toys and niche production deployments, because neither the hardware landscape nor the server/cloud story around GPUs was ready for mainstream Java yet.
In hindsight, they were a decade too early – but they sketched most of the ideas that Babylon and HAT are only now turning into something that has a real shot at going mainstream.
Conclusion – Staring into the Abyss 👁️
Our journey is over. Just an single idea at the end!
The “pragmatic” developer will stay on Level 1, happy that “it just works.” But we… we have been to the depths. We’ve seen the truth.
The real story of AI on the JVM isn’t about frameworks. It’s about the transformation of the platform. Java isn’t just trying to integrate with AI anymore. It is transforming to become a high-performance computing platform.
The AI abyss on the JVM isn’t just staring back at you. It’s compiling in pure Java 🐙
Please, do not blink.