The modern tech industry is experiencing a renaissance of hardware fascination, driven by the artificial intelligence revolution and the ever-growing demand for cloud computing power. This phenomenon – often dubbed online the new “silicon rush” – is best illustrated by NVIDIA’s astronomical rise in market value, which has soared by more than 20,000% over the past decade. In this dynamic landscape, the Arm architecture, long associated primarily with mobile devices, is starting to become as a key player, redefining performance and energy-efficiency.
The breakthrough moment that showcased Arm’s potential to the broader developer community (apart of just some indie hackers) came in 2020 with the release of Apple’s M1-powered computers. These devices revolutionized expectations for developer hardware, delivering an unprecedented blend of high performance, smartphone-like responsiveness, and extremely low energy consumption – enabling all-day battery life. What had once been the domain of x86 processors (yeah, I know that we had Intel Atom before, but do you know any “Intel Atom Developer Stations”?) suddenly gained a powerful and more efficient alternative.

Currently, Arm’s expansion goes far beyond laptops and mobile devices, establishing itself as a dominant force in data centers. Market forecasts indicate that by the end of 2025, processors based on Arm architecture will power nearly 50% of the new computing capacity deployed by the largest cloud service providers.
This transformation poses a fundamental question for the Java ecosystem: is it ready to fully harness the potential of this new architecture? Let’s try to answer that question! But we will start with a bit of theory 😊. Let’s talk about ISA.
ISA?
In the world of processor design, two main ISA (Instruction Set Architecture) philosophies dominate – the set of instructions that serves as the fundamental interface between software and hardware, defining the machine language understood by the processor. These are RISC (Reduced Instruction Set Computing) and CISC (Complex Instruction Set Computing). ARM architecture is a leading representative of the RISC approach, characterized by a simple, highly constrained set of fixed-length instructions. Complex operations, such as multiplying values from two different memory registers, are executed through a sequence of simpler, atomic commands: load value A, load value B, perform multiplication, store the result.
This approach can be aptly compared to the “Unix philosophy for hardware,” where simple, specialized tools are composed to perform more complex tasks.
In contrast stands the CISC architecture, whose main representative is the x86 standard, originating from Intel’s 8086 processor introduced in 1978. CISC offers a rich and extensive set of variable-length instructions capable of executing complex operations in a single step. While this may seem more efficient at first glance, in practice it leads to significant complexity. The processor must have advanced logic to decode instructions of varying lengths, and even the complex instructions are internally broken down into micro-operations. This complexity overhead not only increases energy consumption but also results in an accumulation of historical legacy.
The x86 standard, developed over more than four decades, now contains over 1,000 instructions – many of which are rarely used but must be supported to maintain backward compatibility. Attempts to simplify this standard, such as Intel’s x86S initiative, have failed, cementing its inherent complexity.
The Evolution of ARM – From Acorn to Neoverse
The history of ARM is a textbook example of how a niche technology can evolve into a global standard. Its roots trace back to the British company Acorn Computers and the Acorn RISC Machine (ARM) processor, which saw its first commercial deployment in 1992 in the innovative – yet commercially unsuccessful – Apple Newton device.
However, over the years, thanks to its energy efficiency, this architecture gained a dominant position in the mobile market. The next stage of evolution came with the Cortex family of ready-made processor designs, which became the heart of countless consumer devices – from smartphones to single-board computers like the Raspberry Pi, and even gaming consoles such as the Nintendo Switch. The key design principle behind the Cortex line was to maximize performance within a predefined, low power budget, making it an ideal solution for battery-powered devices.
The real breakthrough for the server world, however, was the introduction of the Neoverse processor family. This marked Arm’s strategic pivot toward data centers and high-performance computing. With Neoverse, the design paradigm shifted: instead of optimizing performance within a strict energy budget, the focus moved to maximizing the performance-per-watt ratio. Neoverse processors were developed in three main variants to address diverse cloud computing needs: the N-series for general-purpose computing, the V-series for HPC (High-Performance Computing) and AI workloads, and the E-series for edge computing.
The architecture was also enriched with key extensions, such as SVE/SVE2 (Scalable Vector Extension) for advanced vector processing and LSE (Large System Extensions), which introduced efficient atomic instructions – a necessity in highly multithreaded systems.
Arm’s success stems not only from its technological advantages but also from a unique business model that has fundamentally reshaped the dynamics of the semiconductor market. Unlike Intel, which designs, manufactures, and sells its own chips, Arm operates on an intellectual property (IP) licensing model. The company designs processor architectures and cores, then sells licenses to other entities, which can manufacture them or modify and integrate them into their own systems.
This flexible business model has lowered the barrier to entry in chip design and acted as a catalyst for the wave of “custom silicon” in the cloud industry. The largest providers, operating at massive scale, sought hardware solutions perfectly tailored to their specific workloads – something they could not achieve with off-the-shelf x86 processors. Thanks to Arm licenses, companies like AWS have designed their own highly optimized AWS Graviton processors, Google has developed the Axion series dedicated in part to AI workloads, and Ampere has emerged as a key ARM chip supplier for Oracle, Microsoft Azure, and other cloud players (and was bought by Softbank in 2025).
Maturity of the Java Ecosystem on ARM
The evolution of Java support for the ARM architecture is a prime example of how a software ecosystem can rapidly mature in response to breakthroughs in hardware. In just a few years, Java has gone from a technology that could merely “run” on ARM to a fully optimized platform leveraging the architecture’s unique capabilities – becoming a true first-class citizen.
The Road to Support: Key JEPs and Milestones
The Java community’s first signs of interest in ARM date back to 2008, with early attempts to compile OpenJDK Zero – a version without any platform-specific optimizations. These were little more than initial experiments. The real breakthrough came in 2014 when a Red Hat team led by Andrew Haley launched work on JEP 237: Linux/AArch64 Port. This initiative aimed to officially port JDK 9 to 64-bit ARM and became a catalyst for the entire industry.
In response, Oracle decided to open-source its own previously closed port, resulting in JEP 297: Unified arm32/arm64 Port, which integrated support for both variants of the architecture into the OpenJDK mainline.
Another major leap came in 2020 with the market release of Apple’s M1 computers. The Java ecosystem reacted quickly, delivering ports for Microsoft and Apple operating systems: JEP 388 introduced support for Windows/AArch64, while JEP 391 brought macOS/AArch64 support. Notably, the latter was the result of an unusual collaboration between Microsoft and Azul Systems—showing how strategically important this new hardware platform had become.
The Role of Intrinsics and ARM-Specific Optimizations
Since JDK 8, Java performance on ARM has improved by an impressive 350% – a 3.5× increase. A key driver of this jump was JEP 315: Improve AArch64 Intrinsics, introduced in JDK 11, which is widely considered the minimum recommended production version for ARM.
Intrinsics are low-level implementations of Java methods that bypass standard bytecode execution, mapping directly to highly optimized machine instructions for a given CPU. JEP 315 focused on accelerating operations fundamental to most Java applications—such as String methods (indexOf, compareTo) and mathematical functions—using ARM NEON-specific instructions. This brought dramatic speedups. Later JDK releases added further optimizations, such as hardware acceleration for cryptographic algorithms like SHA-3. The process wasn’t without challenges: some optimizations proved ineffective, and in extreme cases, such as with Math.log, faulty hardware implementations forced temporary disabling of the intrinsic.
Modern ARM processors, especially in the Neoverse family, offer advanced SIMD (Single Instruction, Multiple Data) capabilities – executing the same operation on multiple data points within a single clock cycle. While the JVM has long used this implicitly via auto-vectorization in JIT compilation, the Vector API (currently JEP 508, in tenth incubation…) unlocks its full potential by giving Java developers explicit, programmatic access to ARM vector instructions like NEON and SVE/SVE2.
Platform Readiness: GraalVM, Garbage Collectors, and Native Dependencies (JNI)
ARM readiness in the Java ecosystem extends beyond the OpenJDK itself:
- GraalVM: This popular alternative JVM has offered full ARM64 support on Linux and macOS since version 21, including native images for Apple M1 processors. A notable limitation is the lack of plans for Windows on ARM support, which could be a blocker in some scenarios.
- Garbage Collectors: Standard GCs like G1, Shenandoah, and ZGC work well on ARM. Subtle differences in the memory model compared to x86 have been accounted for in their implementations. Cloud providers like AWS publish detailed JVM flag recommendations to optimize GC performance on ARM-based processors such as Graviton.
- JNI Challenges: The largest potential migration blocker is the Java Native Interface. Applications relying on native libraries (e.g., written in C/C++) require those libraries to be compiled specifically for AArch64. While active modern projects usually ship multi-architecture builds, older, unmaintained dependencies can make migration difficult or impossible without significant rework – however, there is always some kind of workaround available.
“Write Once, Run Anywhere” in a Multi-Architecture World
Java’s fundamental promise – write once, run anywhere – remains largely true in the context of ARM, provided the application is written entirely in Java. The same compiled JAR or WAR file can run without modification on a Java Virtual Machine (JVM) deployed on both traditional x86 hardware and modern ARM processors.
Containerization plays a critical role in reinforcing this portability. Docker, the de facto standard, abstracts away OS-level differences and dependencies, making a containerized Java application even less dependent on the target environment. This shifts the challenge from application portability to building and distributing container images that are compatible with the target architecture.
The tooling ecosystem has matured to meet the challenges of building applications for multiple architectures simultaneously. Key technologies include Docker Buildx and QEMU, though best practice remains to use native ARM64 build environments.
- Docker Buildx – An advanced Docker CLI plugin that extends
docker buildwith multi-architecture support. It can produce a single image manifest referencing separate architecture-specific layers (e.g.,linux/amd64andlinux/arm64). When pulling the image, the container runtime automatically selects the correct architecture variant, simplifying distribution and deployment. - QEMU – To build ARM images on x86 developer machines or CI/CD runners, Docker can leverage QEMU (Quick Emulator), a CPU emulator and instruction translator. This allows running binaries compiled for one architecture on another. For example, a developer on an Intel laptop can build and test a Docker image for AWS Graviton. The trade-off: emulation is significantly slower – CPU-intensive build steps (compilation, test execution) can be 5–15× slower compared to native execution.
- Native ARM64 Runners – To avoid emulation overhead, the most efficient approach is to build ARM images on native ARM64 machines. Leading CI/CD providers now offer native ARM64 runners, enabling builds and tests in production-like environments, reducing compatibility risks and speeding up pipelines.
The Art of Benchmarking: How to Accurately Measure ARM vs. x86 Performance
Comparing the performance of processors based on ARM and x86 architectures is a highly complex task. Simply stating that one architecture is “faster” than the other is misleading and fails to capture the full picture. On AWS alone, there are over 600 different instance types, and the performance of each depends not only on CPU architecture but also on processor generation, memory size, network bandwidth, and — most importantly — the characteristics of the workload itself.
Synthetic benchmarks, such as CoreMark, can offer some general insight into raw compute power, but their results don’t always translate into real-world performance. This is why it’s crucial to conduct tests on applications that reflect actual production workloads. In the context of Java, a good example is the Spring PetClinic application, which simulates a typical web app with database access, allowing for a more holistic evaluation of system performance.
Results Analysis: Review of Performance and Cost Benchmarks
Available data and tests indicate a clear advantage for modern ARM processors in cloud environments — especially when looking at the performance-to-price ratio.
Spring PetClinic on AWS: Tests run by myself (available here) on the standard Spring PetClinic application showed that ARM-based instances (AWS Graviton) deliver a 10–15% better cost-to-performance ratio compared to similarly priced x86 instances. Notably, this result was achieved without any additional JVM flag tuning or code optimizations.
Google Axion vs. Intel Xeon: Particularly compelling insights come from comparing “sibling” VM series in Google Cloud: C4 (based on Intel Emerald Rapids x86 processors) and C4A (based on Google’s custom ARM-based Axion processors). Both series run on the same modern Google Titanium hardware platform, meaning identical memory, disks, and networking. This setup isolates the impact of CPU architecture alone on performance. Benchmarks conducted by Phoronix showed a significant advantage for Axion instances, both in raw performance and energy efficiency.
Non-definitive, heuristic based performance/cost comparison on Google Cloud
| Instance Type | Arch | Hourly Price (us-central1) | CoreMark (4 vCPU) | CoreMark/$ | Index (vs N2) |
|---|---|---|---|---|---|
| N2-standard-4 | x86 | $0.1942 | 66 884 | 344k | 1.0× |
| T2A-standard-4 | ARM | $0.1540 | 94 096 | 612k | 1.78× |
| C4-standard-4 | x86 | $0.1977 | ≈131 000 | 663k | 1.92× |
| C4A-standard-4 | ARM | $0.1796 | ≈183 000 | 1 021k | 2.96× |
ARM instances (T2A, C4A) are cheaper and deliver significantly higher performance per dollar – up to nearly 3× better for Axion V2.
Wrap Up
The takeaway from all this? By 2025, running Java on ARM isn’t some bold experiment – it’s a smart, battle-tested choice that delivers real value.
Technical Maturity: The Java Virtual Machine (JVM) has fully caught up with hardware advancements. Since the first ports that merely allowed code execution, HotSpot has evolved to include advanced optimizations such as intrinsics, support for vector instructions (Vector API with SVE/SVE2), and modern garbage collection algorithms (Gen-ZGC). As a result, Java not only runs on ARM but fully leverages its potential, making it a true first-class citizen on this platform.
Tooling Ecosystem: The process of building and deploying multi-architecture applications has become largely automated and simplified. Tools such as Docker Buildx and the availability of native ARM64 runners in leading CI/CD systems mean that the “build once, run anywhere” paradigm now works in practice, eliminating the need for complex cross-compilation processes (except for specific JNI cases).
A Compelling Business Case: The combination of higher performance per cost, significant cloud bill savings, and tangible benefits for sustainability (ESG) strategies creates a strong and coherent business argument for migrating workloads to the ARM architecture.
Recommendations for Development Teams and Architects
The decision to migrate should be based on the characteristics of the specific workload and the organization’s business priorities. The following guidelines provide a decision-making framework for technical teams:
YES, if:
- Your services are heavily I/O-bound (e.g., web services, API gateways) or highly parallelized (e.g., data processing, queue-based systems).
- Cost optimization (FinOps) and carbon footprint reduction (ESG) are key performance indicators (KPIs) in your organization.
- You use commercial software licensed per virtual CPU (vCPU), where ARM’s physical cores can offer a cost advantage.
- You are running a modern JDK version (minimum 11.0.9, newer recommended) and GraalVM (on platforms other than Windows on ARM).
MAYBE (requires further analysis), if:
- Your project makes heavy use of native libraries via JNI. In this case, you must verify whether the library providers offer AArch64-compiled versions. If not, assess the cost and risk of recompiling them yourself or finding an alternative solution.
NOT ALWAYS, if:
- Your application critically depends on maximum single-thread performance, where some high-clocked x86 processors may still have an edge.
- Your application relies on specific, exotic instruction set extensions available only on x86, such as AVX-512. That said, modern ARM processors in the Neoverse family offer very high performance and may still be competitive even in these scenarios.
Running Java on the ARM architecture in 2025 is no longer a technological experiment – it has become a mature, highly efficient, and strategically profitable option for a wide range of cloud workloads. So do your own benchmarks and try it if you like the numbers 😊.