Cloud-native development has become the cornerstone of modern application development, with containerization leading the charge in deployment strategies. For Java developers and enterprises with significant Java investments, the journey to cloud-native architecture presents both opportunities and challenges. This article explores how Amazon EKS, combined with cutting-edge technologies like Coordinated Restore at Checkpoint (CRaC) and Quarkus, is revolutionizing Java application deployment in the cloud.
Introduction to Amazon Elastic Kubernetes Service
Amazon Elastic Kubernetes Service (Amazon EKS) is a fully managed Kubernetes service that enables customers to run Kubernetes seamlessly in both AWS Cloud and on-premises data centers. In the cloud, Amazon EKS automates Kubernetes cluster infrastructure management. This is essential for scheduling containers, managing application availability, dynamically scaling resources, optimizing compute, storing cluster data, and performing other critical functions. With Amazon EKS, customers can leverage the robust performance, scalability, reliability, and availability of AWS infrastructure, as well as natively integrate with AWS networking, security, and storage services. To simplify running Kubernetes in on-premises environments, customers can use the same Amazon EKS clusters, features, and tools to run nodes on AWS Outposts or their own infrastructure, or customers can use Amazon EKS Anywhere for self-contained, air-gapped environments.
EKS Auto Mode: The Next Evolution
With the Amazon EKS Auto Mode, customers can automate cluster management without deep Kubernetes expertise, as it selects optimal compute instances, dynamically scales resources, continuously optimizes costs, manages core add-ons, patches operating systems, and integrates with AWS security services. AWS expands its operational responsibility in EKS Auto Mode compared to customer-managed infrastructure in their EKS clusters. In addition to the EKS control plane, AWS will configure, manage, and secure the AWS infrastructure in EKS clusters that customer’s applications need to run.

Customers can now get started quickly, improve performance, and reduce overhead, enabling them to focus on building applications that drive innovation instead of on cluster management tasks. EKS Auto Mode also reduces the work required to acquire and run cost-efficient GPU-accelerated instances so that generative AI workloads have the capacity they need when they need it. It automatically launches EC2 instances based Bottlerocket OS and AWS Elastic Load Balancing (ELB), and provisions Amazon Elastic Block Store (Amazon EBS) volumes inside user AWS accounts and user-provided VPCs when customers deploy their applications. EKS Auto Mode launches and manages the lifecycle of these EC2 instances, scaling and optimizing the data plane as application requirements change during run time, and automatically replacing any unhealthy nodes.
Challenges of Java in Containers
Running Java applications in containers presents a complex web of challenges that demand careful consideration and configuration. Memory management stands as perhaps the most critical concern – prior to Java 10, the JVM couldn’t accurately detect container memory limits, leading to potential out-of-memory errors as it based calculations on host resources rather than container constraints. This complexity is compounded by native memory overhead from thread stacks, direct buffers, and JVM internal structures, which must be accounted for alongside heap allocation.
Off-heap memory management adds another layer of complexity, as tools like Netty or memory-mapped files can bypass heap limits but still count against container memory limits. Startup time poses another significant challenge, particularly in dynamic container environments that demand rapid scaling. The time-consuming process of class loading, especially in large applications with numerous dependencies, can lead to slow container initialization. This is exacerbated by initial heap sizing decisions and JIT compilation overhead, where the tradeoff between startup speed and runtime performance becomes crucial. We will specifically address this challenge in the course of the article.
Resource utilization presents its own set of challenges – CPU throttling in containerized environments can affect Java’s ability to optimize code execution, while memory swapping can severely impact performance if not properly managed or disabled. I/O constraints can become particularly problematic for Java applications that rely heavily on file operations or network communication, as container limitations may not be immediately apparent to the application. The traditional “fat JAR” approach commonly used in Java applications results in larger container images, increasing deployment times and resource consumption. Container images are created using layers. Layered JAR files separate the application and its dependencies so that each part can be stored in a dedicated container image layer. This has the advantage that the cached layers can be reused during the build of the application, which significantly speeds up the rebuild of the container image. This can also have an impact on startup time when using technologies like Seekable OCI (SOCI). SOCI is a technology open sourced by AWS that enables containers to launch faster by lazily loading the container image. SOCI works by creating an index (SOCI Index) of the files within an existing container image. This index is a key enabler to launching containers faster, providing the capability to extract an individual file from a container image before downloading the entire archive.
Furthermore, Java’s garbage collection behavior can cause unexpected pauses, potentially violating container orchestrators’ health checks and leading to unnecessary pod restarts. These challenges become even more pronounced in microservices architectures, where multiple Java containers may compete for resources on the same host, requiring careful tuning of both JVM parameters and container resource limits to achieve optimal performance and reliability.
Better Performance with CRaC and Warp
Coordinated Restore at Checkpoint (CRaC), an innovative OpenJDK project spearheaded by Azul, represents a significant breakthrough in addressing Java’s notorious startup time challenges. By capturing the state of a warmed-up Java application and JVM at any given moment (“checkpoint”), CRaC enables applications to restart from this preserved state, effectively bypassing the traditional time-consuming initialization process. This checkpoint capability, which can be integrated as an additional step in container images builds, dramatically improves startup performance by restoring the application to its optimized state immediately. However, this powerful feature comes with important considerations – checkpoint files may contain sensitive data, and stateful elements like file handles and network connections require careful handling during restoration. While Spring Boot provides native support for CRaC, external libraries may need additional implementation work. For instance, applications using the AWS SDK for Java V2 must implement custom logic to rebuild connections after restore. CRaC addresses these challenges through its Resource interface, which provides beforeCheckpoint() and afterRestore() callbacks, allowing developers to manage state preservation and restoration effectively across their application components.
AWS has a demo application to demonstrate how to use CRaC in combination with the AWS SDK for Java. The application called UnicornStore used in this implementation interacts with Amazon EventBridge through the AWS SDK. First, a client is created, then this client is used for performing operations on a EventBridge. Each client maintains its own HTTP connection pool. To capture the checkpoint, the connections in the pool (network connections) needs to be closed — this is achieved by closing the client in beforeCheckpoint() method, and re-creating it in afterRestore().
The code snippet below shows how UnicornPublisher class is altered to handle CRaC requirements for network connections through org.crac.Resource-interface:
public class UnicornPublisher implements Resource {
...
@PostConstruct
public void init() {
createClient();
Core.getGlobalContext().register(this);
}
@Override
public void beforeCheckpoint(Context<? extends Resource> context) throws Exception {
logger.info("Executing beforeCheckpoint...");
closeClient();
}
@Override
public void afterRestore(Context<? extends Resource> context) throws Exception {
logger.info("Executing afterRestore ...");
createClient();
}
private void createClient() {
logger.info("Creating EventBridgeAsyncClient");
eventBridgeClient = EventBridgeAsyncClient
.builder()
.credentialsProvider(DefaultCredentialsProvider.create())
.build();
}
public void closeClient() {
logger.info("Closing EventBridgeAsyncClient");
eventBridgeClient.close();
}
...
}
Spring has built-in CRaC support since version 6.1 (and Spring Boot since version 3.2), which means, among other things, that CRaC is integrated into the Spring Lifecycle (more information on this can be found here ).
With this approach, it’s possible to solely snapshot the framework code but not the application code. The implication of this approach is of course that it’s not necessary to change the application at all if you just use Spring Boot functionalities. With the automatic checkpointing, we don’t have a fully warmed up JVM that is going to be snapshotted which means that the startup time is a bit slower compared to the manual snapshotting approach. The following Dockerfile shows a compact multi-stage build-approach to create a container image that is using a CRaC snapshot.
FROM azul/zulu-openjdk:21-jdk-crac-latest AS builder
RUN apt-get -qq update && apt-get -qq install -y curl maven
ARG SPRING_DATASOURCE_URL
ENV SPRING_DATASOURCE_URL=$SPRING_DATASOURCE_URL
ARG SPRING_DATASOURCE_PASSWORD
ENV SPRING_DATASOURCE_PASSWORD=$SPRING_DATASOURCE_PASSWORD
COPY ./pom.xml ./pom.xml
COPY src ./src/
# Build the application
RUN mvn clean package -ntp && mv target/store-spring-1.0.0-exec.jar store-spring.jar
# Run the application and take a checkpoint
RUN <<END_OF_SCRIPT
#!/bin/bash
java -Dspring.context.checkpoint=onRefresh -Djdk.crac.collect-fd-stacktraces=true \
-XX:CRaCEngine=warp -XX:CPUFeatures=generic -XX:CRaCCheckpointTo=/opt/crac-files -jar /store-spring.jar & PID=$!
wait $PID || true
END_OF_SCRIPT
FROM azul/zulu-openjdk:21-jdk-crac-latest AS runner
RUN apt-get -qq update && apt-get -qq install -y adduser
RUN addgroup --system --gid 1000 spring
RUN adduser --system --disabled-password --gecos "" --uid 1000 --gid 1000 spring
COPY --from=builder --chown=1000:1000 /opt/crac-files /opt/crac-files
COPY --from=builder --chown=1000:1000 /store-spring.jar /store-spring.jar
USER 1000:1000
EXPOSE 8080
# Restore the application from the checkpoint
CMD ["java", "-XX:CRaCEngine=warp", "-XX:CRaCRestoreFrom=/opt/crac-files"]
The containerization process for CRaC-enabled Java applications follows a multi-stage build approach utilizing Azul’s zulu-openjdk with CRaC support as the foundation for both builder and runner stages. In the initial builder stage, the process compiles and packages the Unicorn Store application into a JAR file, then executes the application to generate a checkpoint of its warmed-up state. This crucial step captures the optimized runtime state of the JVM and application. The second stage establishes a clean runtime environment, where both the checkpoint files and the application JAR are copied from the builder stage and configured with appropriate permissions for the unprivileged ‘spring‘ user, ensuring security best practices. Finally, the container is configured to restore the application directly from the checkpoint files at startup, enabling rapid initialization by bypassing the traditional JVM warm-up phase and achieving near-instant application readiness.
The shell script starts the application with the JVM options -Dspring.context.checkpoint=onRefresh and -XX:CRaCCheckpointTo=/opt/crac-files/. As already indicated, checkpoint is created automatically at startup during the LifecycleProcessor.onRefresh phase.
With the parameter -XX:CRaCEngine=warp we’ve specified a specific engine called Warp. Warp is a new engine available in Azul Zulu builds that can fully replace CRIU (Checkpoint/Restore In Userspace), and does not require any extra privileges. This means, it’s not necessary to add additional permissions for the Kubernetes deployment and a second benefit is that Warp is faster than the CRIU based engine for the demo application. If you want to learn more about Warp, you can read the following third-party blog post .
Kubernetes-native Java with Quarkus
Quarkus, designed as a Kubernetes-native Java framework, provides a performant foundation for containerized applications on Amazon EKS. Its container-first philosophy results in significantly faster startup times and lower memory footprint compared to traditional Java applications. When combined with Mandrel, Red Hat’s downstream distribution of GraalVM, developers can compile their Quarkus applications to native executables that start in milliseconds and consume minimal memory – characteristics particularly valuable in containerized environments. On Amazon EKS, these native executables enable more efficient resource utilization, faster scaling operations, and reduced costs as more containers can be packed onto each node. Mandrel’s compatibility with Quarkus ensures a stable and supported path to native compilation while maintaining access to key AWS services through Quarkus extensions. This combination delivers a production-ready stack for modern, cloud-native Java applications that fully leverage the orchestration capabilities of Amazon EKS while minimizing the traditional overhead associated with Java in containers.
The following section demonstrates how to create a Quarkus project with quarkus-maven-plugin:
mvn io.quarkus.platform:quarkus-maven-plugin:create \
-DprojectGroupId=com.amazon \
-DprojectArtifactId=quarkus-eks \
-DclassName="com.example.GreetingResource" \
-Dextensions="quarkus-container-image-jib,quarkus-kubernetes"
The quarkus-kubernetes extension helps generate Kubernetes manifests automatically. In the next step, we can package the application as a JVM-based container image using Jib:
mvn package -Dquarkus.container-image.build=true \
-Dquarkus.container-image.name=quarkus-eks \
-Dquarkus.container-image.tag=latest \
-Dquarkus.container-image.registry=<your-ecr-repo> \
-Dquarkus.container-image.push=true \
-Dquarkus.container-image.group=javapro
After we’ve built the image, we can push it to Amazon Elastic Container Registry (ECR), change the image path in the generated Kubernetes YAML file and deploy to EKS by applying Kubernetes YAML files:
kubectl apply -f target/kubernetes/kubernetes.yml
kubectl rollout status deployment quarkus-eks
By compiling the application to a native executable, we achieve faster cold starts and lower memory usage, ideal for autoscaling in Kubernetes. First we have to build our application using the native profile defined in pom.xml.
mvn package -Dnative
Now we can build and push the native image to ECR:
docker build -f src/main/docker/Dockerfile.native -t <your-container-image>:latest .
docker push <your-container-image>:latest
And finally update and apply the Kubernetes deployment manifest.
Performance results
At AWS, we’ve developed a comprehensive “Java on AWS”-Immersion Day, designed to help developers navigate the diverse landscape of Java deployments in cloud environments. This hands-on experience focuses particularly on demonstrating various approaches to running Java workloads on AWS infrastructure, addressing common challenges and modern solutions. One significant component of the program delves into container optimization techniques. This section explores various strategies to enhance Java applications in containerized environments, helping developers achieve better resource utilization, faster startup times, and more efficient deployments while maintaining application performance and reliability.
Our optimization journey began with an unmodified container image as our baseline. From there, we gradually implemented various optimization techniques to enhance performance. We utilized Jlink and Jdeps to create a custom runtime environment, incorporated Jib for improved container builds, implemented Class Data Sharing (CDS) through archive creation, employed CRaC for container snapshots, and ultimately integrated GraalVM native compilation. Each of these steps was thoroughly tested on Amazon EKS, with measurements focusing on two critical metrics: the resulting image size and application startup time.
| Version | Image Size | Start time (p99) |
| No optimization | 351MB | 6.459s |
| Custom JRE | 221MB | 6.019s |
| Jib | 231MB | 5.71s |
| CDS | 633MB | 3.194s |
| GraalVM | 460MB | 0.581s |
| CRaC | 412MB | 0.085s |
We can see very clearly from the table that different optimizations lead to different results. Custom JRE and Jib shows a significant reduction in the size of the container image. In terms of startup times, GraalVM and CRaC clearly stand out with less than one second, for well-known reasons.
Summary
As containerization becomes increasingly central to modern application development, Java developers face unique challenges in optimizing their applications for cloud environments. This comprehensive guide explores how Amazon EKS, combined with cutting-edge technologies like CRaC, Quarkus, and GraalVM, is transforming Java deployment in the cloud. To leverage these advances in your own applications, start by evaluating your current containerization strategy against our benchmark results, explore EKS Auto Mode for simplified cluster management, and consider implementing CRaC or Quarkus in combination with GraalVM for applications requiring rapid startup times.