Java Performance Optimization with Agentic AI: Autonomous Diagnostics and Actionable Recommendations

Java performance diagnosis in production is manual, slow, and requires deep expertise. You get a Grafana alert, grab a thread dump, download a JFR recording, open it in JDK Mission Control, stare at flamegraphs, correlate with metrics — hours of work per incident. Most teams don’t have a dedicated performance engineer, so alerts get ignored or result in generic “add more replicas” responses.

We built a system that does this autonomously. When a monitoring alert fires, it collects profiling data and thread dumps, extracts runtime metrics, generates flamegraphs, and sends everything to an AI model that produces a structured performance report — including root cause analysis and concrete code fixes with file paths and line numbers from the actual source repository. The pipeline runs on Kubernetes, triggered by Grafana webhooks, with results stored in Amazon S3.

The sample application is available on GitHub [1], and the Java on AWS Workshop [2] walks through the full setup.

The four-phase approach

This isn’t a chatbot you ask questions to. It’s an event-driven autonomous system with four phases:

Phase 1 — Continuous profiling. async-profiler runs as a JVM agent, recording wall-clock profiles in JFR format every 30 seconds. Completed recordings are moved to Amazon S3 via the Mountpoint for Amazon S3 CSI driver.

Phase 2 — Metric collection and alerting. Spring Boot Actuator exposes Prometheus metrics. Grafana scrapes request rates and fires a webhook when thresholds are exceeded.

Phase 3 — Deterministic data extraction. Java code parses the binary JFR file into runtime metrics (CPU load, GC heap, JVM info), generates collapsed stacks using async-profiler’s own jfr-converter library, produces an HTML flamegraph, and collects a thread dump from the target pod.

Phase 4 — AI-powered root cause analysis. The extracted data goes to Amazon Bedrock via Spring AI. The model correlates collapsed stacks with thread states, optionally looks up source code from GitHub using tool calling, and produces a structured report with health assessment, findings, and prioritized recommendations.

The key architectural decision: deterministic code handles structured, repeatable work (parsing binaries, generating flamegraphs); the AI handles interpretation and correlation. You don’t want an LLM parsing binary files.

Continuous profiling with async-profiler

We chose wall-clock profiling over CPU profiling because wall-clock captures total execution time including waiting states — I/O waits, lock contention, thread parking. CPU profiling only shows active CPU execution. In production, the most impactful issues are often blocking calls invisible in CPU profiles but visible as wide bars in wall-clock profiles.

The container image bundles async-profiler:

RUN cd /tmp && \
    wget -q https://github.com/async-profiler/async-profiler/releases/download/v4.3/async-profiler-4.3-linux-x64.tar.gz && \
    mkdir /async-profiler && \
    tar -xzf ./async-profiler-4.3-linux-x64.tar.gz -C /async-profiler --strip-components=1

COPY --from=builder /async-profiler/ /async-profiler/

At runtime, the JVM starts with the profiler agent attached:

java -agentpath:/async-profiler/lib/libasyncProfiler.so=start,event=wall,file=/tmp/profile-%t.jfr,loop=30s \
     -jar -Dserver.port=8080 /store-spring.jar

The event=wall enables wall-clock sampling. loop=30s rotates the output every 30 seconds. JFR format is compact — 50 to 270 KB per snapshot — containing stack traces, CPU load, GC heap state, and JVM information.

A background loop moves completed JFR files to S3, skipping the latest file to avoid moving an in-progress recording (Mountpoint for Amazon S3 doesn’t support sequential writes):

while true; do sleep 10;
  newest=$(ls /tmp/profile-*.jfr 2>/dev/null | sort -r | head -1);
  for f in /tmp/profile-*.jfr; do
    [ -f "$f" ] || continue;
    [ "$f" = "$newest" ] && continue;
    mv "$f" /s3/profiling/$HOSTNAME/;
  done;
done &

The Mountpoint for Amazon S3 CSI driver presents the Amazon S3 bucket as a local filesystem mount, so the mv command writes directly to S3 without any SDK calls from the application.

Deterministic data extraction pipeline

Before the AI model sees anything, Java code extracts structured data from the binary JFR recording. JfrParser uses the built-in jdk.jfr.consumer.RecordingFile API to iterate through events:

try (var recording = new RecordingFile(jfrFile)) {
    while (recording.hasMoreEvents()) {
        RecordedEvent event = recording.readEvent();
        switch (event.getEventType().getName()) {
            case "profiler.WallClockSample", "jdk.ExecutionSample" -> totalSamples++;
            case "jdk.CPULoad" -> cpuLoads.add(new CpuLoad(
                getDouble(event, "jvmUser"), getDouble(event, "jvmSystem"),
                getDouble(event, "machineTotal")));
            case "jdk.GCHeapSummary" -> gcHeaps.add(new GcHeap(
                getLong(event, "heapUsed"), committed));
            case "jdk.JVMInformation" -> jvmInfo = getString(event, "jvmVersion")
                + " | args: " + getString(event, "jvmArguments");
        }
    }
}

For stack trace analysis, we delegate to async-profiler’s own jfr-converter library (tools.profiler:jfr-converter:4.3). It produces two outputs: collapsed stacks (text format for the AI model) and an HTML flamegraph (interactive visualization for humans):

// Collapsed stacks — one line per unique stack with sample count
var args = new Arguments("--wall", jfrFile.toString(), tempCollapsed.toString());
args.output = "collapsed";
JfrToFlame.convert(jfrFile.toString(), tempCollapsed.toString(), args);

// HTML flamegraph with optional include filter for application frames
var args = new Arguments("--wall", "--inverted", "--include", includePattern,
    jfrFile.toString(), tempHtml.toString());
JfrToFlame.convert(jfrFile.toString(), tempHtml.toString(), args);

Collapsed stacks are better model input than raw JFR or flamegraph images: they’re text (no vision model needed), have correct frame attribution from async-profiler’s own logic, and include sample counts for quantifying relative impact. The HTML flamegraph gets an optional --include filter (e.g., .*unicorn.*) to focus on application frames for human viewing, while collapsed stacks sent to the model remain unfiltered.

Thread dumps come from Spring Boot Actuator’s /actuator/threaddump endpoint via HTTP. If the pod is unreachable, analysis continues without it — JFR data alone is enough for useful results.

The AI analysis engine

AiService uses Spring AI’s ChatClient to send the extracted data to Amazon Bedrock. The system prompt establishes the AI’s role:

private static final String SYSTEM_PROMPT = """
    You are a Java performance engineer. You receive two data sources \
    from a production Spring Boot application running on Amazon EKS: \
    a JFR profiling summary (collapsed stacks, CPU load, GC heap, JVM info) \
    and a thread dump snapshot. Both were captured around the time \
    a monitoring alert fired. Analyze the data and report what you find.""";

The user prompt is constructed from three sections: JFR runtime metrics (CPU load averages, GC heap ranges, JVM version and arguments), collapsed stacks with sample counts, and the thread dump. The prompt requests a structured output format — Health Assessment, Findings, and Recommendations — with instructions to cite specific methods, thread names, and numbers.

What makes this more than a simple prompt-and-response is tool calling. When a GitHub repository URL is configured, AiService registers a GitHubSourceCodeTool with the ChatClient:

if (!repoUrl.isBlank()) {
    var tool = new GitHubSourceCodeTool(repoUrl, token, repoPath);
    builder.defaultTools(tool);
}

The tool uses Spring AI’s @Tool annotation:

@Tool(description = "Fetch a source code file from the application GitHub repository. " +
      "Provide the path relative to the application root, e.g. " +
      "src/main/java/com/example/MyClass.java")
public String fetchSourceCode(String filePath) {
    var fullPath = repoPath.isEmpty() ? filePath : repoPath + "/" + filePath;
    var json = restClient.get()
        .uri("/contents/{path}", fullPath)
        .retrieve()
        .body(String.class);
    var node = MAPPER.readTree(json);
    var encoded = node.get("content").asText();
    return new String(Base64.getMimeDecoder().decode(encoded));
}

During analysis, the model autonomously decides which source files to look up based on methods it finds in the collapsed stacks and thread dump. Spring AI handles the tool calling protocol with Amazon Bedrock automatically. The result: instead of generic “consider making this async,” you get recommendations that reference UnicornService.java:87 with the current blocking code and a concrete replacement.

Each analysis produces 5 correlated artifacts stored in S3: the raw JFR binary (for re-analysis), profiling summary with collapsed stacks, thread dump snapshot, interactive HTML flamegraph, and the AI-generated performance report. The report includes a health assessment, findings correlating collapsed stacks with thread states, and prioritized recommendations with concrete code fixes referencing specific source files and line numbers:

Event-driven orchestration

The pieces connect through Grafana webhooks on Kubernetes. When the HTTP POST request rate exceeds the configured threshold, Grafana sends a webhook to the analyzer’s /webhook endpoint with the pod name and IP address extracted from Prometheus labels.

WebhookController validates the alert and hands it to AnalyzerService, which runs the analysis pipeline asynchronously using virtual threads — the webhook returns immediately with an acknowledgment:

private final ExecutorService asyncExecutor = Executors.newVirtualThreadPerTaskExecutor();

public WebhookResponse processAlerts(List<Alert> alerts) {
    for (var alert : alerts) {
        asyncExecutor.submit(() -> processAlert(alert));
    }
    return new WebhookResponse("Accepted alerts for processing", alerts.size());
}

The pipeline includes retry logic for in-progress JFR files — up to 10 retries with 30-second delays, checking that the JFR has actual samples (a file with 0 samples means the profiler hasn’t finished writing). Each analysis produces 5 correlated artifacts stored in S3: the raw JFR binary (for re-analysis), profiling summary with collapsed stacks (model input), thread dump snapshot, interactive HTML flamegraph, and the AI-generated performance report. All files share a datetime prefix extracted from the JFR filename for easy correlation.

Deployment and security

The analyzer runs as a Spring Boot 4 application on Java 25 with virtual threads enabled. The container image is built with Jib — no Docker daemon needed, builds directly from Maven:

mvn compile jib:build -Dimage=${ECR_URI}:latest

On Amazon EKS, the service runs in a monitoring namespace with EKS Pod Identity providing Amazon S3 and Amazon Bedrock access — no static credentials, no IAM access keys. With Amazon EKS Auto Mode, compute, storage, and networking are fully managed, so you focus on the workload rather than cluster infrastructure. The deployment is a standard Kubernetes manifest with a ClusterIP service (internal only), health probes via Actuator, and resource limits of 1 CPU / 2 GB memory.

env:
- name: SPRING_AI_BEDROCK_CONVERSE_CHAT_OPTIONS_MODEL
  value: "global.anthropic.claude-sonnet-4-5-20250929-v1:0"
- name: GITHUB_REPO_URL
  value: "https://api.github.com/repos/aws-samples/java-on-aws"
- name: GITHUB_REPO_PATH
  value: "apps/unicorn-store-spring"
- name: FLAMEGRAPH_INCLUDE
  value: ".*unicorn.*"

The global. prefix on the model ID enables Amazon Bedrock cross-region inference — requests route to the nearest available region automatically, improving availability and reducing latency.

What we learned

The deterministic/AI boundary matters. Don’t ask an LLM to parse binary data or generate flamegraphs. jdk.jfr.consumer and jfr-converter are deterministic, fast, and correct. The AI’s job is interpretation — connecting patterns across data sources that would take a human hours to correlate.

Collapsed stacks are the right model input. We considered flamegraph images (multimodal), raw JFR events, and thread dumps alone. Collapsed stacks won: text format (no vision model needed), correct frame attribution from async-profiler, sample counts for quantifying impact, compact enough to fit alongside thread dumps in a single prompt.

Source code tool calling transforms output quality. Without it: “consider making this async.” With it: “In UnicornService.java:87, replace publisher.publish().get() with a non-blocking whenComplete callback.” The difference between a suggestion and a pull request.

Wall-clock profiling overhead is minimal. ~1-2% in production, acceptable for continuous profiling. The 30-second rotation balances granularity against storage cost.

Virtual threads simplify orchestration. Webhook returns immediately, analysis runs in background. No thread pool sizing, no reactive complexity. Executors.newVirtualThreadPerTaskExecutor() and you’re done.

What’s next

The current system analyzes each alert independently. Two improvements would make it significantly more useful.

Knowledge base with internal optimization practices. Every organization has internal performance guidelines — connection pooling settings, virtual thread best practices, database tuning parameters. Integrating a RAG (Retrieval-Augmented Generation) knowledge base would ground the AI’s recommendations in your team’s actual standards rather than generic advice. Amazon Bedrock Knowledge Bases provide fully managed RAG — connect an S3 bucket with your performance runbooks, and the model retrieves relevant context during analysis. Spring AI’s QuestionAnswerAdvisor makes this a configuration change, not a rewrite. We documented the integration pattern in RAG Made Serverless: Amazon Bedrock Knowledge Base with Spring AI.

Memory for cross-analysis correlation. Currently each analysis is stateless — the model doesn’t know that the same blocking call showed up in three analyses last week, or that GC pressure has been trending upward since the last deployment. Adding memory would let the agent correlate findings across analyses, detect recurring patterns, and track whether previous recommendations were effective. Amazon Bedrock AgentCore provides managed memory with event memory for analysis history and semantic memory for long-term pattern extraction. We covered the memory integration approach in AI Agent Memory Made Easy: Amazon Bedrock AgentCore Memory with Spring AI.

Together, these additions would transform the system from a per-incident diagnostic tool into a continuous performance advisor that learns your application’s behavior over time.


References:

[1] GitHub: https://github.com/aws-samples/java-on-aws

[2] Workshop: https://catalog.workshops.aws/java-on-aws

[3] Spring AI: https://docs.spring.io/spring-ai/reference/

[4] Amazon Bedrock: https://aws.amazon.com/bedrock/

[5] async-profiler: https://github.com/async-profiler/async-profiler

Want to Dive Deeper?
Yuriy Bezsonov and Sascha Möllering are speakers at JCON!
This article covers the topic of their JCON talk. If you can’t attend live, the session video will be available after the conference – it’s worth checking out!

Total
0
Shares
Previous Post

Talk to Your Data: Natural Language Data Access in Java

Next Post

BoxLang AI v3 released – Multi-Agent Orchestration, Tooling, Skills and so much more

Related Posts