Java developers have been building enterprise applications for decades, but when it comes to AI, the conversation has been dominated by Python. Spring AI changes this situation. It brings the same patterns we know from Spring – dependency injection, auto-configuration, portable abstractions – to AI development.
In January 2026 we built a sample agent application, and we want to share what we learned. The sample AI agent handles conversations with memory, answers questions from internal knowledge bases, uses external APIs, and integrates with existing microservices – all running on Amazon Bedrock. If you want to try it yourself, the Building AI Agents with Java and Spring AI workshop [1] walks you through typical challenges of Generative AI models and provides step by step solutions with code available on GitHub [2].

Choosing the right model
Before writing any code, you need to pick a model. Amazon Bedrock gives you access to foundation models from Anthropic, Amazon, Meta, and others through a unified API. The choice matters – it affects how your agent behaves, how much you pay, and how fast it responds.
Some models only process text, while multimodal models like Anthropic Claude and Amazon Nova 2 can analyze images, PDFs, and understand visual context. If your agent needs to process expense receipts or read documents, you’ll want multimodal capabilities.
Beyond modalities, think about the trade-off between intelligence, speed, and cost:
- Anthropic Claude Opus works best when intelligence is your top priority. Complex tasks requiring strong reasoning capabilities. You’re choosing quality over speed and cost.
- Anthropic Claude Haiku shines when speed matters most. Real-time user interactions or high-volume processing where you need the fastest possible responses.
- Anthropic Claude Sonnet hits the sweet spot for most applications – good intelligence, reasonable speed, manageable cost.
- Amazon Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads. It offers industry-leading price performance and helps enterprises and developers build capable, reliable, and efficient agentic-AI applications.
- Amazon Nova 2 Pro is the most intelligent Nova model, designed for highly complex, multistep tasks like multi-document analysis and software migrations.
We started development with Anthropic Claude Sonnet 4 and found it worked well for our agent use case. For production, we’d profile the actual workload and consider Nova 2 Lite if cost optimization becomes important.
spring.ai.bedrock.converse.chat.options.model=global.anthropic.claude-sonnet-4-20250514-v1:0
ChatClient: the foundation
The ChatClient interface is where everything comes together in Spring AI. It’s a unified API for talking to AI models – one interface that works with Amazon Bedrock, OpenAI, Azure OpenAI, Google Vertex AI, or local Ollama. You can start development with Ollama locally (free, private, no API costs), validate against OpenAI, and deploy to Amazon Bedrock for production – same code, different configuration.
What makes ChatClient powerful is the Advisor pattern. Advisors handle cross-cutting concerns like memory, Retrieval-Augmented Generation (RAG ), and logging without cluttering your business logic. Think of them as middleware for AI conversations – each advisor intercepts requests and responses, modifying prompts or processing outputs.
Here’s what our ChatClient configuration looks like after adding all the capabilities:
this.chatClient = chatClientBuilder
.defaultSystem(DEFAULT_SYSTEM_PROMPT)
.defaultAdvisors(
MessageChatMemoryAdvisor.builder(chatMemory).build(),
QuestionAnswerAdvisor.builder(vectorStore).build()
)
.defaultTools(new DateTimeTools(), new WeatherTools())
.defaultToolCallbacks(mcpTools)
.build();
One configuration gives you conversation memory, RAG-powered knowledge retrieval, custom tools, and Model Context Protocol (MCP) integrations. The advisor chain runs in order – memory loads history, RAG retrieves relevant documents, then the enriched prompt hits the model with all tools available.
Conversation memory
When we first tested our agent, users would introduce themselves, ask a few questions, then say “What’s my name?” – and the agent had no idea. Every request was a blank slate. That’s how AI models work: they’re stateless. Your application needs to manage conversation history and feed it back with each request.
Spring AI handles this with MessageChatMemoryAdvisor. You configure a memory store, wire it into the ChatClient, and the advisor automatically loads conversation history before each request and saves new messages after.
For storage, you have options: InMemory for development, JDBC for production with existing databases, Redis for distributed caching, Cassandra for scale. We went with JDBC and PostgreSQL since most Spring applications already have a relational database – no new infrastructure to manage.
var chatMemory = MessageWindowChatMemory.builder()
.chatMemoryRepository(JdbcChatMemoryRepository.builder()
.dataSource(dataSource)
.dialect(new PostgresChatMemoryRepositoryDialect())
.build())
.maxMessages(20)
.build();
For production applications with longer conversations, consider a three-tier memory architecture: session memory for recent messages (around 20), context memory for conversation summaries, and preferences memory for user profile data. This gives users continuity without burning tokens on full history. We documented this pattern in detail [8] if you want to implement it.
There’s also an easier path if you’re deploying to Amazon Bedrock AgentCore [5]. AgentCore provides fully managed memory – event memory for conversation history with automatic retention, and semantic memory for long-term knowledge extraction. Add the spring-ai-bedrock-agentcore-starter [6] and memory works out of the box, no custom code required.
Knowledge with RAG
Ask a generic AI model about your company’s travel policy limits and you’ll get a confident, plausible, completely wrong answer. The model hallucinates because it’s working from general training data, not your actual documents.
RAG (Retrieval-Augmented Generation) solves this by grounding responses in real content. During ingestion, documents get chunked, converted to vector embeddings, and stored. At query time, the user’s question becomes an embedding, similar chunks get retrieved through semantic search, and those chunks go into the prompt as context. The model generates responses based on your actual documents instead of guessing.
Spring AI’s VectorStore abstraction supports PGVector, OpenSearch, Pinecone, Weaviate, Milvus, and more. We used PGVector since we already had PostgreSQL for memory – same database, no extra services. The QuestionAnswerAdvisor handles retrieval automatically once configured:
spring.ai.bedrock.titan.embedding.model=amazon.titan-embed-text-v2:0
spring.ai.vectorstore.pgvector.initialize-schema=true
spring.ai.vectorstore.pgvector.dimensions=1024
Spring AI 2.0 brings support for Amazon Bedrock Knowledge Bases – fully managed RAG where AWS handles document chunking, embeddings, and vector storage. You connect S3, Confluence, or SharePoint as data sources, and get advanced features like hybrid search (combining semantic and keyword matching), reranking, and metadata filtering. The same `QuestionAnswerAdvisor` pattern works, just with a different VectorStore implementation. We wrote a hands-on guide “RAG Made Serverless – Amazon Bedrock Knowledge Base with Spring AI” [9] if you want to try it.
Tools and MCP
Our agent could remember conversations and answer questions from documents, but it couldn’t tell users the weather for their trip or even what day it was. AI models have knowledge cutoffs – they don’t know today’s date, current weather, or what’s in your database.
Tool calling bridges this gap. You define functions the AI can call, and the model decides when to use them based on the conversation. Spring AI’s @Tool annotation makes this straightforward:
@Tool(description = """
Get weather forecast for a city on a specific date.
Use for answering questions about weather forecasts.
""")
public String getWeather(
@ToolParam(description = "City name, such as Paris, London, New York") String city,
@ToolParam(description = "Date in YYYY-MM-DD format") String date) {
// Call weather API and return formatted result
}
The description matters more than you might expect. The AI reads it to decide which tool to call, with what parameters, in what order. Vague descriptions lead to wrong tool selection; precise ones enable accurate autonomous behavior.
For larger systems, hardcoding every API into your agent doesn’t scale well. Different teams own different services with their own release cycles. You don’t want to redeploy the agent every time the booking team adds an endpoint.
Model Context Protocol (MCP) [7] standardizes how agents discover and use tools. MCP servers expose tools through a standard protocol; clients discover them at runtime. Spring AI’s MCP server starter turns existing microservices into AI-accessible tools – add the dependency, annotate methods with @Tool, and your service becomes an MCP server. The booking team maintains their tools, HR maintains theirs, and your agent discovers everything at startup without code changes.
@Tool(description = """
Find flights between two cities.
Requires: departureCity - Name of the departure city,
arrivalCity - Name of the arrival city.
Returns: List of available flights sorted by price from lowest to highest.)
public List<Flight> findFlightsByRoute(String departureCity, String arrivalCity) {
return flightService.findFlightsByRoute(departureCity, arrivalCity);
}
One thing we learned: agents with tools often have large system prompts and tool definitions that are sent with every request. Amazon Bedrock prompt caching (available for Anthropic models) reduces costs by caching this repeated content:
spring.ai.bedrock.converse.chat.options.cache-options.strategy=SYSTEM_AND_TOOLS
Structure your prompts with stable content first (system instructions, tool definitions) and user-specific content last. Cache hits can reduce costs significantly on the cached portions.
Deploying to AWS
Spring AI agents are standard Spring Boot applications packaged as container images. They run anywhere containers run. On AWS, you have several options depending on how much infrastructure you want to manage.
Amazon Elastic Kubernetes Service (Amazon EKS) gives you Kubernetes flexibility with Amazon EKS Auto Mode fully automates Kubernetes cluster management for compute, storage, and networking on AWS with a single click. Good fit if your team already knows Kubernetes and needs fine-grained control over networking and scaling.
Amazon Elastic Container Service (Amazon ECS) offers serverless containers with AWS Fargate. Amazon ECS Express Mode automates infrastructure setup including domains, networking, and load balancing – turnkey deployment when you want to get running quickly. Amazon ECS Managed Instances provide more control over instance types and pricing (Spot, Reserved) while ECS handles provisioning.
AWS Lambda works well for event-driven workloads with sporadic traffic patterns where you want to scale to zero.
Amazon Bedrock AgentCore is a fully managed agent runtime with built-in memory, code interpreter, browser tools, gateway, and observability. The Spring AI AgentCore starter [6] provides adapters for running Spring AI applications on Amazon Bedrock AgentCore with just one annotation @AgentCoreInvocation:
@Service
public class InvocationService {
@AgentCoreInvocation
public Flux<String> handleInvocation(InvocationRequest request, AgentCoreContext context) {
// Parse JWT, extract user identity for memory isolation
...
return chatService.chat(request.prompt(), sessionId);
}
}
AgentCore integrates with Amazon Cognito for user authentication (OAuth 2.0). JWT tokens are validated automatically, and user identity flows through to your application for memory isolation and audit logging. VPC mode enables access to databases and internal services. IAM roles provide least-privilege access to Bedrock models.
The same Spring AI application code works across all these deployment targets. The difference is how much infrastructure you want to manage versus how much you want AWS to handle for you. For the fastest path to production with built-in agent capabilities, Amazon Bedrock AgentCore is hard to beat.
Wrapping up
Spring AI gives Java developers vendor-neutral abstractions for building AI agents – ChatClient for model access, ChatMemory for stateful conversations, VectorStore for RAG, Tools and MCP for real-time data and integrations. The patterns feel familiar if you’ve worked with Spring before.
Model selection matters more than we initially expected. We started with Anthropic Claude Sonnet and it worked well, but profiling your actual workload helps you optimize – Amazon Nova 2 Lite for cost-effective reasoning, Anthropic Claude Haiku for speed, Anthropic Claude Opus when you need maximum intelligence.
For deployment, Amazon Bedrock AgentCore provides the fastest path to production with built-in memory, security, and observability. When you need more control, the same code runs on Amazon EKS, Amazon ECS, or AWS Lambda.
We hope this gives you a good starting point. The workshop [1] walks through everything in detail if you want to build it yourself.
References:
[1] Workshop: https://catalog.workshops.aws/java-spring-ai-agents [2] GitHub: https://github.com/aws-samples/java-on-aws [3] Spring AI: https://docs.spring.io/spring-ai/reference/ [4] Amazon Bedrock: https://aws.amazon.com/bedrock/ [5] Amazon Bedrock AgentCore: https://aws.amazon.com/bedrock/agentcore/ [6] Spring AI AgentCore Starter: https://github.com/spring-ai-community/spring-ai-bedrock-agentcore [7] Model Context Protocol: https://modelcontextprotocol.io/ [8] Memory Implementation: https://dev.to/yuriybezsonov/a-practical-guide-to-building-ai-agents-with-java-and-spring-ai-part-2-add-memory-odn [9] Bedrock Knowledge Bases: https://dev.to/yuriybezsonov/rag-made-serverless-amazon-bedrock-knowledge-base-with-spring-ai-2dn9
Interested in AI Agents with Java?
Yuriy Bezsonov & Sascha Möllering are speakers at JCON.
This article shows how to build production-ready AI agents with Java and Spring AI – and their JCON session demonstrates how AI can analyze and optimize Java application performance.
Couldn’t join live? The session video will be available after the conference – worth checking out!

This article is part of the JAVAPRO special magazine issue:
Java in the Age of AI
Explore how AI is transforming the way we build, secure, and operate software with Java.
From AI agents and new architectural patterns to security, data, and team dynamics—this edition brings together real-world insights for building intelligent, production-ready systems.
Discover the edition →