
Artificial intelligence (AI) is becoming increasingly essential to modern applications. While AI encompasses many different techniques, the current industry focuses on Generative AI (GenAI) due to the latest advancements in large language models (LLMs).
Traditionally, Python has been the dominant language for integrating AI capabilities into applications. However, for Java developers adopting Generative AI, the Spring AI project offers an attractive solution that enables the seamless development of enterprise-grade applications while keeping pace with the rapidly evolving AI landscape.
Spring AI abstracts complex interactions with various AI providers providing REST APIs, such as OpenAI, Anthropic, Microsoft, Google, Amazon, and even local LLMs. Its model-agnostic nature allows for easy switching between models, and, as usual in Spring, you still have access to functionalities and configurations unique to a particular model.
The framework automatically converts AI model output into Java objects, ensuring type safety across your application and provides other fundamental features like multimodality, AI-related observability, and model response evaluation testing.
Additionally, Spring AI supports more advanced AI patterns, such as Tool Calling, Retrieval-Augmented Generation (RAG), and the Model Context Protocol (MCP), to provide context to LLMs.
Spring AI is built upon the core building blocks of the Spring Framework and other Spring projects like Spring Data for integrating vector databases. Spring Boot simplifies and speeds up the development of AI-powered features through autoconfiguration.
Getting Started
Now that you understand what Spring AI offers let’s dive into some hands-on coding!
To use Spring AI in your Spring Boot application, it’s first necessary to add the appropriate dependency for a selected AI provider. In this example, Ollama is used, allowing you to run various LLMs locally.
It is also recommended to add the Spring AI Bill of Materials (BOM) until Spring Boot begins managing the versions of Spring AI dependencies. This BOM specifies compatible versions of all the required dependencies.
implementation platform("org.springframework.ai:spring-ai-bom:${springAiVersion}")
implementation 'org.springframework.ai:spring-ai-ollama-spring-boot-starter'
Spring AI provides Spring Boot auto-configuration for all its features and supported AI providers, allowing you to achieve results quickly.
The default configuration for Ollama assumes that the LLM model is Mistral and that the API for interacting with the model is hosted on the same machine listening on Ollama’s default port, 11434.
ollama pull mistral
You can easily change Spring AI’s auto-configuration defaults with configuration properties or code, such as swapping the Mistral for the Llama 3.2 model.
spring.ai.ollama.chat.model=llama3.2
The ChatClient API
The ChatClient
API, which is idiomatically similar to the WebClient and RestClient APIs, makes integrating AI-powered functionality into a Spring Boot application seamless.
Its abstraction layer enables developers to switch AI providers with minimal configuration and no code changes, aligning with Spring AI’s key design principle of portability.
Its abstraction layer enables developers to switch AI providers with minimal configuration and no code changes, aligning with Spring AI’s key design principle of portability.
A ChatClient
instance can be created utilizing the Builder pattern with a ChatClient.Builder
object, available for injection into your Spring beans through auto-configuration.
@Component
class MyComponent {
private final ChatClient chatClient;
// Constructor-based dependency injection
public MyComponent(ChatClient.Builder chatClientBuilder) {
this.chatClient = chatClientBuilder.build();
}
...
}
The ChatClient
API supports synchronous and streaming communication. Sending a basic prompt in synchronous mode to an AI model requires just four lines of code.
String answer = this.chatClient.prompt()
.user("What is the capital of Germany?")
.call()
content();
The prompt()
method initializes the interaction, allowing you to construct the prompt by adding messages, like, in this case, a user message. The call()
method sends the constructed prompt to the AI provider and retrieves the response. Finally, the content()
method extracts and returns the model’s output as a String.
Messages can include placeholders enclosed in curly braces to represent dynamic values. Instead of providing a plain text message, you need to use a function of type Consumer<PromptUserSpec>
with the user()
method to specify the values for these placeholders. A Consumer
in Java is a functional interface that takes a single input, in this case of type PromptUserSpec
, and performs an operation on it without returning a result.
String country = "Germany";
String answer = this.chatClient.prompt()
.user(promptUserSpec -> promptUserSpec
.text("What is the capital of {country}?")
.param("country", country))
.call()
.content();
For AI models that support multimodality, which means they can process different input types, such as text, images, and audio, the PromptUserSpec
also provides a media()
method to include pictures and audio alongside text prompts.
Resource imageResource = new ClassPathResource("berlin.png");
String answer = this.chatClient.prompt()
.user(promptUserSpec -> promptUserSpec
.text("What is the name of this city?")
.media(MimeTypeUtils.IMAGE_PNG, imageResource))
.call()
.content();
Structured Output
The ChatClient
API provides multiple ways to handle AI-generated output. In the previous example, we used the content()
method to retrieve the model’s response as a String
. However, the API also supports mapping the output directly into Java objects.
This is achieved using system messages, which are special instructions in the prompt that guide the model’s behavior. The ChatClient
API’s system()
method allows you to incorporate system messages into a prompt similar to user messages.
String answer = this.chatClient.prompt()
.user("What is the capital of Germany?")
.system("You are an expert in history. Provide detailed explanations.")
.call()
.content();
Additionally, you can set a default system message for ChatClient
instances using the defaultSystem()
method in the ChatClient.Builder
.
To enable the automatic mapping of AI-generated output into Java objects within the ChatClient
, the StructuredOutputConverter
appends a system message to the prompt behind the scenes. This message instructs the model to format its response in a specific format, such as JSON, which is then mapped to a Java object.
record City(String name, String zipcode) {}
City capitalOfGermany = this.chatClient.prompt()
.user("What is the capital of Germany?")
.call()
.entity(City.class);
Advanced AI Patterns
While prompt engineering helps guide an AI model’s responses, it has limitations. No matter how well a prompt is crafted, an LLM can only generate answers based on its pre-trained knowledge and the context provided within the prompt itself. To overcome these constraints, AI systems increasingly adopt agentic patterns, where agents – autonomous intelligent systems capable of performing specific tasks without human intervention – enhance an LLM’s capabilities by planning, executing actions, and dynamically retrieving or generating new information.
Spring AI provides the essential building blocks for implementing and using AI agents.
Tool Calling
One such building block is Tool Calling, also known as Function Calling, which enables LLMs to leverage external APIs to perform specific tasks and incorporate up-to-date information. In this approach, when an LLM receives a user prompt, it evaluates whether invoking a defined external function is necessary to fulfill the request. If so, the model generates a structured request detailing the function to be called and the required parameters. The client system then processes this request, executes the specified function, and returns the result to the LLM.
Spring AI provides several ways to define callable tools. One of them is the declarative approach using the @Tool
annotation, offering a description field to explain the purpose and functionality to the LLM. Similarly, the @ToolParam
annotation is used to describe individual parameters of these tools.
@Service
class WeatherService {
@Tool(description = "Fetches the current temperature in a city")
Double fetchCurrentTemperature(
@ToolParam(description = "Name of the city") String city) {
...
}
}
To use a defined tool in a specific chat request, pass an instance of the tool class to the tools()
method in the ChatClient
prompt. To make tools available across all chat requests, register them at ChatClient.Builder
’s defaultTools()
method.
String answer = this.chatClient.prompt()
.user("What is the current temperature in the capital of Germany?")
.tools(weatherService)
.call()
.content();
Under the hood, Spring AI sends tool definitions along with the prompt to the LLM, allowing it to decide whether a function call is needed. If so, the model responds with the tool’s name and required parameters. Spring AI then executes the tool with the provided inputs, retrieves the result, and returns it to the LLM. The model incorporates this data to generate a more accurate and contextually relevant response.
The Tool Calling capability of LLMs is also fundamental to the Model Context Protocol (MCP). This open protocol standardizes the connection between AI models and various data sources and tools. It allows developers to build agents on top of LLMs. MCP follows a client-server architecture, where an AI application (MCP host) communicates through embedded MCP clients to MCP servers to access specific resources or functionalities.
Spring AI has recently introduced support for implementing MCP clients and servers based on the official MCP Java SDK. While this is fantastic, it deserves a more detailed discussion on its own, which is beyond the scope of this article.
Retrieval-Augmented Generation
Another advanced technique for enhancing the output of large language models by integrating external data is called Retrieval-Augmented Generation (RAG). With RAG, external data is transformed into vector embeddings stored in a vector database. A conversion of a user query into a vector embedding is also necessary to retrieve semantically relevant data from the database, which is then added to the LLM’s context to improve responses.
To implement a basic RAG flow, you first need to integrate your data into a vector store. Spring AI facilitates this process through an Extract, Transform, Load (ETL) pipeline. Initially, a DocumentReader
extracts content from various sources, such as PDFs, and converts it into structured Document
objects. Next, these documents are split into chunks using a DocumentTransformer
to ensure they align with the context window limitations of AI models. Finally, the chunks are stored in a vector database using a DocumentWriter
, which is extended by the VectorStore
interface.
DocumentReader documentReader = new PagePdfDocumentReader(pdfResource);
List<Document> documents = new TokenTextSplitter().apply(documentReader.get());
vectorStore.accept(documents);
With the VectorStore
interface, Spring AI provides an abstraction layer that enables seamless adaptability with minimal code changes for integrating various vector databases.
Each supported vector database has its own individual Spring Boot starter dependency. In this example, PGvector, an open-source vector extension for PostgreSQL, is used.
implementation 'org.springframework.ai:spring-ai-pgvector-store-spring-boot-starter'
A VectorStore
instance in Spring AI requires an embedding model, which is available through the EmbeddingModel
API. Since most AI providers offer embedding models, additional dependencies are usually unnecessary. Thanks to Spring Boot’s auto-configuration, a VectorStore
and an EmbeddingModel
instance are available with minimal setup.
spring.ai.ollama.embedding.model=nomic-embed-text
Spring AI’s support for Retrieval-Augmented Generation is built upon its Advisor
API, which enables developers to intercept and transform data exchanges with LLMs. For common RAG workflows, Spring AI offers two out-of-the-box advisors. The QuestionAnswerAdvisor
is designed for a basic RAG flow, while the RetrievalAugmentationAdvisor
provides a modular architecture for custom retrieval and augmentation strategies to support more advanced RAG flows, such as routing between multiple vector databases.
Advisors can be configured for a specific chat request using the advisors()
method within the ChatClient
prompt or across all chat requests via the ChatClient.Builder
.
String answer = this.chatClient.prompt()
.user("What are the best cities to visit based on my travel guides?")
.advisors(new QuestionAnswerAdvisor(vectorStore))
.call()
.content();
In this example, a QuestionAnswerAdvisor
is configured with an auto-configured VectorStore
instance. The retrieval behavior can be fine-tuned using an optional SearchRequest
parameter, which defines how relevant data is searched within the vector database. Another parameter allows developers to provide additional guidance to the LLM on how to interpret and utilize the retrieved data in the prompt.
Image and Audio AI Models
Underlying the ChatClient
is a Model
API interface that supports all types of AI models, including embedding, image, and audio models. While the ChatClient
is necessary to simplify complex prompt creation for chat models, image and audio models typically require less intricate prompts. In these cases, direct interaction with the Model
API is sufficient. As an additional convenience, Spring Boot’s auto-configuration pre-configures Model
API instances for all types of models, allowing for easy injection into your Spring beans.
String prompt = new PromptTemplate("Generate a picture of {city}")
.render(Map.of("city", "Berlin"));
ImageGeneration imageGeneration = imageModel.call(prompt).getResult();
Image image = imageGeneration.getOutput();
String imageUrl = image.getUrl();
Summary
To sum up, Spring AI enables you to integrate GenAI capabilities into Spring Boot applications as seamlessly as databases. It supports various AI models with a focus on portability and provides fundamental features like type-safe responses and multimodality. Beyond that, Spring AI provides essential building blocks for agentic systems and supports the Model Context Protocol, enabling integration with third-party tools and resources. With these features, Spring AI helps you stay at the forefront of evolving AI technologies.