Talk to Your Data: Natural Language Data Access in Java

For decades, Java developers have relied on robust frameworks to manage the complexity of enterprise applications. At the heart of this ecosystem lies Hibernate ORM, the popular library for bridging the gap between object-oriented Java applications and relational databases. It allows developers to map complex domain models to database tables, and write type-safe, performant queries using the Hibernate Query Language (HQL).

Quarkus has emerged as a Java framework for building applications and services, focusing on incredibly fast startup times, efficient resource consumption and developer joy.

Enter the new frontier: Artificial Intelligence. As Large Language Models transform how we interact with software, Java developers need tools to integrate these capabilities natively. This is where LangChain4j comes in. It is quickly emerging as a powerful library for AI integration in Java, offering a unified API that simplifies the complexity of LLM orchestration.

By combining these three powerhouses – Hibernate ORM, Quarkus, and LangChain4j – we can now achieve something that was previously incredibly difficult: allowing every user to “talk” to their data. This article explores the new Hibernate Assistant module, and how it can be used to make your structured relational data available in the unstructured world of AI.

Bridging the Gap: The Hibernate Assistant

In this simple schema, we can see the traditional data flow for a Java application using Hibernate to retrieve data from a relational database:

Traditional Hibernate data flow

Large Language Models are trained on vast amounts of public data. They already know about Hibernate ORM, its persistence mapping logic, and the syntax of the Hibernate Query Language. However, they lack one critical piece of information: the specific context of your application. An LLM doesn’t know how your tables look, nor your entities and their relationships.

The hibernate-assistant module is designed to solve this problem. It acts as a translation layer, using Hibernate’s existing knowledge of the application’s mapping domain model to inform the AI model. Since Hibernate already knows which database table maps to a specific Java entity class, and which columns correspond to persistent attributes, it can programmatically describe this structure to an LLM.

The module offers tools to initialize the context by reporting on mapped entity classes and their properties. For example, it can instruct the model that a “Company” entity corresponds to the Java class org.hibernate.assistant.domain.Company, has an identifier “id”, and includes attributes like its name and an @Embedded address component, as well as a one-to-many collection association of related Employee entities.

Furthermore, the module handles the complex task of transforming query results to feed to the AI model. While serializing simple SQL result sets might be easy, translating the results of an ORM query producing Java objects – which may include circular or lazy associations, collections, and embedded types – is not. Hibernate Assistant, once again, leverages its knowledge of your domain model to transform HQL query results into a textual format that the LLM can effortlessly consume and interpret.

AI-enhanced Hibernate data flow

Why Hibernate?

You might wonder, “Why not just let the LLM write SQL?” While possible, using Hibernate ORM as the intermediary offers significant advantages in reliability and ease of use:

  • Constrained Access: When using Hibernate, data access is naturally constrained to the mapped domain model. The LLM can only access tables that have a corresponding entity class, and even then, it can only select columns listed as fields in your objects.
  • Fail-Early Validation: Thanks to Hibernate’s advanced type-safety and query validation, the system can “fail-early.” We know LLMs are imperfect, and in case of invalid generated HQL statements, Hibernate detects it before it ever reaches the database, increasing overall reliability and performance.
  • Self-Correction: If a query fails, the error message from Hibernate is informative and specific, regardless of the underlying database platform. The user can intercede to act on that error, or even feed it back to the LLM so it can correct its mistake in a subsequent prompt.
  • Portability: Hibernate ORM supports a wide variety of databases, ensuring your application operates consistently regardless of the underlying infrastructure. It provides a single, unified query language (HQL) that abstracts away database-specific SQL variations.
  • Handling Complexity: HQL makes it significantly easier to write complex queries involving multiple entities, associations, embeddable values, and inheritance hierarchies compared to raw SQL. LLMs are great at natural language processing, and HQL being a higher-level abstraction closer to language helps AI agents achieve better results in these complex scenarios.

Building the Application

Let’s now dive into the practical implementation by building a Quarkus application. This application will leverage the concepts we’ve just discussed to enable natural language interaction with your relational data.

Project Dependencies

You will need the standard Quarkus Hibernate ORM extension, the Quarkus LangChain4j extension (for your AI service provider of choice), and the new hibernate-assistant module.

<dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-hibernate-orm</artifactId>
</dependency>
<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-ollama</artifactId>
</dependency>
<dependency>
    <groupId>org.hibernate.orm</groupId>
    <artifactId>hibernate-assistant</artifactId>
</dependency>

You will also need a JDBC driver for your database and some basic configuration for the extensions.

Implementing the Assistant

The goal is to create a service that takes natural language prompts, understands if relevant information can be derived by your Hibernate mappings, generates a query to extract such information, and finally executes it. Optionally, the service can feed the results back to the LLM itself to provide an informed plain language reply directly to our users. 

We can take advantage of the existing ChatModel and ChatMemoryProvider instances, from the Quarkus LangChain4J extension, and the Metamodel instance, from Hibernate ORM, that will be available in the CDI context. With that, we can create our own @ApplicationScoped bean that will be our HibernateAssistant:

import dev.langchain4j.memory.chat.ChatMemoryProvider;
import dev.langchain4j.model.chat.ChatModel;
import jakarta.persistence.metamodel.Metamodel;

private static final PromptTemplate METAMODEL_PROMPT_TEMPLATE = PromptTemplate.from(
            """
            You are an expert in writing Hibernate Query Language (HQL) queries.
            You have access to an entity model with the following structure:
            {{it}}
            """ );

@Inject
ChatModel chatModel;
@Inject
ChatMemoryProvider memoryProvider;
@Inject
Metamodel metamodel;

public HibernateAssistantLC4J() {
   String metamodelJson = MetamodelJsonSerializerImpl.INSTANCE.toString(metamodel);
   SystemMessage systemMessage = METAMODEL_PROMPT_TEMPLATE.apply(metamodelJson).toSystemMessage();
   // We can now initialize the ChatMemory using the provider with the SystemMessage
}

We’ve completed the crucial first step for getting our assistant up and running: creating a SystemMessage, a context initialization directive that will instruct the LLM on how to respond to subsequent prompts. This was easily done thanks to the MetamodelSerializer implementation built into the hibernate-assistant module that feeds all the available mapping information from your Object/Relational Hibernate metamodel into a JSON format which can be easily consumed by large language models. 

Now, we can use the assistant to create a Hibernate query from natural language:

public String executeAiQuery(String message, SharedSessionContract session) {
   UserMessage userMessage = UserMessage.from(message);
   chatMemory.add(userMessage);
   ChatRequest chatRequest = ChatRequest.builder().messages(chatMemory.messages()).build();
   ChatResponse chatResponse = chatModel.chat(chatRequest);

   String hql = chatResponse.aiMessage().text();
   SelectionQuery<Object> query = session.createSelectionQuery(hql, Object.class);
   List<Object> results = query.getResultList();
   return new ResultsJsonSerializerImpl(session.getFactory()).toString(results, query);
}

Under the hood, the assistant generates a valid HQL query tailored to your schema, relevant to the natural language request that was provided. For example, the user might ask “Extract all companies with a name starting with the letter A, regardless of case”. And the assistant will produce the following HQL query:

SELECT c 
FROM Company c 
WHERE LOWER(c.name) LIKE 'a%'

The query is executed using the provided Hibernate ORM Session, which in turn results in the following SQL statement:

select
    c1_0.id,
    c1_0.city,
    c1_0.street,
    c1_0.name 
from
    company_table c1_0 
where
    lower(c1_0.name) like 'a%' escape ''

The query results are extracted, and the retrieved data is available for use in your application’s logic.

In my original code snippet, I’ve gone a step further, converting the query results from Java objects into a textual format, specifically JSON, using the hibernate-assistant module’s ResultsSerializer

Example of plain text data extraction to tabular format

This seamlessly transforms a potentially complex data structure into a format that can easily be consumed by other application components or even fed back to the model itself for further reasoning and insights on the original prompt.

Retrieval-Augmented Generation (RAG)

For more advanced use cases, we might want to combine domain model and underlying database knowledge with the LLM’s conversational abilities in a retrieval-augmented generation pipeline. Thanks to the Hibernate Assistant module, we can easily design an implementation of LangChain4j’s designated RAG interface ContentRetriever.

This content retriever could use the instance of the Hibernate Assistant we’ve created earlier that will be available in the CDI context, as well as once again the current Hibernate ORM session. Using the assistant’s methods we’ve shown in the previous example, it can interpret the user request and extract relevant information from the database through Hibernate. This information is then serialized in a textual format compatible with LLMs, and fed back to the LangChain4J RAG pipeline to provide an informed reply based on your data.

@Inject
HibernateContentRetriever contentRetriever;

interface HibernateAssistantRag {
   String chat(String userMessage);
}

public String ragResponse(String message, SharedSessionContract session) {
   RetrievalAugmentor rag = DefaultRetrievalAugmentor.builder()
         .contentRetriever(contentRetriever)
         .contentInjector(DefaultContentInjector.builder().promptTemplate(INJECTOR_PROMPT_TEMPLATE).build())
         .build();
   final HibernateAssistantRag assistant = AiServices.builder(HibernateAssistantRag.class)
         .chatModel(chatModel)
         .chatMemory(chatMemory)
         .retrievalAugmentor(rag)
         .build();
   return assistant.chat(message);
}

With this setup, when a user asks, “How many companies with at least 5000 employees can be found in the city of Modena”, the content retriever understands it can derive this information from your database, invokes the assistant which generates the relevant HQL query:

SELECT COUNT(c) 
FROM Company c 
WHERE c.address.city = 'Modena'
AND SIZE(c.employees) >= 5000

The query might look trivial, but it involves embedded properties and to-many associations. The model makes quick work of it thanks to its knowledge of the HQL syntax and its powerful capabilities. The query itself is executed completely transparently, within the RAG pipeline, and the extracted data is serialized and provided back to the model through the RetrievalAugmentor to generate an insightful natural language response: 

Example of plan text data extraction and informed LLM reply

The Results

Now that we have access to our database through natural language, how can we make use of it?

Hibernate-powered Chatbot

The standard chatbot experience involves a user conversing interactively with an LLM through a simple chat-like interface. By integrating Hibernate Assistant’s data retrieval capabilities into this interface, we can enrich the AI model’s responses with relevant information or simply present the data in an easily understandable format.

Imagine an HTTP REST endpoint /assistant that accepts a user message. The service uses the assistant functionalities to interact with the database, through the Hibernate ORM extension, and retrieve relevant information. Here’s a simplified look at how that endpoint might be implemented with the Quarkus REST extension:

import org.hibernate.StatelessSession;
import org.hibernate.query.SelectionQuery;

import jakarta.inject.Inject;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.QueryParam;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;

@Path("/assistant")
public class AssistantResource {
   @Inject
   StatelessSession session;
   @Inject
   HibernateAssistantLC4J assistant;

   @GET
   @Path("/json")
   @Produces(MediaType.TEXT_PLAIN)
   public Response queryToJson(@QueryParam("query") String query) {
      final String json = assistant.executeAiQuery(query, session);
      return Response.ok(json).build();
   }

   @GET
   @Path("/ask")
   @Produces(MediaType.TEXT_PLAIN)
   public Response naturalLanguage(@QueryParam("query") String query) {
      final String response = assistant.ragResponse(query, session);
      return Response.ok(response).build();
   }
}

The endpoint offers two methods. Both of them take a simple natural language request through the query parameter as input. The first simply interprets the request and returns the serialized results back in JSON format, which can easily be consumed by another service or the front-end itself. The second method uses the RAG pipeline we explored earlier to provide an informed natural language response back to the user.

Example of chatbot interaction, with RAG infused back and forth

And here is the result. The user asked a simple question to the first method, the assistant generated the query, fetched the Employee entities, and they’re now displayed in a simple table. The user subsequently asks questions and the assistant, thanks to its contextual memory and the Hibernate-infused RAG pipeline, provides informed replies.

Empowering any user, regardless of their familiarity with Hibernate or relational databases, this chatbot enables them to effortlessly retrieve necessary information and gain valuable insights. By integrating the language processing power of LLMs with Hibernate’s deep understanding of your business domain model, simply talking to your data is now possible for everyone.

MCP Server 

Looking forward, a more advanced application of this technology is the Model Context Protocol (MCP). MCP is an open standard that lets AI models connect with tools and data providers, acting like a universal adapter to give AI information and actions beyond its initial training.

By exposing your Hibernate Assistant as an MCP tool, through the Quarkus MCP Server extension, you can allow any compatible AI agent to talk with your database. The agent discovers the “tools” provided by Hibernate Assistant such as “hibernateGetMetamodel” or “hibernateExecuteQuery” and can invoke them autonomously to answer user questions:

import io.quarkiverse.mcp.server.Tool;
import io.quarkiverse.mcp.server.ToolArg;

@Tool(name = "hibernateGetMetamodel", description = "Retrieve a textual (JSON) representation of the Hibernate Metamodel, i.e. the entities, " +
      "properties and relationships defined in the persistence layer, that can be used to access the database.")
public String getMetamodel() {
   return MetamodelJsonSerializerImpl.INSTANCE.toString(session.getFactory().getMetamodel());
}

@Tool(name = "hibernateExecuteQuery", description = "Execute a query against the database using the Hibernate Assistant. " +
      "Pass a natural language message and get a JSON representation of the results extracted from the database.")
public String executeQuery(@ToolArg(description = "Natural language query to execute") String query) {
   return assistant.executeAiQuery(query, session);
}

This turns your Java application into a highly intelligent data node that can be plugged into the broader ecosystem of AI agents, allowing them to perform relational data access and manipulation powered by the logic you have already defined in your Hibernate domain model.

Conclusion

The teamwork of Hibernate ORM, Quarkus, and LangChain4j marks a significant step forward in making relational data accessible to AI. By abstracting the complexity of SQL generation and relational data extraction, and leveraging the rich metadata of the domain model, Hibernate Assistant allows developers to build applications where users can truly talk to their data. Whether through a custom chatbot or a standardized MCP interface, the barrier between natural language and structured data is rapidly disappearing.

Want to Dive Deeper?
Marco Belladelli is a speaker at JCON!
This article covers the topic of his JCON talk. If you can’t attend live, the session video will be available after the conference – it’s worth checking out!

Total
0
Shares
Previous Post

Caching and Beyond: Smarter Data Processing with Java – in 2 Hours

Next Post

Java Performance Optimization with Agentic AI: Autonomous Diagnostics and Actionable Recommendations

Related Posts