Artificial Intelligence is evolving at a breathtaking speed. AI agents will soon become the standard for building applications and they are going to be the new Web 2.0. With all that comes the critical need for robust security measures and checks. Retrieval-Augmented Generation (RAG) systems have become a cornerstone in building AI agents and applications. Agents that can access and utilize domain-specific knowledge are more accurate and more useful. However, as these systems become more prevalent, ensuring they only access and expose authorized information has become a critical challenge.
Sensitive Information Disclosure is a common issue that plagues RAG-based systems. We don’t want the Large Language Model (LLM) to accidentally access or expose sensitive data from a database. Traditional Role-Based Access Control (RBAC) systems are not enough to secure RAG applications and agents. This is where Fine-Grained Authorization (FGA) or Relationship-Based Access Control (ReBAC) shines as an Authorization solution for RAG.
The Security Challenge in RAG Systems
RAG systems present some unique security challenges that traditional Role-Based Access Control (RBAC) systems struggle to address:
- Dynamic Context: RAG systems need to make real-time decisions about which documents and tools can be used for context enhancement.
- Complex Relationships: Documents often have intricate relationships with users, teams, and projects, forming a complex graph that simple role-based systems can’t model effectively.
- Granular Control: Different parts of documents may have different sensitivity levels requiring fine-grained control.
- Performance Requirements: Authorization checks must be fast to maintain the AI system’s responsiveness.
This is where FGA shines, it allows us to:
- Model complex relationships between users, groups, and documents
- Handle hierarchical and transitive permissions naturally
- Scale authorization rules efficiently
Let’s look at some real-world scenarios where sensitive data could be at risk:
- A knowledge base assistant exposing confidential documents to employees
- A support AI revealing sensitive user information
- An AI assistant leaking proprietary information about a company
- A chatbot exposing private data to unauthorized users
Why OpenFGA?
OpenFGA is an open-source Fine-Grained Authorization system, created by Auth0, that allows you to manage access control in your applications. It’s currently a CNCF Sandbox Project. OpenFGA is a powerful tool that allows you to define complex access control policies and enforce them in your application. It is a great choice for securing RAG applications as it solves most of the security challenges in RAG systems, including the ones mentioned above. You can also use a hosted version of OpenFGA from Auth0 if you don’t want to self-host.
If you’re new to RAG and access control, I recommend checking out this introductory post on RAG and Access Control: Where Do You Start?.
Why Java and LangChain4j?
Java isn’t far behind when it comes to building AI applications. It has a large developer community and a rich ecosystem of open-source frameworks, and enterprises are adopting these frameworks.
LangChain4j is one of the popular Java AI frameworks. It is a powerful open-source framework that provides capabilities similar to LangChain and LlamaIndex used in the Python/JS ecosystem. It is becoming the Spring Boot of the Java AI ecosystem. The framework offers a comprehensive set of features for building AI applications. Some of the key features include:
- Easy integration with major commercial and open-source LLMs
- Unified API for different model providers
- Tools like prompt templating, chat memory management, and function calls.
- High-level patterns, abstractions, and implementations for building Agents and RAG applications
Building a Secure RAG System
LangChain4j makes it simple to get started with the “Easy RAG” feature while providing advanced capabilities for complex use cases. Hence, in this guide, we will look at a secure Java RAG system using LangChain4j and OpenFGA. We’ll focus on Relationship-Based Access Control (ReBAC), which prevents unauthorized access to sensitive information while maintaining the flexibility and power of RAG systems.
Prerequisites
This guide was created with the following tools and services:
- Java 21
- Gradle 8.12
- An FGA instance (self-hosted OpenFGA or create an Auth0 FGA store)
- A local Ollama instance or OpenAI API key
Setting up the project
To get started, clone the auth0-ai-samples repository from GitHub:
git clone https://github.com/oktadev/auth0-ai-samples.git
cd auth0-ai-samples/authorization-for-rag/langchain4j-java
The application uses Gradle as the build system and is structured as follows:
src/main/java/rag/RagApplication.java
: The main entry point of the application that defines the RAG pipeline.src/main/java/rag/FGARetriever.java
: The FGA-aware retriever that retrieves documents from the vector store based on the user’s permissions.src/main/java/rag/FGAInit.java
: Script to create the FGA store and models.src/main/resources/docs/*.md
: Sample markdown documents that will be used as context for the LLM. We have public and private documents for demonstration purposes.build.gradle
: The Gradle build file that defines the dependencies and tasks.
Let us look at the important bits and pieces before we run the application.
RAG pipeline
The RAG pipeline configures the underlying LLM and defines the retrievers for data. The pipeline is defined as follows:
/** src/main/java/rag/RagApplication.java **/
final ChatLanguageModel CHAT_MODEL_OPENAI = OpenAiChatModel
.builder()
.apiKey(System.getProperty("OPENAI_API_KEY"))
.modelName(GPT_4_O_MINI)
.build();
var user = "user2";
// 1. Read and load documents from the assets folder
var documents = FileSystemDocumentLoader
.loadDocuments("src/main/resources/docs");
// 2. Create an in-memory vector store from the documents.
InMemoryEmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
EmbeddingStoreIngestor.ingest(documents, embeddingStore);
// 3. Create a base retriever
ContentRetriever baseRetriever = EmbeddingStoreContentRetriever.from(embeddingStore);
// 4. Create the FGA retriever that wraps the base retriever
ContentRetriever fgaRetriever = FGARetriever.create(
baseRetriever,
// FGA tuple to query for the user's permissions
content -> new ClientBatchCheckItem()
.user("user:" + user)
.relation("viewer")
._object(
"doc:" +
content.textSegment().metadata()
.getString("file_name").split("\\.")[0]
)
);
// 5. Create an assistant with the FGA retriever
var assistant = AiServices.builder(Assistant.class)
.chatLanguageModel(CHAT_MODEL_OPENAI)
.chatMemory(MessageWindowChatMemory.withMaxMessages(10))
.contentRetriever(fgaRetriever)
.build();
// 6. Query the retrieval chain with a prompt
var query = "Show me forecast for ZEKO?";
var answer = assistant.chat(query);
If you want to use Ollama running locally, uncomment the
CHAT_MODEL_OLLAMA
variable and comment out theCHAT_MODEL_OPENAI
line. Be sure to update the base URL and model name to match your local Ollama instance.
Here is a visual representation of the RAG architecture

FGA retriever
The FGARetriever
class implements ContentRetriever
and filters documents based on the authorization model defined in the FGA store. This retriever is ideal for scenarios where you already have documents in a vector store and want to filter the vector store results based on the user’s permissions. Assuming the vector store already narrows down the documents to a few, the FGA retriever will further narrow down the documents to only the ones to which the user has access.
Here is the important part of the FGARetriever
class simplified for readability:
/** src/main/java/rag/FGARetriever.java **/
// Create a client to interact with the FGA store
new OpenFgaClient(new ClientConfiguration()
.apiUrl(/*...*/)
.storeId(/*...*/)
// credentials only required if using Auth0 FGA,
// comment out for local OpenFGA
.credentials(/*...*/)
)
// Check permissions for the user and object
private Map<String, Boolean> checkPermissions(List<ClientBatchCheckItem> checks) {
var options = //...
var request = //...
var response = fgaClient.batchCheck(request, options).get();
Map<String, Boolean> permissionMap = new HashMap<>();
for (var result : response.getResult()) {
var checkKey = getCheckKey(result.getRequest());
permissionMap.put(checkKey, result.isAllowed());
}
return permissionMap;
}
// retrieve the documents and filter them using checkPermissions function
public List<Content> retrieve(Query query) {
// First, get relevant documents from the base retriever
List<Content> relevantContent = baseRetriever.retrieve(query);
// Create data structures to track checks and document mappings
// var checks, documentToObject, seenChecks = ...
// Process each document to build checks
for (Content doc : relevantContent) {
ClientBatchCheckItem check = buildQuery.apply(doc);
var checkKey = getCheckKey(check);
documentToObject.put(doc, checkKey);
// Skip duplicate checks for same user, object, and relation
if (!seenChecks.contains(checkKey)) {
/*...*/
}
}
var permissionsMap = checkPermissions(checks);
// filter based on permission
return relevantContent.stream()
.filter(doc -> permissionsMap.get(documentToObject.get(doc)))
.collect(Collectors.toList());
}
Set up An FGA Instance
Run OpenFGA locally with Docker using the following command:
docker pull openfga/openfga && \
docker run -p 8080:8080 -p 8081:8081 -p 3000:3000 openfga/openfga run
If you are using Auth0 FGA, visit the dashboard, navigate to Settings, and in the Authorized Clients section, click + Create Client. Give your client a name, mark all three client permissions, and then click Create. Once your client is created, you’ll see a modal containing Store ID, Client ID, and Client Secret. Click Continue to see the
FGA_API_URL
andFGA_API_AUDIENCE
.
Add .env
file with the following content to the root of the project.
# OpenAI key. Not required if using Ollama
OPENAI_API_KEY=<your-openai-api-key>
# Open FGA
FGA_STORE_ID=<your-fga-store-id>
# Required only for Auth0 FGA
FGA_CLIENT_ID=<your-fga-store-client-id>
FGA_CLIENT_SECRET=<your-fga-store-client-secret>
FGA_API_URL=https://api.xxx.fga.dev
FGA_API_AUDIENCE=https://api.xxx.fga.dev/
Create the FGA models and tuples
We will use a simple model that defines just an owner and viewer relation for docs:
model
schema 1.1
type user
type doc
relations
define owner: [user]
define viewer: [user, user:*]
Check out this documentation to learn more about creating an authorization model in FGA.
Now, to have access to the public information, we will need the following tuple on FGA.
- User:
user:*
- Object: select doc and add
public-doc
in the ID field - Relation :
viewer
A tuple signifies a user’s relation to a given object. For example, the above tuple implies that all users can view the public-doc
object.
Now, to have access to the private information, we will need the following tuple on FGA.
- User:
user:user1
- Object: select doc and add
private-doc
in the ID field - Relation :
viewer
For OpenFGA, the store, models, and tuples can be created programmatically using the OpenFGA Java SDK. This is defined in src/main/java/rag/FGAInit.java
and can be run using the following command:
gradle runFGAInit
Once done, copy the store ID from the console and update the .env
file with the store ID.
If you are using Auth0 FGA, navigate to Model Explorer. You’ll need to update the model information with the above schema. Next, navigate to the Tuple Management section and click + Add Tuple, fill in the details for both tuples detailed above.
Test the application
Now that you have set up the application and the FGA store, you can run the application using the following command:
./gradlew run
The application will start with the query, Show me forecast for ZEKO?
Since this information is in a private document, and user2
does not have access to this document, the application will not be able to retrieve it. The FGA retriever will filter out the private document from the vector store results and, hence, print a similar output.
The provided context does not include specific forecasts or projections for Zeko Advanced Systems Inc. ...
If you change the query to something that is available in the public document, the application will be able to retrieve the information.
Now change the user to user1
and run the application again:
./gradlew run
This time, you should see a response containing the forecast information since we have a tuple that defines the viewer
relation for user1
to the private-doc
object.
Congratulations! You have run a simple RAG application using LangChain4j and secured it using OpenFGA.
Implementation Patterns for Securing RAG Systems
Here are some common patterns for securing RAG systems using FGA:
- Pre-filtering Pattern: Check permissions before searching the vector database. This can be done using the OpenFGA list objects request and metadata filtering feature of supported vector databases.
- Post-filtering Pattern: Perform a vector database search first and then filter the results by permissions. This can be done using the OpenFGA batch check request and custom retrievers or re-rankers. This is what we just did in the example above.
- Hybrid Pattern: Combine pre-filtering and post-filtering patterns to achieve the best of both worlds.
Learn more about OpenFGA
In this post, you learned how to secure a LangChain4j-based RAG application using OpenFGA. We invite you to check out the OpenFGA code on GitHub. Visit Auth for GenAI to learn more about how Auth0 can help you secure your GenAI applications.