Discover how Java 25 and GenAI enable explainable microservices for finance using CrewAI and vector search, open-source and production-ready.

Why this problem matters
Financial institutions live and breathe decisions. Every transaction, payout, or compliance check is a decision point:
- Is this transaction fraudulent?
- Does this payout meet policy thresholds?
- Should this loan request be auto-approved or escalate
The Scale of Decisions in Finance
In theory, these questions sound straightforward. However, in reality, they must be answered millions of times per day, under strict regulations, with milliseconds of latency and a clear explanation for every decision.
Why Current Tools Fall Short
Traditional rule engines are predictable but rigid. By contrast, machine learning models are fast but often opaque, functioning as a black box. As a result, regulators and auditors do not accept “the model said so.”
The Case for a Hybrid Approach
What’s missing is a hybrid approach, one that combines scalable automation with the ability to explain itself.
Therefore, the key question becomes:
How can we use Java 25 GenAI Microservices to create a solution that both scales and explains its reasoning?
The Architecture at glance
The design brings together proven Java reliability with emerging AI reasoning:
- Java 25 + Spring Boot microservice : API gateway, orchestration, security, and logging.
- CrewAI agent layer (Python + FastAPI) : reasoning and justification generation.
- Optional vector search (Milvus Lite): retrieves similar historical cases for context.
- Client: sends request JSON, receives
{ decision, justification }.

This layered design allows us to keep Java in the critical path with fast, robust, production-friendly, while delegating “explainability” to a flexible agent layer.
👉 “Related insights can be found in Java through the decades: From Applets to Microservices.”
Java 25 Features in Action
Every few years, Java gets a release that feels like a paradigm shift. Java 25 is one of those. The headline feature is virtual threads. For decades, Java developers struggled with thread-per-request models. This limited concurrency and wasted memory. Virtual threads solve this by making threads lightweight and plentiful.
Analogy: Imagine a highway where every car previously needed a dedicated lane. With virtual threads, you can pack thousands of cars into far fewer lanes without collisions. It means we can now run tens of thousands of concurrent requests without blocking the JVM. The other breakthrough is structured concurrency, which allows us to treat multiple concurrent tasks as a single unit of work. Cancel all if one fails, collect all results neatly. This is tailor-made for calling AI services, which often involve parallel I/O and retries.
In short, Java 25 is equal to concurrency without pain and exactly what you need when integrating AI reasoning into microservices.
Step 1: The Java API
We start simple. A Spring Boot controller exposes an endpoint /decision where clients can post transaction requests.
try (var exec = Executors.newVirtualThreadPerTaskExecutor()) {
@RestController
@RequestMapping("/api")
public class DecisionController {
private final DecisionService service;
public DecisionController(DecisionService service){ this.service = service; }
@PostMapping("/decision")
public ResponseEntity<DecisionResponse> decide(@RequestBody RequestDto req) {
var result = service.decide(req);
return ResponseEntity.ok(result);
}
}
🔍 Explanation:
DecisionControllerexposes the entry point.- Requests are passed directly to
DecisionService. - The controller stays thin, focused only on routing and response.
This separation is intentional. By isolating the decision logic, we make the system easier to test, extend, and scale.
Step 2: Talking to the Agent
Now for the interesting part. The Java service delegates reasoning to the CrewAI agent. Here we demonstrate virtual threads in action.
@Service
public class DecisionService {
private final RestTemplate restTemplate = new RestTemplate();
public DecisionResponse decide(RequestDto req) {
try (var exec = Executors.newVirtualThreadPerTaskExecutor()) {
var crewCall = exec.submit(() -> callCrew(req));
var reply = crewCall.get();
return new DecisionResponse(reply.decision(), reply.justification());
} catch (Exception e) {
return new DecisionResponse("review","Fallback: "+e.getMessage());
}
}
private CrewReply callCrew(RequestDto req) {
String url = "http://localhost:8001/crew/decision";
return restTemplate.postForObject(url, req, CrewReply.class);
}
}
🔍 Explanation:
- We create a virtual thread executor so each request runs in isolation.
- A request is delegated to
callCrew, which calls the CrewAI agent. - If the agent fails, we fail safe: return
"review".
This pattern combines innovation (AI) with enterprise safety (fallbacks).
Step 3: CrewAI Reasoning Layer
While Java handles orchestration, CrewAI handles reasoning. Think of CrewAI as a specialized co-pilot. It doesn’t just output “approve” or “reject,” it adds why.
Here’s a minimal CrewAI FastAPI endpoint:
@app.post("/crew/decision")
def decision(data: RequestData):
return {
"decision": "approve",
"justification": f"Amount {data.amount} is consistent with prior approvals."
}
🔍 Explanation:
- This stub returns a fixed response today.
- Tomorrow, it can connect to LangChain or an LLM via OpenRouter.
- The key is the contract: the agent must return both the decision and the justification.
By structuring output this way, we guarantee every decision is explainable, even if it’s machine-generated.
Step 4: Adding Context with Vector Search
A decision is only as good as its context. For this reason, we add a vector database.
docker run -p 19530:19530 milvusdb/milvus:latest
This allows us to:
- Store embeddings of past approved/rejected cases.
- Retrieve the top-3 most similar cases for a new request.
- Pass those into the agent’s prompt.
Prompt Example:
System: You are a compliance-aware assistant.
Context: {retrieved_cases}
User: Given the request {request_json}, recommend approve/review/block and explain briefly.
🔍 Benefit:
Instead of “Approve because the model says so,” the response becomes:
“Approve. This matches three past approvals: TX102, TX207, TX301.”
Consequently, decisions are more auditable and trusted by regulators.
Step 5: Observability and Performance
Enterprise developers know: if you can’t measure it, you can’t trust it.
Therefore, we log:
- CrewAI RTT (round-trip time)
- Total latency
- Decision + justification (excluding sensitive fields)
Sample log:
INFO DecisionService - CrewAI RTT: 212ms | Total: 245ms | Decision: approve
As a result, the service is not a black box. It’s measurable, testable, and auditable.
Step 6: Running It Yourself
Finally, let’s run the system locally.
- Start CrewAI agent:
uvicorn main:app --reload --port 8001
2. Start Java service:
mvn spring-boot:run
3. Test a request:
curl -X POST http://localhost:8080/api/decision \
-H "Content-Type: application/json" \
-d '{"transaction_id":"TX123","amount":420.00,"description":"POS purchase"}'
Lessons Learned
- Java 25 is a turning point. Virtual threads + structured concurrency simplify concurrency for AI calls.
- Explainability is non-negotiable. An agentic layer ensures decisions come with reasoning.
- Open-source tools are enough. Spring Boot, CrewAI, and Milvus Lite provide everything needed.
- Balance matters. AI speeds decisions, but enterprise trust requires fallbacks and logs.
What’s Next
- Replace the CrewAI stub with a real LLM gateway (OpenRouter, Hugging Face).
- Deploy to a free-tier cloud (Render, Fly.io).
- Add Helm charts for Kubernetes deploys.
- Invite community contributions via GitHub.
Full Source Code
All code is open-source under :
👉 https://github.com/sibasispadhi/agentic-fintech-java25