Stop Writing YAML: How to Define, Test, and Deploy Your Cloud in Pure Java

1. Introduction

As Java developers, we apply rigorous engineering practices to application code: type safety, unit testing, refactoring, code review. Then we switch to deploying that code and write hundreds of lines of YAML, accepting the lack of expressiveness, missing abstractions, and missing flexibility as the cost of doing business. Or worse, we log into the cloud console and click our way through to the deployment.

This article explores an alternative: defining cloud infrastructure in Java, with the same tools and practices we use for application code. We’ll build a working example, a Java service deployed on AWS, covering the full lifecycle from resource definition through testing to CI/CD. While this example uses Quarkus, the infrastructure patterns demonstrated here apply equally for Spring Boot, Jakarta EE, or any other containerized Java framework. The companion code is available at github.com/wlami/stop-writing-yaml-javapro.

Comic-style illustration of a computer monitor with code on screen and cloud infrastructure components (databases, servers, firewalls, containers) rising from the code into a connected architecture above the display.
When your Java code builds the cloud: infrastructure as code turns resource definitions into real architecture.

2. How We Got Here

Infrastructure management has evolved through distinct stages:

Manual provisioning through cloud consoles and dashboards introduced the “snowflake” problem: each environment was unique, unrepeatable, and prone to human error.

Scripted automation via shell scripts or Python improved repeatability but often lacked idempotency: running the same script twice could produce different results or fail on the second run.

Declarative infrastructure-as-code solved idempotency. Tools like CloudFormation, Terraform, and Kubernetes popularized YAML and domain-specific languages like HCL, the language Terraform uses, for defining a “desired state”. The tool reconciles actual infrastructure with the declared state.

This third stage is where most organizations are today, and where the problems start.

Where Declarative DSLs Break Down

YAML and HCL are configuration formats, not programming languages. Need to create resources across multiple availability zones with alternating configurations? In Java, that is a loop with conditional logic. In configuration languages like HCL, however, expressing logic like this can be awkward or even impossible.

The absence of proper abstraction mechanisms leads to copy-paste reuse: the same anti-pattern we would reject in application code. CloudFormation templates routinely exceed 1000 lines for moderately complex setups.

The developer experience gap is stark. In your Java IDE, you have autocomplete, type checking, safe refactoring, and instant test feedback. In YAML, a misplaced indentation silently breaks your deployment. Refactoring means find-and-replace across files. Testing means deploying to the cloud and waiting.

A concrete example of what this gap looks like. An SQS queue in CloudFormation YAML:

AWSTemplateFormatVersion: '2010-09-09'
Resources:
  ProcessorQueue:
    Type: AWS::SQS::Queue
    Properties:
      QueueName: processor-queue
      VisibilityTimeout: 300
      MessageRetentionPeriod: 86400

Newer infrastructure-as-code tools advance by allowing you to define resources in a general-purpose language. Here is the same queue in Java, using Pulumi.

var queue = new Queue("processor-queue", QueueArgs.builder()
    .visibilityTimeoutSeconds(300)
    .messageRetentionSeconds(86400)
    .build());

Both define the same resource. But in the Java version, your IDE autocompletes visibilityTimeoutSeconds, the compiler rejects visibilityTimeoutSecond (note the missing ‘s’), and you can CMD+click into the QueueArgs source to read every available property. In the YAML version, you learn about typos when the deployment fails.

When Infrastructure Becomes Invisible

A story from a previous employer: months after I left, a colleague deleted what appeared to be a “demo bucket” I had created via ClickOps. With no code, no documentation, no tracked dependencies, nobody knew what that bucket did… until internal developer tooling broke.

This is not unique to ClickOps. Any infrastructure that cannot be discovered, refactored, or tested using standard developer tools is at risk. When there is no compile-time checking, no CMD+click navigation, no “find all usages”, critical infrastructure becomes invisible until something breaks.

3. Infrastructure in Real Languages

The idea is straightforward: treat infrastructure the same way we treat application code. Use real programming languages with real tools.

The Landscape

Several tools allow you to write your infrastructure code this way. AWS Cloud Development Kit (CDK) lets developers define AWS infrastructure using TypeScript, Python, Java, and other languages, but is limited to AWS. HashiCorp’s CDKTF (CDK for Terraform) brought programming-language support to Terraform’s multi-cloud model but was deprecated in late 2025.

Pulumi is an open-source, multi-cloud infrastructure-as-code tool supporting general-purpose languages, including Java. The core engine is Apache 2.0 licensed. This article uses Pulumi for its examples, but the underlying idea, infrastructure as testable, refactorable code, applies to any infrastructure-as-code tool that uses general-purpose programming languages.

How Pulumi Works

With Pulumi, you write a standard Java application that describes infrastructure. The Pulumi CLI executes your program and provisions resources across cloud providers.

The following diagram shows the runtime architecture. Your Java program communicates with the Pulumi engine, which dispatches resource operations to provider plugins. Each provider translates resource declarations into cloud API calls. State is persisted between runs, so the engine can compute the minimal set of changes needed:

The key concepts:

Resources are strongly-typed Java objects that represent cloud resources (S3 buckets, RDS databases, Lambda functions). You instantiate them, set properties, and Pulumi manages creation order and dependencies.

Outputs wrap values that are not yet known, such as a resource’s ID or endpoint, before they are created. Output<T> is conceptually similar to CompletableFuture<T>: you compose outputs using .applyValue() and Output.format() instead of accessing the values directly. 

Providers are plugins that communicate with cloud APIs: AWS, Azure, GCP, Kubernetes, and 120+ others. They provide type-safe access to every resource property.

Stacks are isolated instances of your program. The same Java code deploys to different environments: dev, staging, production. Each with its own configuration and separate state:

This also enables per-developer stacks, so engineers work in isolation without sharing a dev environment. Stack configuration includes built-in secrets management: values marked as secret are encrypted before they are written to the config file and remain encrypted in state and masked in CLI output. No external secrets manager is required for this to work.

Components bundle multiple resources into reusable abstractions: the infrastructure equivalent of extracting a class from duplicated code.

4. Building a Real Application

Let’s build something concrete: a containerized Java service that reads messages from a queue, processes them, and writes results to a PostgreSQL database, deployed on AWS. In this example, we’ll use Quarkus, a Java framework optimized for containers and an alternative to Spring Boot. The infrastructure patterns shown here work with any containerized Java application.

Project Structure

Two separate Gradle projects:

  • quarkus-processor/: the application with its Dockerfile
  • infrastructure/: Pulumi infrastructure code

The infrastructure project pulls in the Pulumi SDK and provider packages:

dependencies {
    implementation 'com.pulumi:pulumi:[1.3,2.0)'
    implementation 'com.pulumi:aws:6.+'
    implementation 'com.pulumi:docker-build:0.+'
}

The entry point is a standard main method. Inside Pulumi.run(), you define resources:

public static void main(String[] args) {
    Pulumi.run(ctx -> {
        // all resource definitions go here
    });
}

Defining Resources

For the complete application, we need a Virtual Private Cloud with subnets and security groups, an SQS message queue, an RDS PostgreSQL database, a container registry (ECR) repository, a Docker image build, and an ECS Fargate service. The full implementation is roughly 200 lines of Java, compared with over 1,000 lines of equivalent CloudFormation YAML.

Here’s the Docker image build, which compiles the Quarkus application and pushes it to ECR as part of the infrastructure deployment:

var image = new Image("app-image", ImageArgs.builder()
    .context(BuildContextArgs.builder()
        .location("../quarkus-processor")
        .build())
    .push(true)
    .tags(ecrRepository.repositoryUrl()
        .applyValue(url -> List.of(url + ":latest")))
    .registries(RegistryArgs.builder()
        .address(ecrRepository.repositoryUrl())
        .username(authToken.applyValue(token -> token.userName()))
        .password(authToken.applyValue(token -> token.password()))
        .build())
    .build());

When you run the pulumi CLI with the up operation, Pulumi invokes Docker to build the Quarkus application from its Dockerfile, pushes the image to ECR, and then provisions the ECS task referencing that image. Application code and infrastructure are deployed together.

Infrastructure Logic in Java

200 lines of raw resource definitions get the job done, but real infrastructure has business requirements that vary. Different environments need different database configurations. Production needs replicas and extended backup retention. Dev does not. In a DSL like HCL or YAML, expressing this kind of conditional logic means reaching for workarounds: count, for_each, ternary expressions, and nested conditionals that become difficult to read and impossible to unit test.

In Java, it is a function.

Consider a method that maps Service Level Agreement (SLA) tiers to database configurations. SLATier is an enum with three values: BRONZE, SILVER, GOLD. DatabaseConfig is a record that captures the infrastructure decisions: whether to use Aurora or plain RDS, how many days to retain backups, whether to enable monitoring, and how many read replicas to create.

public static DatabaseConfig getConfigForSLA(SLATier tier) {
    return switch (tier) {
        case BRONZE -> DatabaseConfig.builder()
            .useAurora(false)  // Single RDS instance
            .backupDays(1)
            .monitoring(false)
            .replicaCount(0)
            .build();
        case SILVER -> DatabaseConfig.builder()
            .useAurora(true)   // Aurora cluster
            .backupDays(7)
            .monitoring(true)
            .replicaCount(1)   // 1 read replica
            .build();
        case GOLD -> DatabaseConfig.builder()
            .useAurora(true)   // Aurora cluster
            .backupDays(30)
            .monitoring(true)
            .replicaCount(2)   // 2 read replicas
            .build();
    };
}

This is a pure function. It takes an enum and returns a record. There are no cloud calls, no Pulumi engine, and no side effects. We will test it with JUnit in the next section.

The returned config drives resource creation with ordinary control flow. An if statement picks the database engine. A for loop creates replicas:

var config = getConfigForSLA(SLATier.SILVER);

if (config.useAurora()) {
    //Define an Aurora Cluster
    var cluster = new Cluster("app-db", ClusterArgs.builder()
        .engine("aurora-postgresql")
        .databaseName("processor")
        .backupRetentionPeriod(config.backupDays())
        .storageEncrypted(true)
        .build());
    //Add replicas which increase availability
    for (int i = 0; i < config.replicaCount(); i++) {
        new ClusterInstance("app-db-replica-" + i,
            ClusterInstanceArgs.builder()
                .clusterIdentifier(cluster.id())
                .instanceClass("db.t3.medium")
                .engine("aurora-postgresql")
                .build());
    }
} else {
    new Instance("app-db", InstanceArgs.builder()
        .engine("postgres")
        .instanceClass("db.t3.medium")
        .allocatedStorage(20)
        .backupRetentionPeriod(config.backupDays())
        .build());
}

BRONZE creates a single RDS instance. SILVER creates an Aurora cluster with one read replica. GOLD creates Aurora with two replicas and 30-day backups. Change the enum value, and the infrastructure changes with it.

This is the kind of logic that is natural in Java and awkward in a configuration language. An if/else that chooses between two different resource types, and a loop that creates a variable number of resources based on a computed value. The equivalent in HCL requires restructuring the configuration around count and conditional expressions that obscure the intent.

Grouping Resources into Components

As this kind of logic grows, you want to package it for reuse. Pulumi provides a mechanism for this: component resources. A component is a Java class that extends Pulumi’s base class ComponentResource and creates child resources in its constructor. From the outside, it looks like a single resource with its own inputs and outputs. Inside, it can generate as many real cloud resources as necessary.

In the companion repository, the database logic above is wrapped into a ManagedDatabaseComponent, and the ECS Fargate setup (cluster, task definition, IAM roles, logging) is wrapped into a ContainerServiceComponent. The main program uses them like this:

var database = new ManagedDatabaseComponent("app-db",
    DatabaseArgs.builder()
        .slaTier(SLATier.SILVER)
        .username("appuser")
        .password(config.requireSecret("dbPassword"))
        .databaseName("processor")
        .build());

var service = new ContainerServiceComponent("processor-service",
    ContainerServiceArgs.builder()
        .image(imageRef)
        .environment(Map.of(
            "DATABASE_URL", database.connectionString(),
            "QUEUE_URL", queue.url()))
        .slaTier(SLATier.SILVER)
        .port(8080)
        .build(),
    subnetIds, sgIds);

The caller does not need to know whether ManagedDatabaseComponent creates a single RDS instance or an Aurora cluster with replicas. That decision is internal, driven by the SLA tier.

A note on the password: config.requireSecret("dbPassword") reads a value from the stack’s configuration file where it is stored as ciphertext:

infrastructure:dbPassword:
  secure: v1:LoMrsybSW3y3T+YY:Zw1vs1Ey2U8s8+Qf2CzC2p7vds0R2NalQP6LVA==

Pulumi decrypts it at deployment time and returns an Output<String>. The secret propagates through the resource graph with the same protection: it is masked in logs and encrypted in the state file. In Terraform, values stored in state are plaintext by default. Here, encryption is the default for any value marked as secret.

Running pulumi up evaluates the program, builds the dependency graph, and shows a preview:

Previewing update (dev)

     Type                                    Name                         Plan
 +   pulumi:pulumi:Stack                     infrastructure-dev            create
 +   ├─ custom:database:ManagedDatabase      app-db                       create
 +   │  ├─ aws:rds:Cluster                   app-db                       create
 +   │  ├─ aws:rds:ClusterInstance           app-db-primary               create
 +   │  └─ aws:rds:ClusterInstance           app-db-replica-0             create
 +   ├─ custom:container:ContainerService    processor-service            create
 +   │  ├─ aws:cloudwatch:LogGroup           processor-service-logs       create
 +   │  ├─ aws:ecs:Cluster                   processor-service-cluster    create
 +   │  ├─ aws:iam:Role                      processor-service-exec-role  create
 +   │  ├─ aws:iam:Role                      processor-service-task-role  create
 +   │  ├─ [...]                             [...]
 +   │  ├─ aws:ecs:TaskDefinition            processor-service-task       create
 +   │  └─ aws:ecs:Service                   processor-service-service    create
 +   ├─ aws:ec2:Vpc                          app-vpc                      create
 +   ├─ aws:ec2:Subnet                       app-subnet-a                 create
 +   ├─ aws:ec2:Subnet                       app-subnet-b                 create
 +   ├─ aws:ec2:SecurityGroup                app-sg                       create
 +   ├─ aws:sqs:Queue                        processor-queue              create
 +   ├─ aws:ecr:Repository                   app-repo                     create
 +   └─ docker-build:Image                   app-image                    create

Resources: +21 to create

The tree shows how components organize resources. app-db and processor-service appear as parent nodes that group their child resources. After confirmation, Pulumi builds the container, pushes it to ECR, and provisions all 21 resources. If you run it again, Pulumi compares its state with your program, finds no differences, and does not change any resources because of its idempotence.

5. Testing Infrastructure with JUnit

Because the previously introduced getConfigForSLA method is a pure function, it’s easily testable with plain JUnit. You don’t need a Pulumi engine, cloud APIs, or mocks:

@Test
void bronzeSLAUsesSingleRDS() {
    var config = ManagedDatabaseComponent.getConfigForSLA(SLATier.BRONZE);
    assertFalse(config.useAurora(),
        "Bronze tier should use simple RDS, not Aurora");
    assertEquals(0, config.replicaCount());
    assertEquals(1, config.backupDays());
}

The Silver and Gold tests follow the same structure and verify Aurora with 1 or 2 replicas, respectively. The pattern applies to other components too. The ContainerServiceComponent extracts resource allocation into static methods:

@Test
void silverTierAllocatesCorrectResources() {
    assertEquals(512, ContainerServiceComponent.getContainerCpu(SLATier.SILVER, 256));
    assertEquals(1024, ContainerServiceComponent.getContainerMemory(SLATier.SILVER, 512));
    assertEquals(2, ContainerServiceComponent.getDesiredCount(SLATier.SILVER));
    assertEquals(30, ContainerServiceComponent.getLogRetentionDays(SLATier.SILVER));
}

These tests run in milliseconds as part of a normal ./gradlew test. They run without Pulumi CLI and do not need either cloud credentials or cost-incurring deployments.  The design principle: extract infrastructure decisions into pure methods, test those methods with standard JUnit.

What to Test and What Not To

Infrastructure tests should verify your decisions, not your cloud provider’s API. Good tests cover SLA-to-configuration mappings, naming conventions, tagging logic, and custom validation. Don’t test whether AWS actually creates an RDS instance when you call its API; that’s Amazon’s job.

The testing pyramid applies to infrastructure the same way it applies to application code:

Unit tests catch configuration errors before any cloud resources are created. Integration tests (deploying to ephemeral stacks) verify that your infrastructure actually works end-to-end.

A Note on Policies

Pulumi’s Policy-as-Code feature (CrossGuard) enables organization-wide guardrails: blocking unencrypted storage, enforcing tagging, and limiting instance sizes. Currently, policies can only be authored in TypeScript or Python, though they enforce resources created by any Pulumi language.

Policies and tests serve different purposes. Tests verify your component logic during development. Policies enforce organizational standards during deployment, regardless of which team wrote the code or what language they used.

6. CI/CD: One Pipeline for Application and Infrastructure

Since the infrastructure is written in Java, it fits into the same CI/CD pipeline as the application.

Our GitHub Actions workflow has two jobs:

For pull requests, the pipeline runs JUnit tests, then executes pulumi preview. This computes a diff between the current infrastructure and the desired state, without modifying anything. The diff is posted as a PR comment, so reviewers can see exactly which resources would be created, updated, or deleted before the code is merged.

On merge to main, the pipeline runs the same tests, then executes pulumi up. Pulumi builds the application container, pushes it to ECR, and provisions or updates all resources. If any step fails, the update stops. Resources already created remain in place, and the state file records partial progress, so the next pulumi up picks up where it left off.

Infrastructure changes become pull requests. When a developer needs to add an environment variable to the ECS task or resize the database, they change the infrastructure code in the same PR. Tests run, reviewers see the diff, approve, and the change rolls out to production.

Environments

Different environments use the same code with different configurations. Dev gets a single small RDS instance and one ECS task. Production gets the Aurora cluster with read replicas. Same Java code, different config files, completely separate state.

Secrets like database passwords work the same way across different stacks: each environment has its own encrypted values in its config file. In Continuous Integration, the decryption passphrase is usually injected via a pipeline secret. You can also inject secrets from AWS Secrets Manager, GitHub Actions secrets, or other sources your organization uses.

The same pattern works in any CI/CD system. GitLab CI, Jenkins, Azure DevOps: they all just run pulumi preview on PRs and pulumi up on merges.

7. Reusable Components as Shared Libraries

Components can be packaged as JARs and published to your internal Maven repository. Teams consume them as Maven/Gradle dependencies:

dependencies {
    implementation 'com.yourcompany.platform:container-service:2.1.0'
    implementation 'com.yourcompany.platform:managed-database:1.5.0'
}

This changes the platform team’s role. Instead of being gatekeepers who manually provision infrastructure, they become library authors who encode organizational standards into versioned components. Application teams consume these components and operate independently, within guardrails enforced by the component code itself.

When the platform team needs to roll out a change, enabling Performance Insights on all Gold-tier databases, for instance, they update the component and publish a new version. Teams adopt it on their own schedule, just as they’d adopt any library update.

Pulumi also supports multi-language components: write a component in Java, and Pulumi generates typed SDKs for TypeScript, Python, and Go. This matters in polyglot organizations where standardizing on one language isn’t realistic.

8. Working with Existing Infrastructure

The question always comes up: “We have hundreds of resources already running in production. We can’t start over.”

You don’t have to.

Importing Resources

Pulumi CLI can adopt existing cloud resources into its state using the pulumi import operation. You write the Java code describing a resource, then tell Pulumi to import it instead of creating a new one:

var existingDb = new DbInstance("legacy-db", DbInstanceArgs.builder()
    .identifier("production-user-db")
    .engine("postgres")
    .instanceClass("db.t3.medium")
    .allocatedStorage(100)
    // ... other properties matching current state
    .build(), CustomResourceOptions.builder()
        .import_("production-user-db")  // import_ with underscore — "import" is a Java keyword
        .build());

Pulumi reads the current state from AWS, verifies it matches your code, and adopts the resource. From that point forward, drift between code and actual infrastructure is detected on every pulumi preview.

Coexisting with Terraform

Pulumi and Terraform can manage different resources in the same environment without conflict: they maintain separate state. Start new services with Pulumi while existing infrastructure stays in Terraform. Using the terraform-state provider, Pulumi can also read Terraform state files to reference outputs, allowing your Pulumi program to look up a database connection string from Terraform-managed resources without migrating them.

Converting YAML to Code

The pulumi convert command converts CloudFormation, Terraform HCL, and Kubernetes YAML to Pulumi code in any supported language:

# Convert Kubernetes YAML to Java
pulumi convert --from kubernetes --language java --out ./pulumi-app

# Convert CloudFormation to Java
pulumi convert --from cloudformation --language java --out ./pulumi-infra

The generated code is a starting point, not a finished product. The value comes after conversion: five copy-pasted database definitions become a loop, repeated configuration becomes a method, and hardcoded values become parameters.

When to Migrate

Not everything needs to be migrated. Stable Terraform modules that manage infrastructure that rarely changes can remain unchanged. Focus on infrastructure that changes frequently, is hard to manage in its current form, or is tied to services under active development.

The pragmatic question isn’t “should we migrate everything?” but “where will migration actually reduce errors or speed up the team?” Infrastructure nobody touches can stay as-is until there’s a real reason to change it.

9. Trade-offs

Using a general-purpose language for infrastructure has costs. They are worth understanding before committing to the approach.

Verbosity for simple cases. A domain-specific language optimized for resource declarations can be more concise. A three-line HCL block expands to six or seven lines of Java with builder and .build() calls. If your infrastructure consists of only a few static resources with no conditional logic, a DSL might be the better choice. The crossover point occurs when infrastructure involves real logic, conditionals, loops, and environment-specific behavior, at which DSLs begin to strain.

Deferred values. Any tool that lets you write imperative code against resources that don’t exist yet needs a way to represent values that are not yet known. Pulumi uses Output<T>, CDK uses Token. This is not a quirk of a specific tool; it is a consequence of the model. Composing these deferred values adds friction: string concatenation becomes Output.format(), and code that threads outputs through multiple resources reads more like reactive programming than straightforward Java.

Team readiness. Writing infrastructure in Java does not turn infrastructure engineers into Java developers, and writing Java does not make application developers infrastructure-literate. The approach works best when teams already have some overlap between these skills or are willing to deliberately develop it. Without that investment, a shared language can become a shared source of confusion.

These costs are front-loaded. The return increases with complexity: the more logic, environments, and cross-team reuse your infrastructure involves, the greater the investment grows.

10. Conclusion

We covered a working example: a Quarkus service with its entire infrastructure defined in Java. The infrastructure code uses standard language features: switch expressions for SLA tier mappings, loops for read replicas, and records for configuration data. Tests are run with JUnit, deployment is handled by GitHub Actions, and the components are reusable Java classes.

The central argument is not about Pulumi specifically. It is that infrastructure code deserves the same engineering practices we apply to application code: type safety, testing, refactoring, and code review. Tools like Pulumi and AWS CDK make this possible. Whether the approach fits your team depends on the complexity of your infrastructure and the extent to which it involves real logic rather than static resource declarations.

The complete implementation (components, tests, CI/CD workflow) is at github.com/wlami/stop-writing-yaml-javapro.

This article is part of the JAVAPRO magazine issue:

From Coder To System Designer

Understand what it means to move from coding to designing systems in the age of AI.
Take a closer look at modern Java platforms, architectural thinking, and the responsibilities that come with shaping complex software systems.

Discover the edition 

Total
0
Shares
Previous Post

Java Developers, You’re Already Ready for Blockchain — You Just Don’t Know It Yet

Next Post

Simpler JVM Project Setup with Mill

Related Posts