The first time I wrote an Ansible role with tests, I thought I was wasting time. Writing a failing test, then code to make it pass, for something as deterministic as “install this package.” The indirection felt pointless.

Third time I caught a regression before it hit a live host, I stopped complaining about it.

TDD for infrastructure is the same discipline as TDD for application code. Same reasons it matters, same objections people have, same payoff when you stick with it.

Why infrastructure needs tests more than application code

Application code fails loudly. Stack trace. Broken build. Fast feedback.

Infrastructure fails quietly. A misconfigured service starts fine but behaves wrong. A missing package doesn’t matter until something tries to use it three months later. A task that’s idempotent on Fedora silently breaks on Ubuntu. You find out when you’re running a playbook on a production host at midnight and something’s not where it should be.

Tests close that loop. Running Molecule against a role before pushing is what gives you the same confidence in your infrastructure code that a test suite gives you in application code.

The 95% threshold

The Mosburn Lab holds 95% coverage for both tests and documentation:

  • Every task in every role has a Molecule test verifying the intended end state
  • Every variable in defaults/main.yml has documentation in the role README
  • Every playbook has a comment header explaining what it does and how to run it

95% isn’t 100%. The gap is for tasks where testing the outcome is genuinely harder than testing the behavior — waiting on an external API after a service starts, verifying a database migration ran correctly. Those get integration tests rather than unit tests. They still get tests.

The workflow

For every new role or task:

  1. Write a Molecule scenario asserting the desired end state
  2. Run Molecule — confirm the test fails for the right reason
  3. Write the task
  4. Run Molecule — confirm it passes
  5. Run again with --check to verify idempotence

Step 2 is not optional. A test that passes before you write the code isn’t testing anything. It’s wrong documentation waiting to burn you.

# Start a new role
ansible-galaxy role init roles/mosburn.newrole
cd roles/mosburn.newrole
molecule init scenario

# Write the test first
vim molecule/default/verify.yml

# Confirm it fails
molecule test

# Write the role tasks
vim tasks/main.yml

# Confirm it passes
molecule test

# Confirm idempotence
molecule converge
molecule idempotence

What a Molecule scenario looks like

For mosburn.keycloak, the verify step checks that the Docker service is running, the Keycloak container responds on the configured port, and the systemd unit is enabled:

---
- name: Verify Keycloak deployment
  hosts: all
  tasks:
    - name: Assert keycloak systemd unit is enabled and active
      ansible.builtin.systemd:
        name: keycloak
      register: keycloak_service
      failed_when: >
        keycloak_service.status.ActiveState != 'active' or
        keycloak_service.status.UnitFileState != 'enabled'

    - name: Assert Keycloak HTTP endpoint responds
      ansible.builtin.uri:
        url: "http://localhost:8080/realms/master"
        status_code: 200
      retries: 10
      delay: 10

    - name: Assert docker-compose.yml is present
      ansible.builtin.stat:
        path: /opt/keycloak/docker-compose.yml
      register: compose_file
      failed_when: not compose_file.stat.exists

The native installation path gets its own Molecule scenario — different platform image, verify step checking the Keycloak binary and PostgreSQL instead of Docker.

Multi-distro testing

Every role supporting Fedora and Ubuntu gets a matrix that tests both:

# molecule/default/molecule.yml
platforms:
  - name: fedora
    image: docker.io/fedora:latest
    pre_build_image: true
  - name: ubuntu
    image: docker.io/ubuntu:24.04
    pre_build_image: true

This is where the per-distribution task file pattern earns its keep. tasks/Fedora.yml and tasks/Ubuntu.yml test independently in the matrix. A breakage on one platform doesn’t hide behind a passing test on the other.

Documentation coverage

The 95% doc threshold applies to role variables. Every variable in defaults/main.yml needs a README entry covering what it controls, the default value, and any gotchas.

mosburn.keycloak has 12 variables. All 12 documented. When I need to pass mosburn_keycloak_native_version to a playbook run six months from now, I won’t be grepping the role source to figure out what it does.

The discipline pays off

These roles deploy across Fedora, Ubuntu, and Gentoo. Some have been running for months. When I add a new distribution target or change a package name, Molecule tells me before a live host does.

That’s the whole point. Tests aren’t fun to write. But a lab that fails silently isn’t something you can trust, and infrastructure you can’t trust is a liability you happen to own.

Write the test first. Always.