Introduction

Imagine your CI pipeline not just running tests, but also investigating failures, triaging issues, writing release notes, and fixing bugs—automatically. That's exactly what the Coding Agent Sandboxes team at Docker achieved with their Fleet: seven AI agent roles that work together autonomously inside microVM sandboxes. In this step-by-step guide, you'll learn how to replicate this approach using Claude Code skills and a local-first development mindset. Whether you're a solo developer or part of a larger team, these principles will help you build a virtual team that ships faster and reduces manual toil.

How to Create a Fleet of AI Coding Agents for CI Automation — Source: www.docker.com

What You Need

Claude Code (or a compatible AI agent framework that supports skill files)
A sandboxing tool for isolation (e.g., Docker with microVM support, sbx from Docker, or any secure execution environment)
A CI platform (GitHub Actions, GitLab CI, etc.) where agents can run
A project with a CLI tool or service you want to test and maintain
Basic familiarity with YAML/JSON configuration and shell scripting
Version control (Git) for managing skill files

Step-by-Step Guide

Step 1: Define Your Agent Roles

Start by listing the repetitive tasks you want to automate. The Docker team created roles like /cli-tester (exploratory testing), /build-engineer (handling builds and releases), /triage-agent (managing issue backlog), and /release-notes-writer. Each role should have a clear persona, set of responsibilities, and allowed tools. Don't think of these as scripts—think of them as autonomous decision-makers. For example, a build engineer's persona might be: "You are a meticulous build engineer who verifies cross-platform compatibility and detects resource leaks."

Step 2: Create Skill Files as Role Descriptions

In Claude Code, a skill is a markdown file that describes the agent's role. It’s not a list of commands to run, but a guide for judgment. For each role, write a skill file that includes:

System prompt with the persona and context.
Available tools (e.g., shell access, file system, network calls).
Decision-making rules: what to do when tests fail unexpectedly, how to prioritize issues, etc.
Success criteria so the agent knows when it's done.

Save these files in a skills/ directory in your repo. For instance, skills/cli-tester.md.

Step 3: Test Locally First

This is the most important principle from Docker's experience. Always iterate on your agent locally before wiring it into CI. Run the skill file directly in your terminal using Claude Code. Watch the agent think: does it build binaries? Does it exercise CLI commands? Does it find real issues? Adjust the skill file until the behavior matches your expectations. This local iteration cycle takes seconds, not minutes. You avoid the painful commit-push-wait-read-logs loop.

Step 4: Isolate Agents with Sandboxes

Agents need full autonomy without touching your host system. Docker's Coding Agent Sandboxes (sbx) provides secure microVM-based isolation—each agent gets its own Docker daemon, network, and filesystem. You can use any equivalent tool (e.g., Docker-in-Docker, Firecracker, or a managed sandbox service). Ensure the sandbox is configured to mount your workspace and grant necessary permissions (like running sbx commands or installing dependencies).

Step 5: Wire Agents into CI with a Thin Workflow

Once a skill is polished locally, create a CI workflow that calls the same skill file. The Docker team uses GitHub Actions with matrix builds for macOS, Linux, and Windows. The workflow is minimal: set up the environment, check out the code, and invoke the skill (e.g., claude-code skill skills/cli-tester.md). There is no separate "CI version" of the skill—just one file, two runtimes. This consistency ensures that what you test locally is exactly what runs in CI.

Step 6: Orchestrate a Fleet of Agents

To create a virtual team, define multiple skill files and schedule them in your CI. For example:

Nightly runs: /cli-tester runs cross-platform tests; /triage-agent scans new issues and adds labels.
On release: /build-engineer builds and tests upgrade paths; /release-notes-writer generates release notes from commit history.
On demand: /bug-fixer attempts to reproduce and patch reported bugs.

Each agent runs in its own sandbox, independent and isolated. They can even communicate by posting results to a shared issue tracker or Slack channel.

Step 7: Monitor, Iterate, and Expand

Treat your agent fleet as living software. Review their decisions—did they miss an issue? Did they misunderstand a failing test? Update the skill files to improve judgment. Add new roles as your project grows. The Docker team emphasizes that a good skill file is like a good employee manual: it's clear, contextual, and gives the agent the freedom to act intelligently.

Tips for Success

Start small: Begin with one or two roles (e.g., tester and triage) before scaling to a full fleet.
Invest in skill quality: Spend time refining the persona and decision rules. A vague skill leads to unpredictable agents.
Use version control: Keep skill files under Git; this allows you to roll back if an agent's behavior changes unexpectedly.
Log agent actions: In CI, capture the agent's terminal output and any artifacts so you can audit what happened.
Combine human and AI: Let agents handle the routine; humans can focus on strategic decisions and rare edge cases.
Revisit monthly: As your project evolves, update role definitions to reflect new features or workflows.

By following these steps, you can build a virtual agent team that ships faster and reduces developer burnout. The key is turning your CI pipeline from a passive verification step into an autonomous problem-solving workforce—all starting with a simple skill file on your laptop.

How to Create a Fleet of AI Coding Agents for CI Automation