From a Jira ticket to a pull request: the end-to-end autonomous system
How I turned the agent into an autonomous system: a Jira ticket triggers it (webhook or polling), a queue with persistent state processes it exactly once, and it returns a PR and a Jira comment.
Every previous post built the worker: an agent loop with tools that runs isolated in a real repo, reproduces the bug with a test, and ships a pull request. What was missing is what starts it and what reports back when it finishes. This post is that wrapper: making a Jira ticket trigger the whole system without anyone pressing a button, processing it exactly once even if the trigger repeats, running it in an ephemeral workspace per ticket, and returning the result as a PR and a comment on the ticket itself. It’s where the series lands: the agent stops being something you run by hand and becomes a service that reacts to tickets.
TL;DR
- The trigger doesn't run the agent: it only enqueues. A Jira webhook gives low latency but drops events; JQL polling is slow but loses nothing. Use the webhook to react fast and polling as the net that recovers what the webhook missed.
- The property that holds everything together is idempotency: process each ticket exactly once. You get it with persistent state in a database and an atomic claim, not with an in-memory flag that's lost when the process restarts.
- Each ticket runs in its own ephemeral workspace (the worktree from the previous post) and the output is a PR plus a Jira comment with the link. A person still decides the merge.
In this article:
- Fundamentals — What changes when you make it autonomous · The trigger: polling or webhook
- Implementation — The queue and persistent state · Idempotency · The ephemeral workspace per ticket
- Operation — Reporting the result back to Jira · The end-to-end diagram · Where it breaks
What changes when you make it autonomous
Up to the previous post, the agent was a program you invoked: you handed it a goal, it ran in its worktree, opened the PR, and finished. Someone had to start it. Making it autonomous is removing that manual start: the system watches a source of work—a queue of Jira tickets—and, when one shows up marked as ready, it processes it on its own.
The change isn’t in the agent, which is still the same loop with the same tools. It’s in the three pieces around it: something that learns there’s work to do (the trigger), something that guarantees that work is done once and isn’t lost if the process crashes (the stateful queue), and something that returns the result to where the request was born (the Jira comment). The agent does the work; this post builds what connects it to a real input and a real output.
Before (manual) Now (autonomous)
you ──► runAgent(goal) Jira ticket ──► trigger ──► queue
│ │
▼ ▼
PR worker ──► agent ──► PR
│
▼
comment on Jira
The word “autonomous” is scarier than it should be, and it also overpromises. It’s not that the agent decides what to do about the business; it’s that nobody has to copy the ticket text and launch the process by hand. The criterion of what gets done is still set by a person when they mark the ticket, and the criterion of what gets integrated is still set by whoever reviews the PR. Autonomous here means “no intervention in the middle,” not “no control.”
It’s worth being honest about what this system is not: it doesn’t plan, doesn’t prioritize, doesn’t coordinate some tasks with others, doesn’t remember anything from one ticket to the next, and doesn’t orchestrate dependencies between them. It’s a worker that reacts to a queue, one ticket at a time. Everything that follows is the infrastructure that makes that reaction reliable—so the work isn’t lost or duplicated—not an intelligence that decides on its own. If you expect planning or coordination, this isn’t that system; it’s the step before, and it’s the one almost everyone needs first.
The trigger: polling or webhook
The first piece is how the system learns there’s a ticket to work on. There are two ways, and it’s worth understanding the trade-off before choosing.
The webhook is Jira pushing: you configure a URL and Jira sends it a POST every time a ticket changes. It’s low latency—you react in seconds—and you don’t spend calls asking in vain. The cost is that you need a public endpoint and that webhooks get dropped: if your server is down or slow when Jira fires, that event doesn’t come back. Jira retries a couple of times and then discards it.
Polling is your system asking: every so often you run a JQL query that pulls the tickets marked as ready and enqueue the new ones. It’s simple, needs no public endpoint, and loses nothing, because the next round sees the ticket again if it’s still pending. The cost is latency—you wait for the next cycle—and calls that often bring back nothing.
| Webhook | Polling | |
|---|---|---|
| Latency | Low (seconds) | High (up to one cycle) |
| Public endpoint | Yes | No |
| Drops events | Yes (if you’re down) | No |
| Wasted calls | No | Yes |
| Complexity | Medium | Low |
The polling query is a JQL that filters by project, by a label that marks the ticket as fit for the agent, and by status:
project = ENG
AND labels = agent-ready
AND status = "To Do"
ORDER BY created ASC
The agent-ready label matters more than it looks: it’s what keeps any ticket from triggering an agent. Without that filter, every new ticket would spend an agent run—and the LLM calls that implies—without anyone asking for it. The trigger only looks at the tickets someone marked on purpose.
The poller is a loop that runs the JQL and enqueues every ticket it doesn’t already know:
// poller.ts — asks Jira every so often and enqueues what's new.
import { enqueueTicket } from "./queue";
const JQL = 'project = ENG AND labels = agent-ready AND status = "To Do" ORDER BY created ASC';
async function poll() {
const res = await fetch(
`https://your-org.atlassian.net/rest/api/3/search?jql=${encodeURIComponent(JQL)}`,
{ headers: { Authorization: `Basic ${process.env.JIRA_AUTH}`, Accept: "application/json" } },
);
const { issues } = await res.json();
// Enqueuing doesn't process: it only records that the ticket exists.
// De-dup is handled by the queue, so re-enqueuing the same ticket is harmless.
for (const issue of issues) {
await enqueueTicket(issue.key, issue.fields.summary);
}
}
// One cycle every 30s: the max polling latency is that interval.
setInterval(poll, 30_000);# poller.py — asks Jira every so often and enqueues what's new.
import os
import time
import requests
from queue import enqueue_ticket
JQL = 'project = ENG AND labels = agent-ready AND status = "To Do" ORDER BY created ASC'
def poll() -> None:
res = requests.get(
"https://your-org.atlassian.net/rest/api/3/search",
params={"jql": JQL},
headers={"Authorization": f"Basic {os.environ['JIRA_AUTH']}", "Accept": "application/json"},
)
# Enqueuing doesn't process: it only records that the ticket exists.
# De-dup is handled by the queue, so re-enqueuing the same ticket is harmless.
for issue in res.json()["issues"]:
enqueue_ticket(issue["key"], issue["fields"]["summary"])
# One cycle every 30s: the max polling latency is that interval.
while True:
poll()
time.sleep(30)<?php
// poller.php — asks Jira every so often and enqueues what's new.
require "queue.php"; // enqueue_ticket()
const JQL = 'project = ENG AND labels = agent-ready AND status = "To Do" ORDER BY created ASC';
function poll(): void {
$url = "https://your-org.atlassian.net/rest/api/3/search?jql=" . urlencode(JQL);
$ctx = stream_context_create(["http" => ["header" =>
"Authorization: Basic " . getenv("JIRA_AUTH") . "\r\nAccept: application/json"]]);
$data = json_decode(file_get_contents($url, false, $ctx), true);
// Enqueuing doesn't process: it only records that the ticket exists.
// De-dup is handled by the queue, so re-enqueuing the same ticket is harmless.
foreach ($data["issues"] as $issue) {
enqueue_ticket($issue["key"], $issue["fields"]["summary"]);
}
}
// One cycle every 30s: the max polling latency is that interval.
while (true) {
poll();
sleep(30);
}The webhook uses the exact same enqueueTicket: an HTTP handler that validates the event signature, pulls the ticket key out of the payload, and enqueues. That’s why they don’t compete with each other. Whether it’s worth having both depends on your operational context: if your endpoint can be down when Jira fires, or you don’t trust the provider to retry until it delivers, combining webhook and polling lowers the risk of losing events—the webhook to react in seconds, and polling running every few minutes as the net that recovers what the webhook let through. It’s not a universal rule: there are systems where a webhook with retries and the event persisted the moment it arrives are more than enough. But if you have to pick one for simplicity, polling is the one that doesn’t lose work, and losing work is usually worse than reacting late.
The queue and persistent state
Here’s the decision almost everyone gets wrong the first time: the trigger does not run the agent. It only enqueues. The temptation is for the webhook handler to run the agent right there and respond when it’s done, but that breaks on two fronts. First, an agent task takes minutes, and a webhook has to respond in seconds or Jira marks it failed and retries—firing the work again. Second, if the process restarts in the middle of a task that only lived in memory, that work is lost without a trace.
The separation is the fix: the trigger writes a row in a table, and a separate worker picks it up and processes it. The table is the queue, and each ticket’s state lives there, not in a process’s memory. If everything restarts, the state is still in the database and the worker resumes where it left off.
-- The queue is a table. Each ticket's state lives here, not in memory.
CREATE TABLE tickets (
jira_key TEXT PRIMARY KEY, -- ENG-1234: one per ticket, no duplicates
summary TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'queued', -- queued | running | pr_open | failed
attempts INT NOT NULL DEFAULT 0, -- to cut off tickets that always fail
pr_url TEXT, -- filled in when the agent opens the PR
claimed_at TIMESTAMPTZ, -- when a worker took it (to detect hangs)
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
jira_key as the primary key is half of the de-dup: enqueuing the same ticket twice doesn’t create two rows, the second insert collides with the key and does nothing. Enqueuing, then, is an INSERT that ignores the conflict:
// queue.ts — enqueuing is idempotent: the primary key absorbs the duplicates.
import { db } from "./db";
export async function enqueueTicket(jiraKey: string, summary: string) {
// If the ticket is already in the queue, ON CONFLICT leaves it as is.
// Re-enqueuing from polling or a repeated webhook creates no duplicate work.
await db.query(
`INSERT INTO tickets (jira_key, summary)
VALUES ($1, $2)
ON CONFLICT (jira_key) DO NOTHING`,
[jiraKey, summary],
);
}# queue.py — enqueuing is idempotent: the primary key absorbs the duplicates.
from db import db
def enqueue_ticket(jira_key: str, summary: str) -> None:
# If the ticket is already in the queue, ON CONFLICT leaves it as is.
# Re-enqueuing from polling or a repeated webhook creates no duplicate work.
db.execute(
"""INSERT INTO tickets (jira_key, summary)
VALUES (%s, %s)
ON CONFLICT (jira_key) DO NOTHING""",
(jira_key, summary),
)<?php
// queue.php — enqueuing is idempotent: the primary key absorbs the duplicates.
require "db.php"; // exposes $db (PDO)
function enqueue_ticket(string $jiraKey, string $summary): void {
// If the ticket is already in the queue, ON CONFLICT leaves it as is.
// Re-enqueuing from polling or a repeated webhook creates no duplicate work.
global $db;
$stmt = $db->prepare(
"INSERT INTO tickets (jira_key, summary)
VALUES (?, ?)
ON CONFLICT (jira_key) DO NOTHING"
);
$stmt->execute([$jiraKey, $summary]);
}With this, the trigger can re-enqueue the same ticket as many times as it wants—polling will do it every 30 seconds until the status changes—without dirtying the queue. The state moves in one direction only: queued → running → pr_open, or failed if the attempts run out.
Idempotency: don’t process the same ticket twice
The primary key prevents enqueuing twice, but the other half is missing: keeping two workers from taking the same ticket at once. With polling and webhook feeding the queue, and maybe several workers running in parallel, this isn’t a rare case, it’s the normal one. If two workers read the queue at the same instant, both see the ticket in queued and both process it: two worktrees, two PRs, two Jira comments for the same bug.
The fix isn’t a flag or an in-memory lock—that’s lost on restart and doesn’t cross between processes. It’s an atomic claim: a single query that moves the ticket from queued to running and, in the same step, hands it back to you. The database guarantees that a single worker wins that UPDATE; the other finds no matching row in queued and moves on.
-- Atomic claim: move from queued to running and return the ticket, in one step.
-- The DB guarantees only ONE worker wins this UPDATE; the others don't see the row.
UPDATE tickets
SET status = 'running',
attempts = attempts + 1,
claimed_at = now()
WHERE jira_key = (
SELECT jira_key FROM tickets
WHERE status = 'queued'
ORDER BY created_at ASC
FOR UPDATE SKIP LOCKED -- don't wait on the row another worker is already claiming
LIMIT 1
)
RETURNING jira_key, summary;
The FOR UPDATE SKIP LOCKED is the fine part: it locks the row this worker is about to claim and makes any other worker arriving in parallel skip it instead of waiting on it. So N workers claim N different tickets without stepping on each other and without waiting on one another. The worker wraps that claim in its loop: take a ticket, process it, and if there’s nothing, wait a moment and try again.
// worker.ts — claim a ticket, process it, repeat. One at a time per worker.
import { db } from "./db";
import { processTicket } from "./process";
const CLAIM = `
UPDATE tickets SET status='running', attempts=attempts+1, claimed_at=now()
WHERE jira_key = (
SELECT jira_key FROM tickets WHERE status='queued'
ORDER BY created_at ASC FOR UPDATE SKIP LOCKED LIMIT 1)
RETURNING jira_key, summary`;
async function workerLoop() {
for (;;) {
const { rows } = await db.query(CLAIM);
if (rows.length === 0) {
await new Promise((r) => setTimeout(r, 5_000)); // empty queue: wait and retry
continue;
}
// We won the claim: this ticket is ours and no one else's.
await processTicket(rows[0].jira_key, rows[0].summary);
}
}
workerLoop();# worker.py — claim a ticket, process it, repeat. One at a time per worker.
import time
from db import db
from process import process_ticket
CLAIM = """
UPDATE tickets SET status='running', attempts=attempts+1, claimed_at=now()
WHERE jira_key = (
SELECT jira_key FROM tickets WHERE status='queued'
ORDER BY created_at ASC FOR UPDATE SKIP LOCKED LIMIT 1)
RETURNING jira_key, summary"""
def worker_loop() -> None:
while True:
rows = db.query(CLAIM)
if not rows:
time.sleep(5) # empty queue: wait and retry
continue
# We won the claim: this ticket is ours and no one else's.
process_ticket(rows[0]["jira_key"], rows[0]["summary"])
worker_loop()<?php
// worker.php — claim a ticket, process it, repeat. One at a time per worker.
require "db.php"; // $db (PDO)
require "process.php"; // process_ticket()
const CLAIM = "
UPDATE tickets SET status='running', attempts=attempts+1, claimed_at=now()
WHERE jira_key = (
SELECT jira_key FROM tickets WHERE status='queued'
ORDER BY created_at ASC FOR UPDATE SKIP LOCKED LIMIT 1)
RETURNING jira_key, summary";
function worker_loop(): void {
global $db;
while (true) {
$row = $db->query(CLAIM)->fetch(PDO::FETCH_ASSOC);
if (!$row) {
sleep(5); // empty queue: wait and retry
continue;
}
// We won the claim: this ticket is ours and no one else's.
process_ticket($row["jira_key"], $row["summary"]);
}
}
worker_loop();It’s worth being precise about what this guarantees and what it doesn’t. The trigger delivers at least once: the webhook retries and polling sees the ticket again, so the same work can arrive several times. What this design achieves isn’t exactly-once—that delivery, in practice, doesn’t exist—but at-least-once delivery plus an idempotent effect: the ticket can be enqueued and claimed many times, but it’s processed only once. The difference matters when the effect leaves the system—opening a PR, commenting on Jira—because those steps also have to be idempotent: “only once” rests on the effect being applied once, not on the message arriving once.
The claimed_at field covers the ugly case: a worker claims a ticket, moves it to running, and crashes before finishing. That ticket stays in running forever and no one takes it again, because the claim only looks at queued rows. The net is a periodic sweep that returns to queued the tickets that have been in running too long—an expired claim—so another worker can pick them up. It’s the equivalent of a lease with an expiry: claiming a ticket isn’t forever, it’s for a while.
That sweep has an edge worth seeing: if you return to queued a ticket whose worker was actually still alive—just slow—you end up with two workers on the same ticket. That’s why the lease shouldn’t be a fixed, blind timeout. The worker processing a long task renews its claim while it works—a heartbeat that pushes claimed_at forward every so often—and the sweep only recovers tickets whose claim truly expired because the worker stopped beating. Even so, recovery can overlap with a lagging worker, and it’s another reason the final effect—the PR, the comment—has to be idempotent: it’s the last net for when the lease gets it wrong.
The ephemeral workspace per ticket
With the ticket claimed, the processing itself is almost everything the previous post already built: each ticket gets its own isolated workspace—a git worktree on a new branch—the agent works inside it, and when it’s done the branch has the commits ready for the PR. The jira_key is the identifier that keeps everything separate: the branch is agent/ENG-1234, the worktree is a folder with that name, and the PR carries the ticket key in its title.
“Ephemeral” is the word that matters. The workspace is created when the worker claims the ticket and destroyed once the PR is open; nothing lingers between one ticket and the next. That gives you two things: two tickets running at once don’t see each other—each in its worktree—and a ticket that goes wrong leaves no garbage behind, because its workspace is deleted whole. It’s the same isolation from the previous post, now triggered by the queue instead of your hand.
What I show is the skeleton of the workspace. A real system adds around it what every real run needs and that I omit here to keep the thread on the trigger and the queue: dependency caches between tickets, logs of each run, build artifacts, coverage reports, and the cleanup of all that when the ticket finishes.
worker claims ENG-1234
│
▼
create worktree + branch agent/ENG-1234 ◄── isolation from post 6
│
▼
run the agent (write-test-fix, whole suite) ◄── the series' loop
│
▼
open the PR ──► delete the local worktree ──► the branch stays, the workspace doesn't
One layer that in a real system isn’t optional: that worktree should run inside an ephemeral container, not directly on the worker’s machine. The agent runs commands and code that came out of a model based on the text of a ticket anyone could write; the sandbox is what keeps a malicious ticket—or simply a wrong command—from touching anything outside its box. The queue brings work from a source you don’t fully control, so process isolation stops being a luxury.
Opening the PR and reporting the result back to Jira
The agent opens the PR the same way as in the previous post: it pushes the branch and creates the pull request with the ticket key in the title. What’s new in the autonomous system is the last step—the one that makes it a system and not a loose process: the result returns to Jira. The agent writes a comment on the ticket with the link to the PR and moves the ticket to “In Review,” so the person who opened it sees the result where they asked for it, without having to go hunt for it on GitHub.
It’s two calls to the Jira API: one posts the comment, the other transitions the ticket’s status. The transition uses a numeric id that depends on how your workflow is configured in Jira, not the status name:
// report.ts — the result returns to Jira: comment with the PR + status change.
const JIRA = "https://your-org.atlassian.net/rest/api/3";
const headers = {
Authorization: `Basic ${process.env.JIRA_AUTH}`,
"Content-Type": "application/json",
};
export async function reportToJira(jiraKey: string, prUrl: string) {
// 1. Comment with the link to the PR, where the person who opened the ticket sees it.
await fetch(`${JIRA}/issue/${jiraKey}/comment`, {
method: "POST",
headers,
body: JSON.stringify({
body: { type: "doc", version: 1, content: [{ type: "paragraph",
content: [{ type: "text", text: `PR ready for review: ${prUrl}` }] }] },
}),
});
// 2. Move the ticket to "In Review". The transition id comes from your Jira workflow.
await fetch(`${JIRA}/issue/${jiraKey}/transitions`, {
method: "POST",
headers,
body: JSON.stringify({ transition: { id: "31" } }),
});
}# report.py — the result returns to Jira: comment with the PR + status change.
import os
import requests
JIRA = "https://your-org.atlassian.net/rest/api/3"
headers = {
"Authorization": f"Basic {os.environ['JIRA_AUTH']}",
"Content-Type": "application/json",
}
def report_to_jira(jira_key: str, pr_url: str) -> None:
# 1. Comment with the link to the PR, where the person who opened the ticket sees it.
requests.post(
f"{JIRA}/issue/{jira_key}/comment",
headers=headers,
json={"body": {"type": "doc", "version": 1, "content": [{"type": "paragraph",
"content": [{"type": "text", "text": f"PR ready for review: {pr_url}"}]}]}},
)
# 2. Move the ticket to "In Review". The transition id comes from your Jira workflow.
requests.post(
f"{JIRA}/issue/{jira_key}/transitions",
headers=headers,
json={"transition": {"id": "31"}},
)<?php
// report.php — the result returns to Jira: comment with the PR + status change.
const JIRA = "https://your-org.atlassian.net/rest/api/3";
function jira_post(string $path, array $body): void {
$ctx = stream_context_create(["http" => [
"method" => "POST",
"header" => "Authorization: Basic " . getenv("JIRA_AUTH") .
"\r\nContent-Type: application/json",
"content" => json_encode($body),
]]);
file_get_contents(JIRA . $path, false, $ctx);
}
function report_to_jira(string $jiraKey, string $prUrl): void {
// 1. Comment with the link to the PR, where the person who opened the ticket sees it.
jira_post("/issue/$jiraKey/comment", ["body" => ["type" => "doc", "version" => 1,
"content" => [["type" => "paragraph",
"content" => [["type" => "text", "text" => "PR ready for review: $prUrl"]]]]]]);
// 2. Move the ticket to "In Review". The transition id comes from your Jira workflow.
jira_post("/issue/$jiraKey/transitions", ["transition" => ["id" => "31"]]);
}With the comment posted and the ticket in “In Review,” the worker marks the ticket as pr_open in its own table and saves the pr_url. That’s the terminal state on the agent’s side: it did its part and returned the result. What follows—reviewing the PR, requesting changes, merging—is human work, and that’s on purpose. The previous post already argued why the output is a PR and not a merge: green tests don’t guarantee the change is correct, only that it didn’t break what you know how to verify. The Jira comment doesn’t change that; it just puts the human checkpoint where people already look.
The end-to-end diagram
With every piece in place, this is the complete system, from a ticket to a PR, with each stage resting on a post in the series:
┌─────────────┐
│ Jira ticket │ someone tags it with the agent-ready label
└──────┬──────┘
│
┌─────┴─────┐ TRIGGER
│ webhook │ (low latency, drops events)
│ + │
│ polling │ (net that recovers what was lost)
└─────┬─────┘
│ enqueueTicket() → INSERT ON CONFLICT DO NOTHING
▼
┌─────────────┐ QUEUE + STATE (tickets table)
│ queued │ state lives in the DB, not in memory
└──────┬──────┘
│ atomic claim (UPDATE ... FOR UPDATE SKIP LOCKED)
▼
┌─────────────┐ a single worker wins the ticket → idempotency
│ running │
└──────┬──────┘
│ worktree + branch agent/ENG-1234 ◄── post 6: isolation
│ ephemeral container ◄── post 3: sandbox
│ write-test-fix + whole suite ◄── posts 2 and 4
▼
┌─────────────┐ opens the PR (not merge) ◄── post 6: PR as output
│ pr_open │ comment on Jira + "In Review"
└──────┬──────┘
│
▼
human review ──► merge ──► cleanup
Read top to bottom, each band is a decision the series built up: the loop and its done criterion, the sandbox, the evaluator that defines what’s correct, per-task isolation, and the PR as output. This post added the three bands at the ends—the trigger, the stateful queue, and the return to Jira—which are the ones that turn an agent you run by hand into a system that reacts to tickets on its own.
What changes most when you complete the system isn’t that the agent is smarter, it’s that it stops needing you to start. A ticket tagged in the morning can have a PR opened and commented before anyone looks at it, and if the process restarted three times in between, the state in the table made the work happen exactly once.
Where it breaks
The system is solid, but it has edges worth knowing before you turn it loose on a real queue:
- Not every ticket should trigger an agent. Without the label filter, every new ticket spends an agent run and its LLM calls. The
agent-readylabel is what keeps the cost under control; combine it with model routing so that each run also uses the cheapest model that solves the task. - Poison tickets block or spend endlessly. A ticket the agent never manages to solve would retry forever if you don’t cut it off. The
attemptscounter is the brake: past a limit, the ticket goes tofailedinstead of back to the queue, and someone gets a heads-up. It’s a dead-letter queue, not an infinite retry. - Not all errors deserve the same retry. This design’s
attemptscounter treats a429from the LLM, a network timeout, and a ticket whose task is impossible or badly written all the same. The first two are transient and resolve by retrying—ideally with a backoff—the third doesn’t get fixed by repeating it and should go straight tofailed. A retry that doesn’t distinguish the error type spends agent runs on tasks that will never pass. Classifying the failure before deciding whether to retry is the next step past the simple counter. - The webhook drops events and polling arrives late. Neither is complete on its own. That’s why they go together: the webhook gives the latency and polling gives the guarantee. If you set up only the webhook, sooner or later a ticket goes unprocessed and no one finds out.
- Concurrency is limited by resources, not the queue. The atomic claim lets N workers run without stepping on each other, but each one reinstalls dependencies and runs the whole suite, just like in the previous post. The ceiling on how many tickets you process at once is the machine, not
SKIP LOCKED. - The rate limit that matters is almost never Jira’s. An aggressive poller or many workers commenting at once can hit the Jira API, yes, but the real bottleneck usually sits in the LLM, the GitHub API, and CI—that’s where many parallel workers run into rate limits and build queues. Space out the polling and batch the writes to Jira, but above all control concurrency against the model and CI, which is where the cost and the waiting pile up.
- The agent has no state of its own. The ticket has state and the queue has state, but the agent loop doesn’t: if it crashes mid-run—after several tools and several model calls—there’s no checkpoint or journal to resume from. The ticket goes back to
queuedand the task starts from scratch. For short tasks that’s acceptable; for long, expensive tasks, a resume point is the next thing you’ll miss. - Without observability, the system is a black box. This design includes no metrics, structured logs, or tracing, and the moment there’s more than one worker you’ll want to answer questions it can’t today: which ticket is running and for how long, which worker has it, how many failed and why, how many tokens each run cost, which tool failed. It’s the first addition a system like this asks for the moment it goes from an example to something that operates on its own.
- Secrets travel to every workspace. Just like with the worktree from the previous post, credentials and tokens in each copy widen the exposure surface. Inject the minimum and never leave them in a branch commit.
- Human review is still the checkpoint. The system opens the PR and notifies; it doesn’t judge whether the change is correct. If the PR is merged unread because “the tests pass,” you lost exactly the piece the design reserved for a person.
None of these invalidates the model; they bound it. The system proposes end-to-end work with no intervention in the middle, but the decision of what gets asked and what gets integrated stays human.
Frequently asked questions
Webhook or polling to trigger the agent?
Both, if you can. The webhook gives you low latency—you react in seconds—but drops events if your server is down when Jira fires. JQL polling is slow but loses nothing, because the next round sees the pending ticket again. The robust combination is webhook to react fast and polling every few minutes as the net that recovers what was lost. If you set up only one, make it polling: losing work is worse than reacting late.
What happens if the worker crashes mid-ticket?
Nothing is lost, because the state lives in the database, not the process. The ticket stays in running with its claimed_at, and a periodic sweep returns to queued the expired claims—the ones stuck in running too long—so another worker can pick them up. It’s a lease with an expiry: claiming a ticket isn’t forever. So a slow worker isn’t mistaken for a crashed one, the claim is renewed with a heartbeat while the task progresses; and as a last safeguard in case the sweep jumps the gun, the final effect—the PR and the comment—is idempotent. The combination of persistent state and atomic claim is what makes the work get processed exactly once even if the process restarts.
Does it guarantee processing each ticket exactly-once?
No, and it’s best not to sell it that way. The trigger delivers at least once—the webhook retries, polling sees the ticket again—and that can’t be avoided. What the design achieves is an idempotent effect over that delivery: the primary key absorbs the duplicate enqueues and the atomic claim lets a single worker process each ticket. The practical result is “it’s processed once,” but the mechanism is at-least-once delivery plus idempotency, not exactly-once delivery. The distinction matters when the effect leaves the system: opening the PR and commenting on Jira should also be idempotent so a retry doesn’t create a second PR.
How do I keep any ticket from triggering an agent?
With an explicit filter in the trigger: a label like agent-ready in the polling JQL and in the webhook condition. Only the tickets someone marked on purpose enter the queue. Without that filter, every new ticket would spend an agent run without anyone asking for it, and that shows up on the LLM bill.
Does the agent merge the PR on its own?
No. Its terminal state is pr_open: it opens the pull request, comments the link on Jira, and moves the ticket to “In Review.” It stops there. Reviewing, requesting changes, and merging is human work, for the same reason I argued in the previous post: green tests prove you didn’t break the known, not that the change is correct. The system automates everything up to the checkpoint, not the checkpoint.
Is it only useful for Jira?
No. The queue, the worker with the atomic claim, and the ephemeral workspace are the same for any source of tickets. The only Jira-specific parts are the trigger—the JQL or the webhook—and the return—the comment and the transition. Swapping Jira for GitHub Issues, Linear, or Asana is rewriting those two ends; the core of the system stays untouched.
Do I need a real queue like Redis or SQS?
To start, no. A Postgres table with an INSERT ON CONFLICT to enqueue and an UPDATE ... FOR UPDATE SKIP LOCKED to claim gives you de-dup, persistent state, and an atomic claim without one more piece of infrastructure. A dedicated queue wins when volume grows or you need finer retries and priorities, but as long as the table is enough, it’s one less dependency to operate.
Conclusion
The whole series built the worker: a loop with tools, a sandbox, an evaluator that defines what’s correct, per-task isolation, and a PR as output. This post didn’t make it smarter; it put around it what turns it into an autonomous system. A trigger that learns about the work without anyone starting it—webhook for the latency, polling for the guarantee. A queue with persistent state and an atomic claim, which is what makes each ticket get processed exactly once even if everything restarts. An ephemeral workspace per ticket, the same old isolation but triggered by the queue. And a return to Jira, so the result shows up where the request was born.
If you’re going to build it, start with the smallest thing that’s already correct: a table as the queue, polling with a label filter, a single worker with an atomic claim, and as output a PR plus a Jira comment. With that you have the end-to-end system working and just one thing a human does—mark the ticket—and another at the end—review the PR. Then you add the webhook for latency, more workers for volume, and the sweep of expired claims for the hangs. Autonomy wasn’t a smarter agent; it was the infrastructure that connected it to a ticket on one end and a PR on the other.