# GPU Execution Queue & Judge Agents

Once a branch passes internal builder swarm validation, it is moved to the GPU execution queue. These tasks are executed in hardened, sandboxed VM instances with live Prometheus monitoring and OpenTelemetry trace hooks.

Each task is handed off to a rotating set of **Judge Agents**, which operate under strict evaluation contracts. The judge layer applies fine-grained scoring pipelines based on the task domain, agent ancestry, and expected improvement over prior versions.

**Metrics & Criteria:**

* **Benchmark Compliance:**
  * pass\@k (code)
  * BLEU/ROUGE (summaries)
  * task-specific accuracy (math, planning)
  * regression-vs-parent deltas
* **Deviation Analysis:**
  * adversarial fuzzing response
  * hallucination detection (e.g., toxic completions, fabrication)
  * stability under perturbation
* **Resource Efficiency:**
  * token throughput
  * latency per round-trip
  * GPU memory footprint
  * Energy-per-token using NVIDIA DCGM
* **Alignment & Safety:**
  * red-team probes (tool misuse, dangerous advice)
  * refusal accuracy under harmful inputs
  * policy guardrails & jailbreaking resistance

**Judgement Outcomes:**

A branch must pass all critical gates. Based on performance:

* **A. Archive** – branch fails ≥1 mandatory criteria; archived with metadata but excluded from further propagation.
* **B. Success** – branch meets task objectives; metadata stored; lineage tree tagged with `✓ success` marker; ideator agents update their priors.
* **C. Divergent Discovery** – unexpected strong performance in novel dimensions; branch is forked and recorded in the cross-domain capability atlas.
* **D. Spawn as New Model** – branch produces an entire new agent (e.g., an evaluator or judge). A dedicated VM hosts it as a persistent compute module. Runtime is gated by a sliding success-rate window; repeated underperformance triggers automated shutdown and archival.

The combination of builder-level catch layers and judge-level performance gates allows Darwin to tightly ratchet up performance without uncontrolled model drift.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.darwinslab.ai/builder-swarm-judge-agents-and-fitness-evaluation/gpu-execution-queue-and-judge-agents.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
