# Evaluator Mechanism

Evaluator agents are responsible for benchmarking new agents on multiple axes of performance:

* **Accuracy & Coherence** – Logical validity of outputs, correctness of reasoning, and factual alignment
* **Novelty** – Behavioral diversity relative to prior agents and the archive
* **Resource Efficiency** – Runtime metrics including latency, memory usage, token throughput, and energy-per-token
* **Task-Specific Performance** – Custom metrics like pass\@k for code, ROUGE for summaries, success rate for planning, or symbolic correctness for math problems

Evaluators are not static—they evolve themselves. Each generation includes a meta-evolutionary cycle that mutates the evaluator population:

* Scoring logic (e.g., benchmark weightings, composite functions)
* Adversarial probes (e.g., fuzz inputs, tool misuse patterns)
* Refusal thresholds and safety sanity checks
* Response diversity filters and alignment gates

This **co-evolutionary approach** creates an adaptive fitness landscape that changes over time, preventing agents from overfitting to fixed benchmarks or exploiting brittle reward heuristics. Evaluators themselves are subject to selection pressure: the most informative, discriminative evaluators are retained, while redundant or overly permissive ones are pruned.

As a result, Darwin’s evaluation pipeline functions like a living immune system—constantly adapting its criteria to stay aligned, robust, and adversarially hardened.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.darwinslab.ai/agent-architecture-and-evaluator-dynamics/evaluator-mechanism.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
