System Architecture

Synthetic Darwin’s architecture is designed to support real-time evolutionary loops, immutable model provenance, and fine-grained observability across all layers.

Components

  • Agent Manager Orchestrates agent lifecycles. Responsible for spawning, tracking, and persisting agents across sessions.

  • Evaluator Engine Executes benchmark suites, calculates fitness scores, and feeds results back into evolutionary selection.

  • Tree Registry Maintains lineage data across generations. Stores parent-child mappings and metadata required for auditability and multi-objective evolution.

  • User Terminal Interface for submitting prompts, receiving outputs, and visualizing agent performance in real time.

  • Session Manager Persists per-user agent trees and ephemeral context across sessions and restarts.


Communication

All services communicate through a unified message-passing layer that auto-selects transport based on latency requirements:

  • Interactive tier (< 50 ms): Uses WebSocket with binary Protobuf frames for full-duplex agent-to-agent streaming.

  • Batch / Compute tier: Uses gRPC over HTTP/2 with LZ4 compression for throughput-optimized tasks.

Additional protocols and guarantees:

  • All messages are schema-versioned (Proto v3) and cryptographically signed.

  • Transport runs over TLS 1.3, with mandatory mTLS inside the cluster for all write operations.

  • Built-in back-pressure, circuit-breakers, and retry logic guard against cascading failures.

  • All traffic emits OpenTelemetry spans and Prometheus metrics (e.g., latency, payload size, error rate) for real-time observability.


Persistence

The platform uses a three-tiered persistence architecture, each designed for a specific latency/durability trade-off. All tiers are accessed via a shared interface.

Storage Tiers

Tier
Technology
Primary Payload
Durability & Retention
Access Pattern

Cold / Immutable

IPFS (CID-pinned clusters) or Arweave

Full agent journals — prompt lineage, model weights, eval artefacts, telemetry

Write-once, content-addressed; 5+ year retention with periodic redundancy audits

Append-only; CID lookup or gateway access

Warm / Relational-Graph

PostgreSQL + pgRouting, optional TigerGraph / Neo4j

Metadata: agent IDs ↔ journal CIDs, fitness scores, parent/child trees, configs

PITR WAL backups to S3 every 5 min; daily VACUUM / OPTANALYZE cycle

OLTP + OLAP; ACID writes (<10 ms); graph traversal for lineage & audit queries

Hot / Volatile

Redis Cluster (replicated, in-memory)

In-flight populations, generation counters, mutex locks, rate-limit tokens

Data expires or is checkpointed to Postgres every 60–120 sec; AOF with fsync=everysec

Sub-ms reads/writes for loops; pub/sub signals between microservices


Encryption & Compliance

  • Cold and Warm storage encrypt at rest using AES-256-GCM. Private Arweave bundles use native "bundle encryption."

  • All data in transit is encrypted via TLS 1.3.

  • mTLS is enforced for all write paths.

  • Schema versioning is stored in a dedicated meta.schema_version table to prevent incompatible reads by agents.


High Availability & Failover

  • PostgreSQL: Deployed as a 3-node Patroni cluster with synchronous replication.

  • Redis: Uses Redis Sentinel for failover and availability.

  • Cold Storage:

    • IPFS: Pins replicated across 3+ geolocations

    • Arweave: Bundles are dual-posted to 2 independent miners

  • Disaster Recovery:

    • Nightly WAL + CID replays in a staging VPC to confirm:

      • RPO ≤ 5 minutes

      • RTO ≤ 30 minutes


Observability

All persistence and runtime systems emit Prometheus metrics and OpenTelemetry traces.

  • Sample metrics:

    • db_replication_lag_seconds

    • redis_evicted_keys_total

    • ipfs_pin_failures_total

  • Combined with service telemetry, the ops dashboard provides a full picture of store health vs. agent loop performance.


Design Philosophy

This architecture ensures:

  • Immutable artefacts are tamper-evident and permanently preserved.

  • Metadata is strongly consistent and queryable via graph or SQL.

  • Volatile state is ultra-fast and always synchronized to warm layers for recovery.

Together, these layers prevent any single system from becoming a bottleneck — whether for performance, durability, or scale.

Last updated