⚡ Performance: Optimize Agent Spawning & Execution Pipeline

Migrated from llm/agent-buildkit#73 on 2025-10-12T20:14:28.985Z Original author: @thomas.scola | Created: 2025-10-12T02:15:10.477Z

⚡ Performance: Optimize Agent Spawning & Execution Pipeline

Problem

Agent spawning currently takes 2-5 seconds per agent, limiting scalability:

Current Bottlenecks:

Sequential agent initialization
No spawn caching or warm pools
Redundant validation on every spawn
Cold start penalties for each agent

Evidence from Codebase:

// agent-mesh/backend/src/services/domain/agent-manager.service.ts:58
async spawnAgent(type: string, options: Partial<AgentConfig> = {}): Promise<string> {
  // Full initialization on every spawn - SLOW!
  const agentProcess = spawn('node', [...], { stdio: ['ignore', 'pipe', 'pipe'] });
}

// parallel-agent-executor.service.ts:265
private async spawnAgentProcess(agentId: string, tasks: string[], executionId: string) {
  // No pooling, no warm agents
  const childProcess = spawn('node', [this.agentScriptPath, ...]);
}

Impact

Current State:

Spawning 10 agents: 20-50 seconds
Spawning 100 agents: 3-8 minutes
High CPU spikes during spawn waves

Target State (90% improvement):

Spawning 10 agents: < 2 seconds
Spawning 100 agents: < 20 seconds
Smooth resource utilization

Solution: Multi-Tier Performance Strategy

1. Agent Pool Management

Warm Agent Pool:

// New: src/services/agent-pool.service.ts
class AgentPoolService {
  private warmPools: Map<AgentType, Agent[]>;
  
  async initialize() {
    // Pre-spawn 5 agents of each common type
    await this.prewarmAgents(['code-reviewer', 'documentation', 'testing']);
  }
  
  async getAgent(type: string): Promise<Agent> {
    // Return warm agent instantly (< 100ms)
    return this.warmPools.get(type)?.pop() || await this.spawnFresh(type);
  }
}

Benefits:

10-50x faster for cached types
Predictable latency
Resource pre-allocation

2. Lazy Initialization

Current: Load everything on spawn
New: Load on-demand

class LazyAgent {
  private _llmClient?: LLMClient;
  
  get llmClient() {
    // Initialize only when actually needed
    if (!this._llmClient) {
      this._llmClient = createLLMClient(this.config);
    }
    return this._llmClient;
  }
}

3. Parallel Spawn Optimization

Current: Sequential spawn
New: Batched parallel spawn with coordination

// New: Batch spawn with progress tracking
async spawnAgentBatch(requests: SpawnRequest[]): Promise<Agent[]> {
  // Spawn in parallel batches of 10
  return Promise.all(
    chunk(requests, 10).map(batch => 
      Promise.all(batch.map(req => this.spawnAgent(req)))
    )
  );
}

4. Spawn Caching

Cache validated agent configurations:

// Cache expensive validation results
private configCache = new LRUCache<string, ValidatedConfig>({ max: 100 });

async validateConfig(config: AgentConfig): Promise<ValidatedConfig> {
  const key = hashConfig(config);
  return this.configCache.get(key) ?? await this.performValidation(config);
}

5. Resource Prediction

Use Phoenix KG to predict resource needs:

// Predict optimal agent count based on workload
const prediction = await phoenixKG.predictOptimalAgentCount({
  taskQueue: currentTasks,
  historicalData: last7Days,
  constraints: { maxCost: 0, maxLatency: 5000 }
});

Tasks

Implement AgentPoolService with warm pools
- Pre-spawn 5 agents per common type
- Auto-replenish when pool drops below 2
- Configurable pool sizes per agent type
Add lazy initialization to all agents
- Defer LLM client creation
- Defer tool loading
- Load on first use, not on spawn
Optimize parallel spawning
- Batch spawn API (10 agents at once)
- Progress tracking per batch
- Failure isolation (one fails, others continue)
Add spawn caching
- Cache validated configs (LRU 100)
- Cache tool definitions
- Invalidate on config changes only
Integrate Phoenix KG predictions
- Query historical spawn patterns
- Predict optimal pool sizes
- Auto-adjust based on workload
Add performance telemetry
- Track spawn duration (p50, p95, p99)
- Track pool hit rates
- Alert on performance degradation

Acceptance Criteria

✅ Agent spawn time < 200ms for pooled agents
✅ Agent spawn time < 2s for cold agents
✅ Spawning 100 agents < 20 seconds
✅ Pool hit rate > 80% for common agent types
✅ CPU utilization smooth (no spikes)
✅ Memory per agent < 50MB (pooled)
✅ Phoenix KG integration for predictions

Metrics Dashboard

Add Grafana panels:

Agent spawn latency (p50, p95, p99)
Pool utilization per agent type
Cache hit rates
Resource consumption trends

Priority: P1 | Labels: performance, spawning, optimization, scalability

⚡ Performance: Optimize Agent Spawning & Execution Pipeline