Performance Benchmarking and Scalability Validation
Objective
Establish performance benchmarks and validate scalability characteristics of OSSA specification, validators, and tooling to ensure production-grade performance for enterprise adoption.
Scope
Performance testing covering:
- Schema Validation Performance - Validation speed for various manifest sizes
- Parser Performance - YAML/JSON parsing benchmarks
- Tooling Performance - CLI command execution times
- Registry Performance - API throughput and latency
- Scalability Testing - Large-scale agent deployments (1000+ agents)
Performance Targets
Schema Validation
- Small manifests (<1KB): <1ms validation time
- Medium manifests (1-10KB): <10ms validation time
- Large manifests (10-100KB): <100ms validation time
- Memory usage: <10MB per validation
- Throughput: >1000 validations/second
CLI Tools
- Startup time: <100ms cold start
- Validation command: <500ms for typical manifest
- Code generation: <2s for TypeScript/Python clients
- Documentation generation: <5s for full docs
Registry API
- List agents: <100ms p95 latency
- Get agent details: <50ms p95 latency
- Register agent: <200ms p95 latency
- Search agents: <150ms p95 latency
- Throughput: >100 req/sec per instance
Benchmarking Approach
Validation Performance
// benchmarks/validation-performance.bench.ts
import { describe, bench } from 'vitest';
import Ajv from 'ajv';
import * as fs from 'fs';
const schema = JSON.parse(fs.readFileSync('spec/ossa-1.0.schema.json', 'utf-8'));
const ajv = new Ajv();
const validate = ajv.compile(schema);
describe('Validation Performance', () => {
bench('validate small manifest (1KB)', () => {
const manifest = generateManifest('small');
validate(manifest);
});
bench('validate medium manifest (10KB)', () => {
const manifest = generateManifest('medium');
validate(manifest);
});
bench('validate large manifest (100KB)', () => {
const manifest = generateManifest('large');
validate(manifest);
});
bench('validate 100 manifests in sequence', () => {
for (let i = 0; i < 100; i++) {
const manifest = generateManifest('small');
validate(manifest);
}
});
});
describe('Parser Performance', () => {
bench('parse YAML manifest', () => {
const yaml = fs.readFileSync('spec/examples/compliance-agent.yml', 'utf-8');
YAML.parse(yaml);
});
bench('parse JSON manifest', () => {
const json = fs.readFileSync('spec/examples/compliance-agent.json', 'utf-8');
JSON.parse(json);
});
});
function generateManifest(size: 'small' | 'medium' | 'large') {
const baseManifest = {
ossaVersion: '1.0',
agent: {
id: 'perf-test',
name: 'Performance Test Agent',
version: '1.0.0',
role: 'custom',
runtime: { type: 'docker', image: 'test:1.0' },
capabilities: []
}
};
const capabilityCounts = { small: 5, medium: 50, large: 500 };
const count = capabilityCounts[size];
for (let i = 0; i < count; i++) {
baseManifest.agent.capabilities.push({
name: `capability_${i}`,
description: `Test capability ${i}`,
input_schema: {
type: 'object',
properties: {
param1: { type: 'string' },
param2: { type: 'number' }
}
},
output_schema: {
type: 'object',
properties: {
result: { type: 'string' }
}
}
});
}
return baseManifest;
}
CLI Performance
#!/bin/bash
# benchmarks/cli-performance.sh
echo "=== CLI Performance Benchmarks ==="
# Startup time
echo "Cold start time:"
time ossa --version
# Validation performance
echo "\nValidation performance:"
hyperfine --warmup 3 'ossa validate spec/examples/compliance-agent.yml'
# Code generation performance
echo "\nCode generation performance:"
hyperfine --warmup 3 'ossa generate typescript spec/examples/compliance-agent.yml'
# Documentation generation
echo "\nDocumentation generation:"
hyperfine --warmup 3 'ossa generate docs spec/examples/'
Load Testing (Registry API)
# Use k6 for load testing
k6 run --vus 100 --duration 30s benchmarks/registry-load-test.js
// benchmarks/registry-load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
export let options = {
stages: [
{ duration: '2m', target: 100 }, // Ramp up to 100 users
{ duration: '5m', target: 100 }, // Stay at 100 users
{ duration: '2m', target: 200 }, // Ramp up to 200 users
{ duration: '5m', target: 200 }, // Stay at 200 users
{ duration: '2m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<200'], // 95% of requests under 200ms
http_req_failed: ['rate<0.01'], // Less than 1% error rate
},
};
export default function() {
// List agents
let listRes = http.get('http://registry.ossa.ai/api/v1/agents');
check(listRes, {
'list agents status 200': (r) => r.status === 200,
'list agents duration < 100ms': (r) => r.timings.duration < 100,
});
// Get agent details
let getRes = http.get('http://registry.ossa.ai/api/v1/agents/compliance-scanner');
check(getRes, {
'get agent status 200': (r) => r.status === 200,
'get agent duration < 50ms': (r) => r.timings.duration < 50,
});
// Search agents
let searchRes = http.get('http://registry.ossa.ai/api/v1/agents?role=compliance');
check(searchRes, {
'search agents status 200': (r) => r.status === 200,
'search agents duration < 150ms': (r) => r.timings.duration < 150,
});
sleep(1);
}
Scalability Testing (Kubernetes)
# benchmarks/k8s-scalability-test.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: ossa-scalability-test
spec:
template:
spec:
containers:
- name: agent-deployer
image: ossa/scalability-tester:1.0
env:
- name: AGENT_COUNT
value: "1000"
- name: NAMESPACE
value: "ossa-scale-test"
command:
- /bin/sh
- -c
- |
for i in $(seq 1 $AGENT_COUNT); do
cat <<EOF | kubectl apply -f -
apiVersion: ossa.ai/v1
kind: Agent
metadata:
name: test-agent-$i
namespace: $NAMESPACE
spec:
ossaVersion: "1.0"
agent:
id: test-agent-$i
name: "Test Agent $i"
version: "1.0.0"
role: custom
runtime:
type: k8s
image: nginx:alpine
resources:
cpu: 100m
memory: 128Mi
capabilities:
- name: test
description: Test capability
EOF
done
echo "Deployed $AGENT_COUNT agents"
echo "Measuring stabilization time..."
time kubectl wait --for=condition=ready pod -l ossa.ai/agent -n $NAMESPACE --timeout=10m
Benchmark Results Format
# OSSA Performance Benchmarks
## Test Environment
- **Date**: 2026-01-10
- **Hardware**: AWS c5.4xlarge (16 vCPU, 32GB RAM)
- **OS**: Ubuntu 22.04 LTS
- **Node.js**: 20.11.0
## Validation Performance
| Manifest Size | Validation Time (p50) | Validation Time (p95) | Memory Usage |
|---------------|----------------------|----------------------|--------------|
| 1 KB | 0.8 ms | 1.2 ms | 2 MB |
| 10 KB | 7.5 ms | 9.8 ms | 8 MB |
| 100 KB | 85 ms | 98 ms | 45 MB |
**Throughput**: 1,250 validations/second
## CLI Performance
| Operation | Time (p50) | Time (p95) |
|---------------------|------------|------------|
| Cold start | 95 ms | 120 ms |
| Validate manifest | 450 ms | 580 ms |
| Generate TypeScript | 1.8 s | 2.2 s |
| Generate docs | 4.2 s | 5.1 s |
## Registry API Performance
| Endpoint | p50 Latency | p95 Latency | p99 Latency | Throughput |
|---------------|-------------|-------------|-------------|------------|
| GET /agents | 45 ms | 85 ms | 120 ms | 150 req/s |
| GET /agent/id | 25 ms | 42 ms | 65 ms | 200 req/s |
| POST /agents | 120 ms | 180 ms | 250 ms | 80 req/s |
| GET /search | 75 ms | 135 ms | 200 ms | 120 req/s |
## Scalability Results
- **1000 agents deployed**: 8m 30s to all ready
- **Operator CPU usage**: 450m (peak)
- **Operator memory usage**: 1.2 GB (peak)
- **etcd size**: 125 MB
Acceptance Criteria
-
Comprehensive benchmark suite covering all components -
Performance targets met or documented deviations -
Load testing with 100+ concurrent users -
Scalability testing with 1000+ agents -
Memory profiling for leaks -
CPU profiling for hot paths -
Benchmark results published in docs -
Performance regression CI (track over time) -
Comparison with similar tools (OpenAPI validators, etc.) -
Optimization recommendations documented
Files to Create
benchmarks/validation-performance.bench.ts
benchmarks/cli-performance.sh
benchmarks/registry-load-test.js
benchmarks/k8s-scalability-test.yaml
docs/performance/benchmarks.md
docs/performance/optimization-guide.md
Tools
- Benchmarking: Vitest bench, hyperfine
- Load testing: k6, Apache Bench
- Profiling: Node.js profiler, pprof
- Monitoring: Prometheus, Grafana