Performance

Optimize OpenClaw: Performance Tuning Guide

Optimize memory usage, tune connection pools, configure caching layers, and benchmark throughput.

11 min readLast updated Feb 18, 2026

Stuck?Check the troubleshooting index or ask in Discord.

Overview

This guide covers optimization techniques for running OpenClaw efficiently. Learn how to reduce memory usage, optimize response times, and handle more concurrent users.

When to tune

Start with defaults. Tune when you experience slow responses, high memory usage, or need to handle more concurrent sessions.

Memory Optimization

OpenClaw can use significant memory with long conversations. Here's how to reduce it:

1. Set Memory Limits in Docker

docker-compose.yml

services:
  openclaw:
    deploy:
      resources:
        limits:
          memory: 2G  # Adjust based on your needs

2. Enable Compaction

OpenClaw automatically summarizes old conversation history when context fills up.

yaml

agents:
  defaults:
    compaction:
      mode: "safeguard"
      reserveTokensFloor: 24000

3. Reduce Context Window

Lower the context tokens if you don't need long conversations.

yaml

agents:
  defaults:
    model:
      contextTokens: 50000  # Instead of 200000

Context Pruning

Prune old tool results from memory to free up context space:

yaml

agents:
  defaults:
    contextPruning:
      mode: "cache-ttl"
      ttl: "1h"  # Prune after 1 hour of inactivity
      keepLastAssistants: 3
      softTrimRatio: 0.3
      hardClearRatio: 0.5

Soft trim — keeps beginning/end of tool results, removes middle
Hard clear — removes entire old tool results
Image blocks are never pruned

Concurrency Settings

Control how many parallel sessions OpenClaw can handle:

yaml

agents:
  defaults:
    model:
      maxConcurrent: 3  # Parallel agent runs (default: 1)

Trade-off

Higher concurrency = more parallel conversations but more memory and API calls.

Caching

Reduce API calls with caching strategies:

Response Caching

Configure retry with exponential backoff to handle rate limits gracefully.

yaml

models:
  providers:
    openai:
      retry:
        attempts: 3
        minDelayMs: 1000
        maxDelayMs: 30000

Streaming & Response Time

Improve perceived responsiveness with streaming:

yaml

agents:
  defaults:
    blockStreamingDefault: "on"
    blockStreamingChunk:
      minChars: 800
      maxChars: 1200
    humanDelay:
      mode: "natural"  # 800-2500ms random delay

For Telegram, also enable streaming mode:

yaml

channels:
  telegram:
    streamMode: "partial"  # Live preview while generating

Monitoring & Benchmarks

Track performance with health checks:

bash

# Check gateway health
openclaw gateway status

# Check system resources
openclaw status --deep

# View detailed metrics
openclaw doctor

Key metrics to watch

Memory usage — should stay below your limit
Response time — target under 10 seconds for most queries
Context usage — watch for approaching limits
Error rate — should be near zero