Context Management¶
The context engine manages conversation context to keep requests within LLM token limits while preserving important conversation history.
Overview¶
Long conversations can exceed LLM context windows, causing errors or degraded responses. The context engine provides:
- Message Windowing - Keep only the most recent N messages
- Token Budgeting - Stay within token limits
- System Message Preservation - Always keep the system prompt
- Windowing Strategies - Different approaches for different use cases
Quick Start¶
import (
"github.com/plexusone/omniagent/agent"
"github.com/plexusone/omniagent/context"
)
// Simple: limit to 50 messages
a, _ := agent.New(config,
agent.WithMaxMessages(50),
)
// Advanced: full configuration
a, _ := agent.New(config,
agent.WithContextConfig(context.Config{
MaxMessages: 100,
MaxTokens: 8000,
ReserveTokens: 4096,
}),
)
Configuration¶
Config Options¶
| Field | Type | Default | Description |
|---|---|---|---|
MaxMessages |
int | 100 | Maximum messages to keep |
MaxTokens |
int | 0 | Maximum token budget (0 = unlimited) |
ReserveTokens |
int | 4096 | Tokens reserved for response |
TokenCounter |
TokenCounter | SimpleTokenCounter | Token estimation strategy |
Example Configurations¶
Chat Application (Memory Efficient)¶
Research Assistant (Deep Context)¶
Quick Q&A (Minimal Context)¶
How It Works¶
Message Windowing¶
When messages exceed MaxMessages, the engine keeps:
- The system message (always preserved)
- The most recent messages up to the limit
Before (8 messages, limit 5):
[System] [User1] [Asst1] [User2] [Asst2] [User3] [Asst3] [User4]
After:
[System] [Asst2] [User3] [Asst3] [User4]
Token Budgeting¶
When token count exceeds MaxTokens - ReserveTokens:
- System message tokens are counted first
- Messages are added from most recent backward
- Older messages are dropped when budget is exceeded
Token Counting¶
SimpleTokenCounter (Default)¶
Estimates tokens using character count (~4 chars/token for English):
ModelTokenCounter¶
For more accurate counting with specific models:
Windowing Strategies¶
Recent (Default)¶
Keeps the most recent messages:
window := context.NewWindow(context.WindowConfig{
Strategy: context.WindowStrategyRecent,
MaxMessages: 50,
})
Important¶
Keeps user questions and tool calls, plus recent messages:
window := context.NewWindow(context.WindowConfig{
Strategy: context.WindowStrategyImportant,
MaxMessages: 50,
})
Summarize (Future)¶
Will summarize older messages using an LLM:
window := context.NewWindow(context.WindowConfig{
Strategy: context.WindowStrategySummarize,
MaxMessages: 50,
})
Agent Integration¶
The context engine is automatically applied during message processing:
// In agent.processInternal():
if a.contextEngine != nil {
messages = a.contextEngine.Apply(messages)
}
Manual Usage¶
You can also use the engine directly:
engine := context.New(context.DefaultConfig())
// Estimate tokens
tokens := engine.EstimateTokens(messages)
// Apply windowing
windowed := engine.Apply(messages)
// Check available budget
available := engine.AvailableTokens(messages)
Token Budget Tracking¶
Track token usage over time:
budget := &context.TokenBudget{
Total: 8000,
Reserved: 2000,
}
// Check available
fmt.Println(budget.Available()) // 6000
// Consume tokens
budget.Consume(1500)
fmt.Println(budget.Available()) // 4500
// Check if over budget
if budget.OverBudget() {
// Handle overflow
}
// Reset for new conversation
budget.Reset()
Best Practices¶
1. Set Appropriate Limits¶
Match limits to your LLM's context window:
| Model | Context Window | Suggested MaxTokens |
|---|---|---|
| GPT-4 | 8K | 6000 |
| GPT-4-32K | 32K | 28000 |
| Claude 3 | 200K | 150000 |
| Gemini 1.5 | 1M | 500000 |
2. Reserve Response Tokens¶
Always reserve tokens for the response:
3. Consider Use Case¶
- Chat: Moderate limits, recent strategy
- Research: High limits, important strategy
- Quick tasks: Low limits, minimal context
4. Monitor Token Usage¶
Log token usage to understand patterns:
Architecture¶
┌─────────────────────────────────────────────────────┐
│ processInternal() │
│ │
│ messages = buildMessages(session, content) │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Context Engine │ │
│ │ │ │
│ │ ├─ Apply() │◄── MaxMessages, MaxTokens │
│ │ ├─ Window │ │
│ │ └─ TokenCounter│ │
│ └─────────────────┘ │
│ │ │
│ ▼ │
│ windowed messages → LLM request │
└─────────────────────────────────────────────────────┘
See Also¶
- Sessions & Storage - Persistent conversation history
- Configuration Reference - Full config options