Voice Feature Parity Implementation Plan¶
Full voice calling feature parity for omniagent+omnivoice with openclaw.
Status¶
| Phase | Module | Package | Status |
|---|---|---|---|
| 1 | omnivoice-core | storage/ |
Completed |
| 1 | omnivoice-core | bargein/ |
Completed |
| 2 | omni-openai | omnivoice/realtime/ |
Completed |
| 2 | omni-google | omnivoice/ |
Completed |
| 3 | omniskill | voicetools/ |
Completed |
Summary¶
| Feature | Module | Location |
|---|---|---|
| OpenAI Realtime API | omni-openai | omnivoice/realtime/ |
| Gemini Live API | omni-google | omnivoice/ |
| Barge-in Detection | omnivoice-core | bargein/ |
| Call State Persistence | omnivoice-core | storage/ |
| Agent Consult Tools | omniskill | voicetools/ |
Phase 1: Core Infrastructure - Completed¶
omnivoice-core/storage/¶
Session state persistence with pluggable backends:
type SessionStore interface {
Save(ctx context.Context, session *SessionState) error
Load(ctx context.Context, sessionID string) (*SessionState, error)
Delete(ctx context.Context, sessionID string) error
ListActive(ctx context.Context) ([]string, error)
UpdateHeartbeat(ctx context.Context, sessionID string) error
Close() error
}
Implementations:
MemoryStore- In-memory for development/testingRedisStore- Redis-backed for production multi-instance
Files:
store.go- Interface definitionstypes.go- SessionState, Turn, SessionMetricsmemory.go- In-memory implementationredis.go- Redis implementation
omnivoice-core/bargein/¶
Barge-in detection for natural interruption handling:
type Detector struct {
config Config
ttsPipeline interface{ IsActive() bool; Stop() }
onInterrupt func(Event)
}
func (d *Detector) AttachTTS(p interface{ IsActive() bool; Stop() })
func (d *Detector) AttachSTTEvents(events <-chan STTEvent)
func (d *Detector) OnInterrupt(handler func(Event))
Interruption Modes:
ModeImmediate- Interrupt as soon as user speech detectedModeAfterSentence- Wait for natural pauseModeDisabled- Ignore user speech during agent output
Phase 2: Realtime APIs - Completed¶
omni-openai/omnivoice/realtime/¶
OpenAI Realtime API WebSocket client for native voice-to-voice (~100ms latency):
type RealtimeProvider struct {
client *Client
config Config
}
func (p *RealtimeProvider) ProcessAudioStream(
ctx context.Context,
audioIn <-chan []byte,
config ProcessConfig,
) (<-chan AudioChunk, <-chan Transcript, error)
Audio Format:
- Input: PCM16 24kHz mono
- Output: PCM16 24kHz mono
Features:
- Session management via WebSocket
- Function calling support
- Voice Activity Detection (VAD)
- 11 voice options
omni-google/omnivoice/¶
Gemini Live API WebSocket client for real-time voice-to-voice:
type RealtimeProvider struct {
apiKey string
config Config
}
func (p *RealtimeProvider) ProcessAudioStream(
ctx context.Context,
audioIn <-chan []byte,
config ProcessConfig,
) (<-chan AudioChunk, <-chan Transcript, error)
Audio Format:
- Input: PCM16 16kHz mono
- Output: PCM16 24kHz mono
Features:
- Bidirectional WebSocket streaming
- Function calling support
- Google Search grounding
- Code execution
- 5 voice options
Phase 3: Voice Tools - Completed¶
omniskill/voicetools/¶
AI agent tools for voice call control:
func NewVoiceSkill(callCtx CallContext) *skill.Skill {
return skill.New("voice_control",
skill.WithTool(NewTransferCallTool(callCtx)),
skill.WithTool(NewHoldCallTool(callCtx)),
skill.WithTool(NewUnholdCallTool(callCtx)),
skill.WithTool(NewConsultAgentTool(callCtx)),
skill.WithTool(NewConferenceTool(callCtx)),
)
}
Tools:
| Tool | Description |
|---|---|
transfer_call |
Transfer to number or queue |
hold_call |
Place caller on hold |
unhold_call |
Resume from hold |
consult_agent |
Query specialist without transfer |
add_to_conference |
Add participants |
Verification¶
All packages pass tests:
cd ~/go/src/github.com/plexusone/omnivoice-core && go test ./storage/... ./bargein/...
cd ~/go/src/github.com/plexusone/omni-openai && go test ./omnivoice/realtime/...
cd ~/go/src/github.com/plexusone/omni-google && go test ./omnivoice/...
cd ~/go/src/github.com/plexusone/omniskill && go test ./voicetools/...
All packages pass linting:
Dependencies Added¶
| Module | Dependency | Purpose |
|---|---|---|
| omnivoice-core | github.com/redis/go-redis/v9 |
Redis session store |
| omni-openai | github.com/gorilla/websocket |
WebSocket client |
| omni-google | github.com/gorilla/websocket |
WebSocket client |
Environment Variables¶
# OpenAI Realtime
OPENAI_API_KEY=sk-...
# Gemini Live
GOOGLE_API_KEY=...
# or GEMINI_API_KEY=...
# Redis (optional, for persistence)
REDIS_URL=redis://localhost:6379
Documentation¶
Updated documentation for each module:
| Module | README | MkDocs Pages |
|---|---|---|
| omnivoice-core | Updated with storage/, bargein/ | storage.md, bargein.md |
| omni-openai | Updated with realtime/ | providers/realtime.md |
| omni-google | Updated with omnivoice/ | omnivoice/index.md |
| omniskill | Updated with voicetools/ | voicetools/index.md |