Case Study: Voice Note Support Implementation
Overview
Implementation of voice note support for OmniAgent, enabling inbound transcription and outbound voice synthesis via OmniVoice interfaces.
Planning Phase Metrics
| Metric |
Value |
| Planning date |
2026-02-22 |
| Repositories explored |
5 (omniagent, omnichat, omnivoice, omnivoice-deepgram, go-elevenlabs) |
| Files analyzed |
~50+ |
| Lines of code read |
~3,500+ |
Implementation Phase Metrics
| Step |
Status |
Lines Written |
Lines Modified |
| 1. WhatsApp audio download |
Completed |
20 |
0 |
| 2. WhatsApp audio upload |
Completed |
45 |
0 |
| 3. VoiceProcessor interface |
Completed |
85 |
0 |
| 4. voice/ package |
Completed |
175 |
0 |
| 5. Config updates |
Completed |
35 |
5 |
| 6. Gateway integration |
Completed |
35 |
5 |
| 7. go.mod dependencies |
Completed |
5 |
0 |
| Total |
Completed |
~400 |
~10 |
Files Created
| File |
Lines |
Purpose |
omniagent/voice/config.go |
40 |
Voice configuration types |
omniagent/voice/processor.go |
135 |
Voice processor implementation |
Files Modified
| File |
Lines Added |
Lines Changed |
Purpose |
omnichat/providers/whatsapp/adapter.go |
65 |
0 |
Audio download/upload |
omnichat/provider/router.go |
95 |
0 |
VoiceProcessor interface |
omniagent/config/config.go |
25 |
1 |
VoiceConfig types |
omniagent/config/defaults.go |
15 |
0 |
Voice defaults |
omniagent/cmd/.../gateway.go |
35 |
2 |
Voice processor wiring |
omniagent/go.mod |
5 |
0 |
Dependencies |
What Worked Well
- OmniVoice STT/TTS interfaces were well-designed and ready to use
- Deepgram provider implementations exist and support required formats
- omnichat already had Media types defined (MediaTypeVoice, MediaTypeAudio)
- Clear integration points in router and gateway
- whatsmeow library has good audio message support
Challenges Encountered
- Local replace directive needed for omnichat development (cross-repo changes)
- gosec linter warnings for APIKey fields required nolint comments
Architecture Decisions
- Provider abstraction via OmniVoice: Enables easy switching between Deepgram, ElevenLabs, and future providers
- Response mode configuration: "auto" mode responds with voice only when user sends voice
- MP3 output format: Chose MP3 for TTS output for broad compatibility
Verification
- [x] go build ./... passes
- [x] go test -v ./... passes
- [x] golangci-lint run passes (0 issues)
- [ ] Integration test with WhatsApp voice notes (requires API keys)
Dependencies Added
github.com/plexusone/omnivoice v0.4.3
github.com/plexusone/omnivoice-deepgram v0.3.0
Next Steps
- Tag and release omnichat with voice support
- Remove local replace directive from omniagent go.mod
- Tag and release omniagent
- Manual testing with WhatsApp voice notes