Skip to content

Case Study: Voice Note Support Implementation

Overview

Implementation of voice note support for OmniAgent, enabling inbound transcription and outbound voice synthesis via OmniVoice interfaces.

Planning Phase Metrics

Metric Value
Planning date 2026-02-22
Repositories explored 5 (omniagent, omnichat, omnivoice, omnivoice-deepgram, go-elevenlabs)
Files analyzed ~50+
Lines of code read ~3,500+

Implementation Phase Metrics

Step Status Lines Written Lines Modified
1. WhatsApp audio download Completed 20 0
2. WhatsApp audio upload Completed 45 0
3. VoiceProcessor interface Completed 85 0
4. voice/ package Completed 175 0
5. Config updates Completed 35 5
6. Gateway integration Completed 35 5
7. go.mod dependencies Completed 5 0
Total Completed ~400 ~10

Files Created

File Lines Purpose
omniagent/voice/config.go 40 Voice configuration types
omniagent/voice/processor.go 135 Voice processor implementation

Files Modified

File Lines Added Lines Changed Purpose
omnichat/providers/whatsapp/adapter.go 65 0 Audio download/upload
omnichat/provider/router.go 95 0 VoiceProcessor interface
omniagent/config/config.go 25 1 VoiceConfig types
omniagent/config/defaults.go 15 0 Voice defaults
omniagent/cmd/.../gateway.go 35 2 Voice processor wiring
omniagent/go.mod 5 0 Dependencies

What Worked Well

  • OmniVoice STT/TTS interfaces were well-designed and ready to use
  • Deepgram provider implementations exist and support required formats
  • omnichat already had Media types defined (MediaTypeVoice, MediaTypeAudio)
  • Clear integration points in router and gateway
  • whatsmeow library has good audio message support

Challenges Encountered

  • Local replace directive needed for omnichat development (cross-repo changes)
  • gosec linter warnings for APIKey fields required nolint comments

Architecture Decisions

  1. Provider abstraction via OmniVoice: Enables easy switching between Deepgram, ElevenLabs, and future providers
  2. Response mode configuration: "auto" mode responds with voice only when user sends voice
  3. MP3 output format: Chose MP3 for TTS output for broad compatibility

Verification

  • [x] go build ./... passes
  • [x] go test -v ./... passes
  • [x] golangci-lint run passes (0 issues)
  • [ ] Integration test with WhatsApp voice notes (requires API keys)

Dependencies Added

github.com/plexusone/omnivoice v0.4.3
github.com/plexusone/omnivoice-deepgram v0.3.0

Next Steps

  1. Tag and release omnichat with voice support
  2. Remove local replace directive from omniagent go.mod
  3. Tag and release omniagent
  4. Manual testing with WhatsApp voice notes