- Implement real-time streaming preview using Parakeet EOU (160ms chunks)
- Add batch transcription on completion for accurate final result
- Prefer Whisper large-v3-turbo (2.7% WER) over Parakeet (6.05% WER) when available
- Remove audio preprocessing that hurts ASR accuracy (gain control, noise reduction)
- Add streaming audio callback support in Recorder and CoreAudioRecorder
- Raw audio passthrough - SDK handles resampling internally
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update FluidAudio to f47209a which includes ESpeakNG framework fix
- Fix VadConfig API compatibility: threshold -> defaultThreshold
- Add WorkspaceSettings to allow FluidAudio's unsafe build flags
This resolves the dyld crash: "Library not loaded: ESpeakNG.framework/Versions/A/ESpeakNG"
Fixed upstream in FluidInference/FluidAudio#159 and FluidInference/FluidAudio#160
Tested: All 4 UI tests pass
This change introduces a standalone Voice Activity Detection (VAD) service and integrates it into the ParakeetTranscriptionService.
The VAD preprocesses the audio to remove silent segments, aiming to improve transcription accuracy.
This is considered experimental due to a discovered anomaly in the Swift/C bridge where timestamps were being multiplied by 100. A workaround has been implemented to correct this.