28 Commits

Author SHA1 Message Date
Jake Shore
de1c1e51aa Add hybrid streaming transcription for improved accuracy
- Implement real-time streaming preview using Parakeet EOU (160ms chunks)
- Add batch transcription on completion for accurate final result
- Prefer Whisper large-v3-turbo (2.7% WER) over Parakeet (6.05% WER) when available
- Remove audio preprocessing that hurts ASR accuracy (gain control, noise reduction)
- Add streaming audio callback support in Recorder and CoreAudioRecorder
- Raw audio passthrough - SDK handles resampling internally

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 07:35:53 -05:00
Beingpax
078e02c503 Parakeet model validation & Intel mac local model warning 2025-12-17 09:41:57 +05:45
Anton Lvovych
cc086c1d92
Update FluidAudio to latest with ESpeakNG framework fix
- Update FluidAudio to f47209a which includes ESpeakNG framework fix
- Fix VadConfig API compatibility: threshold -> defaultThreshold
- Add WorkspaceSettings to allow FluidAudio's unsafe build flags

This resolves the dyld crash: "Library not loaded: ESpeakNG.framework/Versions/A/ESpeakNG"
Fixed upstream in FluidInference/FluidAudio#159 and FluidInference/FluidAudio#160

Tested: All 4 UI tests pass
2025-10-27 15:16:40 +07:00
Beingpax
3f01f49f56 Clean up 2025-10-19 16:59:23 +05:45
Beingpax
a0e4dd1367 Added support for V2 and V3 models 2025-10-19 14:01:31 +05:45
Beingpax
febd75cc39 Unified logging + fluidAudio's logging 2025-10-08 11:30:50 +05:45
Beingpax
7f729a9021 Add ability to sort dictionary items for text replacement 2025-10-04 16:21:16 +05:45
Beingpax
eb364416ea Improved cleanup and model loading for parakeet 2025-10-04 13:54:10 +05:45
Beingpax
4ca877c66b Fix Parakeet CoreML crash 2025-09-28 08:51:47 +05:45
Beingpax
97c6234fb3 Respect VAD flag, downloading & updated to latest version 2025-09-20 17:00:28 +05:45
Beingpax
91734bda45 Native Fluid Audio VAD 2025-09-19 19:24:02 +05:45
Beingpax
6e6773068f Simplified Parakeet transcription without Whisper VAD 2025-09-19 14:36:16 +05:45
Beingpax
afd6e91207 Centralize text formatting in main flows 2025-09-19 09:09:05 +05:45
Beingpax
2b787e8e64 Centralize hallucination filter 2025-09-16 17:30:46 +05:45
Beingpax
7161bc3f71 Improved the comma seperated replacement values to be consolidated 2025-09-12 11:58:26 +05:45
Beingpax
53d1507a53 improve hallucination filter and integrate with parakeet transcription service 2025-09-06 16:56:28 +05:45
Beingpax
5eacee467a Feat: Respect VAD user setting in ParakeetTranscriptionService 2025-09-06 08:57:32 +05:45
Beingpax
c0ed2dc78a Improved VAD for Parakeet model 2025-09-06 07:13:06 +05:45
Beingpax
106fd653ea feat: Integrate experimental VAD for Parakeet
This change introduces a standalone Voice Activity Detection (VAD) service and integrates it into the ParakeetTranscriptionService.

The VAD preprocesses the audio to remove silent segments, aiming to improve transcription accuracy.

This is considered experimental due to a discovered anomaly in the Swift/C bridge where timestamps were being multiplied by 100. A workaround has been implemented to correct this.
2025-09-05 18:37:16 +05:45
Brandon Weng
95061cda40 spacing 2025-08-27 14:51:36 -04:00
Brandon Weng
620b3a8d3b Remove cleanup state 2025-08-27 14:50:09 -04:00
Brandon Weng
2ea220dfed use default configs from upstream for parakeet 2025-08-27 14:48:07 -04:00
Beingpax
9e29b34db1 Fix decoder state cleanup blocking transcription start with Parakeet model 2025-08-25 13:50:07 +05:45
Beingpax
6a308b81bf Update app to support Parakeet B3 model 2025-08-25 13:00:35 +05:45
Beingpax
3eebbc4e3b Better Parakeet error handling 2025-08-03 12:44:13 +05:45
Beingpax
29722d0a31 more logging in parakeettranscription service 2025-08-03 09:35:49 +05:45
Beingpax
b5eaf647db 🦜 Add Parakeet logging 2025-08-02 21:26:37 +05:45
Beingpax
d09a9fba7f Experimental new models 2025-08-01 17:26:08 +05:45