This feature was too coupled to the old closed captions' pads.
(e.g. the old closed captions feature should be enabled for this
to work properly)
Some things were hardcoded and others didn't make sense from the
user experience perspective.
Reverts #876d8aa.
Partially reverts #802964f, removes changes to make closed captions'
pads compatible with live-transcription but keeps provider settings.
The current Vosk CC provider does not support stereo mic streams
(pending investigation as to why).
This commits makes sure stereo is forcefully disabled via SDP munging
only when transcription is active and using Vosk. Having it disabled
in the server side (FreeSWITCH) is not enough because the stereo parameter
is client mandated and replicated by FS on its answer. So we need to
make sure it's always disabled for the time being.
SFU audio does munging server side (and stereo is always off), so no changes
needed there.
The rest of the providers (except WebSpeech) need to be validated against
stereo audio as well.
This is also intended to be temporary - ideally this needs to be fixed in
mod_audio_fork/Vosk/wherever this is breaking.
Mobile endpoints are flaky with the WebSpeechAPI:
- iOS versions that support it are borking our outbound audio when it's
enabled
- Android speech recognition has flaky locale detection and speech
transcription
Additionally: the support check is not checking the WebSpeechAPI
availability properly, so older devices (eg iOS 12) are flagged as
supported even though they aren't.
This commit adds a configuration flag (public.audioCaptions.mobile) to
control transcription availability on mobile. False by default.
Also extends the setSpeechVoices support check and
hasSpeechRecognitionSupport method to prevent false positives.
Adds two new flags to the settings file which change the way the locale
flag is used:
- forceLocale: (true/false) => If true, enforces the transcription
language to be the locale content field and jumps the language
selector
in audio modal.
- defaultSelectLocale: (true/false) => If true, the default selected
value in the dropdown language selector in audio modal will be defined
by the locale content field.
In any case, if the locale flag holds an invalid value, it defaults to
disabled.
Move the language collection to the HTML settings file. This data defines
the available languages available for the speech API.
These language tags are used to filter SpeechSynthesis' API `getVoices`
result. Tags must use BCP 47 format.
https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesisVoice/lang
Avoid enable audio transcription if the browser's vendor does not provide
voices data.
This should prevent false positives for browsers such as Chromium and
Brave.
Parse the audio transcript before broadcasting it's content back to the
client and the recording actor. Limiting by 8 words per line and max of
2 lines to avoid CPU intensive operations over this recurring event.
Replace Calibri font family with Verdana to improve character spacing,
add relative sizing to the text content and a background padding.
Add a server-side app for the audio captions feature and record proto-events
for this data.
As it is, only behaves as a pass-through module. The idea is to include all
the business intelligence in this app.