We should be able to capture WebRTC stats in some form for post-processing
so that it helps on debugging support requests (and other use cases, e.g.:
improving field trial analysis on test servers).
Although much of WebRTC stats information can be gathered via server side
components, none have logs as structured for proper post-processing as
the client logs - so we're going the client route for now.
Capture WebRTC stats information for audio and screen sharing via:
- Audio logCodes: new `stats` extraInfo field
- `audio_joined`
- `audio_failure`
- `sfuaudio_error_retry_through_relay`
- `sfuaudio_error_try_to_reconnect`
- Screen share logCodes: new `stats` extraInfo field
- screenshare_presenter_start_success
- screenshare_viewer_start_success
- screenshare_broker_failure
Additionally, add an option to periodically capture WebRTC stats information
for all relevant peers. This is disabled by default since the log can be
verbose (and, consequentially, network taxing when using external
logging targets). It can be enabled via `public.stats.logMediaStats` in
settings.yml. The default interval is 30s. The periodic log format is as
follows:
- logCode: `mediaStats`
- extraInfo.stats: an aggregated stats object of all peers (equivalent
to the `Copy` function in the Connection Status modal).
We currently use full renegotiation for audio, video, and screen sharing
reconnections, which involves re-creating transports and signaling channels
from scratch. While effective in some scenarios, this approach is slow and,
especially with outbound cameras and screen sharing, prone to failures.
To counter that, WebRTC provides a mechanism to restart ICE without needing
to re-create the peer connection. This allows us to avoid full renegotiation
and bypass some server-side signaling limitations. Implementing ICE restart
should make outbound camera/screen sharing reconnections more reliable and
faster.
This commit implements the ICE restart procedure for all WebRTC components,
based on bbb-webrtc-sfu >= v2.15.0-beta.0, which added support for ICE restart
requests. This feature is off by default. To enable it, adjust the following
flags:
- `/etc/bigbluebutton/bbb-webrtc-sfu/production.yml`: `allowIceRestart: true`
- `/etc/bigbluebutton/bbb-html5.yml`: `public.kurento.restartIce`
* Refer to the inline documentation; this can be enabled on the client side
per media type.
* Note: The default max retries for audio is lower than for cameras/screen
sharing (1 vs 3). This is because the full renegotiation process for audio
is more reliable, so ICE restart is attempted first, followed by full
renegotiation if necessary. This approach is less suitable for cameras/
screen sharing, where longer retry periods for ICE restart make sense
since full renegotation there is... iffy.
Turns the screenshare component into a generic component, so that it can be
used both for screenshare and camera as content fetures.
Also changes specific locales and icons for the camera as content feature.
There's an edge case in finnicky networks where ALG-like firewalls
tamper with USE-CANDIDATE STUN packets and, consequently, bork ICE-lite
connectivity establishment. The odd part is that client-side gathering
seems to complete if intermediate STUN bindings work (before the final
USE-CANDIDATE), which may cause the peer not to generate relay
candidates == connectivity fails.
This adds the `public.kurento.gatheringTimeout` option to forcefully extend
the candidate gathering window in peers that act as offerers. The
behavior is as follows: if the flag is set (ms), the peer will wait
either the gathering completed stage or, _at most_,
public.kurento.gatheringTimeout ms before proceeding with calls chained
to setLocalDescription.
This option is disabled by default and intentionally ommited from the
base settings.yml file as to not encourage its use. Don't use it unless
you know what you're doing :).
Reconnection timers are far too long for abrupt failures because we
are waiting the original timeouts to elapse (30-60s) before trying it
again - even if a connection worked N-sessions back in that session's
history. The ideal thing to have is another intermediate, smaller and
fixed reconnection timer for sessions that had a working screen share
at least once.
The UI is also not being updated to the reconnecting state on negotiation
failures.
* Add an intermediate reconnection timer for abrupt failures set to 8s.
This should improve reconnection times.
* Lower default connection timers values (base 20s down from 30s, max
25s down from 60s)
* Set screen share UI to reconnecting on abrupt failures as well - we
were only tracking ICE states prior to this, not negotiation errors
The reconnect routine is stopping for viewers if a broker cannot
re-connect in the first try. That is wrong: viewers should try to
reconnect as long as there'sigaling data that mandates so.
The reconnect trigger is changed from broker's started attribute to the
presence of a scheduled reconnection timeout - if there isn't one (not
schedule), always re-schedule it.
RTCRTPSender exposes DSCP marking via `networkPriority` in the encodings
configuration dictionaries. That should allow us to control
QoS priorities for different media streams, eg audio with higher network
priority than video. The only browser that implements that right
now is Chromium.
To use this, the public.app.media.networkPriorities configuration in
settings.yml. Audio, camera and screenshare priorities can be controlled
separately. For further info on the possible values, see:
- https://www.w3.org/TR/webrtc-priority/
- https://datatracker.ietf.org/doc/html/rfc8837#section-5
If a viewer session failed mid-call, it was being scheduled for a reconnect via
the min-max connection timers (30s-60s), which is terrible UX.
This commit makes screen sharing viewers try to reconnect immediately when
appropriate.
Outbound/presenter screen sharing reconnect was broken from inception, so it's
being removed until it´s properly re-implemented.
This also fixes an issue where presenter disconnections would be silent for the
end user - now an error toast is shown and the error properly logged.
Fixes an issue where subsequent failures might lead to wrong error codes being
reported;
Splits the screen sharing bridge stop method into a reconnect-safe version and
a public one - should also address some quirks with inbound stream reconnection.
There could be a race condition where the local getDisplayMedia stream ends
(eg via Chrome`s stop sharing toast) while the broker hasn't finished starting.
That would lead to a scenario where the broker wouldn't emit an end event,
causing screen sharing to be flagged as started with a blank/invalid stream.
There could be a scenario where the local gDM stream wasnt cleaned up;
eg.: SFU is offline.
This commit guarantees all tracks from the local stream are stopped.
I have growing concerns about gain node`s effect on audio quality the way it
was implemented, so I opted to fall back to HTMLMediaElement`s volume control
for the time being until we can gauge quality impacts properly later on
Add a new configuration flag enableVolumeControl, false by default while the
feature undergoes a field trial
- forceRelayOnFirefox: whether TURN/relay usage should be forced to work
around Firefox's lack of support for regular nomination when dealing with
ICE-litee peers (e.g.: mediasoup).
* See: https://bugzilla.mozilla.org/show_bug.cgi?id=1034964
- iOS endpoints are ignored from the trigger because _all_ iOS browsers
are either native WebKit or WKWebView based (so they shouldn't be affected)
Splits screenshare stream into video and audio and adds gain node to audio
stream in order to permit volume control by the user. Volume is normalized
between [0, 2](muted and 2x boost).
ICE lite servers (eg mediasoup) dont need candidates signaled out-of-band; neither does KMS in certain scenarios
Disable their signaling saves us some ticks in bbb-webrtc-sfu and some bandwidth all around
Applies to video, listen only and screen sharing
New metadata values: media-server-video, media-server-listenonly, media-server-screenshare; parameter is a String
Added support for getStats in screenshare's service. This works similar
to the getStats for video provider, and the information retrieved from
screenshare is added to the video information for cameras.
Scenario: presenter`s client could crash when the presenter changed while they were sharing their screen
That is due to a race condition on the stop procedure in the bridge: two stops can be triggered (one from the server-side websocket tear off and another from the client itself detecting the presenter change)
That could create a scenario where the broker was cleaned in one stop procedure after the second had checked its availability, causing an attribute access of a null member