Commit Graph

8 Commits

Author SHA1 Message Date
Ghazi Triki
b578aeb243 Move captions.rb to utils directory. 2019-05-10 16:56:29 +01:00
Calvin Walton
b2f8c80202 Handle out of order inserts at the start of a caption stream.
If you're inserting at position 0 (and there was no previous deleted text
from that position), you can't use the timestamp from the previous character
position, since there's no previous character. Use the timestamp of the
following character instead.
2017-10-12 14:16:34 -04:00
Calvin Walton
a28efe68d0 Don't crash on IME-inserted captions
Current BBB client is generating invalid indexes when characters are
inserted with an IME - the edit indexes are as if the preedit text was
being removed, but the preedit text was never sent in the first place.

For now, just don't crash if there's an edit that would remove text
which is past the end of known text. The result might be broken, but
it won't prevent the rest of the recording from working.
2017-06-13 16:02:03 -04:00
Calvin Walton
11da0b61e0 Update caption track generation based on feedback
I've had a chance to see how this script behaves with actual live
caption tracks now, and there's room for improvement. In particular,
it often generates cues that overlap - the next one appears before the
previous one disappears. The browser player position handles this
really poorly, and it's nearly unreadable.

The solution is to, if two cues would overlap, merge them into a
single multiline cue that displays for the full time. This is a lot
easier to read.

Some extra code is added to de-overlap any remaining cues (e.g. if
there's a third cue that would also overlap). This will reduce the
time that the earlier cue gets shown below my preferred minimum, but
not really much we can do about that if people are talking/typing
quickly.

The code can easily be tweaked to set a different number of maximum
lines per cue if desired.
2017-06-06 09:24:14 -04:00
Calvin Walton
fb3a913b5b gen_webvtt: Parse entire events file, instead of iterparse
The iterparse mode doesn't correctly handle long multi-byte UTF-8
characters with some versions of lxml library or libxml version.
2016-12-02 14:33:06 -05:00
Calvin Walton
b50a3020b1 Caption: Handle forced line break directly after wrap correctly.
When the next line break after a wrap induced due to line length was a
hard line break, the hard line break would override the soft line break,
resulting in over-long lines, and the last word would be repeated on the
last line.

Move the hard break code into an else branch so it's only applied if we
haven't already done a word wrap on this line.
2016-04-04 16:11:48 -04:00
Calvin Walton
0babcc195b Update gen_webvtt to list correct deps for ubuntu 14.04 2016-04-04 16:08:10 -04:00
Calvin Walton
b559b4d9d6 Add a tool to generate webvtt files from caption events 2016-04-04 16:08:10 -04:00