Talk Features: Searching Transcripts and Closed Captions

Slides: index one per 10 seconds (e.g.)

Closed captions: convert vtt/srt to text (eliminate overlap)

To actually do searching, need times in the text

Configuring Solr

Closed captions are lists of timings and the text that is displayed in that time.

Solr is not made to store complex data formats, so the easiest way to store
caption files is a string array. You can easily make it so that people can
search the caption and not the times, by adding an analysis step to the
indexing.

This approach won’t pick up matches that extend over several lines.