Whisper is an automated speech recognition tool developed by OpenAI. It generates transcripts and caption files for audio and video files. It can produce output in the same language as the media file or translate between languages. These caption and transcript files can be used to search the content of the files for keywords, and can enable users of all abilities to use and understand the content. Whisper can run locally on a computer without sending data anywhere, maintaining privacy for the content.
In 2023, we began testing Whisper as a solution to providing increased accessibility for digitized audiovisual special collections material. Through a Lyrasis Catalyst Fund grant, we are testing Whisper across a 250 hour sample of AV content to evaluate Whisper's overall performance and its equity of performance for different types of content and contexts, such as technical and subject-specific vocabularies, regional dialects and vernaculars, multi-speaker content, music and background noise, and a range of production qualities. Student editors working on the project edit the caption and transcript files generated by Whisper and produce a corrected caption file and transcript file for each media file. The corrected file is then compared with the Whisper-generated file to calculate the word error rate of Whisper's output.