CADET is a caption and description editing tool from the National Center for Accessible Media (NCAM) at WGBH. Visit the CADET page to learn more and find installation instructions.
Using CADET
Click the CADET icon in the dock/taskbar. CADET will open in your browser.
Editing captions
For best caption display
When you're done editing, use Tools > Check CEA-608 compliance to make sure no lines are over 32 characters. The Status window will show an error message for any line over 32 characters. Clicking the error message will take you to the caption so that you can edit the line break or split the content across multiple captions (if you do this, remember to also adjust caption start/end times. (Tools > Show ruler will display a ruler above the caption box you're editing, which you can use to check the line length.) When you're done adjusting line lengths, clear the messages in the Status window and do the CEA-608 compliance check again to make sure no lines over 32 characters remain.
Editing audio vs video files
Since audio recordings do not have a visual component, they require transcripts but not captions. To edit transcripts for audio files, open CADET, go to File > Open Media and select your recording. Then go to File > Import, select import type Plain, and navigate to the Whisper transcript for the recording. CADET will import the entire transcript as one or two very long caption blocks that are center-justified. Go to Style > Alignment > Left to change the transcript to left-justified. Edit the file as usual for content, combining the lines into paragraphs or adding blank lines as desired. (Note that hitting return twice in a row will automatically create a new caption block in CADET.)
If you regularly work with audio recordings of interviews or other spoken word content, you can use Whisper with Audacity to assist with transcription and translation.
Batch process: Whisper can output several file types (e.g., txt, srt, json). You will need SRT files.
Individual Files: OpenVINO Whisper Transcription plug-in
Word error rate (WER) is a metric commonly used to assess the accuracy of ASR systems. It is the ratio of errors in a transcript to the total number of words spoken. WER can be calculated through various means, such as the Python module werpy or jiwer.
For a script that uses werpy to compare sets of files via Terminal or Command Prompt, see https://github.com/ninarao/wer_calc