Editing and Analyzing Output - Whisper - OpenAI - Research Guides at Emory University Libraries

Editing Transcripts With CADET

CADET is a caption and description editing tool from the National Center for Accessible Media (NCAM) at WGBH. Visit the CADET page to learn more and find installation instructions.

Using CADET

Click the CADET icon in the dock/taskbar. CADET will open in your browser.

Starting on a new recording: Go to File > Open Media and select your recording. Then go to File > Import, select your caption or transcript file type, and navigate to your Whisper caption/transcript file. Wait for CADET to finish importing the captions - you'll see updates in the lower-right "Status" window and eventually it will say "project import complete".
Opening a previously saved work in progress: go to File > Open Project and click on the ".cadet" file you were working on. This will load your media file and in-progess captions.

Editing captions

Make sure you're in Edit mode and Project Type is set to Caption. Begin playing the media and make edits to the caption text or start/end times as needed. Use Ctrl + Space to play/pause the media as needed. You can toggle the caption display on and off by clicking the CC icon in the media window. See Help > Keyboard Shortcuts for more ways to navigate through the media.
Save the project periodically using File > Save Project As.
Export the completed caption/transcript file in the format of your choice (typically WebVTT or SRT for captions and Plain text for transcripts). Go to File > Export, select the Export Type, give the export a file name, and click Save.
Exit CADET using File > Quit, then close the CADET tab in your browser.

For best caption display

When you're done editing, use Tools > Check CEA-608 compliance to make sure no lines are over 32 characters. The Status window will show an error message for any line over 32 characters. Clicking the error message will take you to the caption so that you can edit the line break or split the content across multiple captions (if you do this, remember to also adjust caption start/end times. (Tools > Show ruler will display a ruler above the caption box you're editing, which you can use to check the line length.) When you're done adjusting line lengths, clear the messages in the Status window and do the CEA-608 compliance check again to make sure no lines over 32 characters remain.

Editing audio vs video files

Since audio recordings do not have a visual component, they require transcripts but not captions. To edit transcripts for audio files, open CADET, go to File > Open Media and select your recording. Then go to File > Import, select import type Plain, and navigate to the Whisper transcript for the recording. CADET will import the entire transcript as one or two very long caption blocks that are center-justified. Go to Style > Alignment > Left to change the transcript to left-justified. Edit the file as usual for content, combining the lines into paragraphs or adding blank lines as desired. (Note that hitting return twice in a row will automatically create a new caption block in CADET.)

Editing With Audacity

If you regularly work with audio recordings of interviews or other spoken word content, you can use Whisper with Audacity to assist with transcription and translation.

Batch process: Whisper can output several file types (e.g., txt, srt, json). You will need SRT files.

Use the python script srt-2-audacity to convert the SRT files to Audacity labels.
In Audacity import the label file that corresponds to your audio file.

Individual Files: OpenVINO Whisper Transcription plug-in

The OpenVINO Whisper Transcription plug-in uses whisper.cpp, which is a lightweight implementation of the model that allows integration in different platforms and applications. See this page for installation instructions.
OpenVINO has released several other AI plugins for Audacity that may be useful for audio preprocessing.

Calculating Word Error Rate

Word error rate (WER) is a metric commonly used to assess the accuracy of ASR systems. It is the ratio of errors in a transcript to the total number of words spoken. WER can be calculated through various means, such as the Python module werpy or jiwer.

For a script that uses werpy to compare sets of files via Terminal or Command Prompt, see https://github.com/ninarao/wer_calc