Feimi's Subtitling Notebook (2) - Whisper48

2 minute read

Published:

This page is nolonger up-to-date and is subjected to a major revision in the near future.

Introduction

This is an introduction of my Whisper48 project.

In December 2022, OpenAI published their general-purpose speech recognition model Whisper, which was trained on a large dataset of diverse audio and is able to perform multilingual speech recognition, speech translation, and language identification.

It turns out that this speech recognition model is super-helpful in the context of subtitiling. Buzz for example, an GUI integration of Whisper, has received 3.6k stars on GitHub for now.

Subtitling, especially timestamping can be a tedious work, so that’s where Whisper comes in. Powered by Whisper, Ayanaminn developed N46Whisper for subtitiling Nogizaka 46 videos. This project is deployed on Google Colab, thereby taking advantage of free GPU offered on the cloud.

This project, Whisper48, is a fork from N46Whisper and sticked to the idea of running on Google Colab (simply because I don’t have enough GPU!). Minor modifications were made to incorporate the usage of more accurate Whisper-based models (WhisperX for example) and to adapt for other personal demands.

In principle, usage of this kind of tools like N46Whisper or Whisper48 should not be limited only to Nogizaka 48 or AKB48 videos. Thanks to the multilingual power of Whisper, videos in any common languages (French, German, Korean, …) are supported.

Prerequisites

  1. Install FFmpeg.
  2. A google account.

Step by step

  1. Download the video.

  2. Use FFmpeg to extract the audio from the video.
    • Open the commad line terminal on your computer (Powershell on Windows, terminal on Mac).
    • Enter the working directory.
    • Run FFmpeg with the command: $ ffmpeg -i input.mp4 -ar 16000 -ac 1 -c:a pcm_s16le output.wav
  3. Upload the audio file to your Google Drive. Open the Whisper48 notebook on Google Colab.

  4. The notebook is divided into different parts. Titles and texts are self-explanatory. Run them one by one by clicking the “play button” on the left of each part.

  5. The subtitile file will be dowloaded automatically when the program finishes. You may receive a pop-up window for selecting the folder to download the file.

  6. Upon finishing, open the drop-down menu on the top-right corner of the page. Choose “Disconnect and delete runtime” to disconnect you from Google servers.

Translation

From my own experiences, different translation tools can be consulted: ChatGPT, DeepL, Google Translate, and a very helpful web dictionary of my favorite - Jisho.org.