Whisper48

This page is nolonger up-to-date and is subjected to a major revision in the near future.

This is an introduction of my Whisper48 project.

AI is changing our life! Take good notice of it : )

About the project

In December 2022, OpenAI published their general-purpose speech recognition model Whisper, which was trained on a large dataset of diverse audio and is able to perform multilingual speech recognition, speech translation, and language identification.

It turns out that this speech recognition model is super-helpful in the context of subtitiling. Buzz for example, an GUI integration of Whisper, has received 3.6k stars on GitHub for now.

Subtitling, especially timestamping can be a tedious work, so that’s where Whisper comes in. Powered by Whisper, Ayanaminn developed N46Whisper for subtitiling Nogizaka 46 videos. This project is deployed on Google Colab, thereby taking advantage of free GPU offered on the cloud.

This project, Whisper48, is a fork from N46Whisper and sticked to the idea of running on Google Colab (simply because I don’t have enough GPU!). Minor modifications were made to incorporate the usage of more accurate Whisper-based models (WhisperX for example) and to adapt for other personal demands.

In principle, usage of this kind of tools like N46Whisper or Whisper48 should not be limited only to Nogizaka 48 or AKB48 videos. Thanks to the multilingual power of Whisper, videos in any common languages (French, German, Korean, …) are supported.

Translation

For my own experiences, different translation tools can be consulted: ChatGPT, DeepL, Google Translate, and a very helpful web dictionary of my favorite - Jisho.org.

Furthur editting and exporting

I edit subtitiles with Aegisub, and export videos with hard-coded subtitles using ArcTime Pro.

I use (at least from this year) open-sourced fonts Noto for my subtitiles.