I have recently come across an excellent program for video/audio transcription.
This is an open source project that covers many languages and has amazing accuracy right out of the box. It generates text in various formats such as:
-
plain text,
-
plain text with timestamps,
-
Video Text Track (VTT),
-
SubRib Subtitle File (SRT), and others.
-
works with dozens of languages;
-
has automatic language detection;
-
can be run from the command line;
-
punctuates sentences correctly;
-
and can even translate a transcription – e.g. you can feed it a video or audio file in another language, say, Japanese, and it will produce an English transcription.
-
It’s written in python;
-
it’s covered under the MIT open source license;
-
it’s pretty quick as an hour podcast transcribes in under 15 minutes; and
-
it comes from the folks who produced ChatGPT
Here is the screenshot of the content generated from the first 2 minutes of the latest podcast of Destination Linux. It looks to be 100% accurate – even skipping the music
Here is the link to the project’s github: