Whisper -- Open Source Transcription Solution

I have recently come across an excellent program for video/audio transcription.

This is an open source project that covers many languages and has amazing accuracy right out of the box. It generates text in various formats such as:

  • plain text,

  • plain text with timestamps,

  • Video Text Track (VTT),

  • SubRib Subtitle File (SRT), and others.

  • works with dozens of languages;

  • has automatic language detection;

  • can be run from the command line;

  • punctuates sentences correctly;

  • and can even translate a transcription – e.g. you can feed it a video or audio file in another language, say, Japanese, and it will produce an English transcription.

  • It’s written in python;

  • it’s covered under the MIT open source license;

  • it’s pretty quick as an hour podcast transcribes in under 15 minutes; and

  • it comes from the folks who produced ChatGPT

Here is the screenshot of the content generated from the first 2 minutes of the latest podcast of Destination Linux. It looks to be 100% accurate – even skipping the music

Here is the link to the project’s github:

4 Likes

This is awesome, I think we will spotlight this on an upcoming episode. This is something we might be able to utilize for our production as well. Thank you!

I am working on utilizing this to make a searchable database for the Tux Digital Network and podcasts in general. I have already transcribed 171 episodes of Destination Linux and all 74 episodes of Hardware Addicts. I am currently working on the backend of this application.

I envision the final outcome to be a search engine with results displayed as text with the target term shown in context. Then, by utilizing the timestamps produced from Whisper, each “hit” would be a link to the exact segment of audio in the podcast that contains the target term. The user would be able to read the context of the hit or listen to it.

With this built out, one would be able to enter “3070” or “10 nanometer” in a search box and the results would contain every reference in every podcast where the target term was mentioned. The results then would be clickable which would take you directly to that segment of that podcast.

For the last 7 years or so my side project as been perfecting and producing the concept of breaking large audio files into “sprites” (think CSS sprites but for audio) and mapping that audio to the corresponding text resulting in a clickable text that is read aloud – making text multimedia. The majority of that time I have been working in Hebrew to facilitate study and learning of Biblical Hebrew for students and believers around the world. This project with Whisper and Tux Digital podcasts has been an interesting diversion in my native tongue. I’m not promising anything soon, but this project is getting some of my attention.

Curious about where the 3070 is mentioned in Hardware Addicts? The timestamps are in milliseconds.

* Hardware_Addicts_Episode_018_Green_With_Envy_Over_Nvidias_3000_Serires_Ampere_GPUs.tsv: 2135520 2144320 When RTX 3070 drops faster than 2080 Ti at $499.
* Hardware_Addicts_Episode_018_Green_With_Envy_Over_Nvidias_3000_Serires_Ampere_GPUs.tsv: 2238600 2242120 But the 3070 and 3080 are priced really low.
* Hardware_Addicts_Episode_018_Green_With_Envy_Over_Nvidias_3000_Serires_Ampere_GPUs.tsv: 2287720 2293960 I'm very surprised that 3070 at the lowest level is faster than their last
* Hardware_Addicts_Episode_020_AMDs_Big_Navi_Market_Disruption_Nvidia_Boost_Woes_PC_Specs_For_Photography.tsv: 1497200 1505040 Speaking of rumors, one of the rumors going around is that Nvidia postponed the 3070 launch
* Hardware_Addicts_Episode_020_AMDs_Big_Navi_Market_Disruption_Nvidia_Boost_Woes_PC_Specs_For_Photography.tsv: 1514040 1520640 So if they need to adjust the price on the 3070 at launch, they can to be more competitive
* Hardware_Addicts_Episode_022_AMD_Zen_3_To_Center_Your_Chi_Radeon_6000_Series_-_A_Full_AMD_Extravaganza.tsv: 1274560 1282480 by about 79, 80 bucks. So in this case, I actually think that the 3070 is a better deal.
* Hardware_Addicts_Episode_035_How_To_Get_A_GPU_For_Your_PC_In_2021.tsv: 1857360 1861360 and EVGA 3070 hit Amazon that you could pick up.
* Hardware_Addicts_Episode_038_Computex_2021_Review_Hardware_From_AMD_Intel_Nvidia_More.tsv: 1349880 1353520 and the most, or more affordable RTX 3070 Ti,
* Hardware_Addicts_Episode_053_Intel_Return_Of_The_King_Intel_Comes_Back_Swinging.tsv: 1960900 1964820 It was slightly above the Nvidia RTX 3070 Ti.
* Hardware_Addicts_Episode_053_Intel_Return_Of_The_King_Intel_Comes_Back_Swinging.tsv: 1994380 1997140 but the 3070 Ti is nothing to laugh at.
* Hardware_Addicts_Episode_053_Intel_Return_Of_The_King_Intel_Comes_Back_Swinging.tsv: 1997140 1999380 I would like to have a 3070 Ti as well.

If any one wants to take a look at the Hebrew Audio Text Map I’ve been working on:

I’m going to have to look into using this as well. Hopefully it works in Fedora.