FFmpeg 8.0 merges OpenAI "Whisper Filter" for automatic speech recognition, Vulkan AV1 encoding, & VP9 decoding

cm0002@piefed.world · 19 hours ago

FFmpeg 8.0 merges OpenAI "Whisper Filter" for automatic speech recognition, Vulkan AV1 encoding, & VP9 decoding

data1701d (He/Him)@startrek.website · edit-2 13 hours ago

I think including the word “OpenAI” in the post name is somewhat a misnomer that implies an encrapification not really happening to the FFMPEG project.

Yes, it is true OpenAI originally developed the Whisper model, and I hate OpenAI; however:

Whisper is actually open source, unlike most OpenAI crap.
FFmpeg isn’t even directly using the OpenAI version, written in Python - they’re using a port to C++ called Whisper.cpp
We’ve been able to use speech recognition for decades, so unlike other AI models, I don’t think a speech recognition model that does it better is problem.
You don’t even necessarily have to compile FFmpeg with Whisper support.

I get the dislike of AI, but the idea of association with OpenAI is overblown and not really reflective of reality. Now I can get not wanting to use open source projects whose developers don’t reflect your principles; however, I think this ethical issue is more indirect than may initially appear and is not a strong reason to quit using what is still the most effective media conversion tool.

chrisbtoo@lemmy.ca · 18 hours ago

Hopefully the speech recognition is better than whatever the fuck most online video platforms use for automatic subtitles at the moment.

pirateKaiser@sh.itjust.works · 18 hours ago

I’ve built an app with Whisper, the level of ‘hit or miss’ entirely depends on the size of the model and language. Even audio quality is a lesser factor in my experience. So, it depends…

data1701d (He/Him)@startrek.website · 13 hours ago

I was messing around with HomeAssistant the other day, which uses the same speech recognition engine, and I found it to be decent.

Grass@sh.itjust.works · 11 hours ago

has anyone compared vulkan av1 to nvenc or vaapi? too new still?

katy ✨@piefed.blahaj.zone · 18 hours ago

ugh so what’s the alternative package to ffmpeg?

data1701d (He/Him)@startrek.website · 13 hours ago

No need to panic in this case. While I hate OpenAI, there’s two things to note here:

Whisper is an open source library for speech recognition rather than generative AI, run entirely locally. It’s just using ML to do something we could already do with computers (speech recognition), but better.
They aren’t even directly using the OpenAI version - they’re using whisper.cpp, a port of the model.

TonyOstrich@lemmy.world · 13 hours ago

This is one of the actually decent uses of this model. I have used Whisper to transcribe to phone calls, and just the other week I had to export the audio from a video I was working on to run whisper to get subtitles for the video. It’s still not a set it and forget it solution, but correcting it’s small mistakes here and there is so much faster than manually transcribing the audio.

Given how modular ffmpeg is with the way the switches work a user never has to interact with that portion of the application. I can technically use ffmpeg to trsnscode an mp3 without ever using the video components.

LunaChocken@programming.dev · 17 hours ago

Good luck with that… ffmpeg is the de facto standard.

the_doktor@lemmy.zip · edit-2 18 hours ago

It’s getting to the point where EVERYTHING has freaking AI slop in it and the only solution is to manually build an entire OS from scratch (LFS) and disabling any damned AI slop any package has, while putting it all on a pre-AI era computer because I’m sure the HW and BIOS in every computer will have damned AI soon enough.

To hell with AI slop and to hell with anyone supporting that copyright-infringing, inaccurate, brain-dead, environmentally unfriendly pile of crap technology.

Björn Tantau@swg-empire.de · 18 hours ago

I mean, here it is used optionally to help with accessibility. That is objectively a good use of AI.

Scoopta@programming.dev · 16 hours ago

Also it’s running locally. I think the biggest problem with AI is the data harvesting and this is just not that

Default Username@lemmy.dbzer0.com · edit-2 14 hours ago

This isn’t really GenAI, so I don’t really have a problem with it much like how I don’t have a problem with AI upscaling, for instance. It also can run locally.

data1701d (He/Him)@startrek.website · 13 hours ago

I mostly agree with you. However, I think there are some caveats to upscaling; there are so many lazy “4K AI UPSCALE BEST QUALITY” videos online that just don’t look good and were clearly put there just to get views.

However, I’ve also found they have their uses; for instance, I wanted to laser cut a TMBG Flood logo once, but there were very few good images online that traced well in Inkscape. I ended up doing an AI upscale of the least terrible one with a white background, and that traced pretty well in Inkscape.

FFmpeg 8.0 merges OpenAI "Whisper Filter" for automatic speech recognition, Vulkan AV1 encoding, & VP9 decoding

FFmpeg 8.0 merges OpenAI "Whisper Filter" for automatic speech recognition, Vulkan AV1 encoding, & VP9 decoding

FFmpeg 8.0 Merges Vulkan AV1 Encoding & VP9 Decoding