I can vouch for whisper.cpp . It's not 100% perfect but it's good enough to transcribe a half hour podcast with numerous speakers and which requires pretty minimal fixing afterwards.
OP, this is the best Speech-to-Text solution, IMO. I've used Whisper on Windows (link to GitHub) successfully to transcribe graduate-level class recordings with very minimal manual fixing, mostly only certain last names.