Transcribing Audio With OpenAI / Whisper

Recently, I was listening to a philosophy lecture by Michael Segrue that included some commentary related to T.S. Eliot. I wanted to extract that specific segment because the quotes and ideas were deeply relevant to what I was working on in my Lorde chapter.

In the past, I had transcribed audio to text using various online tools. Most of them offer a few minutes free, then quickly push for a subscription. While that can work in a pinch, this particular case involved rare audio recordings—content not publicly available and accessed through a membership paywall.

Even if these transcription services are trustworthy, I felt it was inappropriate to upload someone else’s protected material. That’s when I turned to a more private, local solution: OpenAI’s Whisper.

Why Whisper?

Since I had recently begun exploring AI tools, I discovered that OpenAI’s Whisper model can be downloaded and run locally. This gave me total control over the data. Whisper supports multiple models of varying sizes and accuracy levels, so you can pick one based on your hardware and needs.

Even the fastest model—while less accurate than the large ones—successfully transcribed the Segrue lecture segment I needed. The tradeoff was worth it for data privacy and full control.

This experience reinforced a key difference in how I approach AI: I’m not using it to generate content, but to help organize and extract insights from the content I already have. At some point, I’ll write a post about using local AI models to parse and search through PDF files—a method that’s proving useful for organizing research notes and generating ideas.

This is where I feel the conversation around AI often gets muddied. There’s a knee-jerk reaction among some who conflate all AI with auto-generated content. But there’s a huge difference between using AI as a creative crutch and using it as a research tool.

In my case, Whisper became a quiet but powerful assistant—letting me do the intellectual heavy lifting, while it handled the tedious transcription work behind the scenes.

How I Ran Whisper Locally on macOS

After deciding to handle the transcription locally, I visited the Whisper GitHub page and learned that it’s an open-source project built by OpenAI. The documentation is straightforward, and it gave me confidence that I could run it myself without relying on any third-party services.

I opened Terminal on my MacBook Pro and followed the instructions to install Whisper. I used pip to install it, along with ffmpeg for audio processing:

pip install git+https://github.com/openai/whisper.git 
brew install ffmpeg

For transcription, I chose the tiny model (the smallest and fastest), since I wanted to process the file quickly and didn’t require high-end accuracy for a simple reference segment.

Step 1: Download the Lecture

I used yt-dlp, a command-line utility that lets you download audio or video from YouTube and other platforms. This made it easy to save the Michael Segrue lecture to my computer:

yt-dlp "https://www.youtube.com/watch?v=VIDEO_ID"

Once I had the file, I used MediaHuman Audio Converter (a free GUI tool) to convert it to .mp3 format, which Whisper accepts without issue.

Step 2: Run the Transcription

Back in Terminal, I navigated to the directory containing the audio file and ran the following command to start the transcription process:

whisper segbib.mp3 --model tiny > segbib.txt

This command runs Whisper using the tiny model on the file segbib.mp3, and outputs the transcript directly to a text file named segbib.txt. The simplicity of this command—and the fact that it worked so smoothly—was impressive.

Transcribing Audio With OpenAI / Whisper

Transcribing Audio With OpenAI / Whisper

Why Whisper?

How I Ran Whisper Locally on macOS

Step 1: Download the Lecture

Step 2: Run the Transcription

Next Post

Using Adminer to Manage WordPress Database Tasks Securely

Previous Post

Summary of Completed Tasks