Notes
How to Transcribe Audio Using Faster Whisper in Google Colab: Complete Guide
Introduction
Audio transcription has become increasingly accessible with advanced machine learning models. In this guide, we'll walk through how to perform accurate audio transcription using Faster Whisper in Google Colab, leveraging GPU acceleration for efficient processing.
Prerequisites
Before we begin, ensure you have a Google Colab account with GPU runtime enabled. This tutorial assumes you're familiar with basic Python and Colab environments.
Step-by-Step Guide
1. Installation and Setup
First, install the required libraries in your Colab notebook:
!pip install faster-whisper
!apt-get install ffmpeg
!pip install ctranslate2==4.4.0
2. Version Compatibility Note
Important Compatibility Consideration
When working with Faster Whisper and related libraries, version compatibility is crucial. In our example, we specifically use ctranslate2==4.4.0
due to specific CUDA and cuDNN requirements:
ctranslate2
version 4.5.0 requires cuDNN 9.1 and is only compatible with CUDA 12.4- Torch version 2.5.1+cu121 supports CUDA 12.1
This subtle version difference can cause significant installation or runtime issues, so pay close attention to library versions.
3. Mounting Google Drive (Optional)
If your audio files are stored in Google Drive:
from google.colab import drive
drive.mount('/content/drive')
4. Verifying System Capabilities
Before transcription, verify your system's capabilities:
import torch
import ctranslate2
print(f"Torch version: {torch.__version__}")
print(f"CTranslate2 version: {ctranslate2.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")
5. Audio Transcription Script
Here's a comprehensive transcription script:
from faster_whisper import WhisperModel
# Initialize the model (choose model size: tiny, base, small, medium, large)
model = WhisperModel("base", device="cuda", compute_type="float16")
# Path to your audio file
audio_path = '/content/drive/My Drive/Audio/output.aac'
# Transcribe the audio
segments, info = model.transcribe(audio_path)
def format_time(seconds):
"""Convert seconds to HH:MM:SS.ms format"""
hours = int(seconds // 3600)
minutes = int((seconds % 3600) // 60)
seconds = seconds % 60
return f"{hours:02d}:{minutes:02d}:{seconds:05.2f}"
# Print transcribed segments with timestamps
for segment in segments:
start_time = format_time(segment.start)
end_time = format_time(segment.end)
text = segment.text
print(f"[{start_time} -> {end_time}] {text}")
Bonus: Extracting Audio from Video
Use FFmpeg to extract audio from a video file:
ffmpeg -i input.mp4 -vn -acodec copy output.aac
Tips and Best Practices
- Choose the appropriate Whisper model size based on your computational resources and accuracy needs.
- Ensure good audio quality for best transcription results.
- Always check library version compatibility.
Conclusion
Faster Whisper provides a powerful, GPU-accelerated solution for audio transcription directly in Google Colab. By understanding version dependencies and following this guide, you can efficiently transcribe audio files.
Troubleshooting
- If you encounter CUDA or library compatibility issues, double-check your library versions.
- Ensure your Colab runtime is set to GPU.
- For large audio files, consider using larger Whisper models or splitting the audio.
Leave a reply