Notes

How to Transcribe Audio Using Faster Whisper in Google Colab: Complete Guide

Introduction

Audio transcription has become increasingly accessible with advanced machine learning models. In this guide, we'll walk through how to perform accurate audio transcription using Faster Whisper in Google Colab, leveraging GPU acceleration for efficient processing.

Prerequisites

Before we begin, ensure you have a Google Colab account with GPU runtime enabled. This tutorial assumes you're familiar with basic Python and Colab environments.

Step-by-Step Guide

1. Installation and Setup

First, install the required libraries in your Colab notebook:

!pip install faster-whisper
!apt-get install ffmpeg
!pip install ctranslate2==4.4.0

2. Version Compatibility Note

Important Compatibility Consideration

When working with Faster Whisper and related libraries, version compatibility is crucial. In our example, we specifically use ctranslate2==4.4.0 due to specific CUDA and cuDNN requirements:

  • ctranslate2 version 4.5.0 requires cuDNN 9.1 and is only compatible with CUDA 12.4
  • Torch version 2.5.1+cu121 supports CUDA 12.1

This subtle version difference can cause significant installation or runtime issues, so pay close attention to library versions.

3. Mounting Google Drive (Optional)

If your audio files are stored in Google Drive:

from google.colab import drive
drive.mount('/content/drive')

4. Verifying System Capabilities

Before transcription, verify your system's capabilities:

import torch
import ctranslate2

print(f"Torch version: {torch.__version__}")
print(f"CTranslate2 version: {ctranslate2.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")

5. Audio Transcription Script

Here's a comprehensive transcription script:

from faster_whisper import WhisperModel

# Initialize the model (choose model size: tiny, base, small, medium, large)
model = WhisperModel("base", device="cuda", compute_type="float16")

# Path to your audio file
audio_path = '/content/drive/My Drive/Audio/output.aac'

# Transcribe the audio
segments, info = model.transcribe(audio_path)

def format_time(seconds):
    """Convert seconds to HH:MM:SS.ms format"""
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    seconds = seconds % 60
    return f"{hours:02d}:{minutes:02d}:{seconds:05.2f}"

# Print transcribed segments with timestamps
for segment in segments:
    start_time = format_time(segment.start)
    end_time = format_time(segment.end)
    text = segment.text
    print(f"[{start_time} -> {end_time}] {text}")

Bonus: Extracting Audio from Video

Use FFmpeg to extract audio from a video file:

ffmpeg -i input.mp4 -vn -acodec copy output.aac

Tips and Best Practices

  1. Choose the appropriate Whisper model size based on your computational resources and accuracy needs.
  2. Ensure good audio quality for best transcription results.
  3. Always check library version compatibility.

Conclusion

Faster Whisper provides a powerful, GPU-accelerated solution for audio transcription directly in Google Colab. By understanding version dependencies and following this guide, you can efficiently transcribe audio files.

Troubleshooting

  • If you encounter CUDA or library compatibility issues, double-check your library versions.
  • Ensure your Colab runtime is set to GPU.
  • For large audio files, consider using larger Whisper models or splitting the audio.

 

Афоризм дня:
Надежды – сны бодрствующих. (510)

Leave a reply

Яндекс.Метрика