Archived

This repository has been archived on 2025-09-28. You can view files and clone it, but you cannot make any changes to it's state, such as pushing and creating new issues, pull requests or comments.

ChaoticByte f83043921a

Switch from whisper.cpp to faster-whisper

2024-08-15 22:22:30 +02:00

1.8 KiB

Raw Blame History

audio-summarize

An audio summarizer that glues together faster-whisper and BART.

Dependencies

Python 3 (tested: 3.12)

Setup

Create a virtual environment for python, activate it and install the required python packages:

python3 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt

Run

In your terminal, make shure you have your python venv activated
Run audio-summarize.py

Usage

./audio-summarize.py -i filepath -o filepath
                     [--summin n] [--summax n] [--segmax n]
                     [--lang lang] [-m name]

options:
  -h, --help   show this help message and exit
  --summin n   The minimum lenght of a segment summary [10, min: 5]
  --summax n   The maximum lenght of a segment summary [90, min: 5]
  --segmax n   The maximum number of tokens per segment [375, 5 - 500]
  --lang lang  The language of the audio source ['en']
  -m name      The name of the whisper model to be used ['small.en']
  -i filepath  The path to the media file
  -o filepath  Where to save the output text to

Example:

./audio-summarize.py -i ./tmp/test.webm -o ./tmp/output.txt

How does it work?

To summarize a media file, the program executes the following steps:

Convert and transcribe the media file using faster-whisper, using ffmpeg and ctranslate2 under the hood
Semantically split up the transcript into segments using semantic-text-splitter and the tokenizer for BART
Summarize each segment using BART (facebook/bart-large-cnn)
Write the results to a text file

1.8 KiB Raw Blame History