An audio summarizer (faster-whisper and BART glued together)
This repository has been archived on 2025-09-28. You can view files and clone it, but you cannot make any changes to it's state, such as pushing and creating new issues, pull requests or comments.
Find a file
2024-08-13 21:37:46 +02:00
.gitignore Add project files 2024-08-13 20:32:46 +02:00
audio-summarize.py Tweak log messages 2024-08-13 21:37:18 +02:00
LICENSE Initial commit 2024-08-13 20:29:07 +02:00
README.md Add more information to the README 2024-08-13 21:37:46 +02:00
requirements.txt Add project files 2024-08-13 20:32:46 +02:00
setup.sh Add project files 2024-08-13 20:32:46 +02:00

audio-summarize

An audio summarizer that glues together ffmpeg, whisper.cpp and BART.

Dependencies

  • Python 3 (tested: 3.12)
  • ffmpeg
  • git
  • make
  • c/c++ compiler (on Ubuntu, installing build-essential does the trick)

Setup

Create a virtual environment for python and activate it:

python3 -m venv .venv
source .venv/bin/activate

Run setup.sh

./setup.sh

Run

  1. You need a whisper.cpp compatible model file (-> https://huggingface.co/ggerganov/whisper.cpp)
  2. In your terminal, make shure you have your python venv activated
  3. Run audio-summarize.py

Usage

./audio-summarize.py -m filepath -i filepath -o filepath
                   [--summin n] [--summax n] [--segmax n]

options:
  -h, --help   show this help message and exit
  --summin n   The minimum lenght of a segment summary [10, min: 5]
  --summax n   The maximum lenght of a segment summary [90, min: 5]
  --segmax n   The maximum number of tokens per segment [375, 5 - 500]
  -m filepath  The path to a whisper.cpp-compatible model file
  -i filepath  The path to the media file
  -o filepath  Where to save the output text to

Example:

./audio-summarize.py -m ./tmp/whisper_ggml-small.en-q5_1.bin -i ./tmp/test.webm -o ./tmp/output.txt

How does it work?

To summarize a media file, the program executes the following steps:

  1. Convert the media file with ffmpeg to a mono 16kHz 16bit-PCM wav file
  2. Transcribe that wav file using whisper.cpp
  3. Clean up the transcript (newlines, whitespaces at the beginning and end)
  4. Semantically split up the transcript into segments using semantic-text-splitter and the tokenizer for BART
  5. Summarize each segment using BART (facebook/bart-large-cnn)
  6. Write the results to a text file