This repository has been archived on 2025-09-28. You can view files and clone it, but you cannot make any changes to it's state, such as pushing and creating new issues, pull requests or comments.
audio-summarize/README.md

1.9 KiB

audio-summarize

An audio summarizer that glues together ffmpeg, whisper.cpp and BART.

Dependencies

  • Python 3 (tested: 3.12)
  • ffmpeg
  • git
  • make
  • c/c++ compiler (on Ubuntu, installing build-essential does the trick)

Setup

Create a virtual environment for python and activate it:

python3 -m venv .venv
source .venv/bin/activate

Run setup.sh

./setup.sh

Run

  1. You need a whisper.cpp compatible model file (-> https://huggingface.co/ggerganov/whisper.cpp)
  2. In your terminal, make shure you have your python venv activated
  3. Run audio-summarize.py

Usage

./audio-summarize.py -m filepath -i filepath -o filepath
                   [--summin n] [--summax n] [--segmax n]

options:
  -h, --help   show this help message and exit
  --summin n   The minimum lenght of a segment summary [10, min: 5]
  --summax n   The maximum lenght of a segment summary [90, min: 5]
  --segmax n   The maximum number of tokens per segment [375, 5 - 500]
  -m filepath  The path to a whisper.cpp-compatible model file
  -i filepath  The path to the media file
  -o filepath  Where to save the output text to

Example:

./audio-summarize.py -m ./tmp/whisper_ggml-small.en-q5_1.bin -i ./tmp/test.webm -o ./tmp/output.txt

How does it work?

To summarize a media file, the program executes the following steps:

  1. Convert the media file with ffmpeg to a mono 16kHz 16bit-PCM wav file
  2. Transcribe that wav file using whisper.cpp
  3. Clean up the transcript (newlines, whitespaces at the beginning and end)
  4. Semantically split up the transcript into segments using semantic-text-splitter and the tokenizer for BART
  5. Summarize each segment using BART (facebook/bart-large-cnn)
  6. Write the results to a text file