Archived

An audio summarizer (faster-whisper and BART glued together)

ai ai-summarizer audio bart ctranslate2 faster-whisper nlp speech-to-text summarization whisper

This repository has been archived on 2025-09-28. You can view files and clone it, but you cannot make any changes to it's state, such as pushing and creating new issues, pull requests or comments.

Find a file

ChaoticByte 4ab43594de Add more information to the README		2024-08-13 21:37:46 +02:00
.gitignore	Add project files	2024-08-13 20:32:46 +02:00
audio-summarize.py	Tweak log messages	2024-08-13 21:37:18 +02:00
LICENSE	Initial commit	2024-08-13 20:29:07 +02:00
README.md	Add more information to the README	2024-08-13 21:37:46 +02:00
requirements.txt	Add project files	2024-08-13 20:32:46 +02:00
setup.sh	Add project files	2024-08-13 20:32:46 +02:00

README.md

audio-summarize

An audio summarizer that glues together ffmpeg, whisper.cpp and BART.

Dependencies

Python 3 (tested: 3.12)
ffmpeg
git
make
c/c++ compiler (on Ubuntu, installing build-essential does the trick)

Setup

Create a virtual environment for python and activate it:

python3 -m venv .venv
source .venv/bin/activate

Run setup.sh

./setup.sh

Run

You need a whisper.cpp compatible model file (-> https://huggingface.co/ggerganov/whisper.cpp)
In your terminal, make shure you have your python venv activated
Run audio-summarize.py

Usage

./audio-summarize.py -m filepath -i filepath -o filepath
                   [--summin n] [--summax n] [--segmax n]

options:
  -h, --help   show this help message and exit
  --summin n   The minimum lenght of a segment summary [10, min: 5]
  --summax n   The maximum lenght of a segment summary [90, min: 5]
  --segmax n   The maximum number of tokens per segment [375, 5 - 500]
  -m filepath  The path to a whisper.cpp-compatible model file
  -i filepath  The path to the media file
  -o filepath  Where to save the output text to

Example:

./audio-summarize.py -m ./tmp/whisper_ggml-small.en-q5_1.bin -i ./tmp/test.webm -o ./tmp/output.txt

How does it work?

To summarize a media file, the program executes the following steps:

Convert the media file with ffmpeg to a mono 16kHz 16bit-PCM wav file
Transcribe that wav file using whisper.cpp
Clean up the transcript (newlines, whitespaces at the beginning and end)
Semantically split up the transcript into segments using semantic-text-splitter and the tokenizer for BART
Summarize each segment using BART (facebook/bart-large-cnn)
Write the results to a text file