An audio summarizer (faster-whisper and BART glued together)
This repository has been archived on 2025-09-28. You can view files and clone it, but you cannot make any changes to it's state, such as pushing and creating new issues, pull requests or comments.
Find a file
2024-08-16 20:19:46 +02:00
.gitignore Switch from whisper.cpp to faster-whisper 2024-08-15 22:22:30 +02:00
audio-summarize.py Clarify that only english summarization is supported at the moment, pin it in the code 2024-08-16 20:19:46 +02:00
LICENSE Add username to LICENSE 2024-08-13 21:40:15 +02:00
README.md Clarify that only english summarization is supported at the moment, pin it in the code 2024-08-16 20:19:46 +02:00
requirements.txt Switch from whisper.cpp to faster-whisper 2024-08-15 22:22:30 +02:00

audio-summarize

An audio summarizer that glues together faster-whisper and BART.

Supported Languages

Only English summarization is supported.

Dependencies

  • Python 3 (tested: 3.12)

Setup

Create a virtual environment for python, activate it and install the required python packages:

python3 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt

Run

  1. In your terminal, make shure you have your python venv activated
  2. Run audio-summarize.py

Usage

./audio-summarize.py -i filepath -o filepath [-m name]
                   [--summin n] [--summax n] [--segmax n]

options:
  -h, --help   show this help message and exit
  --summin n   The minimum lenght of a segment summary [10] (min: 5)
  --summax n   The maximum lenght of a segment summary [90] (min: 5)
  --segmax n   The maximum number of tokens per segment [375] (5 - 500)
  -m name      The name of the whisper model to be used [small.en]
  -i filepath  The path to the media file
  -o filepath  Where to save the output text to

Example:

./audio-summarize.py -i ./tmp/test.webm -o ./tmp/output.txt

How does it work?

To summarize a media file, the program executes the following steps:

  1. Convert and transcribe the media file using faster-whisper, using ffmpeg and ctranslate2 under the hood
  2. Semantically split up the transcript into segments using semantic-text-splitter and the tokenizer for BART
  3. Summarize each segment using BART (facebook/bart-large-cnn)
  4. Write the results to a text file