Archived

This repository has been archived on 2025-09-28. You can view files and clone it, but you cannot make any changes to it's state, such as pushing and creating new issues, pull requests or comments.

ChaoticByte 4ab43594de

Add more information to the README

2024-08-13 21:37:46 +02:00

1.9 KiB

Raw Blame History

audio-summarize

An audio summarizer that glues together ffmpeg, whisper.cpp and BART.

Dependencies

Python 3 (tested: 3.12)
ffmpeg
git
make
c/c++ compiler (on Ubuntu, installing build-essential does the trick)

Setup

Create a virtual environment for python and activate it:

python3 -m venv .venv
source .venv/bin/activate

Run setup.sh

./setup.sh

Run

You need a whisper.cpp compatible model file (-> https://huggingface.co/ggerganov/whisper.cpp)
In your terminal, make shure you have your python venv activated
Run audio-summarize.py

Usage

./audio-summarize.py -m filepath -i filepath -o filepath
                   [--summin n] [--summax n] [--segmax n]

options:
  -h, --help   show this help message and exit
  --summin n   The minimum lenght of a segment summary [10, min: 5]
  --summax n   The maximum lenght of a segment summary [90, min: 5]
  --segmax n   The maximum number of tokens per segment [375, 5 - 500]
  -m filepath  The path to a whisper.cpp-compatible model file
  -i filepath  The path to the media file
  -o filepath  Where to save the output text to

Example:

./audio-summarize.py -m ./tmp/whisper_ggml-small.en-q5_1.bin -i ./tmp/test.webm -o ./tmp/output.txt

How does it work?

To summarize a media file, the program executes the following steps:

Convert the media file with ffmpeg to a mono 16kHz 16bit-PCM wav file
Transcribe that wav file using whisper.cpp
Clean up the transcript (newlines, whitespaces at the beginning and end)
Semantically split up the transcript into segments using semantic-text-splitter and the tokenizer for BART
Summarize each segment using BART (facebook/bart-large-cnn)
Write the results to a text file

1.9 KiB Raw Blame History