Switch from whisper.cpp to faster-whisper

2024-08-15 22:20:55 +02:00 · 2024-08-15 22:20:55 +02:00 · f83043921a
commit f83043921a
parent 464ede2444
5 changed files with 47 additions and 112 deletions
--- a/README.md
+++ b/README.md
@ -1,48 +1,40 @@
 # audio-summarize

-An audio summarizer that glues together ffmpeg, whisper.cpp and BART.
+An audio summarizer that glues together [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and [BART](https://huggingface.co/facebook/bart-large-cnn).

 ## Dependencies

 - Python 3 (tested: 3.12)
- ffmpeg
- git
- make
- c/c++ compiler (on Ubuntu, installing `build-essential` does the trick)

 ## Setup

-Create a virtual environment for python and activate it:
+Create a virtual environment for python, activate it and install the required python packages:

 ```bash
 python3 -m venv .venv
 source .venv/bin/activate
-```
-
-Run setup.sh
-
-```bash
-./setup.sh
+pip3 install -r requirements.txt
 ```

 ## Run

-1. You need a whisper.cpp compatible model file (-> https://huggingface.co/ggerganov/whisper.cpp)
-2. In your terminal, make shure you have your python venv activated
-3. Run audio-summarize.py
+1. In your terminal, make shure you have your python venv activated
+2. Run audio-summarize.py

 ### Usage

 ```
-./audio-summarize.py -m filepath -i filepath -o filepath
-                   [--summin n] [--summax n] [--segmax n]
+./audio-summarize.py -i filepath -o filepath
+                     [--summin n] [--summax n] [--segmax n]
+                     [--lang lang] [-m name]

 options:
  -h, --help   show this help message and exit
  --summin n   The minimum lenght of a segment summary [10, min: 5]
  --summax n   The maximum lenght of a segment summary [90, min: 5]
  --segmax n   The maximum number of tokens per segment [375, 5 - 500]
-  -m filepath  The path to a whisper.cpp-compatible model file
+  --lang lang  The language of the audio source ['en']
+  -m name      The name of the whisper model to be used ['small.en']
  -i filepath  The path to the media file
  -o filepath  Where to save the output text to
 ```
@ -50,16 +42,14 @@ options:
 Example:

 ```bash
-./audio-summarize.py -m ./tmp/whisper_ggml-small.en-q5_1.bin -i ./tmp/test.webm -o ./tmp/output.txt
+./audio-summarize.py -i ./tmp/test.webm -o ./tmp/output.txt
 ```

 ## How does it work?

 To summarize a media file, the program executes the following steps:

-1. Convert the media file with [ffmpeg](https://www.ffmpeg.org/) to a mono 16kHz 16bit-PCM wav file
-2. Transcribe that wav file using [whisper.cpp](https://github.com/ggerganov/whisper.cpp)
-3. Clean up the transcript (newlines, whitespaces at the beginning and end)
-4. Semantically split up the transcript into segments using [semantic-text-splitter](https://github.com/benbrandt/text-splitter) and the tokenizer for BART
-5. Summarize each segment using BART ([`facebook/bart-large-cnn`](https://huggingface.co/facebook/bart-large-cnn))
-6. Write the results to a text file
+1. Convert and transcribe the media file using [faster-whisper](https://github.com/SYSTRAN/faster-whisper), using [ffmpeg](https://www.ffmpeg.org/) and [ctranslate2](https://github.com/OpenNMT/CTranslate2/) under the hood
+2. Semantically split up the transcript into segments using [semantic-text-splitter](https://github.com/benbrandt/text-splitter) and the tokenizer for BART
+3. Summarize each segment using BART ([`facebook/bart-large-cnn`](https://huggingface.co/facebook/bart-large-cnn))
+4. Write the results to a text file