Use streaming HTTP API calls #8

Closed
opened 2023-05-24 18:32:56 +00:00 by ChaoticByte · 1 comment
ChaoticByte commented 2023-05-24 18:32:56 +00:00 (Migrated from github.com)

This allows to display chunks/tokens as soon as they are generated, instead of waiting for the whole output.

This allows to display chunks/tokens as soon as they are generated, instead of waiting for the whole output.
ChaoticByte commented 2023-05-24 21:43:12 +00:00 (Migrated from github.com)

After a semi-working implementation, I noticed that using HTTP Streams results in degraded performance (answers will take longer because an increased load time (llama timings).

After a semi-working implementation, I noticed that using HTTP Streams results in degraded performance (answers will take longer because an increased `load time` (llama timings).
This repository is archived. You cannot comment on issues.
No description provided.