Use streaming HTTP API calls #8

New issue

Closed

opened 2023-05-24 18:32:56 +00:00 by ChaoticByte · 1 comment

ChaoticByte commented

2023-05-24 18:32:56 +00:00

(Migrated from github.com)

This allows to display chunks/tokens as soon as they are generated, instead of waiting for the whole output.

ChaoticByte commented

2023-05-24 21:43:12 +00:00

(Migrated from github.com)

After a semi-working implementation, I noticed that using HTTP Streams results in degraded performance (answers will take longer because an increased load time (llama timings).

After a semi-working implementation, I noticed that using HTTP Streams results in degraded performance (answers will take longer because an increased `load time` (llama timings).

This repository is archived. You cannot comment on issues.