forgejo/models/dbfs/dbfs.go
Earl Warren 238ecfdeb8 fix: garbage collect lingering actions logs (#10009)
If, for any reason (e.g. server crash), a task is recorded as done in the database but the logs are still in the database instead of being in storage, they need to be collected.

The log_in_storage field is only set to true after the logs have been transfered to storage and can be relied upon to reflect which tasks have lingering logs.

A cron job collects lingering logs every day, 3000 at a time, sleeping one second between them. In normal circumstances there will be only a few of them, even on a large instance, and there is no need to collect them as quickly as possible.

When there are a lot of them for some reason, garbage collection must happen at a rate that is not too hard on storage I/O.

Refs https://codeberg.org/forgejo/forgejo/issues/9999

---

Note on backports: the v11 backport is done manually because of minor conflicts. https://codeberg.org/forgejo/forgejo/pulls/10024

## Checklist

The [contributor guide](https://forgejo.org/docs/next/contributor/) contains information that will be helpful to first time contributors. There also are a few [conditions for merging Pull Requests in Forgejo repositories](https://codeberg.org/forgejo/governance/src/branch/main/PullRequestsAgreement.md). You are also welcome to join the [Forgejo development chatroom](https://matrix.to/#/#forgejo-development:matrix.org).

### Tests

- I added test coverage for Go changes...
  - [x] in their respective `*_test.go` for unit tests.
  - [x] in the `tests/integration` directory if it involves interactions with a live Forgejo server.
- I added test coverage for JavaScript changes...
  - [ ] in `web_src/js/*.test.js` if it can be unit tested.
  - [ ] in `tests/e2e/*.test.e2e.js` if it requires interactions with a live Forgejo server (see also the [developer guide for JavaScript testing](https://codeberg.org/forgejo/forgejo/src/branch/forgejo/tests/e2e/README.md#end-to-end-tests)).

### Documentation

- [ ] I created a pull request [to the documentation](https://codeberg.org/forgejo/docs) to explain to Forgejo users how to use this change.
- [x] I did not document these changes and I do not expect someone else to do it.

### Release notes

- [ ] I do not want this change to show in the release notes.
- [x] I want the title to show in the release notes with a link to this pull request.
- [ ] I want the content of the `release-notes/<pull request number>.md` to be be used for the release notes instead of the title.

<!--start release-notes-assistant-->

## Release notes
<!--URL:https://codeberg.org/forgejo/forgejo-->
- Bug fixes
  - [PR](https://codeberg.org/forgejo/forgejo/pulls/10009): <!--number 10009 --><!--line 0 --><!--description Z2FyYmFnZSBjb2xsZWN0IGxpbmdlcmluZyBhY3Rpb25zIGxvZ3M=-->garbage collect lingering actions logs<!--description-->
<!--end release-notes-assistant-->

Co-authored-by: Mathieu Fenniak <mathieu@fenniak.net>
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/10009
Reviewed-by: Mathieu Fenniak <mfenniak@noreply.codeberg.org>
Reviewed-by: Gusted <gusted@noreply.codeberg.org>
Co-authored-by: Earl Warren <contact@earl-warren.org>
Co-committed-by: Earl Warren <contact@earl-warren.org>
2025-11-18 18:59:01 +01:00

131 lines
4.1 KiB
Go

// Copyright 2022 The Gitea Authors. All rights reserved.
// SPDX-License-Identifier: MIT
package dbfs
import (
"context"
"io/fs"
"os"
"path"
"time"
"forgejo.org/models/db"
)
/*
The reasons behind the DBFS (database-filesystem) package:
When a Gitea action is running, the Gitea action server should collect and store all the logs.
The requirements are:
* The running logs must be stored across the cluster if the Gitea servers are deployed as a cluster.
* The logs will be archived to Object Storage (S3/MinIO, etc.) after a period of time.
* The Gitea action UI should be able to render the running logs and the archived logs.
Some possible solutions for the running logs:
* [Not ideal] Using local temp file: it can not be shared across the cluster.
* [Not ideal] Using shared file in the filesystem of git repository: although at the moment, the Gitea cluster's
git repositories must be stored in a shared filesystem, in the future, Gitea may need a dedicated Git Service Server
to decouple the shared filesystem. Then the action logs will become a blocker.
* [Not ideal] Record the logs in a database table line by line: it has a couple of problems:
- It's difficult to make multiple increasing sequence (log line number) for different databases.
- The database table will have a lot of rows and be affected by the big-table performance problem.
- It's difficult to load logs by using the same interface as other storages.
- It's difficult to calculate the size of the logs.
The DBFS solution:
* It can be used in a cluster.
* It can share the same interface (Read/Write/Seek) as other storages.
* It's very friendly to database because it only needs to store much fewer rows than the log-line solution.
* In the future, when Gitea action needs to limit the log size (other CI/CD services also do so), it's easier to calculate the log file size.
* Even sometimes the UI needs to render the tailing lines, the tailing lines can be found be counting the "\n" from the end of the file by seek.
The seeking and finding is not the fastest way, but it's still acceptable and won't affect the performance too much.
*/
type DbfsMeta struct { //revive:disable-line:exported
ID int64 `xorm:"pk autoincr"`
FullPath string `xorm:"VARCHAR(500) UNIQUE NOT NULL"`
BlockSize int64 `xorm:"BIGINT NOT NULL"`
FileSize int64 `xorm:"BIGINT NOT NULL"`
CreateTimestamp int64 `xorm:"BIGINT NOT NULL"`
ModifyTimestamp int64 `xorm:"BIGINT NOT NULL"`
}
type DbfsData struct { //revive:disable-line:exported
ID int64 `xorm:"pk autoincr"`
Revision int64 `xorm:"BIGINT NOT NULL"`
MetaID int64 `xorm:"BIGINT index(meta_offset) NOT NULL"`
BlobOffset int64 `xorm:"BIGINT index(meta_offset) NOT NULL"`
BlobSize int64 `xorm:"BIGINT NOT NULL"`
BlobData []byte `xorm:"BLOB NOT NULL"`
}
func init() {
db.RegisterModel(new(DbfsMeta))
db.RegisterModel(new(DbfsData))
}
func OpenFile(ctx context.Context, name string, flag int) (File, error) {
f, err := newDbFile(ctx, name)
if err != nil {
return nil, err
}
err = f.open(flag)
if err != nil {
_ = f.Close()
return nil, err
}
return f, nil
}
func Open(ctx context.Context, name string) (File, error) {
return OpenFile(ctx, name, os.O_RDONLY)
}
func Create(ctx context.Context, name string) (File, error) {
return OpenFile(ctx, name, os.O_RDWR|os.O_CREATE|os.O_TRUNC)
}
func Rename(ctx context.Context, oldPath, newPath string) error {
f, err := newDbFile(ctx, oldPath)
if err != nil {
return err
}
defer f.Close()
return f.renameTo(newPath)
}
func Remove(ctx context.Context, name string) error {
f, err := newDbFile(ctx, name)
if err != nil {
return err
}
defer f.Close()
return f.delete()
}
var _ fs.FileInfo = (*DbfsMeta)(nil)
func (m *DbfsMeta) Name() string {
return path.Base(m.FullPath)
}
func (m *DbfsMeta) Size() int64 {
return m.FileSize
}
func (m *DbfsMeta) Mode() fs.FileMode {
return os.ModePerm
}
func (m *DbfsMeta) ModTime() time.Time {
return fileTimestampToTime(m.ModifyTimestamp)
}
func (m *DbfsMeta) IsDir() bool {
return false
}
func (m *DbfsMeta) Sys() any {
return nil
}