We sync the spam training data encrypted through our server to make
sure that all clients for a specific user behave the same when
classifying mails. Additionally, this enables the spam classification
in the webApp. We compress the training data vectors
(see clientSpamTrainingDatum) before uploading to our server using
SparseVectorCompressor.ts. When a user has the ClientSpamClassification
enabled, the spam training data sync will happen for every mail
received.
ClientSpamTrainingDatum are not stored in the CacheStorage.
No entityEvents are emitted for this type.
However, we retrieve creations and updates for ClientSpamTrainingData
through the modifiedClientSpamTrainingDataIndex.
We calculate a threshold per classifier based on the dataset ham to spam
ratio, we also subsample our training data to cap the ham to spam ratio
within a certain limit.
Co-authored-by: jomapp <17314077+jomapp@users.noreply.github.com>
Co-authored-by: das <das@tutao.de>
Co-authored-by: abp <abp@tutao.de>
Co-authored-by: Kinan <104761667+kibibytium@users.noreply.github.com>
Co-authored-by: sug <sug@tutao.de>
Co-authored-by: nif <nif@tutao.de>
Co-authored-by: map <mpfau@users.noreply.github.com>
Instead of applying inbox rules based on the unread mail state in the
inbox folder, we introduce the new ProcessingState enum on
the mail type. If a mail has been processed by the leader client, which
is checking for matching inbox rules, the ProcessingState is
updated. If there is a matching rule the flag is updated through the
MoveMailService, if there is no matching rule, the flag is updated
using the ClientClassifierResultService. Both requests are
throttled / debounced. After processing inbox rules, spam prediction
is conducted for mails that have not yet been moved by an inbox rule.
The ProcessingState for not matching ham mails is also updated using
the ClientClassifierResultService.
This new inbox rule handing solves the following two problems:
- when clicking on a notification it could still happen,
that sometimes the inbox rules where not applied
- when the inbox folder had a lot of unread mails, the loading time did
massively increase, since inbox rules were re-applied on every load
Co-authored-by: amm <amm@tutao.de>
Co-authored-by: Nick <nif@tutao.de>
Co-authored-by: das <das@tutao.de>
Co-authored-by: abp <abp@tutao.de>
Co-authored-by: jhm <17314077+jomapp@users.noreply.github.com>
Co-authored-by: map <mpfau@users.noreply.github.com>
Co-authored-by: Kinan <104761667+kibibytium@users.noreply.github.com>
Implement a local machine learning model for client-side spam filtering.
The local model is implemented using tensorflow "LayersModel" to train
separate models in all available mailboxes, resulting in one model
per ownerGroup (i.e. mailbox).
Initially, the training data is aggregated from the last 30 days of
received mails, and the data is stored in a separate offline database
table named spam_classification_training_data. The trained model is
stored in the table spam_classification_model. The initial training
starts after indexing, with periodic training happening
every 30 minutes and on each subsequent login.
The model will predict on incoming mails once we have received the
entity event for said mail, moving it to either inbox or spam folder.
When users move mails, we update the training data labels accordingly,
by adjusting the isSpam classification and isSpamConfidence values in
the offline database. The MoveMailService now contains a moveReason,
which indicates that the mail has been moved by our spam filter.
Client-side spam filtering can be activated using the
SpamClientClassification feature flag, and is for now only
available on the desktop client.
Co-authored-by: sug <sug@tutao.de>
Co-authored-by: kib <104761667+kibibytium@users.noreply.github.com>
Co-authored-by: abp <abp@tutao.de>
Co-authored-by: map <mpfau@users.noreply.github.com>
Co-authored-by: jhm <17314077+jomapp@users.noreply.github.com>
Co-authored-by: frm <frm@tutao.de>
Co-authored-by: das <das@tutao.de>
Co-authored-by: nif <nif@tutao.de>
Co-authored-by: amm <amm@tutao.de>
We do not throw a ProgrammingError when there is a patch mismatch for
type tutanota/File. Additionally, we are not storing instances with
errors to the offline storage and are throwing a ProgrammingError.
Have indexer apply updates before batch is completed
to ensure that we have an up-to-date index (or we will
if updating fails and we retry).
Rename CustomCacheHandler's methods to avoid confusion.
Close#8902
Co-authored-by: paw <paw-hub@users.noreply.github.com>
- Introduce a separate Indexer for SQLite using FTS5
- Split search backends and use the right one based on client (IndexedDB
for Browser, and OfflineStorage everywhere else)
- Split SearchFacade into two implementations
- Adds a table for storing unindexed metadata for mails
- Escape special character for SQLite search
To escape special characters from fts5 syntax. However, simply
surrounding each token in quotes is sufficient to do this.
See section 3.1 "FTS5 Strings" here: https://www.sqlite.org/fts5.html
which states that a string may be specified by surrounding it in
quotes, and that special string requirements only exist for strings
that are not in quotes.
- Add EncryptedDbWrapper
- Simplify out of sync logic in IndexedDbIndexer
- Fix deadlock when initializing IndexedDbIndexer
- Cleanup indexedDb index when migrating to offline storage index
- Pass contactSuggestionFacade to IndexedDbSearchFacade
The only suggestion facade used by IndexedDbSearchFacade was the
contact suggestion facade. So we made it clearer.
- Remove IndexerCore stats
- Split custom cache handlers into separate files
We were already doing this with user, so we should do this with the
other entity types.
- Rewrite IndexedDb tests
- Add OfflineStorage indexer tests
- Add custom cache handlers tests to OfflineStorageTest
- Add tests for custom cache handlers with ephemeral storage
- Use dbStub instead of dbMock in IndexedDbIndexerTest
- Replace spy with testdouble in IndexedDbIndexerTest
Close#8550
Co-authored-by: ivk <ivk@tutao.de>
Co-authored-by: paw <paw-hub@users.noreply.github.com>
Co-authored-by: wrd <wrd@tutao.de>
Co-authored-by: bir <bir@tutao.de>
Co-authored-by: hrb-hub <hrb-hub@users.noreply.github.com>
Passing instances explicitly avoids the situations where some of them
might not be initialized.
We also simplified the entity handling by converting entity updates to
data with resolved types early so that the listening code doesn't have
to deal with it.
We did fix some of the bad test practices, e.g. setting/restoring env
incorrectly. This matters now because accessors for type model
initializers check env.mode.
Co-authored-by: paw <paw-hub@users.noreply.github.com>
We change the usages of resolveServerTypeReference to its client
counterpart in all cache functions except for put, as using the server
type reference only makes sense when putting parsed entities in the
cache storage.
Co-authored-by: jomapp <17314077+jomapp@users.noreply.github.com>
Refactor our instance deserialization/serialization pipeline, both on
TypeScript and on Rust [sdk] to use typeId and attributeIds instead of
typeNames and attributeNames. We furthermore ignore cardinalities
on associations until the instance layer and always
store associations as arrays. This commit introduces **eventual
consistency** on the client, i.e. we are from now on always storing data
in the newest schema format (activeApplicationVersionsForWritingSum)
which ensures that all data is already available on the client after
updating the client to a newer version. This removes the need for
offline migrations on the client and also removes backward migrations
on the server. Furthermore, the server model types are now available
on the client, retrievable through the ApplicationTypesFacade. This is
our first step towards FastSync.
Co-authored-by: nig <nig@tutao.de>
Co-authored-by: abp <abp@tutao.de>
Co-authored-by: jomapp <17314077+jomapp@users.noreply.github.com>
Co-authored-by: map <mpfau@users.noreply.github.com>
Co-authored-by: sug <sug@tutao.de>
Co-authored-by: Kinan <104761667+kibibytium@users.noreply.github.com>
eml and mbox file import in desktop client
pause/resume functionality
chunking for mails and attachments to reduce server requests
testing for imap import source
wisely designed in the alps
Co-authored-by: map <mpfau@users.noreply.github.com>
Co-authored-by: jhm <17314077+jomapp@users.noreply.github.com>
Co-authored-by: Kinan <104761667+kibibytium@users.noreply.github.com>
Co-authored-by: kitsugo <hayashi.jiro@kitsugo.com>
Co-authored-by: nif <nif@tutao.de>
Co-authored-by: sug <sug@tutao.de>
to avoid excessive entity updates and inconsistent offline storages,
we don't send entity updates for each mail set migrated mail.
instead we detect the mail set migration for each folder and drop
its whole mail list from the offline cache.
we could fix up the database when we receive the changed folder,
but that would involve re-doing the migration locally and will
lead to very long entity event handling that might get interrupted,
breaking the offline database anyway.
To sort customIds within the offline database we store customIds
as base64Ext id strings in the offline storage. We want to align
the EphemeralCacheStorage to behave the same, and likewise store
customIds with base64Ext encoding.
we now store custom id entities in the offline storage, which means we need to
make sure storing ranges and comparing ids works for them. in order to achieve
that, we decided to store the normally base64Url-encoded, not lexicographically
sortable ids in the sortable base64Ext format.
the Offline Storage needs to use the "converted" base64Ext ids internally everywhere
for custom id types, but give out ranges and entities in the "raw" base64Url format and
take raw ids as parameters.
to make this easier, we implement the conversion in the public CacheStorage::getRangeForList
implementation and use the private OfflineStorage::getRange method internally.
In order to allow importing of mails we replace legacy MailFolders
(non-static mail listIds) with new MailSets (static mail listIds).
From now on, mails have static mail listIds and static mail elementIds.
To move mails between new MailSets we introduce MailSetEntries
("entries" property on a MailSet), which are index entries sorted by
the received date of the referenced mails (customId). This commit adds
support for new MailSets, while still supporting legacy MailFolders
(mail lists) to support migrating gradually.
* TutanotaModelV74 adds:
* MailSet support
* and defaultAlarmList on GroupSettings
* SystemModelV107 adds model changes for counter (unread mails) updates
* Adapt mail list to show MailSet and legacy mails
The list model is now largely unaware about listIds since it can
display mails from multiple MailBags. MailBags are static mailLists
from which a mail is only removed from when the mail is permanently
deleted.
* Adapt offline storage for mail sets
Offline storage gained the ability to provide cached entities
from a list of ids.