We sync the spam training data encrypted through our server to make
sure that all clients for a specific user behave the same when
classifying mails. Additionally, this enables the spam classification
in the webApp. We compress the training data vectors
(see clientSpamTrainingDatum) before uploading to our server using
SparseVectorCompressor.ts. When a user has the ClientSpamClassification
enabled, the spam training data sync will happen for every mail
received.
ClientSpamTrainingDatum are not stored in the CacheStorage.
No entityEvents are emitted for this type.
However, we retrieve creations and updates for ClientSpamTrainingData
through the modifiedClientSpamTrainingDataIndex.
We calculate a threshold per classifier based on the dataset ham to spam
ratio, we also subsample our training data to cap the ham to spam ratio
within a certain limit.
Co-authored-by: jomapp <17314077+jomapp@users.noreply.github.com>
Co-authored-by: das <das@tutao.de>
Co-authored-by: abp <abp@tutao.de>
Co-authored-by: Kinan <104761667+kibibytium@users.noreply.github.com>
Co-authored-by: sug <sug@tutao.de>
Co-authored-by: nif <nif@tutao.de>
Co-authored-by: map <mpfau@users.noreply.github.com>
Add the header fields(sender, toRecipients, ccRecipients, bccRecipients,
authStatus) to the anti-spam vectors. We also improve some of the
preprocessing steps and add offline migrations by deleting old spam
tables
Co-authored-by: amm@tutao.de
Co-authored-by: jhm <17314077+jomapp@users.noreply.github.com>
Instead of applying inbox rules based on the unread mail state in the
inbox folder, we introduce the new ProcessingState enum on
the mail type. If a mail has been processed by the leader client, which
is checking for matching inbox rules, the ProcessingState is
updated. If there is a matching rule the flag is updated through the
MoveMailService, if there is no matching rule, the flag is updated
using the ClientClassifierResultService. Both requests are
throttled / debounced. After processing inbox rules, spam prediction
is conducted for mails that have not yet been moved by an inbox rule.
The ProcessingState for not matching ham mails is also updated using
the ClientClassifierResultService.
This new inbox rule handing solves the following two problems:
- when clicking on a notification it could still happen,
that sometimes the inbox rules where not applied
- when the inbox folder had a lot of unread mails, the loading time did
massively increase, since inbox rules were re-applied on every load
Co-authored-by: amm <amm@tutao.de>
Co-authored-by: Nick <nif@tutao.de>
Co-authored-by: das <das@tutao.de>
Co-authored-by: abp <abp@tutao.de>
Co-authored-by: jhm <17314077+jomapp@users.noreply.github.com>
Co-authored-by: map <mpfau@users.noreply.github.com>
Co-authored-by: Kinan <104761667+kibibytium@users.noreply.github.com>
Implement a local machine learning model for client-side spam filtering.
The local model is implemented using tensorflow "LayersModel" to train
separate models in all available mailboxes, resulting in one model
per ownerGroup (i.e. mailbox).
Initially, the training data is aggregated from the last 30 days of
received mails, and the data is stored in a separate offline database
table named spam_classification_training_data. The trained model is
stored in the table spam_classification_model. The initial training
starts after indexing, with periodic training happening
every 30 minutes and on each subsequent login.
The model will predict on incoming mails once we have received the
entity event for said mail, moving it to either inbox or spam folder.
When users move mails, we update the training data labels accordingly,
by adjusting the isSpam classification and isSpamConfidence values in
the offline database. The MoveMailService now contains a moveReason,
which indicates that the mail has been moved by our spam filter.
Client-side spam filtering can be activated using the
SpamClientClassification feature flag, and is for now only
available on the desktop client.
Co-authored-by: sug <sug@tutao.de>
Co-authored-by: kib <104761667+kibibytium@users.noreply.github.com>
Co-authored-by: abp <abp@tutao.de>
Co-authored-by: map <mpfau@users.noreply.github.com>
Co-authored-by: jhm <17314077+jomapp@users.noreply.github.com>
Co-authored-by: frm <frm@tutao.de>
Co-authored-by: das <das@tutao.de>
Co-authored-by: nif <nif@tutao.de>
Co-authored-by: amm <amm@tutao.de>