tutanota

mirror of https://github.com/tutao/tutanota.git synced 2025-12-08 06:09:50 +00:00

Author	SHA1	Message	Date
abp	c33591eaca	instantiate and import spam classifier lazily Co-authored-by: das <das@tutao.de>	2025-11-18 17:10:44 +01:00
map	5293be6a4a	Implement spam training data sync and add TutanotaModelV98 We sync the spam training data encrypted through our server to make sure that all clients for a specific user behave the same when classifying mails. Additionally, this enables the spam classification in the webApp. We compress the training data vectors (see clientSpamTrainingDatum) before uploading to our server using SparseVectorCompressor.ts. When a user has the ClientSpamClassification enabled, the spam training data sync will happen for every mail received. ClientSpamTrainingDatum are not stored in the CacheStorage. No entityEvents are emitted for this type. However, we retrieve creations and updates for ClientSpamTrainingData through the modifiedClientSpamTrainingDataIndex. We calculate a threshold per classifier based on the dataset ham to spam ratio, we also subsample our training data to cap the ham to spam ratio within a certain limit. Co-authored-by: jomapp <17314077+jomapp@users.noreply.github.com> Co-authored-by: das <das@tutao.de> Co-authored-by: abp <abp@tutao.de> Co-authored-by: Kinan <104761667+kibibytium@users.noreply.github.com> Co-authored-by: sug <sug@tutao.de> Co-authored-by: nif <nif@tutao.de> Co-authored-by: map <mpfau@users.noreply.github.com>	2025-11-18 13:56:19 +01:00
das	f8bbd32695	Include header fields as tokens in the anti-spam Add the header fields(sender, toRecipients, ccRecipients, bccRecipients, authStatus) to the anti-spam vectors. We also improve some of the preprocessing steps and add offline migrations by deleting old spam tables Co-authored-by: amm@tutao.de Co-authored-by: jhm <17314077+jomapp@users.noreply.github.com>	2025-11-18 10:37:23 +01:00
das	0739a78691	Fix retraining right after initial training. - The field lastTrainedTime was not set during initial training, this led to the spamClassifier retraining on the second login.	2025-10-27 17:52:12 +01:00
abp	4e7c0f2fd5	do not try to train if there is no new data Co-authored-by: map <mpfau@users.noreply.github.com>	2025-10-22 16:44:57 +02:00
abp	5124985d4f	remove DynamicTfVectorizer Co-authored-by: map <mpfau@users.noreply.github.com>	2025-10-22 09:40:46 +02:00
sug	f11e59672e	improve inbox rule handling and run spam prediction after inbox rules Instead of applying inbox rules based on the unread mail state in the inbox folder, we introduce the new ProcessingState enum on the mail type. If a mail has been processed by the leader client, which is checking for matching inbox rules, the ProcessingState is updated. If there is a matching rule the flag is updated through the MoveMailService, if there is no matching rule, the flag is updated using the ClientClassifierResultService. Both requests are throttled / debounced. After processing inbox rules, spam prediction is conducted for mails that have not yet been moved by an inbox rule. The ProcessingState for not matching ham mails is also updated using the ClientClassifierResultService. This new inbox rule handing solves the following two problems: - when clicking on a notification it could still happen, that sometimes the inbox rules where not applied - when the inbox folder had a lot of unread mails, the loading time did massively increase, since inbox rules were re-applied on every load Co-authored-by: amm <amm@tutao.de> Co-authored-by: Nick <nif@tutao.de> Co-authored-by: das <das@tutao.de> Co-authored-by: abp <abp@tutao.de> Co-authored-by: jhm <17314077+jomapp@users.noreply.github.com> Co-authored-by: map <mpfau@users.noreply.github.com> Co-authored-by: Kinan <104761667+kibibytium@users.noreply.github.com>	2025-10-22 09:40:45 +02:00
das	fd22294a18	[antispam] Add client-side local spam filtering Implement a local machine learning model for client-side spam filtering. The local model is implemented using tensorflow "LayersModel" to train separate models in all available mailboxes, resulting in one model per ownerGroup (i.e. mailbox). Initially, the training data is aggregated from the last 30 days of received mails, and the data is stored in a separate offline database table named spam_classification_training_data. The trained model is stored in the table spam_classification_model. The initial training starts after indexing, with periodic training happening every 30 minutes and on each subsequent login. The model will predict on incoming mails once we have received the entity event for said mail, moving it to either inbox or spam folder. When users move mails, we update the training data labels accordingly, by adjusting the isSpam classification and isSpamConfidence values in the offline database. The MoveMailService now contains a moveReason, which indicates that the mail has been moved by our spam filter. Client-side spam filtering can be activated using the SpamClientClassification feature flag, and is for now only available on the desktop client. Co-authored-by: sug <sug@tutao.de> Co-authored-by: kib <104761667+kibibytium@users.noreply.github.com> Co-authored-by: abp <abp@tutao.de> Co-authored-by: map <mpfau@users.noreply.github.com> Co-authored-by: jhm <17314077+jomapp@users.noreply.github.com> Co-authored-by: frm <frm@tutao.de> Co-authored-by: das <das@tutao.de> Co-authored-by: nif <nif@tutao.de> Co-authored-by: amm <amm@tutao.de>	2025-10-22 09:25:20 +02:00

8 commits