7 Commits

Author SHA1 Message Date
Wei S.
d372ef7566 Feat: Tika Integration and Batch Indexing (#132)
* Feat/tika integration (#94)

* feat(Tika) Integration von Tika zur Textextraktion

* feat(Tika) Integration of Apache Tika for text extraction

* feat(Tika): Complete Tika integration with text extraction and docker-compose setup

- Add Tika service to docker-compose.yml
- Implement text sanitization and document validation
- Improve batch processing with concurrency control

* fix(comments) translated comments into english
fix(docker) removed ports (only used for testing)

* feat(indexing): Implement batch indexing for Meilisearch

This change introduces batch processing for indexing emails into Meilisearch to significantly improve performance and throughput during ingestion. This change is based on the batch processing method previously contributed by @axeldunkel.

Previously, each email was indexed individually, resulting in a high number of separate API calls. This approach was inefficient, especially for large mailboxes.

The `processMailbox` queue worker now accumulates emails into a batch before sending them to the `IndexingService`. The service then uses the `addDocuments` Meilisearch API endpoint to index the entire batch in a single request, reducing network overhead and improving indexing speed.

A new environment variable, `MEILI_INDEXING_BATCH`, has been added to make the batch size configurable, with a default of 500.

Additionally, this commit includes minor refactoring:
- The `TikaService` has been moved to its own dedicated file.
- The `PendingEmail` type has been moved to the shared `@open-archiver/types` package.

* chore(jobs): make continuous sync job scheduling idempotent

Adds a static `jobId` to the repeatable 'schedule-continuous-sync' job.

This prevents duplicate jobs from being scheduled if the server restarts. By providing a unique ID, the queue will update the existing repeatable job instead of creating a new one, ensuring the sync runs only at the configured frequency.

---------

Co-authored-by: axeldunkel <53174090+axeldunkel@users.noreply.github.com>
Co-authored-by: Wayne <5291640+ringoinca@users.noreply.github.com>
2025-09-26 11:34:32 +02:00
Wei S.
94021eab69 v0.3.0 release (#76)
* Remove extra ports in Docker Compose file

* Allow self-assigned cert

* Adding allow insecure cert option

* fix(IMAP): Share connections between each fetch email action

* Update docs: troubleshooting CORS error

---------

Co-authored-by: Wayne <5291640+ringoinca@users.noreply.github.com>
2025-09-01 12:44:22 +03:00
Wayne
8c12cda370 Docker Compose deployment 2025-07-25 15:50:25 +03:00
Wayne
946da7925b Docker deployment 2025-07-24 23:43:38 +03:00
Wayne
3d1feedafb Continuous syncing 2025-07-22 01:51:10 +03:00
Wayne
f4d48a4e5a Job queue management setup 2025-07-12 12:39:41 +03:00
Wayne
f243775ae6 scaffolding 2025-07-10 13:32:54 +03:00