Compare commits

...

14 Commits

Author SHA1 Message Date
Wei S.
3434e8d6ef v0.4.2-fix: improve ingestion error handling and error messages (#312)
* fix(backend): improve ingestion error handling and error messages

This commit introduces a "force delete" mechanism for Ingestion Sources and improves error messages for file-based connectors.

Changes:
- Update `IngestionService.delete` to accept a `force` flag, bypassing the `checkDeletionEnabled` check.
- Use `force` deletion when rolling back failed ingestion source creations (e.g., decryption errors or connection failures) to ensure cleanup even if deletion is globally disabled.
- Enhance error messages in `EMLConnector`, `MboxConnector`, and `PSTConnector` to distinguish between missing local files and failed uploads, providing more specific feedback to the user.

* feat(ingestion): optimize duplicate handling and fix race conditions in Google Workspace

- Implement fast duplicate check (by Message-ID) to skip full content download for existing emails in Google Workspace and IMAP connectors.
- Fix race condition in Google Workspace initial import by capturing `historyId` before listing messages, ensuring no data loss for incoming mail during import.
2026-02-24 18:10:32 +01:00
Wei S.
7dac3b2bfd V0.4.2 (#310)
* fix(api): correct API key generation and proxy handling

This commit resolves an issue where generating a new API key would fail. The root cause was improper handling of POST request bodies in the frontend proxy server.

- Refactored `ApiKeyController` methods to use arrow functions to ensure correct `this` binding.

* User profile/account page, change password, API

* docs(api): update ingestion source provider values

Update the `CreateIngestionSourceDto` documentation in `ingestion.md` to reflect the current set of supported providers.

* updating tag

* feat: add REDIS_USER env variable (#172)

* feat: add REDIS_USER env variable

fixes #171

* add proper type for bullmq config

* Bulgarian UI language strings added (backend+frontend) (#194)

* Bulgarian UI Support added

* BG language UI support - Create translation.json

* update redis config logic

* Update Bulgarian language setting, register language

* Allow specifying local file path for mbox/eml/pst (#214)

* Add agents AI doc

* Allow local file path for Mbox file ingestion


---------

Co-authored-by: Wei S. <5291640+wayneshn@users.noreply.github.com>

* feat(ingestion): add local file path support and optimize EML processing

- Frontend: Updated IngestionSourceForm to allow toggling between "Upload File" and "Local File Path" for PST, EML, and Mbox providers.
- Frontend: Added logic to clear irrelevant form data when switching import methods.
- Frontend: Added English translations for new form fields.
- Backend: Refactored EMLConnector to stream ZIP entries using yauzl instead of extracting the full archive to disk, significantly improving efficiency for large archives.
- Docs: Updated API documentation and User Guides (PST, EML, Mbox) to clarify "Local File Path" usage, specifically within Docker environments.

* docs: add meilisearch dumpless upgrade guide and snapshot config

Update `docker-compose.yml` to include the `MEILI_SCHEDULE_SNAPSHOT` environment variable, defaulting to 86400 seconds (24 hours), enabling periodic data snapshots for easier recovery. Shout out to @morph027 for the inspiration!

Additionally, update the Meilisearch upgrade documentation to include an experimental "dumpless" upgrade guide while marking the previous method as the standard recommended process.

* build(coolify): enable daily snapshots for meilisearch

Configure the Meilisearch service in `open-archiver.yml` to create snapshots every 86400 seconds (24 hours) by setting the `MEILI_SCHEDULE_SNAPSHOT` environment variable.

---------

Co-authored-by: Antonia Schwennesen <53372671+zophiana@users.noreply.github.com>
Co-authored-by: IT Creativity + Art Team <admin@it-playground.net>
Co-authored-by: Jan Berdajs <mrbrdo@gmail.com>
2026-02-23 21:25:44 +01:00
albanobattistella
cf121989ae Update Italian linguage (#278) 2026-01-18 15:28:20 +01:00
Wei S.
2df5c9240d V0.4.1 dev (#276)
* fix(api): correct API key generation and proxy handling

This commit resolves an issue where generating a new API key would fail. The root cause was improper handling of POST request bodies in the frontend proxy server.

- Refactored `ApiKeyController` methods to use arrow functions to ensure correct `this` binding.

* User profile/account page, change password, API

* docs(api): update ingestion source provider values

Update the `CreateIngestionSourceDto` documentation in `ingestion.md` to reflect the current set of supported providers.

* updating tag
2026-01-17 13:21:01 +01:00
Wei S.
24afd13858 V0.4.1: API key generation fix, change password, account profile (#273)
* fix(api): correct API key generation and proxy handling

This commit resolves an issue where generating a new API key would fail. The root cause was improper handling of POST request bodies in the frontend proxy server.

- Refactored `ApiKeyController` methods to use arrow functions to ensure correct `this` binding.

* User profile/account page, change password, API

* docs(api): update ingestion source provider values

Update the `CreateIngestionSourceDto` documentation in `ingestion.md` to reflect the current set of supported providers.
2026-01-17 02:46:27 +02:00
Wei S.
c2006dfa94 V0.4 fix 2 (#210)
* formatting code

* Remove uninstalled packages

* fix(imap): Improve IMAP connection stability and error handling

This commit refactors the IMAP connector to enhance connection management, error handling, and overall stability during email ingestion.

The `isConnected` flag has been removed in favor of relying directly on the `client.usable` property from the `imapflow` library. This simplifies the connection logic and avoids state synchronization issues.

The `connect` method now re-creates the client instance if it's not usable, ensuring a fresh connection after errors or disconnects. The retry mechanism (`withRetry`) has been updated to no longer manually reset the connection state, as the `connect` method now handles this automatically on the next attempt.

Additionally, a minor bug in the `sync-cycle-finished` processor has been fixed. The logic for merging sync states from successful jobs has been simplified and correctly typed, preventing potential runtime errors when no successful jobs are present.

---------

Co-authored-by: Wayne <5291640+ringoinca@users.noreply.github.com>
2025-10-29 12:59:19 +01:00
Wei S.
399059a773 V0.4 fix 2 (#207)
* formatting code

* Remove uninstalled packages

---------

Co-authored-by: Wayne <5291640+ringoinca@users.noreply.github.com>
2025-10-28 13:39:09 +01:00
Wei S.
0cff788656 formatting code (#206)
Co-authored-by: Wayne <5291640+ringoinca@users.noreply.github.com>
2025-10-28 13:35:53 +01:00
Wei S.
ddb4d56107 V0.4.0 fix (#205)
* Jobs page responsive fix

* feat(ingestion): Refactor email indexing into a dedicated background job

This commit refactors the email indexing process to improve the performance and reliability of the ingestion pipeline.

Previously, email indexing was performed synchronously within the mailbox processing job. This could lead to timeouts and failed ingestion cycles if the indexing step was slow or encountered errors.

To address this, the indexing logic has been moved into a separate, dedicated background job queue (`indexingQueue`). Now, the mailbox processor simply adds a batch of emails to this queue. A separate worker then processes the indexing job asynchronously.

This decoupling makes the ingestion process more robust:
- It prevents slow indexing from blocking or failing the entire mailbox sync.
- It allows for better resource management and scalability by handling indexing in a dedicated process.
- It improves error handling, as a failed indexing job can be retried independently without affecting the main ingestion flow.

Additionally, this commit includes minor documentation updates and removes a premature timeout in the PDF text extraction helper that was causing issues.

* remove uninstalled packages

---------

Co-authored-by: Wayne <5291640+ringoinca@users.noreply.github.com>
2025-10-28 13:19:56 +01:00
Wei S.
42b0f6e5f1 V0.4.0 fix (#204)
* Jobs page responsive fix

* feat(ingestion): Refactor email indexing into a dedicated background job

This commit refactors the email indexing process to improve the performance and reliability of the ingestion pipeline.

Previously, email indexing was performed synchronously within the mailbox processing job. This could lead to timeouts and failed ingestion cycles if the indexing step was slow or encountered errors.

To address this, the indexing logic has been moved into a separate, dedicated background job queue (`indexingQueue`). Now, the mailbox processor simply adds a batch of emails to this queue. A separate worker then processes the indexing job asynchronously.

This decoupling makes the ingestion process more robust:
- It prevents slow indexing from blocking or failing the entire mailbox sync.
- It allows for better resource management and scalability by handling indexing in a dedicated process.
- It improves error handling, as a failed indexing job can be retried independently without affecting the main ingestion flow.

Additionally, this commit includes minor documentation updates and removes a premature timeout in the PDF text extraction helper that was causing issues.

---------

Co-authored-by: Wayne <5291640+ringoinca@users.noreply.github.com>
2025-10-28 13:14:43 +01:00
Wei S.
6e1ebbbfd7 v0.4 init: File encryption, integrity report, deletion protection, job monitoring (#187)
* open-core setup, adding enterprise package

* enterprise: Audit log API, UI

* Audit-log docs

* feat: Integrity report, allowing users to verify the integrity of archived emails and their attachments.

- When an email is archived, Open Archiver calculates a unique cryptographic signature (a SHA256 hash) for the email's raw `.eml` file and for each of its attachments. These signatures are stored in the database alongside the email's metadata.
- The integrity check feature recalculates these signatures for the stored files and compares them to the original signatures stored in the database. This process allows you to verify that the content of your archived emails has not been altered, corrupted, or tampered with since the moment they were archived.
- Add docs of Integrity report

* Update Docker-compose.yml to use bind mount for Open Archiver data.
Fix API rate-limiter warning about trust proxy

* File encryption support

* Scope attachment deduplication to ingestion source

Previously, attachment deduplication was handled globally by enforcing a unique constraint on the content hash (contentHashSha256) in the `attachments` table. This caused an issue where an attachment from one ingestion source would be incorrectly linked if the same attachment was processed by a different source.

This commit refactors the deduplication logic to be scoped on a per-ingestion-source basis.

Changes:
-   **Schema:** The `attachments` table schema has been updated to include a nullable `ingestionSourceId` column. A composite unique index has been added on `(ingestionSourceId, contentHashSha256)` to enforce per-source uniqueness. The `ingestionSourceId` is nullable to ensure backward compatibility with existing databases.
-   **Ingestion Logic:** The `IngestionService` has been updated to provide the `ingestionSourceId` when inserting attachment records. The `onConflictDoUpdate` clause now targets the new composite key, ensuring that attachments are only considered duplicates if they have the same hash and originate from the same ingestion source.

* Scope attachment deduplication to ingestion source

Previously, attachment deduplication was handled globally by enforcing a unique constraint on the content hash (contentHashSha256) in the `attachments` table. This caused an issue where an attachment from one ingestion source would be incorrectly linked if the same attachment was processed by a different source.

This commit refactors the deduplication logic to be scoped on a per-ingestion-source basis.

Changes:
-   **Schema:** The `attachments` table schema has been updated to include a nullable `ingestionSourceId` column. A composite unique index has been added on `(ingestionSourceId, contentHashSha256)` to enforce per-source uniqueness. The `ingestionSourceId` is nullable to ensure backward compatibility with existing databases.
-   **Ingestion Logic:** The `IngestionService` has been updated to provide the `ingestionSourceId` when inserting attachment records. The `onConflictDoUpdate` clause now targets the new composite key, ensuring that attachments are only considered duplicates if they have the same hash and originate from the same ingestion source.

* Add option to disable deletions

This commit introduces a new feature that allows admins to disable the deletion of emails and ingestion sources for the entire instance. This is a critical feature for compliance and data retention, as it prevents accidental or unauthorized deletions.

Changes:
-   **Configuration**: Added an `ENABLE_DELETION` environment variable. If this variable is not set to `true`, all deletion operations will be disabled.
-   **Deletion Guard**: A centralized `checkDeletionEnabled` guard has been implemented to enforce this setting at both the controller and service levels, ensuring a robust and secure implementation.
-   **Documentation**: The installation guide has been updated to include the new `ENABLE_DELETION` environment variable and its behavior.
-   **Refactor**: The `IngestionService`'s `create` method was refactored to remove unnecessary calls to the `delete` method, simplifying the code and improving its robustness.

* Adding position for menu items

* feat(docker): Fix CORS errors

This commit fixes CORS errors when running the app in Docker by introducing the `APP_URL` environment variable. A CORS policy is set up for the backend to only allow origin from the `APP_URL`.

Key changes include:
- New `APP_URL` and `ORIGIN` environment variables have been added to properly configure CORS and the SvelteKit adapter, making the application's public URL easily configurable.
- Dockerfiles are updated to copy the entrypoint script, Drizzle config, and migration files into the final image.
- Documentation and example files (`.env.example`, `docker-compose.yml`) have been updated to reflect these changes.

* feat(attachments): De-duplicate attachment content by content hash

This commit refactors attachment handling to allow multiple emails within the same ingestion source to reference attachments with identical content (same hash).

Changes:
- The unique index on the `attachments` table has been changed to a non-unique index to permit duplicate hash/source pairs.
- The ingestion logic is updated to first check for an existing attachment with the same hash and source. If found, it reuses the existing record; otherwise, it creates a new one. This maintains storage de-duplication.
- The email deletion logic is improved to be more robust. It now correctly removes the email-attachment link before checking if the attachment record and its corresponding file can be safely deleted.

* Not filtering our Trash folder

* feat(backend): Add BullMQ dashboard for job monitoring

This commit introduces a web-based UI for monitoring and managing background jobs using Bullmq.

Key changes:
- A new `/api/v1/jobs` endpoint is created, serving the Bull Board dashboard. Access is restricted to authenticated administrators.
- All BullMQ queue definitions (`ingestion`, `indexing`, `sync-scheduler`) have been centralized into a new `packages/backend/src/jobs/queues.ts` file.
- Workers and services now import queue instances from this central file, improving code organization and removing redundant queue instantiations.

* Add `ALL_INCLUSIVE_ARCHIVE` environment variable to disable jun filtering

* Using BSL license

* frontend: Responsive design for menu bar, pagination

* License service/module

* Remove demoMode logic

* Formatting code

* Remove enterprise packages

* Fix package.json in packages

* Search page responsive fix

---------

Co-authored-by: Wayne <5291640+ringoinca@users.noreply.github.com>
2025-10-24 17:11:05 +02:00
Wei S.
1e048fdbc1 Update package.json 2025-09-26 17:06:40 +02:00
Wei S.
b71dd55e25 add OCR docs (#144)
Co-authored-by: Wayne <5291640+ringoinca@users.noreply.github.com>
2025-09-26 12:09:23 +02:00
Wei S.
d372ef7566 Feat: Tika Integration and Batch Indexing (#132)
* Feat/tika integration (#94)

* feat(Tika) Integration von Tika zur Textextraktion

* feat(Tika) Integration of Apache Tika for text extraction

* feat(Tika): Complete Tika integration with text extraction and docker-compose setup

- Add Tika service to docker-compose.yml
- Implement text sanitization and document validation
- Improve batch processing with concurrency control

* fix(comments) translated comments into english
fix(docker) removed ports (only used for testing)

* feat(indexing): Implement batch indexing for Meilisearch

This change introduces batch processing for indexing emails into Meilisearch to significantly improve performance and throughput during ingestion. This change is based on the batch processing method previously contributed by @axeldunkel.

Previously, each email was indexed individually, resulting in a high number of separate API calls. This approach was inefficient, especially for large mailboxes.

The `processMailbox` queue worker now accumulates emails into a batch before sending them to the `IndexingService`. The service then uses the `addDocuments` Meilisearch API endpoint to index the entire batch in a single request, reducing network overhead and improving indexing speed.

A new environment variable, `MEILI_INDEXING_BATCH`, has been added to make the batch size configurable, with a default of 500.

Additionally, this commit includes minor refactoring:
- The `TikaService` has been moved to its own dedicated file.
- The `PendingEmail` type has been moved to the shared `@open-archiver/types` package.

* chore(jobs): make continuous sync job scheduling idempotent

Adds a static `jobId` to the repeatable 'schedule-continuous-sync' job.

This prevents duplicate jobs from being scheduled if the server restarts. By providing a unique ID, the queue will update the existing repeatable job instead of creating a new one, ensuring the sync runs only at the configured frequency.

---------

Co-authored-by: axeldunkel <53174090+axeldunkel@users.noreply.github.com>
Co-authored-by: Wayne <5291640+ringoinca@users.noreply.github.com>
2025-09-26 11:34:32 +02:00
157 changed files with 12443 additions and 3342 deletions

View File

@@ -4,8 +4,15 @@
NODE_ENV=development
PORT_BACKEND=4000
PORT_FRONTEND=3000
# The public-facing URL of your application. This is used by the backend to configure CORS.
APP_URL=http://localhost:3000
# This is used by the SvelteKit Node adapter to determine the server's public-facing URL.
# It should always be set to the value of APP_URL.
ORIGIN=$APP_URL
# The frequency of continuous email syncing. Default is every minutes, but you can change it to another value based on your needs.
SYNC_FREQUENCY='* * * * *'
# Set to 'true' to include Junk and Trash folders in the email archive. Defaults to false.
ALL_INCLUSIVE_ARCHIVE=false
# --- Docker Compose Service Configuration ---
# These variables are used by docker-compose.yml to configure the services. Leave them unchanged if you use Docker services for Postgresql, Valkey (Redis) and Meilisearch. If you decide to use your own instances of these services, you can substitute them with your own connection credentials.
@@ -19,7 +26,8 @@ DATABASE_URL="postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/$
# Meilisearch
MEILI_MASTER_KEY=aSampleMasterKey
MEILI_HOST=http://meilisearch:7700
# The number of emails to batch together for indexing. Defaults to 500.
MEILI_INDEXING_BATCH=500
# Redis (We use Valkey, which is Redis-compatible and open source)
@@ -28,6 +36,8 @@ REDIS_PORT=6379
REDIS_PASSWORD=defaultredispassword
# If you run Valkey service from Docker Compose, set the REDIS_TLS_ENABLED variable to false.
REDIS_TLS_ENABLED=false
# Redis username. Only required if not using the default user.
REDIS_USER=notdefaultuser
# --- Storage Settings ---
@@ -39,7 +49,9 @@ BODY_SIZE_LIMIT=100M
# --- Local Storage Settings ---
# The path inside the container where files will be stored.
# This is mapped to a Docker volume for persistence.
# This is only used if STORAGE_TYPE is 'local'.
# This is not an optional variable, it is where the Open Archiver service stores application data. Set this even if you are using S3 storage.
# Make sure the user that runs the Open Archiver service has read and write access to this path.
# Important: It is recommended to create this path manually before installation, otherwise you may face permission and ownership problems.
STORAGE_LOCAL_ROOT_PATH=/var/data/open-archiver
# --- S3-Compatible Storage Settings ---
@@ -52,14 +64,26 @@ STORAGE_S3_REGION=
# Set to 'true' for MinIO and other non-AWS S3 services
STORAGE_S3_FORCE_PATH_STYLE=false
# --- Storage Encryption ---
# IMPORTANT: Generate a secure, random 32-byte hex string for this key.
# You can use `openssl rand -hex 32` to generate a key.
# This key is used for AES-256 encryption of files at rest.
# This is an optional variable, if not set, files will not be encrypted.
STORAGE_ENCRYPTION_KEY=
# --- Security & Authentication ---
# Enable or disable deletion of emails and ingestion sources. Defaults to false.
ENABLE_DELETION=false
# Rate Limiting
# The window in milliseconds for which API requests are checked. Defaults to 60000 (1 minute).
RATE_LIMIT_WINDOW_MS=60000
# The maximum number of API requests allowed from an IP within the window. Defaults to 100.
RATE_LIMIT_MAX_REQUESTS=100
# JWT
# IMPORTANT: Change this to a long, random, and secret string in your .env file
JWT_SECRET=a-very-secret-key-that-you-should-change
@@ -70,3 +94,7 @@ JWT_EXPIRES_IN="7d"
# IMPORTANT: Generate a secure, random 32-byte hex string for this
# You can use `openssl rand -hex 32` to generate a key.
ENCRYPTION_KEY=
# Apache Tika Integration
# ONLY active if TIKA_URL is set
TIKA_URL=http://tika:9998

View File

@@ -4,7 +4,6 @@ about: Create a report to help us improve
title: ''
labels: bug
assignees: ''
---
**Describe the bug**
@@ -12,9 +11,10 @@ A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
5. See error
3. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
@@ -23,7 +23,8 @@ A clear and concise description of what you expected to happen.
If applicable, add screenshots to help explain your problem.
**System:**
- Open Archiver Version:
- Open Archiver Version:
**Relevant logs:**
Any relevant logs (Redact sensitive information)

View File

@@ -4,11 +4,10 @@ about: Suggest an idea for this project
title: ''
labels: enhancement
assignees: ''
---
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is.
A clear and concise description of what the problem is.
**Describe the solution you'd like**
A clear and concise description of what you want to happen.

View File

@@ -35,7 +35,7 @@ jobs:
uses: docker/build-push-action@v6
with:
context: .
file: ./docker/Dockerfile
file: ./apps/open-archiver/Dockerfile
platforms: linux/amd64,linux/arm64
push: true
tags: logiclabshq/open-archiver:${{ steps.sha.outputs.sha }}

4
.gitignore vendored
View File

@@ -24,3 +24,7 @@ pnpm-debug.log
# Vitepress
docs/.vitepress/dist
docs/.vitepress/cache
# TS
**/tsconfig.tsbuildinfo

140
LICENSE
View File

@@ -200,23 +200,23 @@ You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:
- **a)** The work must carry prominent notices stating that you modified
it, and giving a relevant date.
- **b)** The work must carry prominent notices stating that it is
released under this License and any conditions added under section 7.
This requirement modifies the requirement in section 4 to
“keep intact all notices”.
- **c)** You must license the entire work, as a whole, under this
License to anyone who comes into possession of a copy. This
License will therefore apply, along with any applicable section 7
additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no
permission to license the work in any other way, but it does not
invalidate such permission if you have separately received it.
- **d)** If the work has interactive user interfaces, each must display
Appropriate Legal Notices; however, if the Program has interactive
interfaces that do not display Appropriate Legal Notices, your
work need not make them do so.
- **a)** The work must carry prominent notices stating that you modified
it, and giving a relevant date.
- **b)** The work must carry prominent notices stating that it is
released under this License and any conditions added under section 7.
This requirement modifies the requirement in section 4 to
“keep intact all notices”.
- **c)** You must license the entire work, as a whole, under this
License to anyone who comes into possession of a copy. This
License will therefore apply, along with any applicable section 7
additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no
permission to license the work in any other way, but it does not
invalidate such permission if you have separately received it.
- **d)** If the work has interactive user interfaces, each must display
Appropriate Legal Notices; however, if the Program has interactive
interfaces that do not display Appropriate Legal Notices, your
work need not make them do so.
A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
@@ -235,42 +235,42 @@ of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:
- **a)** Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium
customarily used for software interchange.
- **b)** Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a
written offer, valid for at least three years and valid for as
long as you offer spare parts or customer support for that product
model, to give anyone who possesses the object code either **(1)** a
copy of the Corresponding Source for all the software in the
product that is covered by this License, on a durable physical
medium customarily used for software interchange, for a price no
more than your reasonable cost of physically performing this
conveying of source, or **(2)** access to copy the
Corresponding Source from a network server at no charge.
- **c)** Convey individual copies of the object code with a copy of the
written offer to provide the Corresponding Source. This
alternative is allowed only occasionally and noncommercially, and
only if you received the object code with such an offer, in accord
with subsection 6b.
- **d)** Convey the object code by offering access from a designated
place (gratis or for a charge), and offer equivalent access to the
Corresponding Source in the same way through the same place at no
further charge. You need not require recipients to copy the
Corresponding Source along with the object code. If the place to
copy the object code is a network server, the Corresponding Source
may be on a different server (operated by you or a third party)
that supports equivalent copying facilities, provided you maintain
clear directions next to the object code saying where to find the
Corresponding Source. Regardless of what server hosts the
Corresponding Source, you remain obligated to ensure that it is
available for as long as needed to satisfy these requirements.
- **e)** Convey the object code using peer-to-peer transmission, provided
you inform other peers where the object code and Corresponding
Source of the work are being offered to the general public at no
charge under subsection 6d.
- **a)** Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium
customarily used for software interchange.
- **b)** Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a
written offer, valid for at least three years and valid for as
long as you offer spare parts or customer support for that product
model, to give anyone who possesses the object code either **(1)** a
copy of the Corresponding Source for all the software in the
product that is covered by this License, on a durable physical
medium customarily used for software interchange, for a price no
more than your reasonable cost of physically performing this
conveying of source, or **(2)** access to copy the
Corresponding Source from a network server at no charge.
- **c)** Convey individual copies of the object code with a copy of the
written offer to provide the Corresponding Source. This
alternative is allowed only occasionally and noncommercially, and
only if you received the object code with such an offer, in accord
with subsection 6b.
- **d)** Convey the object code by offering access from a designated
place (gratis or for a charge), and offer equivalent access to the
Corresponding Source in the same way through the same place at no
further charge. You need not require recipients to copy the
Corresponding Source along with the object code. If the place to
copy the object code is a network server, the Corresponding Source
may be on a different server (operated by you or a third party)
that supports equivalent copying facilities, provided you maintain
clear directions next to the object code saying where to find the
Corresponding Source. Regardless of what server hosts the
Corresponding Source, you remain obligated to ensure that it is
available for as long as needed to satisfy these requirements.
- **e)** Convey the object code using peer-to-peer transmission, provided
you inform other peers where the object code and Corresponding
Source of the work are being offered to the general public at no
charge under subsection 6d.
A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
@@ -344,23 +344,23 @@ Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:
- **a)** Disclaiming warranty or limiting liability differently from the
terms of sections 15 and 16 of this License; or
- **b)** Requiring preservation of specified reasonable legal notices or
author attributions in that material or in the Appropriate Legal
Notices displayed by works containing it; or
- **c)** Prohibiting misrepresentation of the origin of that material, or
requiring that modified versions of such material be marked in
reasonable ways as different from the original version; or
- **d)** Limiting the use for publicity purposes of names of licensors or
authors of the material; or
- **e)** Declining to grant rights under trademark law for use of some
trade names, trademarks, or service marks; or
- **f)** Requiring indemnification of licensors and authors of that
material by anyone who conveys the material (or modified versions of
it) with contractual assumptions of liability to the recipient, for
any liability that these contractual assumptions directly impose on
those licensors and authors.
- **a)** Disclaiming warranty or limiting liability differently from the
terms of sections 15 and 16 of this License; or
- **b)** Requiring preservation of specified reasonable legal notices or
author attributions in that material or in the Appropriate Legal
Notices displayed by works containing it; or
- **c)** Prohibiting misrepresentation of the origin of that material, or
requiring that modified versions of such material be marked in
reasonable ways as different from the original version; or
- **d)** Limiting the use for publicity purposes of names of licensors or
authors of the material; or
- **e)** Declining to grant rights under trademark law for use of some
trade names, trademarks, or service marks; or
- **f)** Requiring indemnification of licensors and authors of that
material by anyone who conveys the material (or modified versions of
it) with contractual assumptions of liability to the recipient, for
any liability that these contractual assumptions directly impose on
those licensors and authors.
All other non-permissive additional terms are considered “further
restrictions” within the meaning of section 10. If the Program as you

View File

@@ -7,7 +7,7 @@
[![Redis](https://img.shields.io/badge/Redis-DC382D?style=for-the-badge&logo=redis&logoColor=white)](https://redis.io)
[![SvelteKit](https://img.shields.io/badge/SvelteKit-FF3E00?style=for-the-badge&logo=svelte&logoColor=white)](https://svelte.dev/)
**A secure, sovereign, and open-source platform for email archiving and eDiscovery.**
**A secure, sovereign, and open-source platform for email archiving.**
Open Archiver provides a robust, self-hosted solution for archiving, storing, indexing, and searching emails from major platforms, including Google Workspace (Gmail), Microsoft 365, PST files, as well as generic IMAP-enabled email inboxes. Use Open Archiver to keep a permanent, tamper-proof record of your communication history, free from vendor lock-in.
@@ -48,13 +48,14 @@ Password: openarchiver_demo
- Zipped .eml files
- Mbox files
- **Secure & Efficient Storage**: Emails are stored in the standard `.eml` format. The system uses deduplication and compression to minimize storage costs. All data is encrypted at rest.
- **Secure & Efficient Storage**: Emails are stored in the standard `.eml` format. The system uses deduplication and compression to minimize storage costs. All files are encrypted at rest.
- **Pluggable Storage Backends**: Support both local filesystem storage and S3-compatible object storage (like AWS S3 or MinIO).
- **Powerful Search & eDiscovery**: A high-performance search engine indexes the full text of emails and attachments (PDF, DOCX, etc.).
- **Thread discovery**: The ability to discover if an email belongs to a thread/conversation and present the context.
- **Compliance & Retention**: Define granular retention policies to automatically manage the lifecycle of your data. Place legal holds on communications to prevent deletion during litigation (TBD).
- **File Hash and Encryption**: Email and attachment file hash values are stored in the meta database upon ingestion, meaning any attempt to alter the file content will be identified, ensuring legal and regulatory compliance.
- **Comprehensive Auditing**: An immutable audit trail logs all system activities, ensuring you have a clear record of who accessed what and when (TBD).
- - Each archived email comes with an "Integrity Report" feature that indicates if the files are original.
- **Comprehensive Auditing**: An immutable audit trail logs all system activities, ensuring you have a clear record of who accessed what and when.
## 🛠️ Tech Stack

View File

@@ -1,4 +1,4 @@
# Dockerfile for Open Archiver
# Dockerfile for the OSS version of Open Archiver
ARG BASE_IMAGE=node:22-alpine
@@ -15,12 +15,13 @@ COPY package.json pnpm-workspace.yaml pnpm-lock.yaml* ./
COPY packages/backend/package.json ./packages/backend/
COPY packages/frontend/package.json ./packages/frontend/
COPY packages/types/package.json ./packages/types/
COPY apps/open-archiver/package.json ./apps/open-archiver/
# 1. Build Stage: Install all dependencies and build the project
FROM base AS build
COPY packages/frontend/svelte.config.js ./packages/frontend/
# Install all dependencies. Use --shamefully-hoist to create a flat node_modules structure
# Install all dependencies.
ENV PNPM_HOME="/pnpm"
RUN --mount=type=cache,id=pnpm,target=/pnpm/store \
pnpm install --shamefully-hoist --frozen-lockfile --prod=false
@@ -28,19 +29,19 @@ RUN --mount=type=cache,id=pnpm,target=/pnpm/store \
# Copy the rest of the source code
COPY . .
# Build all packages.
RUN pnpm build
# Build the OSS packages.
RUN pnpm build:oss
# 2. Production Stage: Install only production dependencies and copy built artifacts
FROM base AS production
# Copy built application from build stage
COPY --from=build /app/packages/backend/dist ./packages/backend/dist
COPY --from=build /app/packages/frontend/build ./packages/frontend/build
COPY --from=build /app/packages/types/dist ./packages/types/dist
COPY --from=build /app/packages/backend/drizzle.config.ts ./packages/backend/drizzle.config.ts
COPY --from=build /app/packages/backend/src/database/migrations ./packages/backend/src/database/migrations
COPY --from=build /app/packages/frontend/build ./packages/frontend/build
COPY --from=build /app/packages/types/dist ./packages/types/dist
COPY --from=build /app/apps/open-archiver/dist ./apps/open-archiver/dist
# Copy the entrypoint script and make it executable
COPY docker/docker-entrypoint.sh /usr/local/bin/
@@ -53,4 +54,4 @@ EXPOSE 3000
ENTRYPOINT ["docker-entrypoint.sh"]
# Start the application
CMD ["pnpm", "docker-start"]
CMD ["pnpm", "docker-start:oss"]

View File

@@ -0,0 +1,24 @@
import { createServer, logger } from '@open-archiver/backend';
import * as dotenv from 'dotenv';
dotenv.config();
async function start() {
// --- Environment Variable Validation ---
const { PORT_BACKEND } = process.env;
if (!PORT_BACKEND) {
throw new Error('Missing required environment variables for the backend: PORT_BACKEND.');
}
// Create the server instance (passing no modules for the default OSS version)
const app = await createServer([]);
app.listen(PORT_BACKEND, () => {
logger.info({}, `✅ Open Archiver (OSS) running on port ${PORT_BACKEND}`);
});
}
start().catch((error) => {
logger.error({ error }, 'Failed to start the server:', error);
process.exit(1);
});

View File

@@ -0,0 +1,18 @@
{
"name": "open-archiver-app",
"version": "1.0.0",
"private": true,
"scripts": {
"dev": "ts-node-dev --respawn --transpile-only index.ts",
"build": "tsc",
"start": "node dist/index.js"
},
"dependencies": {
"@open-archiver/backend": "workspace:*",
"dotenv": "^17.2.0"
},
"devDependencies": {
"@types/dotenv": "^8.2.3",
"ts-node-dev": "^2.0.0"
}
}

View File

@@ -0,0 +1,8 @@
{
"extends": "../../tsconfig.base.json",
"compilerOptions": {
"outDir": "dist"
},
"include": ["./**/*.ts"],
"references": [{ "path": "../../packages/backend" }]
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 304 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 259 KiB

View File

@@ -10,7 +10,7 @@ services:
env_file:
- .env
volumes:
- archiver-data:/var/data/open-archiver
- ${STORAGE_LOCAL_ROOT_PATH}:${STORAGE_LOCAL_ROOT_PATH}
depends_on:
- postgres
- valkey
@@ -47,11 +47,19 @@ services:
restart: unless-stopped
environment:
MEILI_MASTER_KEY: ${MEILI_MASTER_KEY:-aSampleMasterKey}
MEILI_SCHEDULE_SNAPSHOT: ${MEILI_SCHEDULE_SNAPSHOT:-86400}
volumes:
- meilidata:/meili_data
networks:
- open-archiver-net
tika:
image: apache/tika:3.2.2.0-full
container_name: tika
restart: always
networks:
- open-archiver-net
volumes:
pgdata:
driver: local
@@ -59,8 +67,6 @@ volumes:
driver: local
meilidata:
driver: local
archiver-data:
driver: local
networks:
open-archiver-net:

View File

@@ -33,6 +33,7 @@ export default defineConfig({
items: [
{ text: 'Get Started', link: '/' },
{ text: 'Installation', link: '/user-guides/installation' },
{ text: 'Email Integrity Check', link: '/user-guides/integrity-check' },
{
text: 'Email Providers',
link: '/user-guides/email-providers/',
@@ -91,8 +92,10 @@ export default defineConfig({
{ text: 'Archived Email', link: '/api/archived-email' },
{ text: 'Dashboard', link: '/api/dashboard' },
{ text: 'Ingestion', link: '/api/ingestion' },
{ text: 'Integrity Check', link: '/api/integrity' },
{ text: 'Search', link: '/api/search' },
{ text: 'Storage', link: '/api/storage' },
{ text: 'Jobs', link: '/api/jobs' },
],
},
{
@@ -100,6 +103,7 @@ export default defineConfig({
items: [
{ text: 'Overview', link: '/services/' },
{ text: 'Storage Service', link: '/services/storage-service' },
{ text: 'OCR Service', link: '/services/ocr-service' },
{
text: 'IAM Service',
items: [{ text: 'IAM Policies', link: '/services/iam-service/iam-policy' }],

View File

@@ -19,11 +19,45 @@ The request body should be a `CreateIngestionSourceDto` object.
```typescript
interface CreateIngestionSourceDto {
name: string;
provider: 'google' | 'microsoft' | 'generic_imap';
provider: 'google_workspace' | 'microsoft_365' | 'generic_imap' | 'pst_import' | 'eml_import' | 'mbox_import';
providerConfig: IngestionCredentials;
}
```
#### Example: Creating an Mbox Import Source with File Upload
```json
{
"name": "My Mbox Import",
"provider": "mbox_import",
"providerConfig": {
"type": "mbox_import",
"uploadedFileName": "emails.mbox",
"uploadedFilePath": "open-archiver/tmp/uuid-emails.mbox"
}
}
```
#### Example: Creating an Mbox Import Source with Local File Path
```json
{
"name": "My Mbox Import",
"provider": "mbox_import",
"providerConfig": {
"type": "mbox_import",
"localFilePath": "/path/to/emails.mbox"
}
}
```
**Note:** When using `localFilePath`, the file will not be deleted after import. When using `uploadedFilePath` (via the upload API), the file will be automatically deleted after import. The same applies to `pst_import` and `eml_import` providers.
**Important regarding `localFilePath`:** When running OpenArchiver in a Docker container (which is the standard deployment), `localFilePath` refers to the path **inside the Docker container**, not on the host machine.
To use a local file:
1. **Recommended:** Place your file inside the directory defined by `STORAGE_LOCAL_ROOT_PATH` (e.g., inside a `temp` folder). Since this directory is already mounted as a volume, the file will be accessible at the same path inside the container.
2. **Alternative:** Mount a specific directory containing your files as a volume in `docker-compose.yml`. For example, add `- /path/to/my/files:/imports` to the `volumes` section and use `/imports/myfile.pst` as the `localFilePath`.
#### Responses
- **201 Created:** The newly created ingestion source.

51
docs/api/integrity.md Normal file
View File

@@ -0,0 +1,51 @@
# Integrity Check API
The Integrity Check API provides an endpoint to verify the cryptographic hash of an archived email and its attachments against the stored values in the database. This allows you to ensure that the stored files have not been tampered with or corrupted since they were archived.
## Check Email Integrity
Verifies the integrity of a specific archived email and all of its associated attachments.
- **URL:** `/api/v1/integrity/:id`
- **Method:** `GET`
- **URL Params:**
- `id=[string]` (required) - The UUID of the archived email to check.
- **Permissions:** `read:archive`
- **Success Response:**
- **Code:** 200 OK
- **Content:** `IntegrityCheckResult[]`
### Response Body `IntegrityCheckResult`
An array of objects, each representing the result of an integrity check for a single file (either the email itself or an attachment).
| Field | Type | Description |
| :--------- | :------------------------ | :-------------------------------------------------------------------------- |
| `type` | `'email' \| 'attachment'` | The type of the file being checked. |
| `id` | `string` | The UUID of the email or attachment. |
| `filename` | `string` (optional) | The filename of the attachment. This field is only present for attachments. |
| `isValid` | `boolean` | `true` if the current hash matches the stored hash, otherwise `false`. |
| `reason` | `string` (optional) | A reason for the failure. Only present if `isValid` is `false`. |
### Example Response
```json
[
{
"type": "email",
"id": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
"isValid": true
},
{
"type": "attachment",
"id": "b2c3d4e5-f6a7-8901-2345-67890abcdef1",
"filename": "document.pdf",
"isValid": false,
"reason": "Stored hash does not match current hash."
}
]
```
- **Error Response:**
- **Code:** 404 Not Found
- **Content:** `{ "message": "Archived email not found" }`

128
docs/api/jobs.md Normal file
View File

@@ -0,0 +1,128 @@
# Jobs API
The Jobs API provides endpoints for monitoring the job queues and the jobs within them.
## Overview
Open Archiver uses a job queue system to handle asynchronous tasks like email ingestion and indexing. The system is built on Redis and BullMQ and uses a producer-consumer pattern.
### Job Statuses
Jobs can have one of the following statuses:
- **active:** The job is currently being processed.
- **completed:** The job has been completed successfully.
- **failed:** The job has failed after all retry attempts.
- **delayed:** The job is delayed and will be processed at a later time.
- **waiting:** The job is waiting to be processed.
- **paused:** The job is paused and will not be processed until it is resumed.
### Errors
When a job fails, the `failedReason` and `stacktrace` fields will contain information about the error. The `error` field will also be populated with the `failedReason` for easier access.
### Job Preservation
Jobs are preserved for a limited time after they are completed or failed. This means that the job counts and the jobs that you see in the API are for a limited time.
- **Completed jobs:** The last 1000 completed jobs are preserved.
- **Failed jobs:** The last 5000 failed jobs are preserved.
## Get All Queues
- **Endpoint:** `GET /v1/jobs/queues`
- **Description:** Retrieves a list of all job queues and their job counts.
- **Permissions:** `manage:all`
- **Responses:**
- `200 OK`: Returns a list of queue overviews.
- `401 Unauthorized`: If the user is not authenticated.
- `403 Forbidden`: If the user does not have the required permissions.
### Response Body
```json
{
"queues": [
{
"name": "ingestion",
"counts": {
"active": 0,
"completed": 56,
"failed": 4,
"delayed": 3,
"waiting": 0,
"paused": 0
}
},
{
"name": "indexing",
"counts": {
"active": 0,
"completed": 0,
"failed": 0,
"delayed": 0,
"waiting": 0,
"paused": 0
}
}
]
}
```
## Get Queue Jobs
- **Endpoint:** `GET /v1/jobs/queues/:queueName`
- **Description:** Retrieves a list of jobs within a specific queue, with pagination and filtering by status.
- **Permissions:** `manage:all`
- **URL Parameters:**
- `queueName` (string, required): The name of the queue to retrieve jobs from.
- **Query Parameters:**
- `status` (string, optional): The status of the jobs to retrieve. Can be one of `active`, `completed`, `failed`, `delayed`, `waiting`, `paused`. Defaults to `failed`.
- `page` (number, optional): The page number to retrieve. Defaults to `1`.
- `limit` (number, optional): The number of jobs to retrieve per page. Defaults to `10`.
- **Responses:**
- `200 OK`: Returns a detailed view of the queue, including a paginated list of jobs.
- `401 Unauthorized`: If the user is not authenticated.
- `403 Forbidden`: If the user does not have the required permissions.
- `404 Not Found`: If the specified queue does not exist.
### Response Body
```json
{
"name": "ingestion",
"counts": {
"active": 0,
"completed": 56,
"failed": 4,
"delayed": 3,
"waiting": 0,
"paused": 0
},
"jobs": [
{
"id": "1",
"name": "initial-import",
"data": {
"ingestionSourceId": "clx1y2z3a0000b4d2e5f6g7h8"
},
"state": "failed",
"failedReason": "Error: Connection timed out",
"timestamp": 1678886400000,
"processedOn": 1678886401000,
"finishedOn": 1678886402000,
"attemptsMade": 5,
"stacktrace": ["..."],
"returnValue": null,
"ingestionSourceId": "clx1y2z3a0000b4d2e5f6g7h8",
"error": "Error: Connection timed out"
}
],
"pagination": {
"currentPage": 1,
"totalPages": 1,
"totalJobs": 4,
"limit": 10
}
}
```

View File

@@ -0,0 +1,78 @@
# Audit Log: API Endpoints
The audit log feature exposes two API endpoints for retrieving and verifying audit log data. Both endpoints require authentication and are only accessible to users with the appropriate permissions.
## Get Audit Logs
Retrieves a paginated list of audit log entries, with support for filtering and sorting.
- **Endpoint:** `GET /api/v1/enterprise/audit-logs`
- **Method:** `GET`
- **Authentication:** Required
### Query Parameters
| Parameter | Type | Description |
| ------------ | -------- | --------------------------------------------------------------------------- |
| `page` | `number` | The page number to retrieve. Defaults to `1`. |
| `limit` | `number` | The number of entries to retrieve per page. Defaults to `20`. |
| `startDate` | `date` | The start date for the date range filter. |
| `endDate` | `date` | The end date for the date range filter. |
| `actor` | `string` | The actor identifier to filter by. |
| `actionType` | `string` | The action type to filter by (e.g., `LOGIN`, `CREATE`). |
| `sort` | `string` | The sort order for the results. Can be `asc` or `desc`. Defaults to `desc`. |
### Response Body
```json
{
"data": [
{
"id": 1,
"previousHash": null,
"timestamp": "2025-10-03T00:00:00.000Z",
"actorIdentifier": "e8026a75-b58a-4902-8858-eb8780215f82",
"actorIp": "::1",
"actionType": "LOGIN",
"targetType": "User",
"targetId": "e8026a75-b58a-4902-8858-eb8780215f82",
"details": {},
"currentHash": "..."
}
],
"meta": {
"total": 100,
"page": 1,
"limit": 20
}
}
```
## Verify Audit Log Integrity
Initiates a verification process to check the integrity of the entire audit log chain.
- **Endpoint:** `POST /api/v1/enterprise/audit-logs/verify`
- **Method:** `POST`
- **Authentication:** Required
### Response Body
**Success**
```json
{
"ok": true,
"message": "Audit log integrity verified successfully."
}
```
**Failure**
```json
{
"ok": false,
"message": "Audit log chain is broken!",
"logId": 123
}
```

View File

@@ -0,0 +1,31 @@
# Audit Log: Backend Implementation
The backend implementation of the audit log is handled by the `AuditService`, located in `packages/backend/src/services/AuditService.ts`. This service encapsulates all the logic for creating, retrieving, and verifying audit log entries.
## Hashing and Verification Logic
The core of the audit log's immutability lies in its hashing and verification logic.
### Hash Calculation
The `calculateHash` method is responsible for generating a SHA-256 hash of a log entry. To ensure consistency, it performs the following steps:
1. **Canonical Object Creation:** It constructs a new object with a fixed property order, ensuring that the object's structure is always the same.
2. **Timestamp Normalization:** It converts the `timestamp` to milliseconds since the epoch (`getTime()`) to avoid any precision-related discrepancies between the application and the database.
3. **Canonical Stringification:** It uses a custom `canonicalStringify` function to create a JSON string representation of the object. This function sorts the object keys, ensuring that the output is always the same, regardless of the in-memory property order.
4. **Hash Generation:** It computes a SHA-256 hash of the canonical string.
### Verification Process
The `verifyAuditLog` method is designed to be highly scalable and efficient, even with millions of log entries. It processes the logs in manageable chunks (e.g., 1000 at a time) to avoid loading the entire table into memory.
The verification process involves the following steps:
1. **Iterative Processing:** It fetches the logs in batches within a `while` loop.
2. **Chain Verification:** For each log entry, it compares the `previousHash` with the `currentHash` of the preceding log. If they do not match, the chain is broken, and the verification fails.
3. **Hash Recalculation:** It recalculates the hash of the current log entry using the same `calculateHash` method used during creation.
4. **Integrity Check:** It compares the recalculated hash with the `currentHash` stored in the database. If they do not match, the log entry has been tampered with, and the verification fails.
## Service Integration
The `AuditService` is integrated into the application through the `AuditLogModule` (`packages/enterprise/src/modules/audit-log/audit-log.module.ts`), which registers the API routes for the audit log feature. The service's `createAuditLog` method is called from various other services throughout the application to record significant events.

View File

@@ -0,0 +1,39 @@
# Audit Log: User Interface
The audit log user interface provides a comprehensive view of all significant events that have occurred within the Open Archiver system. It is designed to be intuitive and user-friendly, allowing administrators to easily monitor and review system activity.
## Viewing Audit Logs
The main audit log page displays a table of log entries, with the following columns:
- **Timestamp:** The date and time of the event.
- **Actor:** The identifier of the user or system process that performed the action.
- **IP Address:** The IP address from which the action was initiated.
- **Action:** The type of action performed, displayed as a color-coded badge for easy identification.
- **Target Type:** The type of resource that was affected.
- **Target ID:** The unique identifier of the affected resource.
- **Details:** A truncated preview of the event's details. The full JSON object is displayed in a pop-up card on hover.
## Filtering and Sorting
The table can be sorted by timestamp by clicking the "Timestamp" header. This allows you to view the logs in either chronological or reverse chronological order.
## Pagination
Pagination controls are available below the table, allowing you to navigate through the entire history of audit log entries.
## Verifying Log Integrity
The "Verify Log Integrity" button allows you to initiate a verification process to check the integrity of the entire audit log chain. This process recalculates the hash of each log entry and compares it to the stored hash, ensuring that the cryptographic chain is unbroken and no entries have been tampered with.
### Verification Responses
- **Success:** A success notification is displayed, confirming that the audit log integrity has been verified successfully. This means that the log chain is complete and no entries have been tampered with.
- **Failure:** An error notification is displayed, indicating that the audit log chain is broken or an entry has been tampered with. The notification will include the ID of the log entry where the issue was detected. There are two types of failures:
- **Audit log chain is broken:** This means that the `previousHash` of a log entry does not match the `currentHash` of the preceding entry. This indicates that one or more log entries may have been deleted or inserted into the chain.
- **Audit log entry is tampered!:** This means that the recalculated hash of a log entry does not match its stored `currentHash`. This indicates that the data within the log entry has been altered.
## Viewing Log Details
You can view the full details of any log entry by clicking on its row in the table. This will open a dialog containing all the information associated with the log entry, including the previous and current hashes.

View File

@@ -0,0 +1,27 @@
# Audit Log
The Audit Log is an enterprise-grade feature designed to provide a complete, immutable, and verifiable record of every significant action that occurs within the Open Archiver system. Its primary purpose is to ensure compliance with strict regulatory standards, such as the German GoBD, by establishing a tamper-proof chain of evidence for all activities.
## Core Principles
To fulfill its compliance and security functions, the audit log adheres to the following core principles:
### 1. Immutability
Every log entry is cryptographically chained to the previous one. Each new entry contains a SHA-256 hash of the preceding entry's hash, creating a verifiable chain. Any attempt to alter or delete a past entry would break this chain and be immediately detectable through the verification process.
### 2. Completeness
The system is designed to log every significant event without exception. This includes not only user-initiated actions (like logins, searches, and downloads) but also automated system processes, such as data ingestion and policy-based deletions.
### 3. Attribution
Each log entry is unambiguously linked to the actor that initiated the event. This could be a specific authenticated user, an external auditor, or an automated system process. The actor's identifier and source IP address are recorded to ensure full traceability.
### 4. Clarity and Detail
Log entries are structured to be detailed and human-readable, providing sufficient context for an auditor to understand the event without needing specialized system knowledge. This includes the action performed, the target resource affected, and a JSON object with specific, contextual details of the event.
### 5. Verifiability
The integrity of the entire audit log can be verified at any time. A dedicated process iterates through the logs from the beginning, recalculating the hash of each entry and comparing it to the stored hash, ensuring the cryptographic chain is unbroken and no entries have been tampered with.

View File

@@ -1,289 +0,0 @@
# IAM Policies
This document provides a guide to creating and managing IAM policies in Open Archiver. It is intended for developers and administrators who need to configure granular access control for users and roles.
## Policy Structure
IAM policies are defined as an array of JSON objects, where each object represents a single permission rule. The structure of a policy object is as follows:
```json
{
"action": "read" OR ["read", "create"],
"subject": "ingestion" OR ["ingestion", "dashboard"],
"conditions": {
"field_name": "value"
},
"inverted": false OR true,
}
```
- `action`: The action(s) to be performed on the subject. Can be a single string or an array of strings.
- `subject`: The resource(s) or entity on which the action is to be performed. Can be a single string or an array of strings.
- `conditions`: (Optional) A set of conditions that must be met for the permission to be granted.
- `inverted`: (Optional) When set to `true`, this inverts the rule, turning it from a "can" rule into a "cannot" rule. This is useful for creating exceptions to broader permissions.
## Actions
The following actions are available for use in IAM policies:
- `manage`: A wildcard action that grants all permissions on a subject (`create`, `read`, `update`, `delete`, `search`, `sync`).
- `create`: Allows the user to create a new resource.
- `read`: Allows the user to view a resource.
- `update`: Allows the user to modify an existing resource.
- `delete`: Allows the user to delete a resource.
- `search`: Allows the user to search for resources.
- `sync`: Allows the user to synchronize a resource.
## Subjects
The following subjects are available for use in IAM policies:
- `all`: A wildcard subject that represents all resources.
- `archive`: Represents archived emails.
- `ingestion`: Represents ingestion sources.
- `settings`: Represents system settings.
- `users`: Represents user accounts.
- `roles`: Represents user roles.
- `dashboard`: Represents the dashboard.
## Advanced Conditions with MongoDB-Style Queries
Conditions are the key to creating fine-grained access control rules. They are defined as a JSON object where each key represents a field on the subject, and the value defines the criteria for that field.
All conditions within a single rule are implicitly joined with an **AND** logic. This means that for a permission to be granted, the resource must satisfy _all_ specified conditions.
The power of this system comes from its use of a subset of [MongoDB's query language](https://www.mongodb.com/docs/manual/), which provides a flexible and expressive way to define complex rules. These rules are translated into native queries for both the PostgreSQL database (via Drizzle ORM) and the Meilisearch engine.
### Supported Operators and Examples
Here is a detailed breakdown of the supported operators with examples.
#### `$eq` (Equal)
This is the default operator. If you provide a simple key-value pair, it is treated as an equality check.
```json
// This rule...
{ "status": "active" }
// ...is equivalent to this:
{ "status": { "$eq": "active" } }
```
**Use Case**: Grant access to an ingestion source only if its status is `active`.
#### `$ne` (Not Equal)
Matches documents where the field value is not equal to the specified value.
```json
{ "provider": { "$ne": "pst_import" } }
```
**Use Case**: Allow a user to see all ingestion sources except for PST imports.
#### `$in` (In Array)
Matches documents where the field value is one of the values in the specified array.
```json
{
"id": {
"$in": ["INGESTION_ID_1", "INGESTION_ID_2"]
}
}
```
**Use Case**: Grant an auditor access to a specific list of ingestion sources.
#### `$nin` (Not In Array)
Matches documents where the field value is not one of the values in the specified array.
```json
{ "provider": { "$nin": ["pst_import", "eml_import"] } }
```
**Use Case**: Hide all manual import sources from a specific user role.
#### `$lt` / `$lte` (Less Than / Less Than or Equal)
Matches documents where the field value is less than (`$lt`) or less than or equal to (`$lte`) the specified value. This is useful for numeric or date-based comparisons.
```json
{ "sentAt": { "$lt": "2024-01-01T00:00:00.000Z" } }
```
#### `$gt` / `$gte` (Greater Than / Greater Than or Equal)
Matches documents where the field value is greater than (`$gt`) or greater than or equal to (`$gte`) the specified value.
```json
{ "sentAt": { "$lt": "2024-01-01T00:00:00.000Z" } }
```
#### `$exists`
Matches documents that have (or do not have) the specified field.
```json
// Grant access only if a 'lastSyncStatusMessage' exists
{ "lastSyncStatusMessage": { "$exists": true } }
```
## Inverted Rules: Creating Exceptions with `cannot`
By default, all rules are "can" rules, meaning they grant permissions. However, you can create a "cannot" rule by adding `"inverted": true` to a policy object. This is extremely useful for creating exceptions to broader permissions.
A common pattern is to grant broad access and then use an inverted rule to carve out a specific restriction.
**Use Case**: Grant a user access to all ingestion sources _except_ for one specific source.
This is achieved with two rules:
1. A "can" rule that grants `read` access to the `ingestion` subject.
2. An inverted "cannot" rule that denies `read` access for the specific ingestion `id`.
```json
[
{
"action": "read",
"subject": "ingestion"
},
{
"inverted": true,
"action": "read",
"subject": "ingestion",
"conditions": {
"id": "SPECIFIC_INGESTION_ID_TO_EXCLUDE"
}
}
]
```
## Policy Evaluation Logic
The system evaluates policies by combining all relevant rules for a user. The logic is simple:
- A user has permission if at least one `can` rule allows it.
- A permission is denied if a `cannot` (`"inverted": true`) rule explicitly forbids it, even if a `can` rule allows it. `cannot` rules always take precedence.
### Dynamic Policies with Placeholders
To create dynamic policies that are specific to the current user, you can use the `${user.id}` placeholder in the `conditions` object. This placeholder will be replaced with the ID of the current user at runtime.
## Special Permissions for User and Role Management
It is important to note that while `read` access to `users` and `roles` can be granted granularly, any actions that modify these resources (`create`, `update`, `delete`) are restricted to Super Admins.
A user must have the `{ "action": "manage", "subject": "all" }` permission (Typically a Super Admin role) to manage users and roles. This is a security measure to prevent unauthorized changes to user accounts and permissions.
## Policy Examples
Here are several examples based on the default roles in the system, demonstrating how to combine actions, subjects, and conditions to achieve specific access control scenarios.
### Administrator
This policy grants a user full access to all resources using wildcards.
```json
[
{
"action": "manage",
"subject": "all"
}
]
```
### End-User
This policy allows a user to view the dashboard, create new ingestion sources, and fully manage the ingestion sources they own.
```json
[
{
"action": "read",
"subject": "dashboard"
},
{
"action": "create",
"subject": "ingestion"
},
{
"action": "manage",
"subject": "ingestion",
"conditions": {
"userId": "${user.id}"
}
},
{
"action": "manage",
"subject": "archive",
"conditions": {
"ingestionSource.userId": "${user.id}" // also needs to give permission to archived emails created by the user
}
}
]
```
### Global Read-Only Auditor
This policy grants read and search access across most of the application's resources, making it suitable for an auditor who needs to view data without modifying it.
```json
[
{
"action": ["read", "search"],
"subject": ["ingestion", "archive", "dashboard", "users", "roles"]
}
]
```
### Ingestion Admin
This policy grants full control over all ingestion sources and archives, but no other resources.
```json
[
{
"action": "manage",
"subject": "ingestion"
}
]
```
### Auditor for Specific Ingestion Sources
This policy demonstrates how to grant access to a specific list of ingestion sources using the `$in` operator.
```json
[
{
"action": ["read", "search"],
"subject": "ingestion",
"conditions": {
"id": {
"$in": ["INGESTION_ID_1", "INGESTION_ID_2"]
}
}
}
]
```
### Limit Access to a Specific Mailbox
This policy grants a user access to a specific ingestion source, but only allows them to see emails belonging to a single user within that source.
This is achieved by defining two specific `can` rules: The rule grants `read` and `search` access to the `archive` subject, but the `userEmail` must match.
```json
[
{
"action": ["read", "search"],
"subject": "archive",
"conditions": {
"userEmail": "user1@example.com"
}
}
]
```

View File

@@ -0,0 +1,96 @@
# OCR Service
The OCR (Optical Character Recognition) and text extraction service is responsible for extracting plain text content from various file formats, such as PDFs, Office documents, and more. This is a crucial component for making email attachments searchable.
## Overview
The system employs a two-pronged approach for text extraction:
1. **Primary Extractor (Apache Tika)**: A powerful and versatile toolkit that can extract text from a wide variety of file formats. It is the recommended method for its superior performance and format support.
2. **Legacy Extractor**: A fallback mechanism that uses a combination of libraries (`pdf2json`, `mammoth`, `xlsx`) for common file types like PDF, DOCX, and XLSX. This is used when Apache Tika is not configured.
The main logic resides in `packages/backend/src/helpers/textExtractor.ts`, which decides which extraction method to use based on the application's configuration.
## Configuration
To enable the primary text extraction method, you must configure the URL of an Apache Tika server instance in your environment variables.
In your `.env` file, set the `TIKA_URL`:
```env
# .env.example
# Apache Tika Integration
# ONLY active if TIKA_URL is set
TIKA_URL=http://tika:9998
```
If `TIKA_URL` is not set, the system will automatically fall back to the legacy extraction methods. The service performs a health check on startup to verify connectivity with the Tika server.
## File Size Limits
To prevent excessive memory usage and processing time, the service imposes a general size limit on files submitted for text extraction. Files larger than the configured limit will be skipped.
- **With Apache Tika**: The maximum file size is **100MB**.
- **With Legacy Fallback**: The maximum file size is **50MB**.
## Supported File Formats
The service's ability to extract text depends on whether it's using Apache Tika or the legacy fallback methods.
### With Apache Tika
When `TIKA_URL` is configured, the service can process a vast range of file formats. Apache Tika is designed for broad compatibility and supports hundreds of file types, including but not limited to:
- Portable Document Format (PDF)
- Microsoft Office formats (DOC, DOCX, PPT, PPTX, XLS, XLSX)
- OpenDocument Formats (ODT, ODS, ODP)
- Rich Text Format (RTF)
- Plain Text (TXT, CSV, JSON, XML, HTML)
- Image formats with OCR capabilities (PNG, JPEG, TIFF)
- Archive formats (ZIP, TAR, GZ)
- Email formats (EML, MSG)
For a complete and up-to-date list, please refer to the official [Apache Tika documentation](https://tika.apache.org/3.2.3/formats.html).
### With Legacy Fallback
When Tika is not configured, text extraction is limited to the following formats:
- `application/pdf` (PDF)
- `application/vnd.openxmlformats-officedocument.wordprocessingml.document` (DOCX)
- `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet` (XLSX)
- Plain text formats such as `text/*`, `application/json`, and `application/xml`.
## Features of the Tika Integration (`OcrService`)
The `OcrService` (`packages/backend/src/services/OcrService.ts`) provides several enhancements to make text extraction efficient and robust.
### Caching
To avoid redundant processing of the same file, the service implements a simple LRU (Least Recently Used) cache.
- **Cache Key**: A SHA-256 hash of the file's buffer is used as the cache key.
- **Functionality**: If a file with the same hash is processed again, the text content is served directly from the cache, saving significant processing time.
- **Statistics**: The service keeps track of cache hits, misses, and the hit rate for performance monitoring.
### Concurrency Management (Semaphore)
Extracting text from large files can be resource-intensive. To prevent the Tika server from being overwhelmed by multiple requests for the _same file_ simultaneously (e.g., during a large import), a semaphore mechanism is used.
- **Functionality**: If a request for a specific file (identified by its hash) is already in progress, any subsequent requests for the same file will wait for the first one to complete and then use its result.
- **Benefit**: This deduplicates parallel processing efforts and reduces unnecessary load on the Tika server.
### Health Check and DNS Fallback
- **Availability Check**: The service includes a `checkTikaAvailability` method to verify that the Tika server is reachable and operational. This check is performed on application startup.
- **DNS Fallback**: For convenience in Docker environments, if the Tika URL uses the hostname `tika` (e.g., `http://tika:9998`), the service will automatically attempt a fallback to `localhost` if the initial connection fails.
## Legacy Fallback Methods
When Tika is not available, the `extractTextLegacy` function in `textExtractor.ts` handles extraction for a limited set of MIME types:
- `application/pdf`: Processed using `pdf2json`. Includes a 50MB size limit and a 5-second timeout to prevent memory issues.
- `application/vnd.openxmlformats-officedocument.wordprocessingml.document` (DOCX): Processed using `mammoth`.
- `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet` (XLSX): Processed using `xlsx`.
- Plain text formats (`text/*`, `application/json`, `application/xml`): Converted directly from the buffer.

View File

@@ -30,7 +30,14 @@ archive.zip
2. Click the **Create New** button.
3. Select **EML Import** as the provider.
4. Enter a name for the ingestion source.
5. Click the **Choose File** button and select the zip archive containing your EML files.
5. **Choose Import Method:**
* **Upload File:** Click **Choose File** and select the zip archive containing your EML files. (Best for smaller archives)
* **Local Path:** Enter the path to the zip file **inside the container**. (Best for large archives)
> **Note on Local Path:** When using Docker, the "Local Path" is relative to the container's filesystem.
> * **Recommended:** Place your zip file in a `temp` folder inside your configured storage directory (`STORAGE_LOCAL_ROOT_PATH`). This path is already mounted. For example, if your storage path is `/data`, put the file in `/data/temp/emails.zip` and enter `/data/temp/emails.zip` as the path.
> * **Alternative:** Mount a separate volume in `docker-compose.yml` (e.g., `- /host/path:/container/path`) and use the container path.
6. Click the **Submit** button.
OpenArchiver will then start importing the EML files from the zip archive. The ingestion process may take some time, depending on the size of the archive.

View File

@@ -17,7 +17,13 @@ Once you have your `.mbox` file, you can upload it to OpenArchiver through the w
1. Navigate to the **Ingestion** page.
2. Click on the **New Ingestion** button.
3. Select **Mbox** as the source type.
4. Upload your `.mbox` file.
4. **Choose Import Method:**
* **Upload File:** Upload your `.mbox` file.
* **Local Path:** Enter the path to the mbox file **inside the container**.
> **Note on Local Path:** When using Docker, the "Local Path" is relative to the container's filesystem.
> * **Recommended:** Place your mbox file in a `temp` folder inside your configured storage directory (`STORAGE_LOCAL_ROOT_PATH`). This path is already mounted. For example, if your storage path is `/data`, put the file in `/data/temp/emails.mbox` and enter `/data/temp/emails.mbox` as the path.
> * **Alternative:** Mount a separate volume in `docker-compose.yml` (e.g., `- /host/path:/container/path`) and use the container path.
## 3. Folder Structure

View File

@@ -15,7 +15,14 @@ To ensure a successful import, you should prepare your PST file according to the
2. Click the **Create New** button.
3. Select **PST Import** as the provider.
4. Enter a name for the ingestion source.
5. Click the **Choose File** button and select the PST file.
5. **Choose Import Method:**
* **Upload File:** Click **Choose File** and select the PST file from your computer. (Best for smaller files)
* **Local Path:** Enter the path to the PST file **inside the container**. (Best for large files)
> **Note on Local Path:** When using Docker, the "Local Path" is relative to the container's filesystem.
> * **Recommended:** Place your file in a `temp` folder inside your configured storage directory (`STORAGE_LOCAL_ROOT_PATH`). This path is already mounted. For example, if your storage path is `/data`, put the file in `/data/temp/archive.pst` and enter `/data/temp/archive.pst` as the path.
> * **Alternative:** Mount a separate volume in `docker-compose.yml` (e.g., `- /host/path:/container/path`) and use the container path.
6. Click the **Submit** button.
OpenArchiver will then start importing the emails from the PST file. The ingestion process may take some time, depending on the size of the file.

View File

@@ -17,7 +17,22 @@ git clone https://github.com/LogicLabs-OU/OpenArchiver.git
cd OpenArchiver
```
## 2. Configure Your Environment
## 2. Create a Directory for Local Storage (Important)
Before configuring the application, you **must** create a directory on your host machine where Open Archiver will store its data (such as emails and attachments). Manually creating this directory helps prevent potential permission issues.
Foe examples, you can use this path `/var/data/open-archiver`.
Run the following commands to create the directory and set the correct permissions:
```bash
sudo mkdir -p /var/data/open-archiver
sudo chown -R $(id -u):$(id -g) /var/data/open-archiver
```
This ensures the directory is owned by your current user, which is necessary for the application to have write access. You will set this path in your `.env` file in the next step.
## 3. Configure Your Environment
The application is configured using environment variables. You'll need to create a `.env` file to store your configuration.
@@ -29,9 +44,15 @@ cp .env.example.docker .env
Now, open the `.env` file in a text editor and customize the settings.
### Important Configuration
### Key Configuration Steps
You must change the following placeholder values to secure your instance:
1. **Set the Storage Path**: Find the `STORAGE_LOCAL_ROOT_PATH` variable and set it to the path you just created.
```env
STORAGE_LOCAL_ROOT_PATH=/var/data/open-archiver
```
2. **Secure Your Instance**: You must change the following placeholder values to secure your instance:
- `POSTGRES_PASSWORD`: A strong, unique password for the database.
- `REDIS_PASSWORD`: A strong, unique password for the Valkey/Redis service.
@@ -41,6 +62,10 @@ You must change the following placeholder values to secure your instance:
```bash
openssl rand -hex 32
```
- `STORAGE_ENCRYPTION_KEY`: **(Optional but Recommended)** A 32-byte hex string for encrypting emails and attachments at rest. If this key is not provided, storage encryption will be disabled. You can generate one with:
```bash
openssl rand -hex 32
```
### Storage Configuration
@@ -65,29 +90,34 @@ Here is a complete list of environment variables available for configuration:
#### Application Settings
| Variable | Description | Default Value |
| ---------------- | ----------------------------------------------------------------------------------------------------- | ------------- |
| `NODE_ENV` | The application environment. | `development` |
| `PORT_BACKEND` | The port for the backend service. | `4000` |
| `PORT_FRONTEND` | The port for the frontend service. | `3000` |
| `SYNC_FREQUENCY` | The frequency of continuous email syncing. See [cron syntax](https://crontab.guru/) for more details. | `* * * * *` |
| Variable | Description | Default Value |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------- |
| `NODE_ENV` | The application environment. | `development` |
| `PORT_BACKEND` | The port for the backend service. | `4000` |
| `PORT_FRONTEND` | The port for the frontend service. | `3000` |
| `APP_URL` | The public-facing URL of your application. This is used by the backend to configure CORS. | `http://localhost:3000` |
| `ORIGIN` | Used by the SvelteKit Node adapter to determine the server's public-facing URL. It should always be set to the value of `APP_URL` (e.g., `ORIGIN=$APP_URL`). | `http://localhost:3000` |
| `SYNC_FREQUENCY` | The frequency of continuous email syncing. See [cron syntax](https://crontab.guru/) for more details. | `* * * * *` |
| `ALL_INCLUSIVE_ARCHIVE` | Set to `true` to include all emails, including Junk and Trash folders, in the email archive. | `false` |
#### Docker Compose Service Configuration
These variables are used by `docker-compose.yml` to configure the services.
| Variable | Description | Default Value |
| ------------------- | ----------------------------------------------- | -------------------------------------------------------- |
| `POSTGRES_DB` | The name of the PostgreSQL database. | `open_archive` |
| `POSTGRES_USER` | The username for the PostgreSQL database. | `admin` |
| `POSTGRES_PASSWORD` | The password for the PostgreSQL database. | `password` |
| `DATABASE_URL` | The connection URL for the PostgreSQL database. | `postgresql://admin:password@postgres:5432/open_archive` |
| `MEILI_MASTER_KEY` | The master key for Meilisearch. | `aSampleMasterKey` |
| `MEILI_HOST` | The host for the Meilisearch service. | `http://meilisearch:7700` |
| `REDIS_HOST` | The host for the Valkey (Redis) service. | `valkey` |
| `REDIS_PORT` | The port for the Valkey (Redis) service. | `6379` |
| `REDIS_PASSWORD` | The password for the Valkey (Redis) service. | `defaultredispassword` |
| `REDIS_TLS_ENABLED` | Enable or disable TLS for Redis. | `false` |
| Variable | Description | Default Value |
| ---------------------- | ---------------------------------------------------- | -------------------------------------------------------- |
| `POSTGRES_DB` | The name of the PostgreSQL database. | `open_archive` |
| `POSTGRES_USER` | The username for the PostgreSQL database. | `admin` |
| `POSTGRES_PASSWORD` | The password for the PostgreSQL database. | `password` |
| `DATABASE_URL` | The connection URL for the PostgreSQL database. | `postgresql://admin:password@postgres:5432/open_archive` |
| `MEILI_MASTER_KEY` | The master key for Meilisearch. | `aSampleMasterKey` |
| `MEILI_HOST` | The host for the Meilisearch service. | `http://meilisearch:7700` |
| `MEILI_INDEXING_BATCH` | The number of emails to batch together for indexing. | `500` |
| `REDIS_HOST` | The host for the Valkey (Redis) service. | `valkey` |
| `REDIS_PORT` | The port for the Valkey (Redis) service. | `6379` |
| `REDIS_USER` | Optional Redis username if ACLs are used. | |
| `REDIS_PASSWORD` | The password for the Valkey (Redis) service. | `defaultredispassword` |
| `REDIS_TLS_ENABLED` | Enable or disable TLS for Redis. | `false` |
#### Storage Settings
@@ -95,26 +125,34 @@ These variables are used by `docker-compose.yml` to configure the services.
| ------------------------------ | ----------------------------------------------------------------------------------------------------------- | ------------------------- |
| `STORAGE_TYPE` | The storage backend to use (`local` or `s3`). | `local` |
| `BODY_SIZE_LIMIT` | The maximum request body size for uploads. Can be a number in bytes or a string with a unit (e.g., `100M`). | `100M` |
| `STORAGE_LOCAL_ROOT_PATH` | The root path for local file storage. | `/var/data/open-archiver` |
| `STORAGE_LOCAL_ROOT_PATH` | The root path for Open Archiver app data. | `/var/data/open-archiver` |
| `STORAGE_S3_ENDPOINT` | The endpoint for S3-compatible storage (required if `STORAGE_TYPE` is `s3`). | |
| `STORAGE_S3_BUCKET` | The bucket name for S3-compatible storage (required if `STORAGE_TYPE` is `s3`). | |
| `STORAGE_S3_ACCESS_KEY_ID` | The access key ID for S3-compatible storage (required if `STORAGE_TYPE` is `s3`). | |
| `STORAGE_S3_SECRET_ACCESS_KEY` | The secret access key for S3-compatible storage (required if `STORAGE_TYPE` is `s3`). | |
| `STORAGE_S3_REGION` | The region for S3-compatible storage (required if `STORAGE_TYPE` is `s3`). | |
| `STORAGE_S3_FORCE_PATH_STYLE` | Force path-style addressing for S3 (optional). | `false` |
| `STORAGE_ENCRYPTION_KEY` | A 32-byte hex string for AES-256 encryption of files at rest. If not set, files will not be encrypted. | |
#### Security & Authentication
| Variable | Description | Default Value |
| -------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------ |
| `JWT_SECRET` | A secret key for signing JWT tokens. | `a-very-secret-key-that-you-should-change` |
| `JWT_EXPIRES_IN` | The expiration time for JWT tokens. | `7d` |
| ~~`SUPER_API_KEY`~~ (Deprecated) | An API key with super admin privileges. (The SUPER_API_KEY is deprecated since v0.3.0 after we roll out the role-based access control system.) | |
| `RATE_LIMIT_WINDOW_MS` | The window in milliseconds for which API requests are checked. | `900000` (15 minutes) |
| `RATE_LIMIT_MAX_REQUESTS` | The maximum number of API requests allowed from an IP within the window. | `100` |
| `ENCRYPTION_KEY` | A 32-byte hex string for encrypting sensitive data in the database. | |
| Variable | Description | Default Value |
| -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------ |
| `ENABLE_DELETION` | Enable or disable deletion of emails and ingestion sources. If this option is not set, or is set to any value other than `true`, deletion will be disabled for the entire instance. | `false` |
| `JWT_SECRET` | A secret key for signing JWT tokens. | `a-very-secret-key-that-you-should-change` |
| `JWT_EXPIRES_IN` | The expiration time for JWT tokens. | `7d` |
| ~~`SUPER_API_KEY`~~ (Deprecated) | An API key with super admin privileges. (The SUPER_API_KEY is deprecated since v0.3.0 after we roll out the role-based access control system.) | |
| `RATE_LIMIT_WINDOW_MS` | The window in milliseconds for which API requests are checked. | `900000` (15 minutes) |
| `RATE_LIMIT_MAX_REQUESTS` | The maximum number of API requests allowed from an IP within the window. | `100` |
| `ENCRYPTION_KEY` | A 32-byte hex string for encrypting sensitive data in the database. | |
## 3. Run the Application
#### Apache Tika Integration
| Variable | Description | Default Value |
| ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------ |
| `TIKA_URL` | Optional. The URL of an Apache Tika server for advanced text extraction from attachments. If not set, the application falls back to built-in parsers for PDF, Word, and Excel files. | `http://tika:9998` |
## 4. Run the Application
Once you have configured your `.env` file, you can start all the services using Docker Compose:
@@ -134,7 +172,7 @@ You can check the status of the running containers with:
docker compose ps
```
## 4. Access the Application
## 5. Access the Application
Once the services are running, you can access the Open Archiver web interface by navigating to `http://localhost:3000` in your web browser.
@@ -142,7 +180,7 @@ Upon first visit, you will be redirected to the `/setup` page where you can set
If you are not redirected to the `/setup` page but instead see the login page, there might be something wrong with the database. Restart the service and try again.
## 5. Next Steps
## 6. Next Steps
After successfully deploying and logging into Open Archiver, the next step is to configure your ingestion sources to start archiving emails.
@@ -301,31 +339,3 @@ docker-compose up -d --force-recreate
```
After this, any new data will be saved directly into the `./data/open-archiver` folder in your project directory.
## Troubleshooting
### 403 Cross-Site POST Forbidden Error
If you are running the application behind a reverse proxy or have mapped the application to a different port (e.g., `3005:3000`), you may encounter a `403 Cross-site POST from submissions are forbidden` error when uploading files.
To resolve this, you must set the `ORIGIN` environment variable to the URL of your application. This ensures that the backend can verify the origin of requests and prevent cross-site request forgery (CSRF) attacks.
Add the following line to your `.env` file, replacing `<your_host>` and `<your_port>` with your specific values:
```bash
ORIGIN=http://<your_host>:<your_port>
```
For example, if your application is accessible at `http://localhost:3005`, you would set the variable as follows:
```bash
ORIGIN=http://localhost:3005
```
After adding the `ORIGIN` variable, restart your Docker containers for the changes to take effect:
```bash
docker-compose up -d --force-recreate
```
This will ensure that your file uploads are correctly authorized.

View File

@@ -0,0 +1,37 @@
# Integrity Check
Open Archiver allows you to verify the integrity of your archived emails and their attachments. This guide explains how the integrity check works and what the results mean.
## How It Works
When an email is archived, Open Archiver calculates a unique cryptographic signature (a SHA256 hash) for the email's raw `.eml` file and for each of its attachments. These signatures are stored in the database alongside the email's metadata.
The integrity check feature recalculates these signatures for the stored files and compares them to the original signatures stored in the database. This process allows you to verify that the content of your archived emails has not been altered, corrupted, or tampered with since the moment they were archived.
## The Integrity Report
When you view an email in the Open Archiver interface, an integrity report is automatically generated and displayed. This report provides a clear, at-a-glance status for the email file and each of its attachments.
### Statuses
- **Valid (Green Badge):** A "Valid" status means that the current signature of the file matches the original signature stored in the database. This is the expected status and indicates that the file's integrity is intact.
- **Invalid (Red Badge):** An "Invalid" status means that the current signature of the file does _not_ match the original signature. This indicates that the file's content has changed since it was archived.
### Reasons for an "Invalid" Status
If a file is marked as "Invalid," you can hover over the badge to see a reason for the failure. Common reasons include:
- **Stored hash does not match current hash:** This is the most common reason and indicates that the file's content has been modified. This could be due to accidental changes, data corruption, or unauthorized tampering.
- **Could not read attachment file from storage:** This message indicates that the file could not be read from its storage location. This could be due to a storage system issue, a file permission problem, or because the file has been deleted.
## What to Do If an Integrity Check Fails
If you encounter an "Invalid" status for an email or attachment, it is important to investigate the issue. Here are some steps you can take:
1. **Check Storage:** Verify that the file exists in its storage location and that its permissions are correct.
2. **Review Audit Logs:** If you have audit logging enabled, review the logs for any unauthorized access or modifications to the file.
3. **Restore from Backup:** If you suspect data corruption, you may need to restore the affected file from a backup.
The integrity check feature is a crucial tool for ensuring the long-term reliability and trustworthiness of your email archive. By regularly monitoring the integrity of your archived data, you can be confident that your records are accurate and complete.

View File

@@ -0,0 +1,75 @@
# Troubleshooting CORS Errors
Cross-Origin Resource Sharing (CORS) is a security feature that controls how web applications in one domain can request and interact with resources in another. If not configured correctly, you may encounter errors when performing actions like uploading files.
This guide will help you diagnose and resolve common CORS-related issues.
## Symptoms
You may be experiencing a CORS issue if you see one of the following errors in your browser's developer console or in the application's logs:
- `TypeError: fetch failed`
- `Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource.`
- `Unexpected token 'C', "Cross-site"... is not valid JSON`
- A JSON error response similar to the following:
```json
{
"message": "CORS Error: This origin is not allowed.",
"requiredOrigin": "http://localhost:3000",
"receivedOrigin": "https://localhost:3000"
}
```
## Root Cause
These errors typically occur when the URL you are using to access the application in your browser does not exactly match the `APP_URL` configured in your `.env` file.
This can happen for several reasons:
- You are accessing the application via a different port.
- You are using a reverse proxy that changes the protocol (e.g., from `http` to `https`).
- The SvelteKit server, in a production build, is incorrectly guessing its public-facing URL.
## Solution
The solution is to ensure that the application's frontend and backend are correctly configured with the public-facing URL of your instance. This is done by setting two environment variables: `APP_URL` and `ORIGIN`.
1. **Open your `.env` file** in a text editor.
2. **Set `APP_URL`**: Define the `APP_URL` variable with the exact URL you use to access the application in your browser.
```env
APP_URL=http://your-domain-or-ip:3000
```
3. **Set `ORIGIN`**: The SvelteKit server requires a specific `ORIGIN` variable to correctly identify itself. This should always be set to the value of your `APP_URL`.
```env
ORIGIN=$APP_URL
```
By using `$APP_URL`, you ensure that both variables are always in sync.
### Example Configuration
If you are running the application locally on port `3000`, your configuration should look like this:
```env
APP_URL=http://localhost:3000
ORIGIN=$APP_URL
```
If your application is behind a reverse proxy and is accessible at `https://archive.mycompany.com`, your configuration should be:
```env
APP_URL=https://archive.mycompany.com
ORIGIN=$APP_URL
```
After making these changes to your `.env` file, you must restart the application for them to take effect:
```bash
docker compose up -d --force-recreate
```
This will ensure that the backend's CORS policy and the frontend server's origin are correctly aligned, resolving the errors.

View File

@@ -4,9 +4,57 @@ Meilisearch, the search engine used by Open Archiver, requires a manual data mig
If an Open Archiver upgrade includes a major Meilisearch version change, you will need to migrate your search index by following the process below.
## Migration Process Overview
## Experimental: Dumpless Upgrade
For self-hosted instances using Docker Compose (as recommended), the migration process involves creating a data dump from your current Meilisearch instance, upgrading the Docker image, and then importing that dump into the new version.
> **Warning:** This feature is currently **experimental**. We do not recommend using it for production environments until it is marked as stable. Please use the [standard migration process](#standard-migration-process-recommended) instead. Proceed with caution.
Meilisearch recently introduced an experimental "dumpless" upgrade method. This allows you to migrate the database to a new Meilisearch version without manually creating and importing a dump. However, please note that **dumpless upgrades are not currently atomic**. If the process fails, your database may become corrupted, resulting in data loss.
**Prerequisite: Create a Snapshot**
Before attempting a dumpless upgrade, you **must** take a snapshot of your instance. This ensures you have a recovery point if the upgrade fails. Learn how to create snapshots in the [official Meilisearch documentation](https://www.meilisearch.com/docs/learn/data_backup/snapshots).
### How to Enable
To perform a dumpless upgrade, you need to configure your Meilisearch instance with the experimental flag. You can do this in one of two ways:
**Option 1: Using an Environment Variable**
Add the `MEILI_EXPERIMENTAL_DUMPLESS_UPGRADE` environment variable to your `docker-compose.yml` file for the Meilisearch service.
```yaml
services:
meilisearch:
image: getmeili/meilisearch:v1.x # The new version you want to upgrade to
environment:
- MEILI_MASTER_KEY=${MEILI_MASTER_KEY}
- MEILI_EXPERIMENTAL_DUMPLESS_UPGRADE=true
```
**Option 2: Using a CLI Option**
Alternatively, you can pass the `--experimental-dumpless-upgrade` flag in the command section of your `docker-compose.yml`.
```yaml
services:
meilisearch:
image: getmeili/meilisearch:v1.x # The new version you want to upgrade to
command: meilisearch --experimental-dumpless-upgrade
```
After updating your configuration, restart your container:
```bash
docker compose up -d
```
Meilisearch will attempt to migrate your database to the new version automatically.
---
## Standard Migration Process (Recommended)
For self-hosted instances using Docker Compose, the recommended migration process involves creating a data dump from your current Meilisearch instance, upgrading the Docker image, and then importing that dump into the new version.
### Step 1: Create a Dump

View File

@@ -22,6 +22,7 @@ services:
- MEILI_HOST=http://meilisearch:7700
- REDIS_HOST=valkey
- REDIS_PORT=6379
- REDIS_USER=default
- REDIS_PASSWORD=${SERVICE_PASSWORD_VALKEY}
- REDIS_TLS_ENABLED=false
- STORAGE_TYPE=${STORAGE_TYPE:-local}
@@ -73,5 +74,6 @@ services:
image: getmeili/meilisearch:v1.15
environment:
- MEILI_MASTER_KEY=${SERVICE_PASSWORD_MEILISEARCH}
- MEILI_SCHEDULE_SNAPSHOT=86400
volumes:
- meilidata:/meili_data

View File

@@ -1,17 +1,24 @@
{
"name": "open-archiver",
"version": "0.3.3",
"version": "0.4.2",
"private": true,
"license": "SEE LICENSE IN LICENSE file",
"scripts": {
"dev": "dotenv -- pnpm --filter \"./packages/*\" --parallel dev",
"build": "pnpm --filter \"./packages/*\" build",
"start": "dotenv -- pnpm --filter \"./packages/*\" --parallel start",
"build:oss": "pnpm --filter \"./packages/*\" --filter \"!./packages/enterprise\" --filter \"./apps/open-archiver\" build",
"build:enterprise": "cross-env VITE_ENTERPRISE_MODE=true pnpm build",
"start:oss": "dotenv -- concurrently \"node apps/open-archiver/dist/index.js\" \"pnpm --filter @open-archiver/frontend start\"",
"start:enterprise": "dotenv -- concurrently \"node apps/open-archiver-enterprise/dist/index.js\" \"pnpm --filter @open-archiver/frontend start\"",
"dev:enterprise": "cross-env VITE_ENTERPRISE_MODE=true dotenv -- pnpm --filter \"@open-archiver/*\" --filter \"open-archiver-enterprise-app\" --parallel dev",
"dev:oss": "dotenv -- pnpm --filter \"./packages/*\" --filter \"!./packages/@open-archiver/enterprise\" --filter \"open-archiver-app\" --parallel dev",
"build": "pnpm --filter \"./packages/*\" --filter \"./apps/*\" build",
"start": "dotenv -- pnpm --filter \"open-archiver-app\" --parallel start",
"start:workers": "dotenv -- concurrently \"pnpm --filter @open-archiver/backend start:ingestion-worker\" \"pnpm --filter @open-archiver/backend start:indexing-worker\" \"pnpm --filter @open-archiver/backend start:sync-scheduler\"",
"start:workers:dev": "dotenv -- concurrently \"pnpm --filter @open-archiver/backend start:ingestion-worker:dev\" \"pnpm --filter @open-archiver/backend start:indexing-worker:dev\" \"pnpm --filter @open-archiver/backend start:sync-scheduler:dev\"",
"db:generate": "dotenv -- pnpm --filter @open-archiver/backend db:generate",
"db:migrate": "dotenv -- pnpm --filter @open-archiver/backend db:migrate",
"db:migrate:dev": "dotenv -- pnpm --filter @open-archiver/backend db:migrate:dev",
"docker-start": "concurrently \"pnpm start:workers\" \"pnpm start\"",
"docker-start:oss": "concurrently \"pnpm start:workers\" \"pnpm start:oss\"",
"docker-start:enterprise": "concurrently \"pnpm start:workers\" \"pnpm start:enterprise\"",
"docs:dev": "vitepress dev docs --port 3009",
"docs:build": "vitepress build docs",
"docs:preview": "vitepress preview docs",
@@ -23,6 +30,7 @@
"dotenv-cli": "8.0.0"
},
"devDependencies": {
"cross-env": "^10.0.0",
"prettier": "^3.6.2",
"prettier-plugin-svelte": "^3.4.0",
"prettier-plugin-tailwindcss": "^0.6.14",

View File

@@ -2,12 +2,13 @@
"name": "@open-archiver/backend",
"version": "0.1.0",
"private": true,
"license": "SEE LICENSE IN LICENSE file",
"main": "dist/index.js",
"types": "dist/index.d.ts",
"scripts": {
"dev": "ts-node-dev --respawn --transpile-only src/index.ts ",
"build": "tsc && pnpm copy-assets",
"dev": "tsc --watch",
"copy-assets": "cp -r src/locales dist/locales",
"start": "node dist/index.js",
"start:ingestion-worker": "node dist/workers/ingestion.worker.js",
"start:indexing-worker": "node dist/workers/indexing.worker.js",
"start:sync-scheduler": "node dist/jobs/schedulers/sync-scheduler.js",
@@ -31,6 +32,7 @@
"bcryptjs": "^3.0.2",
"bullmq": "^5.56.3",
"busboy": "^1.6.0",
"cors": "^2.8.5",
"cross-fetch": "^4.1.0",
"deepmerge-ts": "^7.1.5",
"dotenv": "^17.2.0",
@@ -58,16 +60,14 @@
"pst-extractor": "^1.11.0",
"reflect-metadata": "^0.2.2",
"sqlite3": "^5.1.7",
"tsconfig-paths": "^4.2.0",
"xlsx": "https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz",
"yauzl": "^3.2.0",
"zod": "^4.1.5"
},
"devDependencies": {
"@bull-board/api": "^6.11.0",
"@bull-board/express": "^6.11.0",
"@types/archiver": "^6.0.3",
"@types/busboy": "^1.5.4",
"@types/cors": "^2.8.19",
"@types/express": "^5.0.3",
"@types/mailparser": "^3.4.6",
"@types/microsoft-graph": "^2.40.1",
@@ -75,6 +75,7 @@
"@types/node": "^24.0.12",
"@types/yauzl": "^2.10.3",
"ts-node-dev": "^2.0.0",
"tsconfig-paths": "^4.2.0",
"typescript": "^5.8.3"
}
}

View File

@@ -1,7 +1,7 @@
import { Request, Response } from 'express';
import { ApiKeyService } from '../../services/ApiKeyService';
import { z } from 'zod';
import { config } from '../../config';
import { UserService } from '../../services/UserService';
const generateApiKeySchema = z.object({
name: z
@@ -14,20 +14,27 @@ const generateApiKeySchema = z.object({
.positive('Only positive number is allowed')
.max(730, 'The API key must expire within 2 years / 730 days.'),
});
export class ApiKeyController {
public async generateApiKey(req: Request, res: Response) {
if (config.app.isDemo) {
return res.status(403).json({ message: req.t('errors.demoMode') });
}
private userService = new UserService();
public generateApiKey = async (req: Request, res: Response) => {
try {
const { name, expiresInDays } = generateApiKeySchema.parse(req.body);
if (!req.user || !req.user.sub) {
return res.status(401).json({ message: 'Unauthorized' });
}
const userId = req.user.sub;
const actor = await this.userService.findById(userId);
if (!actor) {
return res.status(401).json({ message: 'Unauthorized' });
}
const key = await ApiKeyService.generate(userId, name, expiresInDays);
const key = await ApiKeyService.generate(
userId,
name,
expiresInDays,
actor,
req.ip || 'unknown'
);
res.status(201).json({ key });
} catch (error) {
@@ -38,9 +45,9 @@ export class ApiKeyController {
}
res.status(500).json({ message: req.t('errors.internalServerError') });
}
}
};
public async getApiKeys(req: Request, res: Response) {
public getApiKeys = async (req: Request, res: Response) => {
if (!req.user || !req.user.sub) {
return res.status(401).json({ message: 'Unauthorized' });
}
@@ -48,19 +55,20 @@ export class ApiKeyController {
const keys = await ApiKeyService.getKeys(userId);
res.status(200).json(keys);
}
};
public async deleteApiKey(req: Request, res: Response) {
if (config.app.isDemo) {
return res.status(403).json({ message: req.t('errors.demoMode') });
}
public deleteApiKey = async (req: Request, res: Response) => {
const { id } = req.params;
if (!req.user || !req.user.sub) {
return res.status(401).json({ message: 'Unauthorized' });
}
const userId = req.user.sub;
await ApiKeyService.deleteKey(id, userId);
const actor = await this.userService.findById(userId);
if (!actor) {
return res.status(401).json({ message: 'Unauthorized' });
}
await ApiKeyService.deleteKey(id, userId, actor, req.ip || 'unknown');
res.status(204).send({ message: req.t('apiKeys.deleteSuccess') });
}
};
}

View File

@@ -1,8 +1,10 @@
import { Request, Response } from 'express';
import { ArchivedEmailService } from '../../services/ArchivedEmailService';
import { config } from '../../config';
import { UserService } from '../../services/UserService';
import { checkDeletionEnabled } from '../../helpers/deletionGuard';
export class ArchivedEmailController {
private userService = new UserService();
public getArchivedEmails = async (req: Request, res: Response): Promise<Response> => {
try {
const { ingestionSourceId } = req.params;
@@ -35,8 +37,17 @@ export class ArchivedEmailController {
if (!userId) {
return res.status(401).json({ message: req.t('errors.unauthorized') });
}
const actor = await this.userService.findById(userId);
if (!actor) {
return res.status(401).json({ message: req.t('errors.unauthorized') });
}
const email = await ArchivedEmailService.getArchivedEmailById(id, userId);
const email = await ArchivedEmailService.getArchivedEmailById(
id,
userId,
actor,
req.ip || 'unknown'
);
if (!email) {
return res.status(404).json({ message: req.t('archivedEmail.notFound') });
}
@@ -48,12 +59,18 @@ export class ArchivedEmailController {
};
public deleteArchivedEmail = async (req: Request, res: Response): Promise<Response> => {
if (config.app.isDemo) {
return res.status(403).json({ message: req.t('errors.demoMode') });
}
try {
checkDeletionEnabled();
const { id } = req.params;
await ArchivedEmailService.deleteArchivedEmail(id);
const userId = req.user?.sub;
if (!userId) {
return res.status(401).json({ message: req.t('errors.unauthorized') });
}
const actor = await this.userService.findById(userId);
if (!actor) {
return res.status(401).json({ message: req.t('errors.unauthorized') });
}
await ArchivedEmailService.deleteArchivedEmail(id, actor, req.ip || 'unknown');
return res.status(204).send();
} catch (error) {
console.error(`Delete archived email ${req.params.id} error:`, error);

View File

@@ -44,7 +44,7 @@ export class AuthController {
{ email, password, first_name, last_name },
true
);
const result = await this.#authService.login(email, password);
const result = await this.#authService.login(email, password, req.ip || 'unknown');
return res.status(201).json(result);
} catch (error) {
console.error('Setup error:', error);
@@ -60,7 +60,7 @@ export class AuthController {
}
try {
const result = await this.#authService.login(email, password);
const result = await this.#authService.login(email, password, req.ip || 'unknown');
if (!result) {
return res.status(401).json({ message: req.t('auth.login.invalidCredentials') });

View File

@@ -3,7 +3,6 @@ import { IamService } from '../../services/IamService';
import { PolicyValidator } from '../../iam-policy/policy-validator';
import type { CaslPolicy } from '@open-archiver/types';
import { logger } from '../../config/logger';
import { config } from '../../config';
export class IamController {
#iamService: IamService;
@@ -42,9 +41,6 @@ export class IamController {
};
public createRole = async (req: Request, res: Response) => {
if (config.app.isDemo) {
return res.status(403).json({ message: req.t('errors.demoMode') });
}
const { name, policies } = req.body;
if (!name || !policies) {
@@ -69,9 +65,6 @@ export class IamController {
};
public deleteRole = async (req: Request, res: Response) => {
if (config.app.isDemo) {
return res.status(403).json({ message: req.t('errors.demoMode') });
}
const { id } = req.params;
try {
@@ -83,9 +76,6 @@ export class IamController {
};
public updateRole = async (req: Request, res: Response) => {
if (config.app.isDemo) {
return res.status(403).json({ message: req.t('errors.demoMode') });
}
const { id } = req.params;
const { name, policies } = req.body;

View File

@@ -7,9 +7,11 @@ import {
SafeIngestionSource,
} from '@open-archiver/types';
import { logger } from '../../config/logger';
import { config } from '../../config';
import { UserService } from '../../services/UserService';
import { checkDeletionEnabled } from '../../helpers/deletionGuard';
export class IngestionController {
private userService = new UserService();
/**
* Converts an IngestionSource object to a safe version for client-side consumption
* by removing the credentials.
@@ -22,16 +24,22 @@ export class IngestionController {
}
public create = async (req: Request, res: Response): Promise<Response> => {
if (config.app.isDemo) {
return res.status(403).json({ message: req.t('errors.demoMode') });
}
try {
const dto: CreateIngestionSourceDto = req.body;
const userId = req.user?.sub;
if (!userId) {
return res.status(401).json({ message: req.t('errors.unauthorized') });
}
const newSource = await IngestionService.create(dto, userId);
const actor = await this.userService.findById(userId);
if (!actor) {
return res.status(401).json({ message: req.t('errors.unauthorized') });
}
const newSource = await IngestionService.create(
dto,
userId,
actor,
req.ip || 'unknown'
);
const safeSource = this.toSafeIngestionSource(newSource);
return res.status(201).json(safeSource);
} catch (error: any) {
@@ -74,13 +82,23 @@ export class IngestionController {
};
public update = async (req: Request, res: Response): Promise<Response> => {
if (config.app.isDemo) {
return res.status(403).json({ message: req.t('errors.demoMode') });
}
try {
const { id } = req.params;
const dto: UpdateIngestionSourceDto = req.body;
const updatedSource = await IngestionService.update(id, dto);
const userId = req.user?.sub;
if (!userId) {
return res.status(401).json({ message: req.t('errors.unauthorized') });
}
const actor = await this.userService.findById(userId);
if (!actor) {
return res.status(401).json({ message: req.t('errors.unauthorized') });
}
const updatedSource = await IngestionService.update(
id,
dto,
actor,
req.ip || 'unknown'
);
const safeSource = this.toSafeIngestionSource(updatedSource);
return res.status(200).json(safeSource);
} catch (error) {
@@ -93,26 +111,31 @@ export class IngestionController {
};
public delete = async (req: Request, res: Response): Promise<Response> => {
if (config.app.isDemo) {
return res.status(403).json({ message: req.t('errors.demoMode') });
}
try {
checkDeletionEnabled();
const { id } = req.params;
await IngestionService.delete(id);
const userId = req.user?.sub;
if (!userId) {
return res.status(401).json({ message: req.t('errors.unauthorized') });
}
const actor = await this.userService.findById(userId);
if (!actor) {
return res.status(401).json({ message: req.t('errors.unauthorized') });
}
await IngestionService.delete(id, actor, req.ip || 'unknown');
return res.status(204).send();
} catch (error) {
console.error(`Delete ingestion source ${req.params.id} error:`, error);
if (error instanceof Error && error.message === 'Ingestion source not found') {
return res.status(404).json({ message: req.t('ingestion.notFound') });
} else if (error instanceof Error) {
return res.status(400).json({ message: error.message });
}
return res.status(500).json({ message: req.t('errors.internalServerError') });
}
};
public triggerInitialImport = async (req: Request, res: Response): Promise<Response> => {
if (config.app.isDemo) {
return res.status(403).json({ message: req.t('errors.demoMode') });
}
try {
const { id } = req.params;
await IngestionService.triggerInitialImport(id);
@@ -127,12 +150,22 @@ export class IngestionController {
};
public pause = async (req: Request, res: Response): Promise<Response> => {
if (config.app.isDemo) {
return res.status(403).json({ message: req.t('errors.demoMode') });
}
try {
const { id } = req.params;
const updatedSource = await IngestionService.update(id, { status: 'paused' });
const userId = req.user?.sub;
if (!userId) {
return res.status(401).json({ message: req.t('errors.unauthorized') });
}
const actor = await this.userService.findById(userId);
if (!actor) {
return res.status(401).json({ message: req.t('errors.unauthorized') });
}
const updatedSource = await IngestionService.update(
id,
{ status: 'paused' },
actor,
req.ip || 'unknown'
);
const safeSource = this.toSafeIngestionSource(updatedSource);
return res.status(200).json(safeSource);
} catch (error) {
@@ -145,12 +178,17 @@ export class IngestionController {
};
public triggerForceSync = async (req: Request, res: Response): Promise<Response> => {
if (config.app.isDemo) {
return res.status(403).json({ message: req.t('errors.demoMode') });
}
try {
const { id } = req.params;
await IngestionService.triggerForceSync(id);
const userId = req.user?.sub;
if (!userId) {
return res.status(401).json({ message: req.t('errors.unauthorized') });
}
const actor = await this.userService.findById(userId);
if (!actor) {
return res.status(401).json({ message: req.t('errors.unauthorized') });
}
await IngestionService.triggerForceSync(id, actor, req.ip || 'unknown');
return res.status(202).json({ message: req.t('ingestion.forceSyncTriggered') });
} catch (error) {
console.error(`Trigger force sync for ${req.params.id} error:`, error);

View File

@@ -0,0 +1,29 @@
import { Request, Response } from 'express';
import { IntegrityService } from '../../services/IntegrityService';
import { z } from 'zod';
const checkIntegritySchema = z.object({
id: z.string().uuid(),
});
export class IntegrityController {
private integrityService = new IntegrityService();
public checkIntegrity = async (req: Request, res: Response) => {
try {
const { id } = checkIntegritySchema.parse(req.params);
const results = await this.integrityService.checkEmailIntegrity(id);
res.status(200).json(results);
} catch (error) {
if (error instanceof z.ZodError) {
return res
.status(400)
.json({ message: req.t('api.requestBodyInvalid'), errors: error.message });
}
if (error instanceof Error && error.message === 'Archived email not found') {
return res.status(404).json({ message: req.t('errors.notFound') });
}
res.status(500).json({ message: req.t('errors.internalServerError') });
}
};
}

View File

@@ -0,0 +1,42 @@
import { Request, Response } from 'express';
import { JobsService } from '../../services/JobsService';
import {
IGetQueueJobsRequestParams,
IGetQueueJobsRequestQuery,
JobStatus,
} from '@open-archiver/types';
export class JobsController {
private jobsService: JobsService;
constructor() {
this.jobsService = new JobsService();
}
public getQueues = async (req: Request, res: Response) => {
try {
const queues = await this.jobsService.getQueues();
res.status(200).json({ queues });
} catch (error) {
res.status(500).json({ message: 'Error fetching queues', error });
}
};
public getQueueJobs = async (req: Request, res: Response) => {
try {
const { queueName } = req.params as unknown as IGetQueueJobsRequestParams;
const { status, page, limit } = req.query as unknown as IGetQueueJobsRequestQuery;
const pageNumber = parseInt(page, 10) || 1;
const limitNumber = parseInt(limit, 10) || 10;
const queueDetails = await this.jobsService.getQueueDetails(
queueName,
status,
pageNumber,
limitNumber
);
res.status(200).json(queueDetails);
} catch (error) {
res.status(500).json({ message: 'Error fetching queue jobs', error });
}
};
}

View File

@@ -31,7 +31,8 @@ export class SearchController {
limit: limit ? parseInt(limit as string) : 10,
matchingStrategy: matchingStrategy as MatchingStrategies,
},
userId
userId,
req.ip || 'unknown'
);
res.status(200).json(results);

View File

@@ -1,8 +1,9 @@
import type { Request, Response } from 'express';
import { SettingsService } from '../../services/SettingsService';
import { config } from '../../config';
import { UserService } from '../../services/UserService';
const settingsService = new SettingsService();
const userService = new UserService();
export const getSystemSettings = async (req: Request, res: Response) => {
try {
@@ -17,10 +18,18 @@ export const getSystemSettings = async (req: Request, res: Response) => {
export const updateSystemSettings = async (req: Request, res: Response) => {
try {
// Basic validation can be performed here if necessary
if (config.app.isDemo) {
return res.status(403).json({ message: req.t('errors.demoMode') });
if (!req.user || !req.user.sub) {
return res.status(401).json({ message: 'Unauthorized' });
}
const updatedSettings = await settingsService.updateSystemSettings(req.body);
const actor = await userService.findById(req.user.sub);
if (!actor) {
return res.status(401).json({ message: 'Unauthorized' });
}
const updatedSettings = await settingsService.updateSystemSettings(
req.body,
actor,
req.ip || 'unknown'
);
res.status(200).json(updatedSettings);
} catch (error) {
// A more specific error could be logged here

View File

@@ -3,7 +3,6 @@ import { UserService } from '../../services/UserService';
import * as schema from '../../database/schema';
import { sql } from 'drizzle-orm';
import { db } from '../../database';
import { config } from '../../config';
const userService = new UserService();
@@ -21,27 +20,39 @@ export const getUser = async (req: Request, res: Response) => {
};
export const createUser = async (req: Request, res: Response) => {
if (config.app.isDemo) {
return res.status(403).json({ message: req.t('errors.demoMode') });
}
const { email, first_name, last_name, password, roleId } = req.body;
if (!req.user || !req.user.sub) {
return res.status(401).json({ message: 'Unauthorized' });
}
const actor = await userService.findById(req.user.sub);
if (!actor) {
return res.status(401).json({ message: 'Unauthorized' });
}
const newUser = await userService.createUser(
{ email, first_name, last_name, password },
roleId
roleId,
actor,
req.ip || 'unknown'
);
res.status(201).json(newUser);
};
export const updateUser = async (req: Request, res: Response) => {
if (config.app.isDemo) {
return res.status(403).json({ message: req.t('errors.demoMode') });
}
const { email, first_name, last_name, roleId } = req.body;
if (!req.user || !req.user.sub) {
return res.status(401).json({ message: 'Unauthorized' });
}
const actor = await userService.findById(req.user.sub);
if (!actor) {
return res.status(401).json({ message: 'Unauthorized' });
}
const updatedUser = await userService.updateUser(
req.params.id,
{ email, first_name, last_name },
roleId
roleId,
actor,
req.ip || 'unknown'
);
if (!updatedUser) {
return res.status(404).json({ message: req.t('user.notFound') });
@@ -50,9 +61,6 @@ export const updateUser = async (req: Request, res: Response) => {
};
export const deleteUser = async (req: Request, res: Response) => {
if (config.app.isDemo) {
return res.status(403).json({ message: req.t('errors.demoMode') });
}
const userCountResult = await db.select({ count: sql<number>`count(*)` }).from(schema.users);
const isOnlyUser = Number(userCountResult[0].count) === 1;
@@ -61,6 +69,70 @@ export const deleteUser = async (req: Request, res: Response) => {
message: req.t('user.cannotDeleteOnlyUser'),
});
}
await userService.deleteUser(req.params.id);
if (!req.user || !req.user.sub) {
return res.status(401).json({ message: 'Unauthorized' });
}
const actor = await userService.findById(req.user.sub);
if (!actor) {
return res.status(401).json({ message: 'Unauthorized' });
}
await userService.deleteUser(req.params.id, actor, req.ip || 'unknown');
res.status(204).send();
};
export const getProfile = async (req: Request, res: Response) => {
if (!req.user || !req.user.sub) {
return res.status(401).json({ message: 'Unauthorized' });
}
const user = await userService.findById(req.user.sub);
if (!user) {
return res.status(404).json({ message: req.t('user.notFound') });
}
res.json(user);
};
export const updateProfile = async (req: Request, res: Response) => {
const { email, first_name, last_name } = req.body;
if (!req.user || !req.user.sub) {
return res.status(401).json({ message: 'Unauthorized' });
}
const actor = await userService.findById(req.user.sub);
if (!actor) {
return res.status(401).json({ message: 'Unauthorized' });
}
const updatedUser = await userService.updateUser(
req.user.sub,
{ email, first_name, last_name },
undefined,
actor,
req.ip || 'unknown'
);
res.json(updatedUser);
};
export const updatePassword = async (req: Request, res: Response) => {
const { currentPassword, newPassword } = req.body;
if (!req.user || !req.user.sub) {
return res.status(401).json({ message: 'Unauthorized' });
}
const actor = await userService.findById(req.user.sub);
if (!actor) {
return res.status(401).json({ message: 'Unauthorized' });
}
try {
await userService.updatePassword(
req.user.sub,
currentPassword,
newPassword,
actor,
req.ip || 'unknown'
);
res.status(200).json({ message: 'Password updated successfully' });
} catch (e: any) {
if (e.message === 'Invalid current password') {
return res.status(400).json({ message: e.message });
}
throw e;
}
};

View File

@@ -1,4 +1,4 @@
import rateLimit from 'express-rate-limit';
import { rateLimit, ipKeyGenerator } from 'express-rate-limit';
import { config } from '../../config';
const windowInMinutes = Math.ceil(config.api.rateLimit.windowMs / 60000);
@@ -6,6 +6,11 @@ const windowInMinutes = Math.ceil(config.api.rateLimit.windowMs / 60000);
export const rateLimiter = rateLimit({
windowMs: config.api.rateLimit.windowMs,
max: config.api.rateLimit.max,
keyGenerator: (req, res) => {
// Use the real IP address of the client, even if it's behind a proxy.
// `app.set('trust proxy', true)` in `server.ts`.
return ipKeyGenerator(req.ip || 'unknown');
},
message: {
status: 429,
message: `Too many requests from this IP, please try again after ${windowInMinutes} minutes`,

View File

@@ -3,7 +3,7 @@ import { ApiKeyController } from '../controllers/api-key.controller';
import { requireAuth } from '../middleware/requireAuth';
import { AuthService } from '../../services/AuthService';
export const apiKeyRoutes = (authService: AuthService) => {
export const apiKeyRoutes = (authService: AuthService): Router => {
const router = Router();
const controller = new ApiKeyController();

View File

@@ -0,0 +1,16 @@
import { Router } from 'express';
import { IntegrityController } from '../controllers/integrity.controller';
import { requireAuth } from '../middleware/requireAuth';
import { requirePermission } from '../middleware/requirePermission';
import { AuthService } from '../../services/AuthService';
export const integrityRoutes = (authService: AuthService): Router => {
const router = Router();
const controller = new IntegrityController();
router.use(requireAuth(authService));
router.get('/:id', requirePermission('read', 'archive'), controller.checkIntegrity);
return router;
};

View File

@@ -0,0 +1,25 @@
import { Router } from 'express';
import { JobsController } from '../controllers/jobs.controller';
import { requireAuth } from '../middleware/requireAuth';
import { requirePermission } from '../middleware/requirePermission';
import { AuthService } from '../../services/AuthService';
export const createJobsRouter = (authService: AuthService): Router => {
const router = Router();
const jobsController = new JobsController();
router.use(requireAuth(authService));
router.get(
'/queues',
requirePermission('manage', 'all', 'user.requiresSuperAdminRole'),
jobsController.getQueues
);
router.get(
'/queues/:queueName',
requirePermission('manage', 'all', 'user.requiresSuperAdminRole'),
jobsController.getQueueJobs
);
return router;
};

View File

@@ -11,6 +11,10 @@ export const createUserRouter = (authService: AuthService): Router => {
router.get('/', requirePermission('read', 'users'), userController.getUsers);
router.get('/profile', userController.getProfile);
router.patch('/profile', userController.updateProfile);
router.post('/profile/password', userController.updatePassword);
router.get('/:id', requirePermission('read', 'users'), userController.getUser);
/**

View File

@@ -0,0 +1,170 @@
import express, { Express } from 'express';
import cors from 'cors';
import dotenv from 'dotenv';
import { AuthController } from './controllers/auth.controller';
import { IngestionController } from './controllers/ingestion.controller';
import { ArchivedEmailController } from './controllers/archived-email.controller';
import { StorageController } from './controllers/storage.controller';
import { SearchController } from './controllers/search.controller';
import { IamController } from './controllers/iam.controller';
import { createAuthRouter } from './routes/auth.routes';
import { createIamRouter } from './routes/iam.routes';
import { createIngestionRouter } from './routes/ingestion.routes';
import { createArchivedEmailRouter } from './routes/archived-email.routes';
import { createStorageRouter } from './routes/storage.routes';
import { createSearchRouter } from './routes/search.routes';
import { createDashboardRouter } from './routes/dashboard.routes';
import { createUploadRouter } from './routes/upload.routes';
import { createUserRouter } from './routes/user.routes';
import { createSettingsRouter } from './routes/settings.routes';
import { apiKeyRoutes } from './routes/api-key.routes';
import { integrityRoutes } from './routes/integrity.routes';
import { createJobsRouter } from './routes/jobs.routes';
import { AuthService } from '../services/AuthService';
import { AuditService } from '../services/AuditService';
import { UserService } from '../services/UserService';
import { IamService } from '../services/IamService';
import { StorageService } from '../services/StorageService';
import { SearchService } from '../services/SearchService';
import { SettingsService } from '../services/SettingsService';
import i18next from 'i18next';
import FsBackend from 'i18next-fs-backend';
import i18nextMiddleware from 'i18next-http-middleware';
import path from 'path';
import { logger } from '../config/logger';
import { rateLimiter } from './middleware/rateLimiter';
import { config } from '../config';
import { OpenArchiverFeature } from '@open-archiver/types';
// Define the "plugin" interface
export interface ArchiverModule {
initialize: (app: Express, authService: AuthService) => Promise<void>;
name: OpenArchiverFeature;
}
export let authService: AuthService;
export async function createServer(modules: ArchiverModule[] = []): Promise<Express> {
// Load environment variables
dotenv.config();
// --- Environment Variable Validation ---
const { JWT_SECRET, JWT_EXPIRES_IN } = process.env;
if (!JWT_SECRET || !JWT_EXPIRES_IN) {
throw new Error(
'Missing required environment variables for the backend: JWT_SECRET, JWT_EXPIRES_IN.'
);
}
// --- Dependency Injection Setup ---
const auditService = new AuditService();
const userService = new UserService();
authService = new AuthService(userService, auditService, JWT_SECRET, JWT_EXPIRES_IN);
const authController = new AuthController(authService, userService);
const ingestionController = new IngestionController();
const archivedEmailController = new ArchivedEmailController();
const storageService = new StorageService();
const storageController = new StorageController(storageService);
const searchService = new SearchService();
const searchController = new SearchController();
const iamService = new IamService();
const iamController = new IamController(iamService);
const settingsService = new SettingsService();
// --- i18next Initialization ---
const initializeI18next = async () => {
const systemSettings = await settingsService.getSystemSettings();
const defaultLanguage = systemSettings?.language || 'en';
logger.info({ language: defaultLanguage }, 'Default language');
await i18next.use(FsBackend).init({
lng: defaultLanguage,
fallbackLng: defaultLanguage,
ns: ['translation'],
defaultNS: 'translation',
backend: {
loadPath: path.resolve(__dirname, '../locales/{{lng}}/{{ns}}.json'),
},
});
};
// Initialize i18next
await initializeI18next();
logger.info({}, 'i18next initialized');
// Configure the Meilisearch index on startup
logger.info({}, 'Configuring email index...');
await searchService.configureEmailIndex();
const app = express();
// --- CORS ---
app.use(
cors({
origin: process.env.APP_URL || 'http://localhost:3000',
credentials: true,
})
);
// Trust the proxy to get the real IP address of the client.
// This is important for audit logging and security.
app.set('trust proxy', true);
// --- Routes ---
const authRouter = createAuthRouter(authController);
const ingestionRouter = createIngestionRouter(ingestionController, authService);
const archivedEmailRouter = createArchivedEmailRouter(archivedEmailController, authService);
const storageRouter = createStorageRouter(storageController, authService);
const searchRouter = createSearchRouter(searchController, authService);
const dashboardRouter = createDashboardRouter(authService);
const iamRouter = createIamRouter(iamController, authService);
const uploadRouter = createUploadRouter(authService);
const userRouter = createUserRouter(authService);
const settingsRouter = createSettingsRouter(authService);
const apiKeyRouter = apiKeyRoutes(authService);
const integrityRouter = integrityRoutes(authService);
const jobsRouter = createJobsRouter(authService);
// Middleware for all other routes
app.use((req, res, next) => {
// exclude certain API endpoints from the rate limiter, for example status, system settings
const excludedPatterns = [/^\/v\d+\/auth\/status$/, /^\/v\d+\/settings\/system$/];
for (const pattern of excludedPatterns) {
if (pattern.test(req.path)) {
return next();
}
}
rateLimiter(req, res, next);
});
app.use(express.json());
app.use(express.urlencoded({ extended: true }));
// i18n middleware
app.use(i18nextMiddleware.handle(i18next));
app.use(`/${config.api.version}/auth`, authRouter);
app.use(`/${config.api.version}/iam`, iamRouter);
app.use(`/${config.api.version}/upload`, uploadRouter);
app.use(`/${config.api.version}/ingestion-sources`, ingestionRouter);
app.use(`/${config.api.version}/archived-emails`, archivedEmailRouter);
app.use(`/${config.api.version}/storage`, storageRouter);
app.use(`/${config.api.version}/search`, searchRouter);
app.use(`/${config.api.version}/dashboard`, dashboardRouter);
app.use(`/${config.api.version}/users`, userRouter);
app.use(`/${config.api.version}/settings`, settingsRouter);
app.use(`/${config.api.version}/api-keys`, apiKeyRouter);
app.use(`/${config.api.version}/integrity`, integrityRouter);
app.use(`/${config.api.version}/jobs`, jobsRouter);
// Load all provided extension modules
for (const module of modules) {
await module.initialize(app, authService);
console.log(`🏢 Enterprise module loaded: ${module.name}`);
}
app.get('/', (req, res) => {
res.send('Backend is running!!');
});
console.log('✅ Core OSS modules loaded.');
return app;
}

View File

@@ -9,4 +9,5 @@ export const apiConfig = {
? parseInt(process.env.RATE_LIMIT_MAX_REQUESTS, 10)
: 100, // limit each IP to 100 requests per windowMs
},
version: 'v1',
};

View File

@@ -4,6 +4,7 @@ export const app = {
nodeEnv: process.env.NODE_ENV || 'development',
port: process.env.PORT_BACKEND ? parseInt(process.env.PORT_BACKEND, 10) : 4000,
encryptionKey: process.env.ENCRYPTION_KEY,
isDemo: process.env.IS_DEMO === 'true',
syncFrequency: process.env.SYNC_FREQUENCY || '* * * * *', //default to 1 minute
enableDeletion: process.env.ENABLE_DELETION === 'true',
allInclusiveArchive: process.env.ALL_INCLUSIVE_ARCHIVE === 'true',
};

View File

@@ -1,6 +1,6 @@
import { storage } from './storage';
import { app } from './app';
import { searchConfig } from './search';
import { searchConfig, meiliConfig } from './search';
import { connection as redisConfig } from './redis';
import { apiConfig } from './api';
@@ -8,6 +8,7 @@ export const config = {
storage,
app,
search: searchConfig,
meili: meiliConfig,
redis: redisConfig,
api: apiConfig,
};

View File

@@ -1,15 +1,20 @@
import 'dotenv/config';
import { type ConnectionOptions } from 'bullmq';
/**
* @see https://github.com/taskforcesh/bullmq/blob/master/docs/gitbook/guide/connections.md
*/
const connectionOptions: any = {
const connectionOptions: ConnectionOptions = {
host: process.env.REDIS_HOST || 'localhost',
port: (process.env.REDIS_PORT && parseInt(process.env.REDIS_PORT, 10)) || 6379,
password: process.env.REDIS_PASSWORD,
enableReadyCheck: true,
};
if (process.env.REDIS_USER) {
connectionOptions.username = process.env.REDIS_USER;
}
if (process.env.REDIS_TLS_ENABLED === 'true') {
connectionOptions.tls = {
rejectUnauthorized: false,

View File

@@ -4,3 +4,9 @@ export const searchConfig = {
host: process.env.MEILI_HOST || 'http://127.0.0.1:7700',
apiKey: process.env.MEILI_MASTER_KEY || '',
};
export const meiliConfig = {
indexingBatchSize: process.env.MEILI_INDEXING_BATCH
? parseInt(process.env.MEILI_INDEXING_BATCH)
: 500,
};

View File

@@ -2,9 +2,14 @@ import { StorageConfig } from '@open-archiver/types';
import 'dotenv/config';
const storageType = process.env.STORAGE_TYPE;
const encryptionKey = process.env.STORAGE_ENCRYPTION_KEY;
const openArchiverFolderName = 'open-archiver';
let storageConfig: StorageConfig;
if (encryptionKey && !/^[a-fA-F0-9]{64}$/.test(encryptionKey)) {
throw new Error('STORAGE_ENCRYPTION_KEY must be a 64-character hex string (32 bytes)');
}
if (storageType === 'local') {
if (!process.env.STORAGE_LOCAL_ROOT_PATH) {
throw new Error('STORAGE_LOCAL_ROOT_PATH is not defined in the environment variables');
@@ -13,6 +18,7 @@ if (storageType === 'local') {
type: 'local',
rootPath: process.env.STORAGE_LOCAL_ROOT_PATH,
openArchiverFolderName: openArchiverFolderName,
encryptionKey: encryptionKey,
};
} else if (storageType === 's3') {
if (
@@ -32,6 +38,7 @@ if (storageType === 'local') {
region: process.env.STORAGE_S3_REGION,
forcePathStyle: process.env.STORAGE_S3_FORCE_PATH_STYLE === 'true',
openArchiverFolderName: openArchiverFolderName,
encryptionKey: encryptionKey,
};
} else {
throw new Error(`Invalid STORAGE_TYPE: ${storageType}`);

View File

@@ -1,4 +1,4 @@
import { drizzle } from 'drizzle-orm/postgres-js';
import { drizzle, PostgresJsDatabase } from 'drizzle-orm/postgres-js';
import postgres from 'postgres';
import 'dotenv/config';
@@ -12,3 +12,4 @@ if (!process.env.DATABASE_URL) {
const connectionString = encodeDatabaseUrl(process.env.DATABASE_URL);
const client = postgres(connectionString);
export const db = drizzle(client, { schema });
export type Database = PostgresJsDatabase<typeof schema>;

View File

@@ -0,0 +1,9 @@
CREATE TYPE "public"."audit_log_action" AS ENUM('CREATE', 'READ', 'UPDATE', 'DELETE', 'LOGIN', 'LOGOUT', 'SETUP', 'IMPORT', 'PAUSE', 'SYNC', 'UPLOAD', 'SEARCH', 'DOWNLOAD', 'GENERATE');--> statement-breakpoint
CREATE TYPE "public"."audit_log_target_type" AS ENUM('ApiKey', 'ArchivedEmail', 'Dashboard', 'IngestionSource', 'Role', 'SystemSettings', 'User', 'File');--> statement-breakpoint
ALTER TABLE "audit_logs" ALTER COLUMN "target_type" SET DATA TYPE "public"."audit_log_target_type" USING "target_type"::"public"."audit_log_target_type";--> statement-breakpoint
ALTER TABLE "audit_logs" ADD COLUMN "previous_hash" varchar(64);--> statement-breakpoint
ALTER TABLE "audit_logs" ADD COLUMN "actor_ip" text;--> statement-breakpoint
ALTER TABLE "audit_logs" ADD COLUMN "action_type" "audit_log_action" NOT NULL;--> statement-breakpoint
ALTER TABLE "audit_logs" ADD COLUMN "current_hash" varchar(64) NOT NULL;--> statement-breakpoint
ALTER TABLE "audit_logs" DROP COLUMN "action";--> statement-breakpoint
ALTER TABLE "audit_logs" DROP COLUMN "is_tamper_evident";

View File

@@ -0,0 +1,4 @@
ALTER TABLE "attachments" DROP CONSTRAINT "attachments_content_hash_sha256_unique";--> statement-breakpoint
ALTER TABLE "attachments" ADD COLUMN "ingestion_source_id" uuid;--> statement-breakpoint
ALTER TABLE "attachments" ADD CONSTRAINT "attachments_ingestion_source_id_ingestion_sources_id_fk" FOREIGN KEY ("ingestion_source_id") REFERENCES "public"."ingestion_sources"("id") ON DELETE cascade ON UPDATE no action;--> statement-breakpoint
CREATE UNIQUE INDEX "source_hash_unique" ON "attachments" USING btree ("ingestion_source_id","content_hash_sha256");

View File

@@ -0,0 +1,2 @@
DROP INDEX "source_hash_unique";--> statement-breakpoint
CREATE INDEX "source_hash_idx" ON "attachments" USING btree ("ingestion_source_id","content_hash_sha256");

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,153 +1,174 @@
{
"version": "7",
"dialect": "postgresql",
"entries": [
{
"idx": 0,
"version": "7",
"when": 1752225352591,
"tag": "0000_amusing_namora",
"breakpoints": true
},
{
"idx": 1,
"version": "7",
"when": 1752326803882,
"tag": "0001_odd_night_thrasher",
"breakpoints": true
},
{
"idx": 2,
"version": "7",
"when": 1752332648392,
"tag": "0002_lethal_quentin_quire",
"breakpoints": true
},
{
"idx": 3,
"version": "7",
"when": 1752332967084,
"tag": "0003_petite_wrecker",
"breakpoints": true
},
{
"idx": 4,
"version": "7",
"when": 1752606108876,
"tag": "0004_sleepy_paper_doll",
"breakpoints": true
},
{
"idx": 5,
"version": "7",
"when": 1752606327253,
"tag": "0005_chunky_sue_storm",
"breakpoints": true
},
{
"idx": 6,
"version": "7",
"when": 1753112018514,
"tag": "0006_majestic_caretaker",
"breakpoints": true
},
{
"idx": 7,
"version": "7",
"when": 1753190159356,
"tag": "0007_handy_archangel",
"breakpoints": true
},
{
"idx": 8,
"version": "7",
"when": 1753370737317,
"tag": "0008_eminent_the_spike",
"breakpoints": true
},
{
"idx": 9,
"version": "7",
"when": 1754337938241,
"tag": "0009_late_lenny_balinger",
"breakpoints": true
},
{
"idx": 10,
"version": "7",
"when": 1754420780849,
"tag": "0010_perpetual_lightspeed",
"breakpoints": true
},
{
"idx": 11,
"version": "7",
"when": 1754422064158,
"tag": "0011_tan_blackheart",
"breakpoints": true
},
{
"idx": 12,
"version": "7",
"when": 1754476962901,
"tag": "0012_warm_the_stranger",
"breakpoints": true
},
{
"idx": 13,
"version": "7",
"when": 1754659373517,
"tag": "0013_classy_talkback",
"breakpoints": true
},
{
"idx": 14,
"version": "7",
"when": 1754831765718,
"tag": "0014_foamy_vapor",
"breakpoints": true
},
{
"idx": 15,
"version": "7",
"when": 1755443936046,
"tag": "0015_wakeful_norman_osborn",
"breakpoints": true
},
{
"idx": 16,
"version": "7",
"when": 1755780572342,
"tag": "0016_lonely_mariko_yashida",
"breakpoints": true
},
{
"idx": 17,
"version": "7",
"when": 1755961566627,
"tag": "0017_tranquil_shooting_star",
"breakpoints": true
},
{
"idx": 18,
"version": "7",
"when": 1756911118035,
"tag": "0018_flawless_owl",
"breakpoints": true
},
{
"idx": 19,
"version": "7",
"when": 1756937533843,
"tag": "0019_confused_scream",
"breakpoints": true
},
{
"idx": 20,
"version": "7",
"when": 1757860242528,
"tag": "0020_panoramic_wolverine",
"breakpoints": true
}
]
}
"version": "7",
"dialect": "postgresql",
"entries": [
{
"idx": 0,
"version": "7",
"when": 1752225352591,
"tag": "0000_amusing_namora",
"breakpoints": true
},
{
"idx": 1,
"version": "7",
"when": 1752326803882,
"tag": "0001_odd_night_thrasher",
"breakpoints": true
},
{
"idx": 2,
"version": "7",
"when": 1752332648392,
"tag": "0002_lethal_quentin_quire",
"breakpoints": true
},
{
"idx": 3,
"version": "7",
"when": 1752332967084,
"tag": "0003_petite_wrecker",
"breakpoints": true
},
{
"idx": 4,
"version": "7",
"when": 1752606108876,
"tag": "0004_sleepy_paper_doll",
"breakpoints": true
},
{
"idx": 5,
"version": "7",
"when": 1752606327253,
"tag": "0005_chunky_sue_storm",
"breakpoints": true
},
{
"idx": 6,
"version": "7",
"when": 1753112018514,
"tag": "0006_majestic_caretaker",
"breakpoints": true
},
{
"idx": 7,
"version": "7",
"when": 1753190159356,
"tag": "0007_handy_archangel",
"breakpoints": true
},
{
"idx": 8,
"version": "7",
"when": 1753370737317,
"tag": "0008_eminent_the_spike",
"breakpoints": true
},
{
"idx": 9,
"version": "7",
"when": 1754337938241,
"tag": "0009_late_lenny_balinger",
"breakpoints": true
},
{
"idx": 10,
"version": "7",
"when": 1754420780849,
"tag": "0010_perpetual_lightspeed",
"breakpoints": true
},
{
"idx": 11,
"version": "7",
"when": 1754422064158,
"tag": "0011_tan_blackheart",
"breakpoints": true
},
{
"idx": 12,
"version": "7",
"when": 1754476962901,
"tag": "0012_warm_the_stranger",
"breakpoints": true
},
{
"idx": 13,
"version": "7",
"when": 1754659373517,
"tag": "0013_classy_talkback",
"breakpoints": true
},
{
"idx": 14,
"version": "7",
"when": 1754831765718,
"tag": "0014_foamy_vapor",
"breakpoints": true
},
{
"idx": 15,
"version": "7",
"when": 1755443936046,
"tag": "0015_wakeful_norman_osborn",
"breakpoints": true
},
{
"idx": 16,
"version": "7",
"when": 1755780572342,
"tag": "0016_lonely_mariko_yashida",
"breakpoints": true
},
{
"idx": 17,
"version": "7",
"when": 1755961566627,
"tag": "0017_tranquil_shooting_star",
"breakpoints": true
},
{
"idx": 18,
"version": "7",
"when": 1756911118035,
"tag": "0018_flawless_owl",
"breakpoints": true
},
{
"idx": 19,
"version": "7",
"when": 1756937533843,
"tag": "0019_confused_scream",
"breakpoints": true
},
{
"idx": 20,
"version": "7",
"when": 1757860242528,
"tag": "0020_panoramic_wolverine",
"breakpoints": true
},
{
"idx": 21,
"version": "7",
"when": 1759412986134,
"tag": "0021_nosy_veda",
"breakpoints": true
},
{
"idx": 22,
"version": "7",
"when": 1759701622932,
"tag": "0022_complete_triton",
"breakpoints": true
},
{
"idx": 23,
"version": "7",
"when": 1760354094610,
"tag": "0023_swift_swordsman",
"breakpoints": true
}
]
}

View File

@@ -7,3 +7,5 @@ export * from './schema/ingestion-sources';
export * from './schema/users';
export * from './schema/system-settings';
export * from './schema/api-keys';
export * from './schema/audit-logs';
export * from './schema/enums';

View File

@@ -1,15 +1,23 @@
import { relations } from 'drizzle-orm';
import { pgTable, text, uuid, bigint, primaryKey } from 'drizzle-orm/pg-core';
import { pgTable, text, uuid, bigint, primaryKey, index } from 'drizzle-orm/pg-core';
import { archivedEmails } from './archived-emails';
import { ingestionSources } from './ingestion-sources';
export const attachments = pgTable('attachments', {
id: uuid('id').primaryKey().defaultRandom(),
filename: text('filename').notNull(),
mimeType: text('mime_type'),
sizeBytes: bigint('size_bytes', { mode: 'number' }).notNull(),
contentHashSha256: text('content_hash_sha256').notNull().unique(),
storagePath: text('storage_path').notNull(),
});
export const attachments = pgTable(
'attachments',
{
id: uuid('id').primaryKey().defaultRandom(),
filename: text('filename').notNull(),
mimeType: text('mime_type'),
sizeBytes: bigint('size_bytes', { mode: 'number' }).notNull(),
contentHashSha256: text('content_hash_sha256').notNull(),
storagePath: text('storage_path').notNull(),
ingestionSourceId: uuid('ingestion_source_id').references(() => ingestionSources.id, {
onDelete: 'cascade',
}),
},
(table) => [index('source_hash_idx').on(table.ingestionSourceId, table.contentHashSha256)]
);
export const emailAttachments = pgTable(
'email_attachments',

View File

@@ -1,12 +1,34 @@
import { bigserial, boolean, jsonb, pgTable, text, timestamp } from 'drizzle-orm/pg-core';
import { bigserial, jsonb, pgTable, text, timestamp, varchar } from 'drizzle-orm/pg-core';
import { auditLogActionEnum, auditLogTargetTypeEnum } from './enums';
export const auditLogs = pgTable('audit_logs', {
// A unique, sequential, and gapless primary key for ordering.
id: bigserial('id', { mode: 'number' }).primaryKey(),
// The SHA-256 hash of the preceding log entry's `currentHash`.
previousHash: varchar('previous_hash', { length: 64 }),
// A high-precision, UTC timestamp of when the event occurred.
timestamp: timestamp('timestamp', { withTimezone: true }).notNull().defaultNow(),
// A stable identifier for the actor who performed the action.
actorIdentifier: text('actor_identifier').notNull(),
action: text('action').notNull(),
targetType: text('target_type'),
// The IP address from which the action was initiated.
actorIp: text('actor_ip'),
// A standardized, machine-readable identifier for the event.
actionType: auditLogActionEnum('action_type').notNull(),
// The type of resource that was affected by the action.
targetType: auditLogTargetTypeEnum('target_type'),
// The unique identifier of the affected resource.
targetId: text('target_id'),
// A JSON object containing specific, contextual details of the event.
details: jsonb('details'),
isTamperEvident: boolean('is_tamper_evident').default(false),
// The SHA-256 hash of this entire log entry's contents.
currentHash: varchar('current_hash', { length: 64 }).notNull(),
});

View File

@@ -0,0 +1,5 @@
import { pgEnum } from 'drizzle-orm/pg-core';
import { AuditLogActions, AuditLogTargetTypes } from '@open-archiver/types';
export const auditLogActionEnum = pgEnum('audit_log_action', AuditLogActions);
export const auditLogTargetTypeEnum = pgEnum('audit_log_target_type', AuditLogTargetTypes);

View File

@@ -0,0 +1,9 @@
import { config } from '../config';
import i18next from 'i18next';
export function checkDeletionEnabled() {
if (!config.app.enableDeletion) {
const errorMessage = i18next.t('Deletion is disabled for this instance.');
throw new Error(errorMessage);
}
}

View File

@@ -1,7 +1,10 @@
import PDFParser from 'pdf2json';
import mammoth from 'mammoth';
import xlsx from 'xlsx';
import { logger } from '../config/logger';
import { OcrService } from '../services/OcrService';
// Legacy PDF extraction (with improved memory management)
function extractTextFromPdf(buffer: Buffer): Promise<string> {
return new Promise((resolve) => {
const pdfParser = new PDFParser(null, true);
@@ -10,28 +13,57 @@ function extractTextFromPdf(buffer: Buffer): Promise<string> {
const finish = (text: string) => {
if (completed) return;
completed = true;
pdfParser.removeAllListeners();
// explicit cleanup
try {
pdfParser.removeAllListeners();
} catch (e) {
// Ignore cleanup errors
}
resolve(text);
};
pdfParser.on('pdfParser_dataError', () => finish(''));
pdfParser.on('pdfParser_dataReady', () => finish(pdfParser.getRawTextContent()));
pdfParser.on('pdfParser_dataError', (err: any) => {
logger.warn('PDF parsing error:', err?.parserError || 'Unknown error');
finish('');
});
pdfParser.on('pdfParser_dataReady', () => {
try {
const text = pdfParser.getRawTextContent();
finish(text || '');
} catch (err) {
logger.warn('Error getting PDF text content:', err);
finish('');
}
});
try {
pdfParser.parseBuffer(buffer);
} catch (err) {
console.error('Error parsing PDF buffer', err);
logger.error('Error parsing PDF buffer:', err);
finish('');
}
// Prevent hanging if the parser never emits events
setTimeout(() => finish(''), 10000);
// reduced Timeout for better performance
// setTimeout(() => {
// logger.warn('PDF parsing timed out');
// finish('');
// }, 5000);
});
}
export async function extractText(buffer: Buffer, mimeType: string): Promise<string> {
// Legacy text extraction for various formats
async function extractTextLegacy(buffer: Buffer, mimeType: string): Promise<string> {
try {
if (mimeType === 'application/pdf') {
// Check PDF size (memory protection)
if (buffer.length > 50 * 1024 * 1024) {
// 50MB Limit
logger.warn('PDF too large for legacy extraction, skipping');
return '';
}
return await extractTextFromPdf(buffer);
}
@@ -50,7 +82,7 @@ export async function extractText(buffer: Buffer, mimeType: string): Promise<str
const sheetText = xlsx.utils.sheet_to_txt(sheet);
fullText += sheetText + '\n';
}
return fullText;
return fullText.trim();
}
if (
@@ -60,11 +92,56 @@ export async function extractText(buffer: Buffer, mimeType: string): Promise<str
) {
return buffer.toString('utf-8');
}
return '';
} catch (error) {
console.error(`Error extracting text from attachment with MIME type ${mimeType}:`, error);
return ''; // Return empty string on failure
logger.error(`Error extracting text from attachment with MIME type ${mimeType}:`, error);
// Force garbage collection if available
if (global.gc) {
global.gc();
}
return '';
}
}
// Main extraction function
export async function extractText(buffer: Buffer, mimeType: string): Promise<string> {
// Input validation
if (!buffer || buffer.length === 0) {
return '';
}
console.warn(`Unsupported MIME type for text extraction: ${mimeType}`);
return ''; // Return empty string for unsupported types
if (!mimeType) {
logger.warn('No MIME type provided for text extraction');
return '';
}
// General size limit
const maxSize = process.env.TIKA_URL ? 100 * 1024 * 1024 : 50 * 1024 * 1024; // 100MB for Tika, 50MB for Legacy
if (buffer.length > maxSize) {
logger.warn(
`File too large for text extraction: ${buffer.length} bytes (limit: ${maxSize})`
);
return '';
}
// Decide between Tika and legacy
const tikaUrl = process.env.TIKA_URL;
if (tikaUrl) {
// Tika decides what it can parse
logger.debug(`Using Tika for text extraction: ${mimeType}`);
const ocrService = new OcrService();
try {
return await ocrService.extractTextWithTika(buffer, mimeType);
} catch (error) {
logger.error({ error }, 'OCR text extraction failed, returning empty string');
return '';
}
} else {
// extract using legacy mode
return await extractTextLegacy(buffer, mimeType);
}
}

View File

@@ -1,155 +1,10 @@
import express from 'express';
import dotenv from 'dotenv';
import { AuthController } from './api/controllers/auth.controller';
import { IngestionController } from './api/controllers/ingestion.controller';
import { ArchivedEmailController } from './api/controllers/archived-email.controller';
import { StorageController } from './api/controllers/storage.controller';
import { SearchController } from './api/controllers/search.controller';
import { IamController } from './api/controllers/iam.controller';
import { requireAuth } from './api/middleware/requireAuth';
import { createAuthRouter } from './api/routes/auth.routes';
import { createIamRouter } from './api/routes/iam.routes';
import { createIngestionRouter } from './api/routes/ingestion.routes';
import { createArchivedEmailRouter } from './api/routes/archived-email.routes';
import { createStorageRouter } from './api/routes/storage.routes';
import { createSearchRouter } from './api/routes/search.routes';
import { createDashboardRouter } from './api/routes/dashboard.routes';
import { createUploadRouter } from './api/routes/upload.routes';
import { createUserRouter } from './api/routes/user.routes';
import { createSettingsRouter } from './api/routes/settings.routes';
import { apiKeyRoutes } from './api/routes/api-key.routes';
import { AuthService } from './services/AuthService';
import { UserService } from './services/UserService';
import { IamService } from './services/IamService';
import { StorageService } from './services/StorageService';
import { SearchService } from './services/SearchService';
import { SettingsService } from './services/SettingsService';
import i18next from 'i18next';
import FsBackend from 'i18next-fs-backend';
import i18nextMiddleware from 'i18next-http-middleware';
import path from 'path';
import { logger } from './config/logger';
import { rateLimiter } from './api/middleware/rateLimiter';
// Load environment variables
dotenv.config();
// --- Environment Variable Validation ---
const { PORT_BACKEND, JWT_SECRET, JWT_EXPIRES_IN } = process.env;
if (!PORT_BACKEND || !JWT_SECRET || !JWT_EXPIRES_IN) {
throw new Error(
'Missing required environment variables for the backend: PORT_BACKEND, JWT_SECRET, JWT_EXPIRES_IN.'
);
}
// --- i18next Initialization ---
const initializeI18next = async () => {
const systemSettings = await settingsService.getSystemSettings();
const defaultLanguage = systemSettings?.language || 'en';
logger.info({ language: defaultLanguage }, 'Default language');
await i18next.use(FsBackend).init({
lng: defaultLanguage,
fallbackLng: defaultLanguage,
ns: ['translation'],
defaultNS: 'translation',
backend: {
loadPath: path.resolve(__dirname, './locales/{{lng}}/{{ns}}.json'),
},
});
};
// --- Dependency Injection Setup ---
const userService = new UserService();
const authService = new AuthService(userService, JWT_SECRET, JWT_EXPIRES_IN);
const authController = new AuthController(authService, userService);
const ingestionController = new IngestionController();
const archivedEmailController = new ArchivedEmailController();
const storageService = new StorageService();
const storageController = new StorageController(storageService);
const searchService = new SearchService();
const searchController = new SearchController();
const iamService = new IamService();
const iamController = new IamController(iamService);
const settingsService = new SettingsService();
// --- Express App Initialization ---
const app = express();
// --- Routes ---
const authRouter = createAuthRouter(authController);
const ingestionRouter = createIngestionRouter(ingestionController, authService);
const archivedEmailRouter = createArchivedEmailRouter(archivedEmailController, authService);
const storageRouter = createStorageRouter(storageController, authService);
const searchRouter = createSearchRouter(searchController, authService);
const dashboardRouter = createDashboardRouter(authService);
const iamRouter = createIamRouter(iamController, authService);
const uploadRouter = createUploadRouter(authService);
const userRouter = createUserRouter(authService);
const settingsRouter = createSettingsRouter(authService);
const apiKeyRouter = apiKeyRoutes(authService);
// upload route is added before middleware because it doesn't use the json middleware.
app.use('/v1/upload', uploadRouter);
// Middleware for all other routes
app.use((req, res, next) => {
// exclude certain API endpoints from the rate limiter, for example status, system settings
const excludedPatterns = [/^\/v\d+\/auth\/status$/, /^\/v\d+\/settings\/system$/];
for (const pattern of excludedPatterns) {
if (pattern.test(req.path)) {
return next();
}
}
rateLimiter(req, res, next);
});
app.use(express.json());
app.use(express.urlencoded({ extended: true }));
// i18n middleware
app.use(i18nextMiddleware.handle(i18next));
app.use('/v1/auth', authRouter);
app.use('/v1/iam', iamRouter);
app.use('/v1/ingestion-sources', ingestionRouter);
app.use('/v1/archived-emails', archivedEmailRouter);
app.use('/v1/storage', storageRouter);
app.use('/v1/search', searchRouter);
app.use('/v1/dashboard', dashboardRouter);
app.use('/v1/users', userRouter);
app.use('/v1/settings', settingsRouter);
app.use('/v1/api-keys', apiKeyRouter);
// Example of a protected route
app.get('/v1/protected', requireAuth(authService), (req, res) => {
res.json({
message: 'You have accessed a protected route!',
user: req.user, // The user payload is attached by the requireAuth middleware
});
});
app.get('/', (req, res) => {
res.send('Backend is running!');
});
// --- Server Start ---
const startServer = async () => {
try {
// Initialize i18next
await initializeI18next();
logger.info({}, 'i18next initialized');
// Configure the Meilisearch index on startup
logger.info({}, 'Configuring email index...');
await searchService.configureEmailIndex();
app.listen(PORT_BACKEND, () => {
logger.info({}, `Backend listening at http://localhost:${PORT_BACKEND}`);
});
} catch (error) {
logger.error({ error }, 'Failed to start the server:', error);
process.exit(1);
}
};
startServer();
export { createServer, ArchiverModule } from './api/server';
export { logger } from './config/logger';
export { config } from './config';
export * from './services/AuthService';
export * from './services/AuditService';
export * from './api/middleware/requireAuth';
export * from './api/middleware/requirePermission';
export { db } from './database';
export * as drizzleOrm from 'drizzle-orm';
export * from './database/schema';

View File

@@ -3,14 +3,15 @@ import { IndexingService } from '../../services/IndexingService';
import { SearchService } from '../../services/SearchService';
import { StorageService } from '../../services/StorageService';
import { DatabaseService } from '../../services/DatabaseService';
import { PendingEmail } from '@open-archiver/types';
const searchService = new SearchService();
const storageService = new StorageService();
const databaseService = new DatabaseService();
const indexingService = new IndexingService(databaseService, searchService, storageService);
export default async function (job: Job<{ emailId: string }>) {
const { emailId } = job.data;
console.log(`Indexing email with ID: ${emailId}`);
await indexingService.indexEmailById(emailId);
export default async function (job: Job<{ emails: PendingEmail[] }>) {
const { emails } = job.data;
console.log(`Indexing email batch with ${emails.length} emails`);
await indexingService.indexEmailBatch(emails);
}

View File

@@ -1,9 +1,19 @@
import { Job } from 'bullmq';
import { IProcessMailboxJob, SyncState, ProcessMailboxError } from '@open-archiver/types';
import {
IProcessMailboxJob,
SyncState,
ProcessMailboxError,
PendingEmail,
} from '@open-archiver/types';
import { IngestionService } from '../../services/IngestionService';
import { logger } from '../../config/logger';
import { EmailProviderFactory } from '../../services/EmailProviderFactory';
import { StorageService } from '../../services/StorageService';
import { IndexingService } from '../../services/IndexingService';
import { SearchService } from '../../services/SearchService';
import { DatabaseService } from '../../services/DatabaseService';
import { config } from '../../config';
import { indexingQueue } from '../queues';
/**
* This processor handles the ingestion of emails for a single user's mailbox.
@@ -15,9 +25,15 @@ import { StorageService } from '../../services/StorageService';
*/
export const processMailboxProcessor = async (job: Job<IProcessMailboxJob, SyncState, string>) => {
const { ingestionSourceId, userEmail } = job.data;
const BATCH_SIZE: number = config.meili.indexingBatchSize;
let emailBatch: PendingEmail[] = [];
logger.info({ ingestionSourceId, userEmail }, `Processing mailbox for user`);
const searchService = new SearchService();
const storageService = new StorageService();
const databaseService = new DatabaseService();
try {
const source = await IngestionService.findById(ingestionSourceId);
if (!source) {
@@ -26,22 +42,48 @@ export const processMailboxProcessor = async (job: Job<IProcessMailboxJob, SyncS
const connector = EmailProviderFactory.createConnector(source);
const ingestionService = new IngestionService();
const storageService = new StorageService();
// Pass the sync state for the entire source, the connector will handle per-user logic if necessary
for await (const email of connector.fetchEmails(userEmail, source.syncState)) {
// Create a callback to check for duplicates without fetching full email content
const checkDuplicate = async (messageId: string) => {
return await IngestionService.doesEmailExist(messageId, ingestionSourceId);
};
for await (const email of connector.fetchEmails(
userEmail,
source.syncState,
checkDuplicate
)) {
if (email) {
await ingestionService.processEmail(email, source, storageService, userEmail);
const processedEmail = await ingestionService.processEmail(
email,
source,
storageService,
userEmail
);
if (processedEmail) {
emailBatch.push(processedEmail);
if (emailBatch.length >= BATCH_SIZE) {
await indexingQueue.add('index-email-batch', { emails: emailBatch });
emailBatch = [];
}
}
}
}
if (emailBatch.length > 0) {
await indexingQueue.add('index-email-batch', { emails: emailBatch });
emailBatch = [];
}
const newSyncState = connector.getUpdatedSyncState(userEmail);
logger.info({ ingestionSourceId, userEmail }, `Finished processing mailbox for user`);
// Return the new sync state to be aggregated by the parent flow
return newSyncState;
} catch (error) {
if (emailBatch.length > 0) {
await indexingQueue.add('index-email-batch', { emails: emailBatch });
emailBatch = [];
}
logger.error({ err: error, ingestionSourceId, userEmail }, 'Error processing mailbox');
const errorMessage = error instanceof Error ? error.message : 'An unknown error occurred';
const processMailboxError: ProcessMailboxError = {

View File

@@ -51,7 +51,7 @@ export default async (job: Job<ISyncCycleFinishedJob, any, string>) => {
const finalSyncState = deepmerge(
...successfulJobs.filter((s) => s && Object.keys(s).length > 0)
);
) as SyncState;
const source = await IngestionService.findById(ingestionSourceId);
let status: IngestionStatus = 'active';
@@ -63,7 +63,9 @@ export default async (job: Job<ISyncCycleFinishedJob, any, string>) => {
let message: string;
// Check for a specific rate-limit message from the successful jobs
const rateLimitMessage = successfulJobs.find((j) => j.statusMessage)?.statusMessage;
const rateLimitMessage = successfulJobs.find(
(j) => j.statusMessage && j.statusMessage.includes('rate limit')
)?.statusMessage;
if (failedJobs.length > 0) {
status = 'error';

View File

@@ -8,6 +8,7 @@ const scheduleContinuousSync = async () => {
'schedule-continuous-sync',
{},
{
jobId: 'schedule-continuous-sync',
repeat: {
pattern: config.app.syncFrequency,
},

View File

@@ -0,0 +1,69 @@
{
"auth": {
"setup": {
"allFieldsRequired": "Изискват се поща, парола и име",
"alreadyCompleted": "Настройката вече е завършена."
},
"login": {
"emailAndPasswordRequired": "Изискват се поща и парола",
"invalidCredentials": "Невалидни идентификационни данни"
}
},
"errors": {
"internalServerError": "Възникна вътрешна грешка в сървъра",
"demoMode": "Тази операция не е разрешена в демо режим",
"unauthorized": "Неоторизирано",
"unknown": "Възникна неизвестна грешка",
"noPermissionToAction": "Нямате разрешение да извършите текущото действие."
},
"user": {
"notFound": "Потребителят не е открит",
"cannotDeleteOnlyUser": "Опитвате се да изтриете единствения потребител в базата данни, това не е позволено.",
"requiresSuperAdminRole": "За управление на потребители е необходима роля на супер администратор."
},
"iam": {
"failedToGetRoles": "Неуспешно получаване на роли.",
"roleNotFound": "Ролята не е намерена.",
"failedToGetRole": "Неуспешно получаване на роля.",
"missingRoleFields": "Липсват задължителни полета: име и политика.",
"invalidPolicy": "Невалидно твърдение за политика:",
"failedToCreateRole": "Създаването на роля неуспешно.",
"failedToDeleteRole": "Изтриването на роля неуспешно.",
"missingUpdateFields": "Липсват полета за актуализиране: име или политики.",
"failedToUpdateRole": "Актуализирането на ролята неуспешно.",
"requiresSuperAdminRole": "За управление на роли е необходима роля на супер администратор."
},
"settings": {
"failedToRetrieve": "Неуспешно извличане на настройките",
"failedToUpdate": "Неуспешно актуализиране на настройките",
"noPermissionToUpdate": "Нямате разрешение да актуализирате системните настройки."
},
"dashboard": {
"permissionRequired": "Необходимо ви е разрешение за четене на таблото, за да видите данните от него."
},
"ingestion": {
"failedToCreate": "Създаването на източник за приемане не бе успешно поради грешка при свързване.",
"notFound": "Източникът за приемане не е намерен",
"initialImportTriggered": "Първоначалният импорт е задействан успешно.",
"forceSyncTriggered": "Принудителното синхронизиране е задействано успешно."
},
"archivedEmail": {
"notFound": "Архивираната поща не е намерена"
},
"search": {
"keywordsRequired": "Ключовите думи са задължителни"
},
"storage": {
"filePathRequired": "Пътят към файла е задължителен",
"invalidFilePath": "Невалиден път към файла",
"fileNotFound": "Файлът не е намерен",
"downloadError": "Грешка при изтегляне на файла"
},
"apiKeys": {
"generateSuccess": "API ключът е генериран успешно.",
"deleteSuccess": "API ключът е успешно изтрит."
},
"api": {
"requestBodyInvalid": "Невалидно съдържание на заявката."
}
}

View File

@@ -14,7 +14,8 @@
"demoMode": "Dieser Vorgang ist im Demo-Modus nicht zulässig.",
"unauthorized": "Unbefugt",
"unknown": "Ein unbekannter Fehler ist aufgetreten",
"noPermissionToAction": "Sie haben keine Berechtigung, die aktuelle Aktion auszuführen."
"noPermissionToAction": "Sie haben keine Berechtigung, die aktuelle Aktion auszuführen.",
"deletion_disabled": "Das Löschen ist für diese Instanz deaktiviert."
},
"user": {
"notFound": "Benutzer nicht gefunden",

View File

@@ -14,7 +14,8 @@
"demoMode": "This operation is not allowed in demo mode.",
"unauthorized": "Unauthorized",
"unknown": "An unknown error occurred",
"noPermissionToAction": "You don't have the permission to perform the current action."
"noPermissionToAction": "You don't have the permission to perform the current action.",
"deletion_disabled": "Deletion is disabled for this instance."
},
"user": {
"notFound": "User not found",

View File

@@ -3,28 +3,47 @@ import { db } from '../database';
import { apiKeys } from '../database/schema/api-keys';
import { CryptoService } from './CryptoService';
import { and, eq } from 'drizzle-orm';
import { ApiKey } from '@open-archiver/types';
import { ApiKey, User } from '@open-archiver/types';
import { AuditService } from './AuditService';
export class ApiKeyService {
private static auditService = new AuditService();
public static async generate(
userId: string,
name: string,
expiresInDays: number
expiresInDays: number,
actor: User,
actorIp: string
): Promise<string> {
const key = randomBytes(32).toString('hex');
const expiresAt = new Date();
expiresAt.setDate(expiresAt.getDate() + expiresInDays);
const keyHash = createHash('sha256').update(key).digest('hex');
await db.insert(apiKeys).values({
userId,
name,
key: CryptoService.encrypt(key),
keyHash,
expiresAt,
try {
await db.insert(apiKeys).values({
userId,
name,
key: CryptoService.encrypt(key),
keyHash,
expiresAt,
});
await this.auditService.createAuditLog({
actorIdentifier: actor.id,
actionType: 'GENERATE',
targetType: 'ApiKey',
targetId: name,
actorIp,
details: {
keyName: name,
},
});
return key;
} catch (error) {
throw error;
}
}
public static async getKeys(userId: string): Promise<ApiKey[]> {
@@ -46,8 +65,19 @@ export class ApiKeyService {
.filter((k): k is NonNullable<typeof k> => k !== null);
}
public static async deleteKey(id: string, userId: string) {
public static async deleteKey(id: string, userId: string, actor: User, actorIp: string) {
const [key] = await db.select().from(apiKeys).where(eq(apiKeys.id, id));
await db.delete(apiKeys).where(and(eq(apiKeys.id, id), eq(apiKeys.userId, userId)));
await this.auditService.createAuditLog({
actorIdentifier: actor.id,
actionType: 'DELETE',
targetType: 'ApiKey',
targetId: id,
actorIp,
details: {
keyName: key?.name,
},
});
}
/**
*

View File

@@ -17,6 +17,9 @@ import type {
import { StorageService } from './StorageService';
import { SearchService } from './SearchService';
import type { Readable } from 'stream';
import { AuditService } from './AuditService';
import { User } from '@open-archiver/types';
import { checkDeletionEnabled } from '../helpers/deletionGuard';
interface DbRecipients {
to: { name: string; address: string }[];
@@ -34,6 +37,7 @@ async function streamToBuffer(stream: Readable): Promise<Buffer> {
}
export class ArchivedEmailService {
private static auditService = new AuditService();
private static mapRecipients(dbRecipients: unknown): Recipient[] {
const { to = [], cc = [], bcc = [] } = dbRecipients as DbRecipients;
@@ -98,7 +102,9 @@ export class ArchivedEmailService {
public static async getArchivedEmailById(
emailId: string,
userId: string
userId: string,
actor: User,
actorIp: string
): Promise<ArchivedEmail | null> {
const email = await db.query.archivedEmails.findFirst({
where: eq(archivedEmails.id, emailId),
@@ -118,6 +124,15 @@ export class ArchivedEmailService {
return null;
}
await this.auditService.createAuditLog({
actorIdentifier: actor.id,
actionType: 'READ',
targetType: 'ArchivedEmail',
targetId: emailId,
actorIp,
details: {},
});
let threadEmails: ThreadEmail[] = [];
if (email.threadId) {
@@ -179,7 +194,12 @@ export class ArchivedEmailService {
return mappedEmail;
}
public static async deleteArchivedEmail(emailId: string): Promise<void> {
public static async deleteArchivedEmail(
emailId: string,
actor: User,
actorIp: string
): Promise<void> {
checkDeletionEnabled();
const [email] = await db
.select()
.from(archivedEmails)
@@ -193,7 +213,7 @@ export class ArchivedEmailService {
// Load and handle attachments before deleting the email itself
if (email.hasAttachments) {
const emailAttachmentsResult = await db
const attachmentsForEmail = await db
.select({
attachmentId: attachments.id,
storagePath: attachments.storagePath,
@@ -203,37 +223,33 @@ export class ArchivedEmailService {
.where(eq(emailAttachments.emailId, emailId));
try {
for (const attachment of emailAttachmentsResult) {
const [refCount] = await db
.select({ count: count(emailAttachments.emailId) })
for (const attachment of attachmentsForEmail) {
// Delete the link between this email and the attachment record.
await db
.delete(emailAttachments)
.where(
and(
eq(emailAttachments.emailId, emailId),
eq(emailAttachments.attachmentId, attachment.attachmentId)
)
);
// Check if any other emails are linked to this attachment record.
const [recordRefCount] = await db
.select({ count: count() })
.from(emailAttachments)
.where(eq(emailAttachments.attachmentId, attachment.attachmentId));
if (refCount.count === 1) {
// If no other emails are linked to this record, it's safe to delete it and the file.
if (recordRefCount.count === 0) {
await storage.delete(attachment.storagePath);
await db
.delete(emailAttachments)
.where(
and(
eq(emailAttachments.emailId, emailId),
eq(emailAttachments.attachmentId, attachment.attachmentId)
)
);
await db
.delete(attachments)
.where(eq(attachments.id, attachment.attachmentId));
} else {
await db
.delete(emailAttachments)
.where(
and(
eq(emailAttachments.emailId, emailId),
eq(emailAttachments.attachmentId, attachment.attachmentId)
)
);
}
}
} catch {
} catch (error) {
console.error('Failed to delete email attachments', error);
throw new Error('Failed to delete email attachments');
}
}
@@ -245,5 +261,16 @@ export class ArchivedEmailService {
await searchService.deleteDocuments('emails', [emailId]);
await db.delete(archivedEmails).where(eq(archivedEmails.id, emailId));
await this.auditService.createAuditLog({
actorIdentifier: actor.id,
actionType: 'DELETE',
targetType: 'ArchivedEmail',
targetId: emailId,
actorIp,
details: {
reason: 'ManualDeletion',
},
});
}
}

View File

@@ -0,0 +1,199 @@
import { db, Database } from '../database';
import * as schema from '../database/schema';
import {
AuditLogEntry,
CreateAuditLogEntry,
GetAuditLogsOptions,
GetAuditLogsResponse,
} from '@open-archiver/types';
import { desc, sql, asc, and, gte, lte, eq } from 'drizzle-orm';
import { createHash } from 'crypto';
export class AuditService {
private sanitizeObject(obj: any): any {
if (obj === null || typeof obj !== 'object') {
return obj;
}
if (Array.isArray(obj)) {
return obj.map((item) => this.sanitizeObject(item));
}
const sanitizedObj: { [key: string]: any } = {};
for (const key in obj) {
if (Object.prototype.hasOwnProperty.call(obj, key)) {
const value = obj[key];
sanitizedObj[key] = value === undefined ? null : this.sanitizeObject(value);
}
}
return sanitizedObj;
}
public async createAuditLog(entry: CreateAuditLogEntry) {
return db.transaction(async (tx) => {
// Lock the table to prevent race conditions
await tx.execute(sql`LOCK TABLE audit_logs IN EXCLUSIVE MODE`);
const sanitizedEntry = this.sanitizeObject(entry);
const previousHash = await this.getLatestHash(tx);
const newEntry = {
...sanitizedEntry,
previousHash,
timestamp: new Date(),
};
const currentHash = this.calculateHash(newEntry);
const finalEntry = {
...newEntry,
currentHash,
};
await tx.insert(schema.auditLogs).values(finalEntry);
return finalEntry;
});
}
private async getLatestHash(tx: Database): Promise<string | null> {
const [latest] = await tx
.select({
currentHash: schema.auditLogs.currentHash,
})
.from(schema.auditLogs)
.orderBy(desc(schema.auditLogs.id))
.limit(1);
return latest?.currentHash ?? null;
}
private calculateHash(entry: any): string {
// Create a canonical object for hashing to ensure consistency in property order and types.
const objectToHash = {
actorIdentifier: entry.actorIdentifier,
actorIp: entry.actorIp ?? null,
actionType: entry.actionType,
targetType: entry.targetType ?? null,
targetId: entry.targetId ?? null,
details: entry.details ?? null,
previousHash: entry.previousHash ?? null,
// Normalize timestamp to milliseconds since epoch to avoid precision issues.
timestamp: new Date(entry.timestamp).getTime(),
};
const data = this.canonicalStringify(objectToHash);
return createHash('sha256').update(data).digest('hex');
}
private canonicalStringify(obj: any): string {
if (obj === undefined) {
return 'null';
}
if (obj === null || typeof obj !== 'object') {
return JSON.stringify(obj);
}
if (Array.isArray(obj)) {
return `[${obj.map((item) => this.canonicalStringify(item)).join(',')}]`;
}
const keys = Object.keys(obj).sort();
const pairs = keys.map((key) => {
const value = obj[key];
return `${JSON.stringify(key)}:${this.canonicalStringify(value)}`;
});
return `{${pairs.join(',')}}`;
}
public async getAuditLogs(options: GetAuditLogsOptions = {}): Promise<GetAuditLogsResponse> {
const {
page = 1,
limit = 20,
startDate,
endDate,
actor,
actionType,
sort = 'desc',
} = options;
const whereClauses = [];
if (startDate) whereClauses.push(gte(schema.auditLogs.timestamp, startDate));
if (endDate) whereClauses.push(lte(schema.auditLogs.timestamp, endDate));
if (actor) whereClauses.push(eq(schema.auditLogs.actorIdentifier, actor));
if (actionType) whereClauses.push(eq(schema.auditLogs.actionType, actionType));
const where = and(...whereClauses);
const logs = await db.query.auditLogs.findMany({
where,
orderBy: [sort === 'asc' ? asc(schema.auditLogs.id) : desc(schema.auditLogs.id)],
limit,
offset: (page - 1) * limit,
});
const totalResult = await db
.select({
count: sql<number>`count(*)`,
})
.from(schema.auditLogs)
.where(where);
const total = totalResult[0].count;
return {
data: logs as AuditLogEntry[],
meta: {
total,
page,
limit,
},
};
}
public async verifyAuditLog(): Promise<{ ok: boolean; message: string; logId?: number }> {
const chunkSize = 1000;
let offset = 0;
let previousHash: string | null = null;
/**
* TODO: create job for audit log verification, generate audit report (new DB table)
*/
while (true) {
const logs = await db.query.auditLogs.findMany({
orderBy: [asc(schema.auditLogs.id)],
limit: chunkSize,
offset,
});
if (logs.length === 0) {
break;
}
for (const log of logs) {
if (log.previousHash !== previousHash) {
return {
ok: false,
message: 'Audit log chain is broken!',
logId: log.id,
};
}
const calculatedHash = this.calculateHash(log);
if (log.currentHash !== calculatedHash) {
return {
ok: false,
message: 'Audit log entry is tampered!',
logId: log.id,
};
}
previousHash = log.currentHash;
}
offset += chunkSize;
}
return {
ok: true,
message:
'Audit log integrity verified successfully. The logs are not tempered with and the log chain is complete.',
};
}
}

View File

@@ -2,17 +2,25 @@ import { compare } from 'bcryptjs';
import { SignJWT, jwtVerify } from 'jose';
import type { AuthTokenPayload, LoginResponse } from '@open-archiver/types';
import { UserService } from './UserService';
import { AuditService } from './AuditService';
import { db } from '../database';
import * as schema from '../database/schema';
import { eq } from 'drizzle-orm';
export class AuthService {
#userService: UserService;
#auditService: AuditService;
#jwtSecret: Uint8Array;
#jwtExpiresIn: string;
constructor(userService: UserService, jwtSecret: string, jwtExpiresIn: string) {
constructor(
userService: UserService,
auditService: AuditService,
jwtSecret: string,
jwtExpiresIn: string
) {
this.#userService = userService;
this.#auditService = auditService;
this.#jwtSecret = new TextEncoder().encode(jwtSecret);
this.#jwtExpiresIn = jwtExpiresIn;
}
@@ -33,16 +41,36 @@ export class AuthService {
.sign(this.#jwtSecret);
}
public async login(email: string, password: string): Promise<LoginResponse | null> {
public async login(email: string, password: string, ip: string): Promise<LoginResponse | null> {
const user = await this.#userService.findByEmail(email);
if (!user || !user.password) {
await this.#auditService.createAuditLog({
actorIdentifier: email,
actionType: 'LOGIN',
targetType: 'User',
targetId: email,
actorIp: ip,
details: {
error: 'UserNotFound',
},
});
return null; // User not found or password not set
}
const isPasswordValid = await this.verifyPassword(password, user.password);
if (!isPasswordValid) {
await this.#auditService.createAuditLog({
actorIdentifier: user.id,
actionType: 'LOGIN',
targetType: 'User',
targetId: user.id,
actorIp: ip,
details: {
error: 'InvalidPassword',
},
});
return null; // Invalid password
}
@@ -63,6 +91,15 @@ export class AuthService {
roles: roles,
});
await this.#auditService.createAuditLog({
actorIdentifier: user.id,
actionType: 'LOGIN',
targetType: 'User',
targetId: user.id,
actorIp: ip,
details: {},
});
return {
accessToken,
user: {

View File

@@ -22,7 +22,8 @@ export interface IEmailConnector {
testConnection(): Promise<boolean>;
fetchEmails(
userEmail: string,
syncState?: SyncState | null
syncState?: SyncState | null,
checkDuplicate?: (messageId: string) => Promise<boolean>
): AsyncGenerator<EmailObject | null>;
getUpdatedSyncState(userEmail?: string): SyncState;
listAllUsers(): AsyncGenerator<MailboxUser>;

View File

@@ -1,4 +1,10 @@
import { Attachment, EmailAddress, EmailDocument, EmailObject } from '@open-archiver/types';
import {
Attachment,
EmailAddress,
EmailDocument,
EmailObject,
PendingEmail,
} from '@open-archiver/types';
import { SearchService } from './SearchService';
import { StorageService } from './StorageService';
import { extractText } from '../helpers/textExtractor';
@@ -7,6 +13,7 @@ import { archivedEmails, attachments, emailAttachments } from '../database/schem
import { eq } from 'drizzle-orm';
import { streamToBuffer } from '../helpers/streamToBuffer';
import { simpleParser } from 'mailparser';
import { logger } from '../config/logger';
interface DbRecipients {
to: { name: string; address: string }[];
@@ -20,14 +27,44 @@ type AttachmentsType = {
mimeType: string;
}[];
/**
* Sanitizes text content by removing invalid characters that could cause JSON serialization issues
*/
function sanitizeText(text: string): string {
if (!text) return '';
// Remove control characters and invalid UTF-8 sequences
return text
.replace(/\uFFFD/g, '') // Replacement character for invalid UTF-8 sequences
.replace(/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/g, '') // Remove control characters
.trim();
}
/**
* Recursively sanitize all string values in an object to prevent JSON issues
*/
function sanitizeObject<T>(obj: T): T {
if (typeof obj === 'string') {
return sanitizeText(obj) as unknown as T;
} else if (Array.isArray(obj)) {
return obj.map(sanitizeObject) as unknown as T;
} else if (obj !== null && typeof obj === 'object') {
const sanitized: any = {};
for (const key in obj) {
if (Object.prototype.hasOwnProperty.call(obj, key)) {
sanitized[key] = sanitizeObject((obj as any)[key]);
}
}
return sanitized;
}
return obj;
}
export class IndexingService {
private dbService: DatabaseService;
private searchService: SearchService;
private storageService: StorageService;
/**
* Initializes the service with its dependencies.
*/
constructor(
dbService: DatabaseService,
searchService: SearchService,
@@ -39,9 +76,129 @@ export class IndexingService {
}
/**
* Fetches an email by its ID from the database, creates a search document, and indexes it.
* Index multiple emails in a single batch operation for better performance
*/
public async indexEmailById(emailId: string): Promise<void> {
public async indexEmailBatch(emails: PendingEmail[]): Promise<void> {
if (emails.length === 0) {
return;
}
logger.info({ batchSize: emails.length }, 'Starting batch indexing of emails');
try {
const CONCURRENCY_LIMIT = 10;
const rawDocuments: EmailDocument[] = [];
for (let i = 0; i < emails.length; i += CONCURRENCY_LIMIT) {
const batch = emails.slice(i, i + CONCURRENCY_LIMIT);
const batchDocuments = await Promise.allSettled(
batch.map(async (pendingEmail) => {
try {
const document = await this.indexEmailById(
pendingEmail.archivedEmailId
);
if (document) {
return document;
}
return null;
} catch (error) {
logger.error(
{
emailId: pendingEmail.archivedEmailId,
error: error instanceof Error ? error.message : String(error),
},
'Failed to create document for email in batch'
);
throw error;
}
})
);
for (const result of batchDocuments) {
if (result.status === 'fulfilled' && result.value) {
rawDocuments.push(result.value);
} else if (result.status === 'rejected') {
logger.error({ error: result.reason }, 'Failed to process email in batch');
} else {
logger.error(
{ result: result },
'Failed to process email in batch, reason unknown.'
);
}
}
}
if (rawDocuments.length === 0) {
logger.warn('No documents created from email batch');
return;
}
// Sanitize all documents
const sanitizedDocuments = rawDocuments.map((doc) => sanitizeObject(doc));
// Ensure all required fields are present
const completeDocuments = sanitizedDocuments.map((doc) =>
this.ensureEmailDocumentFields(doc)
);
// Validate each document and separate valid from invalid ones
const validDocuments: EmailDocument[] = [];
const invalidDocuments: { doc: any; reason: string }[] = [];
for (const doc of completeDocuments) {
if (this.isValidEmailDocument(doc)) {
validDocuments.push(doc);
} else {
invalidDocuments.push({ doc, reason: 'JSON.stringify failed' });
logger.warn({ document: doc }, 'Skipping invalid EmailDocument');
}
}
// Log detailed information for invalid documents
if (invalidDocuments.length > 0) {
for (const { doc } of invalidDocuments) {
logger.error(
{
emailId: doc.id,
document: JSON.stringify(doc, null, 2),
},
'Invalid EmailDocument details'
);
}
}
if (validDocuments.length === 0) {
logger.warn('No valid documents to index in batch.');
return;
}
logger.debug({ documentCount: validDocuments.length }, 'Sending batch to Meilisearch');
await this.searchService.addDocuments('emails', validDocuments, 'id');
logger.info(
{
batchSize: emails.length,
successfulDocuments: validDocuments.length,
failedDocuments: emails.length - validDocuments.length,
invalidDocuments: invalidDocuments.length,
},
'Successfully indexed email batch'
);
} catch (error) {
logger.error(
{
batchSize: emails.length,
error: error instanceof Error ? error.message : String(error),
},
'Failed to index email batch'
);
throw error;
}
}
private async indexEmailById(emailId: string): Promise<EmailDocument | null> {
const email = await this.dbService.db.query.archivedEmails.findFirst({
where: eq(archivedEmails.id, emailId),
});
@@ -71,20 +228,16 @@ export class IndexingService {
emailAttachmentsResult,
email.userEmail
);
await this.searchService.addDocuments('emails', [document], 'id');
return document;
}
/**
* Indexes an email object directly, creates a search document, and indexes it.
* @deprecated
*/
public async indexByEmail(
email: EmailObject,
ingestionSourceId: string,
archivedEmailId: string
): Promise<void> {
/* private async indexByEmail(pendingEmail: PendingEmail): Promise<void> {
const attachments: AttachmentsType = [];
if (email.attachments && email.attachments.length > 0) {
for (const attachment of email.attachments) {
if (pendingEmail.email.attachments && pendingEmail.email.attachments.length > 0) {
for (const attachment of pendingEmail.email.attachments) {
attachments.push({
buffer: attachment.content,
filename: attachment.filename,
@@ -93,19 +246,95 @@ export class IndexingService {
}
}
const document = await this.createEmailDocumentFromRaw(
email,
pendingEmail.email,
attachments,
ingestionSourceId,
archivedEmailId,
email.userEmail || ''
pendingEmail.sourceId,
pendingEmail.archivedId,
pendingEmail.email.userEmail || ''
);
// console.log(document);
await this.searchService.addDocuments('emails', [document], 'id');
}
} */
/**
* Creates a search document from a raw email object and its attachments.
*/
/* private async createEmailDocumentFromRawForBatch(
email: EmailObject,
ingestionSourceId: string,
archivedEmailId: string,
userEmail: string
): Promise<EmailDocument> {
const extractedAttachments: { filename: string; content: string }[] = [];
if (email.attachments && email.attachments.length > 0) {
const ATTACHMENT_CONCURRENCY = 3;
for (let i = 0; i < email.attachments.length; i += ATTACHMENT_CONCURRENCY) {
const attachmentBatch = email.attachments.slice(i, i + ATTACHMENT_CONCURRENCY);
const attachmentResults = await Promise.allSettled(
attachmentBatch.map(async (attachment) => {
try {
if (!this.shouldExtractText(attachment.contentType)) {
return null;
}
const textContent = await extractText(
attachment.content,
attachment.contentType || ''
);
return {
filename: attachment.filename,
content: textContent || '',
};
} catch (error) {
logger.warn(
{
filename: attachment.filename,
mimeType: attachment.contentType,
emailId: archivedEmailId,
error: error instanceof Error ? error.message : String(error),
},
'Failed to extract text from attachment'
);
return null;
}
})
);
for (const result of attachmentResults) {
if (result.status === 'fulfilled' && result.value) {
extractedAttachments.push(result.value);
}
}
}
}
const allAttachmentText = extractedAttachments
.map((att) => sanitizeText(att.content))
.join(' ');
const enhancedBody = [sanitizeText(email.body || email.html || ''), allAttachmentText]
.filter(Boolean)
.join('\n\n--- Attachments ---\n\n');
return {
id: archivedEmailId,
userEmail: userEmail,
from: email.from[0]?.address || '',
to: email.to?.map((addr: EmailAddress) => addr.address) || [],
cc: email.cc?.map((addr: EmailAddress) => addr.address) || [],
bcc: email.bcc?.map((addr: EmailAddress) => addr.address) || [],
subject: email.subject || '',
body: enhancedBody,
attachments: extractedAttachments,
timestamp: new Date(email.receivedAt).getTime(),
ingestionSourceId: ingestionSourceId,
};
} */
private async createEmailDocumentFromRaw(
email: EmailObject,
attachments: AttachmentsType,
@@ -126,7 +355,6 @@ export class IndexingService {
`Failed to extract text from attachment: ${attachment.filename}`,
error
);
// skip attachment or fail the job
}
}
// console.log('email.userEmail', userEmail);
@@ -145,9 +373,6 @@ export class IndexingService {
};
}
/**
* Creates a search document from a database email record and its attachments.
*/
private async createEmailDocument(
email: typeof archivedEmails.$inferSelect,
attachments: Attachment[],
@@ -181,9 +406,6 @@ export class IndexingService {
};
}
/**
* Extracts text content from a list of attachments.
*/
private async extractAttachmentContents(
attachments: Attachment[]
): Promise<{ filename: string; content: string }[]> {
@@ -202,9 +424,91 @@ export class IndexingService {
`Failed to extract text from attachment: ${attachment.filename}`,
error
);
// skip attachment or fail the job
}
}
return extractedAttachments;
}
private shouldExtractText(mimeType: string): boolean {
if (process.env.TIKA_URL) {
return true;
}
if (!mimeType) return false;
// Tika supported mime types: https://tika.apache.org/2.4.1/formats.html
const extractableTypes = [
'application/pdf',
'application/msword',
'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'application/vnd.ms-excel',
'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
'application/vnd.ms-powerpoint',
'application/vnd.openxmlformats-officedocument.presentationml.presentation',
'text/plain',
'text/html',
'application/rss+xml',
'application/xml',
'application/json',
'text/rtf',
'application/rtf',
'text/csv',
'text/tsv',
'application/csv',
'image/bpg',
'image/png',
'image/vnd.wap.wbmp',
'image/x-jbig2',
'image/bmp',
'image/x-xcf',
'image/gif',
'image/x-icon',
'image/jpeg',
'image/x-ms-bmp',
'image/webp',
'image/tiff',
'image/svg+xml',
'application/vnd.apple.pages',
'application/vnd.apple.numbers',
'application/vnd.apple.keynote',
'image/heic',
'image/heif',
];
return extractableTypes.some((type) => mimeType.toLowerCase().includes(type));
}
/**
* Ensures all required fields are present in EmailDocument
*/
private ensureEmailDocumentFields(doc: Partial<EmailDocument>): EmailDocument {
return {
id: doc.id || 'missing-id',
userEmail: doc.userEmail || 'unknown',
from: doc.from || '',
to: Array.isArray(doc.to) ? doc.to : [],
cc: Array.isArray(doc.cc) ? doc.cc : [],
bcc: Array.isArray(doc.bcc) ? doc.bcc : [],
subject: doc.subject || '',
body: doc.body || '',
attachments: Array.isArray(doc.attachments) ? doc.attachments : [],
timestamp: typeof doc.timestamp === 'number' ? doc.timestamp : Date.now(),
ingestionSourceId: doc.ingestionSourceId || 'unknown',
};
}
/**
* Validates if the given object is a valid EmailDocument that can be serialized to JSON
*/
private isValidEmailDocument(doc: any): boolean {
try {
JSON.stringify(doc);
return true;
} catch (error) {
logger.error(
{ doc, error: (error as Error).message },
'Invalid EmailDocument detected'
);
return false;
}
}
}

View File

@@ -6,6 +6,7 @@ import type {
IngestionSource,
IngestionCredentials,
IngestionProvider,
PendingEmail,
} from '@open-archiver/types';
import { and, desc, eq } from 'drizzle-orm';
import { CryptoService } from './CryptoService';
@@ -19,16 +20,17 @@ import {
attachments as attachmentsSchema,
emailAttachments,
} from '../database/schema';
import { createHash } from 'crypto';
import { createHash, randomUUID } from 'crypto';
import { logger } from '../config/logger';
import { IndexingService } from './IndexingService';
import { SearchService } from './SearchService';
import { DatabaseService } from './DatabaseService';
import { config } from '../config/index';
import { FilterBuilder } from './FilterBuilder';
import e from 'express';
import { AuditService } from './AuditService';
import { User } from '@open-archiver/types';
import { checkDeletionEnabled } from '../helpers/deletionGuard';
export class IngestionService {
private static auditService = new AuditService();
private static decryptSource(
source: typeof ingestionSources.$inferSelect
): IngestionSource | null {
@@ -53,7 +55,9 @@ export class IngestionService {
public static async create(
dto: CreateIngestionSourceDto,
userId: string
userId: string,
actor: User,
actorIp: string
): Promise<IngestionSource> {
const { providerConfig, ...rest } = dto;
const encryptedCredentials = CryptoService.encryptObject(providerConfig);
@@ -67,9 +71,21 @@ export class IngestionService {
const [newSource] = await db.insert(ingestionSources).values(valuesToInsert).returning();
await this.auditService.createAuditLog({
actorIdentifier: actor.id,
actionType: 'CREATE',
targetType: 'IngestionSource',
targetId: newSource.id,
actorIp,
details: {
sourceName: newSource.name,
sourceType: newSource.provider,
},
});
const decryptedSource = this.decryptSource(newSource);
if (!decryptedSource) {
await this.delete(newSource.id);
await this.delete(newSource.id, actor, actorIp, true);
throw new Error(
'Failed to process newly created ingestion source due to a decryption error.'
);
@@ -80,13 +96,18 @@ export class IngestionService {
const connectionValid = await connector.testConnection();
// If connection succeeds, update status to auth_success, which triggers the initial import.
if (connectionValid) {
return await this.update(decryptedSource.id, { status: 'auth_success' });
return await this.update(
decryptedSource.id,
{ status: 'auth_success' },
actor,
actorIp
);
} else {
throw Error('Ingestion authentication failed.')
throw Error('Ingestion authentication failed.');
}
} catch (error) {
// If connection fails, delete the newly created source and throw the error.
await this.delete(decryptedSource.id);
await this.delete(decryptedSource.id, actor, actorIp, true);
throw error;
}
}
@@ -123,7 +144,9 @@ export class IngestionService {
public static async update(
id: string,
dto: UpdateIngestionSourceDto
dto: UpdateIngestionSourceDto,
actor?: User,
actorIp?: string
): Promise<IngestionSource> {
const { providerConfig, ...rest } = dto;
const valuesToUpdate: Partial<typeof ingestionSources.$inferInsert> = { ...rest };
@@ -158,11 +181,39 @@ export class IngestionService {
if (originalSource.status !== 'auth_success' && decryptedSource.status === 'auth_success') {
await this.triggerInitialImport(decryptedSource.id);
}
if (actor && actorIp) {
const changedFields = Object.keys(dto).filter(
(key) =>
key !== 'providerConfig' &&
originalSource[key as keyof IngestionSource] !==
decryptedSource[key as keyof IngestionSource]
);
if (changedFields.length > 0) {
await this.auditService.createAuditLog({
actorIdentifier: actor.id,
actionType: 'UPDATE',
targetType: 'IngestionSource',
targetId: id,
actorIp,
details: {
changedFields,
},
});
}
}
return decryptedSource;
}
public static async delete(id: string): Promise<IngestionSource> {
public static async delete(
id: string,
actor: User,
actorIp: string,
force: boolean = false
): Promise<IngestionSource> {
if (!force) {
checkDeletionEnabled();
}
const source = await this.findById(id);
if (!source) {
throw new Error('Ingestion source not found');
@@ -175,7 +226,8 @@ export class IngestionService {
if (
(source.credentials.type === 'pst_import' ||
source.credentials.type === 'eml_import') &&
source.credentials.type === 'eml_import' ||
source.credentials.type === 'mbox_import') &&
source.credentials.uploadedFilePath &&
(await storage.exists(source.credentials.uploadedFilePath))
) {
@@ -195,6 +247,17 @@ export class IngestionService {
.where(eq(ingestionSources.id, id))
.returning();
await this.auditService.createAuditLog({
actorIdentifier: actor.id,
actionType: 'DELETE',
targetType: 'IngestionSource',
targetId: id,
actorIp,
details: {
sourceName: deletedSource.name,
},
});
const decryptedSource = this.decryptSource(deletedSource);
if (!decryptedSource) {
// Even if decryption fails, we should confirm deletion.
@@ -215,7 +278,7 @@ export class IngestionService {
await ingestionQueue.add('initial-import', { ingestionSourceId: source.id });
}
public static async triggerForceSync(id: string): Promise<void> {
public static async triggerForceSync(id: string, actor: User, actorIp: string): Promise<void> {
const source = await this.findById(id);
logger.info({ ingestionSourceId: id }, 'Force syncing started.');
if (!source) {
@@ -240,15 +303,35 @@ export class IngestionService {
}
// Reset status to 'active'
await this.update(id, {
status: 'active',
lastSyncStatusMessage: 'Force sync triggered by user.',
await this.update(
id,
{
status: 'active',
lastSyncStatusMessage: 'Force sync triggered by user.',
},
actor,
actorIp
);
await this.auditService.createAuditLog({
actorIdentifier: actor.id,
actionType: 'SYNC',
targetType: 'IngestionSource',
targetId: id,
actorIp,
details: {
sourceName: source.name,
},
});
await ingestionQueue.add('continuous-sync', { ingestionSourceId: source.id });
}
public async performBulkImport(job: IInitialImportJob): Promise<void> {
public static async performBulkImport(
job: IInitialImportJob,
actor: User,
actorIp: string
): Promise<void> {
const { ingestionSourceId } = job;
const source = await IngestionService.findById(ingestionSourceId);
if (!source) {
@@ -256,10 +339,15 @@ export class IngestionService {
}
logger.info(`Starting bulk import for source: ${source.name} (${source.id})`);
await IngestionService.update(ingestionSourceId, {
status: 'importing',
lastSyncStartedAt: new Date(),
});
await IngestionService.update(
ingestionSourceId,
{
status: 'importing',
lastSyncStartedAt: new Date(),
},
actor,
actorIp
);
const connector = EmailProviderFactory.createConnector(source);
@@ -287,22 +375,46 @@ export class IngestionService {
}
} catch (error) {
logger.error(`Bulk import failed for source: ${source.name} (${source.id})`, error);
await IngestionService.update(ingestionSourceId, {
status: 'error',
lastSyncFinishedAt: new Date(),
lastSyncStatusMessage:
error instanceof Error ? error.message : 'An unknown error occurred.',
});
await IngestionService.update(
ingestionSourceId,
{
status: 'error',
lastSyncFinishedAt: new Date(),
lastSyncStatusMessage:
error instanceof Error ? error.message : 'An unknown error occurred.',
},
actor,
actorIp
);
throw error; // Re-throw to allow BullMQ to handle the job failure
}
}
/**
* Quickly checks if an email exists in the database by its Message-ID header.
* This is used to skip downloading duplicate emails during ingestion.
*/
public static async doesEmailExist(
messageId: string,
ingestionSourceId: string
): Promise<boolean> {
const existingEmail = await db.query.archivedEmails.findFirst({
where: and(
eq(archivedEmails.messageIdHeader, messageId),
eq(archivedEmails.ingestionSourceId, ingestionSourceId)
),
columns: { id: true },
});
return !!existingEmail;
}
public async processEmail(
email: EmailObject,
source: IngestionSource,
storage: StorageService,
userEmail: string
): Promise<void> {
): Promise<PendingEmail | null> {
try {
// Generate a unique message ID for the email. If the email already has a message-id header, use that.
// Otherwise, generate a new one based on the email's hash, source ID, and email ID.
@@ -331,7 +443,7 @@ export class IngestionService {
{ messageId, ingestionSourceId: source.id },
'Skipping duplicate email'
);
return;
return null;
}
const emlBuffer = email.eml ?? Buffer.from(email.body, 'utf-8');
@@ -371,50 +483,71 @@ export class IngestionService {
const attachmentHash = createHash('sha256')
.update(attachmentBuffer)
.digest('hex');
const attachmentPath = `${config.storage.openArchiverFolderName}/${source.name.replaceAll(' ', '-')}-${source.id}/attachments/${attachment.filename}`;
await storage.put(attachmentPath, attachmentBuffer);
const [newAttachment] = await db
.insert(attachmentsSchema)
.values({
filename: attachment.filename,
mimeType: attachment.contentType,
sizeBytes: attachment.size,
contentHashSha256: attachmentHash,
storagePath: attachmentPath,
})
.onConflictDoUpdate({
target: attachmentsSchema.contentHashSha256,
set: { filename: attachment.filename },
})
.returning();
// Check if an attachment with the same hash already exists for this source
const existingAttachment = await db.query.attachments.findFirst({
where: and(
eq(attachmentsSchema.contentHashSha256, attachmentHash),
eq(attachmentsSchema.ingestionSourceId, source.id)
),
});
let storagePath: string;
if (existingAttachment) {
// If it exists, reuse the storage path and don't save the file again
storagePath = existingAttachment.storagePath;
logger.info(
{
attachmentHash,
ingestionSourceId: source.id,
reusedPath: storagePath,
},
'Reusing existing attachment file for deduplication.'
);
} else {
// If it's a new attachment, create a unique path and save it
const uniqueId = randomUUID().slice(0, 7);
storagePath = `${config.storage.openArchiverFolderName}/${source.name.replaceAll(' ', '-')}-${source.id}/attachments/${uniqueId}-${attachment.filename}`;
await storage.put(storagePath, attachmentBuffer);
}
let attachmentRecord = existingAttachment;
if (!attachmentRecord) {
// If it's a new attachment, create a unique path and save it
const uniqueId = randomUUID().slice(0, 5);
const storagePath = `${config.storage.openArchiverFolderName}/${source.name.replaceAll(' ', '-')}-${source.id}/attachments/${uniqueId}-${attachment.filename}`;
await storage.put(storagePath, attachmentBuffer);
// Insert a new attachment record
[attachmentRecord] = await db
.insert(attachmentsSchema)
.values({
filename: attachment.filename,
mimeType: attachment.contentType,
sizeBytes: attachment.size,
contentHashSha256: attachmentHash,
storagePath: storagePath,
ingestionSourceId: source.id,
})
.returning();
}
// Link the attachment record (either new or existing) to the email
await db
.insert(emailAttachments)
.values({
emailId: archivedEmail.id,
attachmentId: newAttachment.id,
attachmentId: attachmentRecord.id,
})
.onConflictDoNothing();
}
}
// adding to indexing queue
//Instead: index by email (raw email object, ingestion id)
logger.info({ emailId: archivedEmail.id }, 'Indexing email');
// await indexingQueue.add('index-email', {
// emailId: archivedEmail.id,
// });
const searchService = new SearchService();
const storageService = new StorageService();
const databaseService = new DatabaseService();
const indexingService = new IndexingService(
databaseService,
searchService,
storageService
);
//assign userEmail
email.userEmail = userEmail;
await indexingService.indexByEmail(email, source.id, archivedEmail.id);
return {
archivedEmailId: archivedEmail.id,
};
} catch (error) {
logger.error({
message: `Failed to process email ${email.id} for source ${source.id}`,
@@ -422,6 +555,7 @@ export class IngestionService {
emailId: email.id,
ingestionSourceId: source.id,
});
return null;
}
}
}

View File

@@ -0,0 +1,93 @@
import { db } from '../database';
import { archivedEmails, emailAttachments } from '../database/schema';
import { eq } from 'drizzle-orm';
import { StorageService } from './StorageService';
import { createHash } from 'crypto';
import { logger } from '../config/logger';
import type { IntegrityCheckResult } from '@open-archiver/types';
import { streamToBuffer } from '../helpers/streamToBuffer';
export class IntegrityService {
private storageService = new StorageService();
public async checkEmailIntegrity(emailId: string): Promise<IntegrityCheckResult[]> {
const results: IntegrityCheckResult[] = [];
// 1. Fetch the archived email
const email = await db.query.archivedEmails.findFirst({
where: eq(archivedEmails.id, emailId),
});
if (!email) {
throw new Error('Archived email not found');
}
// 2. Check the email's integrity
const emailStream = await this.storageService.get(email.storagePath);
const emailBuffer = await streamToBuffer(emailStream);
const currentEmailHash = createHash('sha256').update(emailBuffer).digest('hex');
if (currentEmailHash === email.storageHashSha256) {
results.push({ type: 'email', id: email.id, isValid: true });
} else {
results.push({
type: 'email',
id: email.id,
isValid: false,
reason: 'Stored hash does not match current hash.',
});
}
// 3. If the email has attachments, check them
if (email.hasAttachments) {
const emailAttachmentsRelations = await db.query.emailAttachments.findMany({
where: eq(emailAttachments.emailId, emailId),
with: {
attachment: true,
},
});
for (const relation of emailAttachmentsRelations) {
const attachment = relation.attachment;
try {
const attachmentStream = await this.storageService.get(attachment.storagePath);
const attachmentBuffer = await streamToBuffer(attachmentStream);
const currentAttachmentHash = createHash('sha256')
.update(attachmentBuffer)
.digest('hex');
if (currentAttachmentHash === attachment.contentHashSha256) {
results.push({
type: 'attachment',
id: attachment.id,
filename: attachment.filename,
isValid: true,
});
} else {
results.push({
type: 'attachment',
id: attachment.id,
filename: attachment.filename,
isValid: false,
reason: 'Stored hash does not match current hash.',
});
}
} catch (error) {
logger.error(
{ attachmentId: attachment.id, error },
'Failed to read attachment from storage for integrity check.'
);
results.push({
type: 'attachment',
id: attachment.id,
filename: attachment.filename,
isValid: false,
reason: 'Could not read attachment file from storage.',
});
}
}
}
return results;
}
}

View File

@@ -0,0 +1,101 @@
import { Job, Queue } from 'bullmq';
import { ingestionQueue, indexingQueue } from '../jobs/queues';
import { IJob, IQueueCounts, IQueueDetails, IQueueOverview, JobStatus } from '@open-archiver/types';
export class JobsService {
private queues: Queue[];
constructor() {
this.queues = [ingestionQueue, indexingQueue];
}
public async getQueues(): Promise<IQueueOverview[]> {
const queueOverviews: IQueueOverview[] = [];
for (const queue of this.queues) {
const counts = await queue.getJobCounts(
'active',
'completed',
'failed',
'delayed',
'waiting',
'paused'
);
queueOverviews.push({
name: queue.name,
counts: {
active: counts.active || 0,
completed: counts.completed || 0,
failed: counts.failed || 0,
delayed: counts.delayed || 0,
waiting: counts.waiting || 0,
paused: counts.paused || 0,
},
});
}
return queueOverviews;
}
public async getQueueDetails(
queueName: string,
status: JobStatus,
page: number,
limit: number
): Promise<IQueueDetails> {
const queue = this.queues.find((q) => q.name === queueName);
if (!queue) {
throw new Error(`Queue ${queueName} not found`);
}
const counts = await queue.getJobCounts(
'active',
'completed',
'failed',
'delayed',
'waiting',
'paused'
);
const start = (page - 1) * limit;
const end = start + limit - 1;
const jobStatus = status === 'waiting' ? 'wait' : status;
const jobs = await queue.getJobs([jobStatus], start, end, true);
const totalJobs = await queue.getJobCountByTypes(jobStatus);
return {
name: queue.name,
counts: {
active: counts.active || 0,
completed: counts.completed || 0,
failed: counts.failed || 0,
delayed: counts.delayed || 0,
waiting: counts.waiting || 0,
paused: counts.paused || 0,
},
jobs: await Promise.all(jobs.map((job) => this.formatJob(job))),
pagination: {
currentPage: page,
totalPages: Math.ceil(totalJobs / limit),
totalJobs,
limit,
},
};
}
private async formatJob(job: Job): Promise<IJob> {
const state = await job.getState();
return {
id: job.id,
name: job.name,
data: job.data,
state: state,
failedReason: job.failedReason,
timestamp: job.timestamp,
processedOn: job.processedOn,
finishedOn: job.finishedOn,
attemptsMade: job.attemptsMade,
stacktrace: job.stacktrace,
returnValue: job.returnvalue,
ingestionSourceId: job.data.ingestionSourceId,
error: state === 'failed' ? job.stacktrace : undefined,
};
}
}

View File

@@ -0,0 +1,272 @@
import crypto from 'crypto';
import { logger } from '../config/logger';
// Simple LRU cache for Tika results with statistics
class TikaCache {
private cache = new Map<string, string>();
private maxSize = 50;
private hits = 0;
private misses = 0;
get(key: string): string | undefined {
const value = this.cache.get(key);
if (value !== undefined) {
this.hits++;
// LRU: Move element to the end
this.cache.delete(key);
this.cache.set(key, value);
} else {
this.misses++;
}
return value;
}
set(key: string, value: string): void {
// If already exists, delete first
if (this.cache.has(key)) {
this.cache.delete(key);
}
// If cache is full, remove oldest element
else if (this.cache.size >= this.maxSize) {
const firstKey = this.cache.keys().next().value;
if (firstKey !== undefined) {
this.cache.delete(firstKey);
}
}
this.cache.set(key, value);
}
getStats(): { size: number; maxSize: number; hits: number; misses: number; hitRate: number } {
const total = this.hits + this.misses;
const hitRate = total > 0 ? (this.hits / total) * 100 : 0;
return {
size: this.cache.size,
maxSize: this.maxSize,
hits: this.hits,
misses: this.misses,
hitRate: Math.round(hitRate * 100) / 100, // 2 decimal places
};
}
reset(): void {
this.cache.clear();
this.hits = 0;
this.misses = 0;
}
}
// Semaphore for running Tika requests
class TikaSemaphore {
private inProgress = new Map<string, Promise<string>>();
private waitCount = 0;
async acquire(key: string, operation: () => Promise<string>): Promise<string> {
// Check if a request for this key is already running
const existingPromise = this.inProgress.get(key);
if (existingPromise) {
this.waitCount++;
logger.debug(`Waiting for in-progress Tika request (${key.slice(0, 8)}...)`);
try {
return await existingPromise;
} finally {
this.waitCount--;
}
}
// Start new request
const promise = this.executeOperation(key, operation);
this.inProgress.set(key, promise);
try {
return await promise;
} finally {
// Remove promise from map when finished
this.inProgress.delete(key);
}
}
private async executeOperation(key: string, operation: () => Promise<string>): Promise<string> {
try {
return await operation();
} catch (error) {
// Remove promise from map even on errors
logger.error(`Tika operation failed for key ${key.slice(0, 8)}...`, error);
throw error;
}
}
getStats(): { inProgress: number; waitCount: number } {
return {
inProgress: this.inProgress.size,
waitCount: this.waitCount,
};
}
clear(): void {
this.inProgress.clear();
this.waitCount = 0;
}
}
export class OcrService {
private tikaCache = new TikaCache();
private tikaSemaphore = new TikaSemaphore();
// Tika-based text extraction with cache and semaphore
async extractTextWithTika(buffer: Buffer, mimeType: string): Promise<string> {
const tikaUrl = process.env.TIKA_URL;
if (!tikaUrl) {
throw new Error('TIKA_URL environment variable not set');
}
// Cache key: SHA-256 hash of the buffer
const hash = crypto.createHash('sha256').update(buffer).digest('hex');
// Cache lookup (before semaphore!)
const cachedResult = this.tikaCache.get(hash);
if (cachedResult !== undefined) {
logger.debug(`Tika cache hit for ${mimeType} (${buffer.length} bytes)`);
return cachedResult;
}
// Use semaphore to deduplicate parallel requests
return await this.tikaSemaphore.acquire(hash, async () => {
// Check cache again (might have been filled by parallel request)
const cachedAfterWait = this.tikaCache.get(hash);
if (cachedAfterWait !== undefined) {
logger.debug(`Tika cache hit after wait for ${mimeType} (${buffer.length} bytes)`);
return cachedAfterWait;
}
logger.debug(`Executing Tika request for ${mimeType} (${buffer.length} bytes)`);
// DNS fallback: If "tika" hostname, also try localhost
const urlsToTry = [
`${tikaUrl}/tika`,
// Fallback falls DNS-Problem mit "tika" hostname
...(tikaUrl.includes('://tika:')
? [`${tikaUrl.replace('://tika:', '://localhost:')}/tika`]
: []),
];
for (const url of urlsToTry) {
try {
logger.debug(`Trying Tika URL: ${url}`);
const response = await fetch(url, {
method: 'PUT',
headers: {
'Content-Type': mimeType || 'application/octet-stream',
Accept: 'text/plain',
Connection: 'close',
},
body: buffer,
signal: AbortSignal.timeout(180000),
});
if (!response.ok) {
logger.warn(
`Tika extraction failed at ${url}: ${response.status} ${response.statusText}`
);
continue; // Try next URL
}
const text = await response.text();
const result = text.trim();
// Cache result (also empty strings to avoid repeated attempts)
this.tikaCache.set(hash, result);
const cacheStats = this.tikaCache.getStats();
const semaphoreStats = this.tikaSemaphore.getStats();
logger.debug(
`Tika extraction successful - Cache: ${cacheStats.hits}H/${cacheStats.misses}M (${cacheStats.hitRate}%) - Semaphore: ${semaphoreStats.inProgress} active, ${semaphoreStats.waitCount} waiting`
);
return result;
} catch (error) {
logger.warn(
`Tika extraction error at ${url}:`,
error instanceof Error ? error.message : 'Unknown error'
);
// Continue to next URL
}
}
// All URLs failed - cache this too (as empty string)
logger.error('All Tika URLs failed');
this.tikaCache.set(hash, '');
return '';
});
}
// Helper function to check Tika availability
async checkTikaAvailability(): Promise<boolean> {
const tikaUrl = process.env.TIKA_URL;
if (!tikaUrl) {
return false;
}
try {
const response = await fetch(`${tikaUrl}/version`, {
method: 'GET',
signal: AbortSignal.timeout(5000), // 5 seconds timeout
});
if (response.ok) {
const version = await response.text();
logger.info(`Tika server available, version: ${version.trim()}`);
return true;
}
return false;
} catch (error) {
logger.warn(
'Tika server not available:',
error instanceof Error ? error.message : 'Unknown error'
);
return false;
}
}
// Optional: Tika health check on startup
async initializeTextExtractor(): Promise<void> {
const tikaUrl = process.env.TIKA_URL;
if (tikaUrl) {
const isAvailable = await this.checkTikaAvailability();
if (!isAvailable) {
logger.error(`Tika server configured but not available at: ${tikaUrl}`);
logger.error('Text extraction will fall back to legacy methods or fail');
}
} else {
logger.info('Using legacy text extraction methods (pdf2json, mammoth, xlsx)');
logger.info(
'Set TIKA_URL environment variable to use Apache Tika for better extraction'
);
}
}
// Get cache statistics
getTikaCacheStats(): {
size: number;
maxSize: number;
hits: number;
misses: number;
hitRate: number;
} {
return this.tikaCache.getStats();
}
// Get semaphore statistics
getTikaSemaphoreStats(): { inProgress: number; waitCount: number } {
return this.tikaSemaphore.getStats();
}
// Clear cache (e.g. for tests or manual reset)
clearTikaCache(): void {
this.tikaCache.reset();
this.tikaSemaphore.clear();
logger.info('Tika cache and semaphore cleared');
}
}

View File

@@ -1,16 +1,25 @@
import { Index, MeiliSearch, SearchParams } from 'meilisearch';
import { config } from '../config';
import type { SearchQuery, SearchResult, EmailDocument, TopSender } from '@open-archiver/types';
import type {
SearchQuery,
SearchResult,
EmailDocument,
TopSender,
User,
} from '@open-archiver/types';
import { FilterBuilder } from './FilterBuilder';
import { AuditService } from './AuditService';
export class SearchService {
private client: MeiliSearch;
private auditService: AuditService;
constructor() {
this.client = new MeiliSearch({
host: config.search.host,
apiKey: config.search.apiKey,
});
this.auditService = new AuditService();
}
public async getIndex<T extends Record<string, any>>(name: string): Promise<Index<T>> {
@@ -48,7 +57,11 @@ export class SearchService {
return index.deleteDocuments({ filter });
}
public async searchEmails(dto: SearchQuery, userId: string): Promise<SearchResult> {
public async searchEmails(
dto: SearchQuery,
userId: string,
actorIp: string
): Promise<SearchResult> {
const { query, filters, page = 1, limit = 10, matchingStrategy = 'last' } = dto;
const index = await this.getIndex<EmailDocument>('emails');
@@ -84,9 +97,24 @@ export class SearchService {
searchParams.filter = searchFilter;
}
}
console.log('searchParams', searchParams);
// console.log('searchParams', searchParams);
const searchResults = await index.search(query, searchParams);
await this.auditService.createAuditLog({
actorIdentifier: userId,
actionType: 'SEARCH',
targetType: 'ArchivedEmail',
targetId: '',
actorIp,
details: {
query,
filters,
page,
limit,
matchingStrategy,
},
});
return {
hits: searchResults.hits,
total: searchResults.estimatedTotalHits ?? searchResults.hits.length,

View File

@@ -1,7 +1,7 @@
import { db } from '../database';
import { systemSettings } from '../database/schema/system-settings';
import type { SystemSettings } from '@open-archiver/types';
import { eq } from 'drizzle-orm';
import type { SystemSettings, User } from '@open-archiver/types';
import { AuditService } from './AuditService';
const DEFAULT_SETTINGS: SystemSettings = {
language: 'en',
@@ -10,6 +10,7 @@ const DEFAULT_SETTINGS: SystemSettings = {
};
export class SettingsService {
private auditService = new AuditService();
/**
* Retrieves the current system settings.
* If no settings exist, it initializes and returns the default settings.
@@ -30,13 +31,36 @@ export class SettingsService {
* @param newConfig - A partial object of the new settings configuration.
* @returns The updated system settings.
*/
public async updateSystemSettings(newConfig: Partial<SystemSettings>): Promise<SystemSettings> {
public async updateSystemSettings(
newConfig: Partial<SystemSettings>,
actor: User,
actorIp: string
): Promise<SystemSettings> {
const currentConfig = await this.getSystemSettings();
const mergedConfig = { ...currentConfig, ...newConfig };
// Since getSettings ensures a record always exists, we can directly update.
const [result] = await db.update(systemSettings).set({ config: mergedConfig }).returning();
const changedFields = Object.keys(newConfig).filter(
(key) =>
currentConfig[key as keyof SystemSettings] !==
newConfig[key as keyof SystemSettings]
);
if (changedFields.length > 0) {
await this.auditService.createAuditLog({
actorIdentifier: actor.id,
actionType: 'UPDATE',
targetType: 'SystemSettings',
targetId: 'system',
actorIp,
details: {
changedFields,
},
});
}
return result.config;
}

View File

@@ -2,11 +2,25 @@ import { IStorageProvider, StorageConfig } from '@open-archiver/types';
import { LocalFileSystemProvider } from './storage/LocalFileSystemProvider';
import { S3StorageProvider } from './storage/S3StorageProvider';
import { config } from '../config/index';
import { createCipheriv, createDecipheriv, randomBytes } from 'crypto';
import { streamToBuffer } from '../helpers/streamToBuffer';
import { Readable } from 'stream';
/**
* A unique identifier for Open Archiver encrypted files. This value SHOULD NOT BE ALTERED in future development to ensure compatibility.
*/
const ENCRYPTION_PREFIX = Buffer.from('oa_enc_idf_v1::');
export class StorageService implements IStorageProvider {
private provider: IStorageProvider;
private encryptionKey: Buffer | null = null;
private readonly algorithm = 'aes-256-cbc';
constructor(storageConfig: StorageConfig = config.storage) {
if (storageConfig.encryptionKey) {
this.encryptionKey = Buffer.from(storageConfig.encryptionKey, 'hex');
}
switch (storageConfig.type) {
case 'local':
this.provider = new LocalFileSystemProvider(storageConfig);
@@ -19,12 +33,125 @@ export class StorageService implements IStorageProvider {
}
}
put(path: string, content: Buffer | NodeJS.ReadableStream): Promise<void> {
return this.provider.put(path, content);
private async encrypt(content: Buffer): Promise<Buffer> {
if (!this.encryptionKey) {
return content;
}
const iv = randomBytes(16);
const cipher = createCipheriv(this.algorithm, this.encryptionKey, iv);
const encrypted = Buffer.concat([cipher.update(content), cipher.final()]);
return Buffer.concat([ENCRYPTION_PREFIX, iv, encrypted]);
}
get(path: string): Promise<NodeJS.ReadableStream> {
return this.provider.get(path);
private async decrypt(content: Buffer): Promise<Buffer> {
if (!this.encryptionKey) {
return content;
}
const prefix = content.subarray(0, ENCRYPTION_PREFIX.length);
if (!prefix.equals(ENCRYPTION_PREFIX)) {
// File is not encrypted, return as is
return content;
}
try {
const iv = content.subarray(ENCRYPTION_PREFIX.length, ENCRYPTION_PREFIX.length + 16);
const encrypted = content.subarray(ENCRYPTION_PREFIX.length + 16);
const decipher = createDecipheriv(this.algorithm, this.encryptionKey, iv);
return Buffer.concat([decipher.update(encrypted), decipher.final()]);
} catch (error) {
// Decryption failed for a file that has the prefix.
// This indicates a corrupted file or a wrong key.
throw new Error('Failed to decrypt file. It may be corrupted or the key is incorrect.');
}
}
async put(path: string, content: Buffer | NodeJS.ReadableStream): Promise<void> {
const buffer =
content instanceof Buffer
? content
: await streamToBuffer(content as NodeJS.ReadableStream);
const encryptedContent = await this.encrypt(buffer);
return this.provider.put(path, encryptedContent);
}
async get(path: string): Promise<NodeJS.ReadableStream> {
const stream = await this.provider.get(path);
const buffer = await streamToBuffer(stream);
const decryptedContent = await this.decrypt(buffer);
return Readable.from(decryptedContent);
}
public async getStream(path: string): Promise<NodeJS.ReadableStream> {
const stream = await this.provider.get(path);
if (!this.encryptionKey) {
return stream;
}
// For encrypted files, we need to read the prefix and IV first.
// This part still buffers a small, fixed amount of data, which is acceptable.
const prefixAndIvBuffer = await new Promise<Buffer>((resolve, reject) => {
const chunks: Buffer[] = [];
let totalLength = 0;
const targetLength = ENCRYPTION_PREFIX.length + 16;
const onData = (chunk: Buffer) => {
chunks.push(chunk);
totalLength += chunk.length;
if (totalLength >= targetLength) {
stream.removeListener('data', onData);
resolve(Buffer.concat(chunks));
}
};
stream.on('data', onData);
stream.on('error', reject);
stream.on('end', () => {
// Handle cases where the file is smaller than the prefix + IV
if (totalLength < targetLength) {
resolve(Buffer.concat(chunks));
}
});
});
const prefix = prefixAndIvBuffer.subarray(0, ENCRYPTION_PREFIX.length);
if (!prefix.equals(ENCRYPTION_PREFIX)) {
// File is not encrypted, return a new stream containing the buffered prefix and the rest of the original stream
const combinedStream = new Readable({
read() {},
});
combinedStream.push(prefixAndIvBuffer);
stream.on('data', (chunk) => {
combinedStream.push(chunk);
});
stream.on('end', () => {
combinedStream.push(null); // No more data
});
stream.on('error', (err) => {
combinedStream.emit('error', err);
});
return combinedStream;
}
try {
const iv = prefixAndIvBuffer.subarray(
ENCRYPTION_PREFIX.length,
ENCRYPTION_PREFIX.length + 16
);
const decipher = createDecipheriv(this.algorithm, this.encryptionKey, iv);
// Push the remaining part of the initial buffer to the decipher
const remainingBuffer = prefixAndIvBuffer.subarray(ENCRYPTION_PREFIX.length + 16);
if (remainingBuffer.length > 0) {
decipher.write(remainingBuffer);
}
// Pipe the rest of the stream
stream.pipe(decipher);
return decipher;
} catch (error) {
throw new Error('Failed to decrypt file. It may be corrupted or the key is incorrect.');
}
}
delete(path: string): Promise<void> {

View File

@@ -1,10 +1,12 @@
import { db } from '../database';
import * as schema from '../database/schema';
import { eq, sql } from 'drizzle-orm';
import { hash } from 'bcryptjs';
import { hash, compare } from 'bcryptjs';
import type { CaslPolicy, User } from '@open-archiver/types';
import { AuditService } from './AuditService';
export class UserService {
private static auditService = new AuditService();
/**
* Finds a user by their email address.
* @param email The email address of the user to find.
@@ -60,7 +62,9 @@ export class UserService {
public async createUser(
userDetails: Pick<User, 'email' | 'first_name' | 'last_name'> & { password?: string },
roleId: string
roleId: string,
actor: User,
actorIp: string
): Promise<typeof schema.users.$inferSelect> {
const { email, first_name, last_name, password } = userDetails;
const hashedPassword = password ? await hash(password, 10) : undefined;
@@ -80,33 +84,112 @@ export class UserService {
roleId: roleId,
});
await UserService.auditService.createAuditLog({
actorIdentifier: actor.id,
actionType: 'CREATE',
targetType: 'User',
targetId: newUser[0].id,
actorIp,
details: {
createdUserEmail: newUser[0].email,
},
});
return newUser[0];
}
public async updateUser(
id: string,
userDetails: Partial<Pick<User, 'email' | 'first_name' | 'last_name'>>,
roleId?: string
roleId: string | undefined,
actor: User,
actorIp: string
): Promise<typeof schema.users.$inferSelect | null> {
const originalUser = await this.findById(id);
const updatedUser = await db
.update(schema.users)
.set(userDetails)
.where(eq(schema.users.id, id))
.returning();
if (roleId) {
if (roleId && originalUser?.role?.id !== roleId) {
await db.delete(schema.userRoles).where(eq(schema.userRoles.userId, id));
await db.insert(schema.userRoles).values({
userId: id,
roleId: roleId,
});
await UserService.auditService.createAuditLog({
actorIdentifier: actor.id,
actionType: 'UPDATE',
targetType: 'User',
targetId: id,
actorIp,
details: {
field: 'role',
oldValue: originalUser?.role?.name,
newValue: roleId, // TODO: get role name
},
});
}
// TODO: log other user detail changes
return updatedUser[0] || null;
}
public async deleteUser(id: string): Promise<void> {
public async deleteUser(id: string, actor: User, actorIp: string): Promise<void> {
const userToDelete = await this.findById(id);
await db.delete(schema.users).where(eq(schema.users.id, id));
await UserService.auditService.createAuditLog({
actorIdentifier: actor.id,
actionType: 'DELETE',
targetType: 'User',
targetId: id,
actorIp,
details: {
deletedUserEmail: userToDelete?.email,
},
});
}
public async updatePassword(
id: string,
currentPassword: string,
newPassword: string,
actor: User,
actorIp: string
): Promise<void> {
const user = await db.query.users.findFirst({
where: eq(schema.users.id, id),
});
if (!user || !user.password) {
throw new Error('User not found');
}
const isPasswordValid = await compare(currentPassword, user.password);
if (!isPasswordValid) {
throw new Error('Invalid current password');
}
const hashedPassword = await hash(newPassword, 10);
await db
.update(schema.users)
.set({ password: hashedPassword })
.where(eq(schema.users.id, id));
await UserService.auditService.createAuditLog({
actorIdentifier: actor.id,
actionType: 'UPDATE',
targetType: 'User',
targetId: id,
actorIp,
details: {
field: 'password',
},
});
}
/**
@@ -152,6 +235,17 @@ export class UserService {
roleId: superAdminRole.id,
});
await UserService.auditService.createAuditLog({
actorIdentifier: 'SYSTEM',
actionType: 'SETUP',
targetType: 'User',
targetId: newUser[0].id,
actorIp: '::1', // System action
details: {
setupAdminEmail: newUser[0].email,
},
});
return newUser[0];
}

View File

@@ -32,29 +32,72 @@ export class EMLConnector implements IEmailConnector {
this.storage = new StorageService();
}
private getFilePath(): string {
return this.credentials.localFilePath || this.credentials.uploadedFilePath || '';
}
private getDisplayName(): string {
if (this.credentials.uploadedFileName) {
return this.credentials.uploadedFileName;
}
if (this.credentials.localFilePath) {
const parts = this.credentials.localFilePath.split('/');
return parts[parts.length - 1].replace('.zip', '');
}
return `eml-import-${new Date().getTime()}`;
}
private async getFileStream(): Promise<NodeJS.ReadableStream> {
if (this.credentials.localFilePath) {
return createReadStream(this.credentials.localFilePath);
}
return this.storage.get(this.getFilePath());
}
public async testConnection(): Promise<boolean> {
try {
if (!this.credentials.uploadedFilePath) {
throw Error('EML file path not provided.');
const filePath = this.getFilePath();
if (!filePath) {
throw Error('EML Zip file path not provided.');
}
if (!this.credentials.uploadedFilePath.includes('.zip')) {
if (!filePath.includes('.zip')) {
throw Error('Provided file is not in the ZIP format.');
}
const fileExist = await this.storage.exists(this.credentials.uploadedFilePath);
let fileExist = false;
if (this.credentials.localFilePath) {
try {
await fs.access(this.credentials.localFilePath);
fileExist = true;
} catch {
fileExist = false;
}
} else {
fileExist = await this.storage.exists(filePath);
}
if (!fileExist) {
throw Error('EML file upload not finished yet, please wait.');
if (this.credentials.localFilePath) {
throw Error(`EML Zip file not found at path: ${this.credentials.localFilePath}`);
} else {
throw Error(
'Uploaded EML Zip file not found. The upload may not have finished yet, or it failed.'
);
}
}
return true;
} catch (error) {
logger.error({ error, credentials: this.credentials }, 'EML file validation failed.');
logger.error(
{ error, credentials: this.credentials },
'EML Zip file validation failed.'
);
throw error;
}
}
public async *listAllUsers(): AsyncGenerator<MailboxUser> {
const displayName =
this.credentials.uploadedFileName || `eml-import-${new Date().getTime()}`;
const displayName = this.getDisplayName();
logger.info(`Found potential mailbox: ${displayName}`);
const constructedPrimaryEmail = `${displayName.replace(/ /g, '.').toLowerCase()}@eml.local`;
yield {
@@ -68,10 +111,8 @@ export class EMLConnector implements IEmailConnector {
userEmail: string,
syncState?: SyncState | null
): AsyncGenerator<EmailObject | null> {
const fileStream = await this.storage.get(this.credentials.uploadedFilePath);
const fileStream = await this.getFileStream();
const tempDir = await fs.mkdtemp(join('/tmp', `eml-import-${new Date().getTime()}`));
const unzippedPath = join(tempDir, 'unzipped');
await fs.mkdir(unzippedPath);
const zipFilePath = join(tempDir, 'eml.zip');
try {
@@ -82,99 +123,150 @@ export class EMLConnector implements IEmailConnector {
dest.on('error', reject);
});
await this.extract(zipFilePath, unzippedPath);
const files = await this.getAllFiles(unzippedPath);
for (const file of files) {
if (file.endsWith('.eml')) {
try {
// logger.info({ file }, 'Processing EML file.');
const stream = createReadStream(file);
const content = await streamToBuffer(stream);
// logger.info({ file, size: content.length }, 'Read file to buffer.');
let relativePath = file.substring(unzippedPath.length + 1);
if (dirname(relativePath) === '.') {
relativePath = '';
} else {
relativePath = dirname(relativePath);
}
const emailObject = await this.parseMessage(content, relativePath);
// logger.info({ file, messageId: emailObject.id }, 'Parsed email message.');
yield emailObject;
} catch (error) {
logger.error(
{ error, file },
'Failed to process a single EML file. Skipping.'
);
}
}
}
yield* this.processZipEntries(zipFilePath);
} catch (error) {
logger.error({ error }, 'Failed to fetch email.');
throw error;
} finally {
await fs.rm(tempDir, { recursive: true, force: true });
try {
await this.storage.delete(this.credentials.uploadedFilePath);
} catch (error) {
logger.error(
{ error, file: this.credentials.uploadedFilePath },
'Failed to delete EML file after processing.'
);
if (this.credentials.uploadedFilePath && !this.credentials.localFilePath) {
try {
await this.storage.delete(this.credentials.uploadedFilePath);
} catch (error) {
logger.error(
{ error, file: this.credentials.uploadedFilePath },
'Failed to delete EML file after processing.'
);
}
}
}
}
private extract(zipFilePath: string, dest: string): Promise<void> {
return new Promise((resolve, reject) => {
private async *processZipEntries(zipFilePath: string): AsyncGenerator<EmailObject | null> {
// Open the ZIP file.
// Note: yauzl requires random access, so we must use the file on disk.
const zipfile = await new Promise<yauzl.ZipFile>((resolve, reject) => {
yauzl.open(zipFilePath, { lazyEntries: true, decodeStrings: false }, (err, zipfile) => {
if (err) reject(err);
zipfile.on('error', reject);
zipfile.readEntry();
zipfile.on('entry', (entry) => {
const fileName = entry.fileName.toString('utf8');
// Ignore macOS-specific metadata files.
if (fileName.startsWith('__MACOSX/')) {
zipfile.readEntry();
return;
}
const entryPath = join(dest, fileName);
if (/\/$/.test(fileName)) {
fs.mkdir(entryPath, { recursive: true })
.then(() => zipfile.readEntry())
.catch(reject);
} else {
zipfile.openReadStream(entry, (err, readStream) => {
if (err) reject(err);
const writeStream = createWriteStream(entryPath);
readStream.pipe(writeStream);
writeStream.on('finish', () => zipfile.readEntry());
writeStream.on('error', reject);
});
}
});
zipfile.on('end', () => resolve());
if (err || !zipfile) return reject(err);
resolve(zipfile);
});
});
}
private async getAllFiles(dirPath: string, arrayOfFiles: string[] = []): Promise<string[]> {
const files = await fs.readdir(dirPath);
// Create an async iterator for zip entries
const entryIterator = this.zipEntryGenerator(zipfile);
for (const file of files) {
const fullPath = join(dirPath, file);
if ((await fs.stat(fullPath)).isDirectory()) {
await this.getAllFiles(fullPath, arrayOfFiles);
} else {
arrayOfFiles.push(fullPath);
for await (const { entry, openReadStream } of entryIterator) {
const fileName = entry.fileName.toString();
if (fileName.startsWith('__MACOSX/') || /\/$/.test(fileName)) {
continue;
}
if (fileName.endsWith('.eml')) {
try {
const readStream = await openReadStream();
const relativePath = dirname(fileName) === '.' ? '' : dirname(fileName);
const emailObject = await this.parseMessage(readStream, relativePath);
yield emailObject;
} catch (error) {
logger.error(
{ error, file: fileName },
'Failed to process a single EML file from zip. Skipping.'
);
}
}
}
return arrayOfFiles;
}
private async parseMessage(emlBuffer: Buffer, path: string): Promise<EmailObject> {
private async *zipEntryGenerator(
zipfile: yauzl.ZipFile
): AsyncGenerator<{ entry: yauzl.Entry; openReadStream: () => Promise<Readable> }> {
let resolveNext: ((value: any) => void) | null = null;
let rejectNext: ((reason?: any) => void) | null = null;
let finished = false;
const queue: yauzl.Entry[] = [];
zipfile.readEntry();
zipfile.on('entry', (entry) => {
if (resolveNext) {
const resolve = resolveNext;
resolveNext = null;
rejectNext = null;
resolve(entry);
} else {
queue.push(entry);
}
});
zipfile.on('end', () => {
finished = true;
if (resolveNext) {
const resolve = resolveNext;
resolveNext = null;
rejectNext = null;
resolve(null); // Signal end
}
});
zipfile.on('error', (err) => {
finished = true;
if (rejectNext) {
const reject = rejectNext;
resolveNext = null;
rejectNext = null;
reject(err);
}
});
while (!finished || queue.length > 0) {
if (queue.length > 0) {
const entry = queue.shift()!;
yield {
entry,
openReadStream: () =>
new Promise<Readable>((resolve, reject) => {
zipfile.openReadStream(entry, (err, stream) => {
if (err || !stream) return reject(err);
resolve(stream);
});
}),
};
zipfile.readEntry(); // Read next entry only after yielding
} else {
const entry = await new Promise<yauzl.Entry | null>((resolve, reject) => {
resolveNext = resolve;
rejectNext = reject;
});
if (entry) {
yield {
entry,
openReadStream: () =>
new Promise<Readable>((resolve, reject) => {
zipfile.openReadStream(entry, (err, stream) => {
if (err || !stream) return reject(err);
resolve(stream);
});
}),
};
zipfile.readEntry(); // Read next entry only after yielding
} else {
break; // End of zip
}
}
}
}
private async parseMessage(
input: Buffer | Readable,
path: string
): Promise<EmailObject> {
let emlBuffer: Buffer;
if (Buffer.isBuffer(input)) {
emlBuffer = input;
} else {
emlBuffer = await streamToBuffer(input);
}
const parsedEmail: ParsedMail = await simpleParser(emlBuffer);
const attachments = parsedEmail.attachments.map((attachment: Attachment) => ({

View File

@@ -132,7 +132,8 @@ export class GoogleWorkspaceConnector implements IEmailConnector {
*/
public async *fetchEmails(
userEmail: string,
syncState?: SyncState | null
syncState?: SyncState | null,
checkDuplicate?: (messageId: string) => Promise<boolean>
): AsyncGenerator<EmailObject> {
const authClient = this.getAuthClient(userEmail, [
'https://www.googleapis.com/auth/gmail.readonly',
@@ -144,7 +145,7 @@ export class GoogleWorkspaceConnector implements IEmailConnector {
// If no sync state is provided for this user, this is an initial import. Get all messages.
if (!startHistoryId) {
yield* this.fetchAllMessagesForUser(gmail, userEmail);
yield* this.fetchAllMessagesForUser(gmail, userEmail, checkDuplicate);
return;
}
@@ -170,6 +171,16 @@ export class GoogleWorkspaceConnector implements IEmailConnector {
if (messageAdded.message?.id) {
try {
const messageId = messageAdded.message.id;
// Optimization: Check for existence before fetching full content
if (checkDuplicate && (await checkDuplicate(messageId))) {
logger.debug(
{ messageId, userEmail },
'Skipping duplicate email (pre-check)'
);
continue;
}
const metadataResponse = await gmail.users.messages.get({
userId: userEmail,
id: messageId,
@@ -258,8 +269,17 @@ export class GoogleWorkspaceConnector implements IEmailConnector {
private async *fetchAllMessagesForUser(
gmail: gmail_v1.Gmail,
userEmail: string
userEmail: string,
checkDuplicate?: (messageId: string) => Promise<boolean>
): AsyncGenerator<EmailObject> {
// Capture the history ID at the start to ensure no emails are missed during the import process.
// Any emails arriving during this import will be covered by the next sync starting from this point.
// Overlaps are handled by the duplicate check.
const profileResponse = await gmail.users.getProfile({ userId: userEmail });
if (profileResponse.data.historyId) {
this.newHistoryId = profileResponse.data.historyId;
}
let pageToken: string | undefined = undefined;
do {
const listResponse: Common.GaxiosResponseWithHTTP2<gmail_v1.Schema$ListMessagesResponse> =
@@ -277,6 +297,16 @@ export class GoogleWorkspaceConnector implements IEmailConnector {
if (message.id) {
try {
const messageId = message.id;
// Optimization: Check for existence before fetching full content
if (checkDuplicate && (await checkDuplicate(messageId))) {
logger.debug(
{ messageId, userEmail },
'Skipping duplicate email (pre-check)'
);
continue;
}
const metadataResponse = await gmail.users.messages.get({
userId: userEmail,
id: messageId,
@@ -352,12 +382,6 @@ export class GoogleWorkspaceConnector implements IEmailConnector {
}
pageToken = listResponse.data.nextPageToken ?? undefined;
} while (pageToken);
// After fetching all messages, get the latest history ID to use as the starting point for the next sync.
const profileResponse = await gmail.users.getProfile({ userId: userEmail });
if (profileResponse.data.historyId) {
this.newHistoryId = profileResponse.data.historyId;
}
}
public getUpdatedSyncState(userEmail: string): SyncState {

View File

@@ -8,13 +8,13 @@ import type {
import type { IEmailConnector } from '../EmailProviderFactory';
import { ImapFlow } from 'imapflow';
import { simpleParser, ParsedMail, Attachment, AddressObject, Headers } from 'mailparser';
import { config } from '../../config';
import { logger } from '../../config/logger';
import { getThreadId } from './helpers/utils';
export class ImapConnector implements IEmailConnector {
private client: ImapFlow;
private newMaxUids: { [mailboxPath: string]: number } = {};
private isConnected = false;
private statusMessage: string | undefined;
constructor(private credentials: GenericImapCredentials) {
@@ -40,7 +40,6 @@ export class ImapConnector implements IEmailConnector {
// Handles client-level errors, like unexpected disconnects, to prevent crashes.
client.on('error', (err) => {
logger.error({ err }, 'IMAP client error');
this.isConnected = false;
});
return client;
@@ -50,20 +49,17 @@ export class ImapConnector implements IEmailConnector {
* Establishes a connection to the IMAP server if not already connected.
*/
private async connect(): Promise<void> {
if (this.isConnected && this.client.usable) {
// If the client is already connected and usable, do nothing.
if (this.client.usable) {
return;
}
// If the client is not usable (e.g., after a logout), create a new one.
if (!this.client.usable) {
this.client = this.createClient();
}
// If the client is not usable (e.g., after a logout or an error), create a new one.
this.client = this.createClient();
try {
await this.client.connect();
this.isConnected = true;
} catch (err: any) {
this.isConnected = false;
logger.error({ err }, 'IMAP connection failed');
if (err.responseText) {
throw new Error(`IMAP Connection Error: ${err.responseText}`);
@@ -76,9 +72,8 @@ export class ImapConnector implements IEmailConnector {
* Disconnects from the IMAP server if the connection is active.
*/
private async disconnect(): Promise<void> {
if (this.isConnected && this.client.usable) {
if (this.client.usable) {
await this.client.logout();
this.isConnected = false;
}
}
@@ -129,7 +124,7 @@ export class ImapConnector implements IEmailConnector {
return await action();
} catch (err: any) {
logger.error({ err, attempt }, `IMAP operation failed on attempt ${attempt}`);
this.isConnected = false; // Force reconnect on next attempt
// The client is no longer usable, a new one will be created on the next attempt.
if (attempt === maxRetries) {
logger.error({ err }, 'IMAP operation failed after all retries.');
throw err;
@@ -147,31 +142,30 @@ export class ImapConnector implements IEmailConnector {
public async *fetchEmails(
userEmail: string,
syncState?: SyncState | null
syncState?: SyncState | null,
checkDuplicate?: (messageId: string) => Promise<boolean>
): AsyncGenerator<EmailObject | null> {
try {
// list all mailboxes first
const mailboxes = await this.withRetry(async () => await this.client.list());
const processableMailboxes = mailboxes.filter((mailbox) => {
// filter out trash and all mail emails
// Exclude mailboxes that cannot be selected.
if (mailbox.flags.has('\\Noselect')) {
return false;
}
if (config.app.allInclusiveArchive) {
return true;
}
// filter out junk/spam mail emails
if (mailbox.specialUse) {
const specialUse = mailbox.specialUse.toLowerCase();
if (
specialUse === '\\junk' ||
specialUse === '\\trash' ||
specialUse === '\\all'
) {
if (specialUse === '\\junk' || specialUse === '\\trash') {
return false;
}
}
// Fallback to checking flags
if (
mailbox.flags.has('\\Noselect') ||
mailbox.flags.has('\\Trash') ||
mailbox.flags.has('\\Junk') ||
mailbox.flags.has('\\All')
) {
if (mailbox.flags.has('\\Trash') || mailbox.flags.has('\\Junk')) {
return false;
}
@@ -225,6 +219,22 @@ export class ImapConnector implements IEmailConnector {
this.newMaxUids[mailboxPath] = msg.uid;
}
// Optimization: Verify existence using Message-ID from envelope before fetching full body
if (checkDuplicate && msg.envelope?.messageId) {
const isDuplicate = await checkDuplicate(msg.envelope.messageId);
if (isDuplicate) {
logger.debug(
{
mailboxPath,
uid: msg.uid,
messageId: msg.envelope.messageId,
},
'Skipping duplicate email (pre-check)'
);
continue;
}
}
logger.debug({ mailboxPath, uid: msg.uid }, 'Processing message');
if (msg.envelope && msg.source) {

View File

@@ -1,174 +1,240 @@
import type {
MboxImportCredentials,
EmailObject,
EmailAddress,
SyncState,
MailboxUser,
MboxImportCredentials,
EmailObject,
EmailAddress,
SyncState,
MailboxUser,
} from '@open-archiver/types';
import type { IEmailConnector } from '../EmailProviderFactory';
import { simpleParser, ParsedMail, Attachment, AddressObject } from 'mailparser';
import { logger } from '../../config/logger';
import { getThreadId } from './helpers/utils';
import { StorageService } from '../StorageService';
import { Readable } from 'stream';
import { Readable, Transform } from 'stream';
import { createHash } from 'crypto';
import { streamToBuffer } from '../../helpers/streamToBuffer';
import { promises as fs, createReadStream } from 'fs';
class MboxSplitter extends Transform {
private buffer: Buffer = Buffer.alloc(0);
private delimiter: Buffer = Buffer.from('\nFrom ');
private firstChunk: boolean = true;
_transform(chunk: Buffer, encoding: string, callback: Function) {
if (this.firstChunk) {
// Check if the file starts with "From ". If not, prepend it to the first email.
if (chunk.subarray(0, 5).toString() !== 'From ') {
this.push(Buffer.from('From '));
}
this.firstChunk = false;
}
let currentBuffer = Buffer.concat([this.buffer, chunk]);
let position;
while ((position = currentBuffer.indexOf(this.delimiter)) > -1) {
const email = currentBuffer.subarray(0, position);
if (email.length > 0) {
this.push(email);
}
// The next email starts with "From ", which is what the parser expects.
currentBuffer = currentBuffer.subarray(position + 1);
}
this.buffer = currentBuffer;
callback();
}
_flush(callback: Function) {
if (this.buffer.length > 0) {
this.push(this.buffer);
}
callback();
}
}
export class MboxConnector implements IEmailConnector {
private storage: StorageService;
private storage: StorageService;
constructor(private credentials: MboxImportCredentials) {
this.storage = new StorageService();
}
constructor(private credentials: MboxImportCredentials) {
this.storage = new StorageService();
}
public async testConnection(): Promise<boolean> {
try {
if (!this.credentials.uploadedFilePath) {
throw Error('Mbox file path not provided.');
}
if (!this.credentials.uploadedFilePath.includes('.mbox')) {
throw Error('Provided file is not in the MBOX format.');
}
const fileExist = await this.storage.exists(this.credentials.uploadedFilePath);
if (!fileExist) {
throw Error('Mbox file upload not finished yet, please wait.');
}
public async testConnection(): Promise<boolean> {
try {
const filePath = this.getFilePath();
if (!filePath) {
throw Error('Mbox file path not provided.');
}
if (!filePath.includes('.mbox')) {
throw Error('Provided file is not in the MBOX format.');
}
return true;
} catch (error) {
logger.error({ error, credentials: this.credentials }, 'Mbox file validation failed.');
throw error;
}
}
let fileExist = false;
if (this.credentials.localFilePath) {
try {
await fs.access(this.credentials.localFilePath);
fileExist = true;
} catch {
fileExist = false;
}
} else {
fileExist = await this.storage.exists(filePath);
}
public async *listAllUsers(): AsyncGenerator<MailboxUser> {
const displayName =
this.credentials.uploadedFileName || `mbox-import-${new Date().getTime()}`;
logger.info(`Found potential mailbox: ${displayName}`);
const constructedPrimaryEmail = `${displayName.replace(/ /g, '.').toLowerCase()}@mbox.local`;
yield {
id: constructedPrimaryEmail,
primaryEmail: constructedPrimaryEmail,
displayName: displayName,
};
}
if (!fileExist) {
if (this.credentials.localFilePath) {
throw Error(`Mbox file not found at path: ${this.credentials.localFilePath}`);
} else {
throw Error(
'Uploaded Mbox file not found. The upload may not have finished yet, or it failed.'
);
}
}
public async *fetchEmails(
userEmail: string,
syncState?: SyncState | null
): AsyncGenerator<EmailObject | null> {
try {
const fileStream = await this.storage.get(this.credentials.uploadedFilePath);
const fileBuffer = await streamToBuffer(fileStream as Readable);
const mboxContent = fileBuffer.toString('utf-8');
const emailDelimiter = '\nFrom ';
const emails = mboxContent.split(emailDelimiter);
return true;
} catch (error) {
logger.error(
{ error, credentials: this.credentials },
'Mbox file validation failed.'
);
throw error;
}
}
// The first split part might be empty or part of the first email's header, so we adjust.
if (emails.length > 0 && !mboxContent.startsWith('From ')) {
emails.shift(); // Adjust if the file doesn't start with "From "
}
private getFilePath(): string {
return this.credentials.localFilePath || this.credentials.uploadedFilePath || '';
}
logger.info(`Found ${emails.length} potential emails in the mbox file.`);
let emailCount = 0;
private async getFileStream(): Promise<NodeJS.ReadableStream> {
if (this.credentials.localFilePath) {
return createReadStream(this.credentials.localFilePath);
}
return this.storage.getStream(this.getFilePath());
}
for (const email of emails) {
try {
// Re-add the "From " delimiter for the parser, except for the very first email
const emailWithDelimiter =
emailCount > 0 || mboxContent.startsWith('From ') ? `From ${email}` : email;
const emailBuffer = Buffer.from(emailWithDelimiter, 'utf-8');
const emailObject = await this.parseMessage(emailBuffer, '');
yield emailObject;
emailCount++;
} catch (error) {
logger.error(
{ error, file: this.credentials.uploadedFilePath },
'Failed to process a single message from mbox file. Skipping.'
);
}
}
logger.info(`Finished processing mbox file. Total emails processed: ${emailCount}`);
} finally {
try {
await this.storage.delete(this.credentials.uploadedFilePath);
} catch (error) {
logger.error(
{ error, file: this.credentials.uploadedFilePath },
'Failed to delete mbox file after processing.'
);
}
}
}
public async *listAllUsers(): AsyncGenerator<MailboxUser> {
const displayName = this.getDisplayName();
logger.info(`Found potential mailbox: ${displayName}`);
const constructedPrimaryEmail = `${displayName.replace(/ /g, '.').toLowerCase()}@mbox.local`;
yield {
id: constructedPrimaryEmail,
primaryEmail: constructedPrimaryEmail,
displayName: displayName,
};
}
private async parseMessage(emlBuffer: Buffer, path: string): Promise<EmailObject> {
const parsedEmail: ParsedMail = await simpleParser(emlBuffer);
private getDisplayName(): string {
if (this.credentials.uploadedFileName) {
return this.credentials.uploadedFileName;
}
if (this.credentials.localFilePath) {
const parts = this.credentials.localFilePath.split('/');
return parts[parts.length - 1].replace('.mbox', '');
}
return `mbox-import-${new Date().getTime()}`;
}
const attachments = parsedEmail.attachments.map((attachment: Attachment) => ({
filename: attachment.filename || 'untitled',
contentType: attachment.contentType,
size: attachment.size,
content: attachment.content as Buffer,
}));
public async *fetchEmails(
userEmail: string,
syncState?: SyncState | null
): AsyncGenerator<EmailObject | null> {
const filePath = this.getFilePath();
const fileStream = await this.getFileStream();
const mboxSplitter = new MboxSplitter();
const emailStream = fileStream.pipe(mboxSplitter);
const mapAddresses = (
addresses: AddressObject | AddressObject[] | undefined
): EmailAddress[] => {
if (!addresses) return [];
const addressArray = Array.isArray(addresses) ? addresses : [addresses];
return addressArray.flatMap((a) =>
a.value.map((v) => ({
name: v.name,
address: v.address?.replaceAll(`'`, '') || '',
}))
);
};
for await (const emailBuffer of emailStream) {
try {
const emailObject = await this.parseMessage(emailBuffer as Buffer, '');
yield emailObject;
} catch (error) {
logger.error(
{ error, file: filePath },
'Failed to process a single message from mbox file. Skipping.'
);
}
}
const threadId = getThreadId(parsedEmail.headers);
let messageId = parsedEmail.messageId;
if (this.credentials.uploadedFilePath && !this.credentials.localFilePath) {
try {
await this.storage.delete(filePath);
} catch (error) {
logger.error(
{ error, file: filePath },
'Failed to delete mbox file after processing.'
);
}
}
}
if (!messageId) {
messageId = `generated-${createHash('sha256').update(emlBuffer).digest('hex')}`;
}
private async parseMessage(emlBuffer: Buffer, path: string): Promise<EmailObject> {
const parsedEmail: ParsedMail = await simpleParser(emlBuffer);
const from = mapAddresses(parsedEmail.from);
if (from.length === 0) {
from.push({ name: 'No Sender', address: 'No Sender' });
}
const attachments = parsedEmail.attachments.map((attachment: Attachment) => ({
filename: attachment.filename || 'untitled',
contentType: attachment.contentType,
size: attachment.size,
content: attachment.content as Buffer,
}));
// Extract folder path from headers. Mbox files don't have a standard folder structure, so we rely on custom headers added by email clients.
// Gmail uses 'X-Gmail-Labels', and other clients like Thunderbird may use 'X-Folder'.
const gmailLabels = parsedEmail.headers.get('x-gmail-labels');
const folderHeader = parsedEmail.headers.get('x-folder');
let finalPath = '';
const mapAddresses = (
addresses: AddressObject | AddressObject[] | undefined
): EmailAddress[] => {
if (!addresses) return [];
const addressArray = Array.isArray(addresses) ? addresses : [addresses];
return addressArray.flatMap((a) =>
a.value.map((v) => ({
name: v.name,
address: v.address?.replaceAll(`'`, '') || '',
}))
);
};
if (gmailLabels && typeof gmailLabels === 'string') {
// We take the first label as the primary folder.
// Gmail labels can be hierarchical, but we'll simplify to the first label.
finalPath = gmailLabels.split(',')[0];
} else if (folderHeader && typeof folderHeader === 'string') {
finalPath = folderHeader;
}
const threadId = getThreadId(parsedEmail.headers);
let messageId = parsedEmail.messageId;
return {
id: messageId,
threadId: threadId,
from,
to: mapAddresses(parsedEmail.to),
cc: mapAddresses(parsedEmail.cc),
bcc: mapAddresses(parsedEmail.bcc),
subject: parsedEmail.subject || '',
body: parsedEmail.text || '',
html: parsedEmail.html || '',
headers: parsedEmail.headers,
attachments,
receivedAt: parsedEmail.date || new Date(),
eml: emlBuffer,
path: finalPath,
};
}
if (!messageId) {
messageId = `generated-${createHash('sha256').update(emlBuffer).digest('hex')}`;
}
public getUpdatedSyncState(): SyncState {
return {};
}
const from = mapAddresses(parsedEmail.from);
if (from.length === 0) {
from.push({ name: 'No Sender', address: 'No Sender' });
}
// Extract folder path from headers. Mbox files don't have a standard folder structure, so we rely on custom headers added by email clients.
// Gmail uses 'X-Gmail-Labels', and other clients like Thunderbird may use 'X-Folder'.
const gmailLabels = parsedEmail.headers.get('x-gmail-labels');
const folderHeader = parsedEmail.headers.get('x-folder');
let finalPath = '';
if (gmailLabels && typeof gmailLabels === 'string') {
// We take the first label as the primary folder.
// Gmail labels can be hierarchical, but we'll simplify to the first label.
finalPath = gmailLabels.split(',')[0];
} else if (folderHeader && typeof folderHeader === 'string') {
finalPath = folderHeader;
}
return {
id: messageId,
threadId: threadId,
from,
to: mapAddresses(parsedEmail.to),
cc: mapAddresses(parsedEmail.cc),
bcc: mapAddresses(parsedEmail.bcc),
subject: parsedEmail.subject || '',
body: parsedEmail.text || '',
html: parsedEmail.html || '',
headers: parsedEmail.headers,
attachments,
receivedAt: parsedEmail.date || new Date(),
eml: emlBuffer,
path: finalPath,
};
}
public getUpdatedSyncState(): SyncState {
return {};
}
}

View File

@@ -13,15 +13,8 @@ import { getThreadId } from './helpers/utils';
import { StorageService } from '../StorageService';
import { Readable } from 'stream';
import { createHash } from 'crypto';
const streamToBuffer = (stream: Readable): Promise<Buffer> => {
return new Promise((resolve, reject) => {
const chunks: Buffer[] = [];
stream.on('data', (chunk) => chunks.push(chunk));
stream.on('error', reject);
stream.on('end', () => resolve(Buffer.concat(chunks)));
});
};
import { join } from 'path';
import { createWriteStream, createReadStream, promises as fs } from 'fs';
// We have to hardcode names for deleted and trash folders here as current lib doesn't support looking into PST properties.
const DELETED_FOLDERS = new Set([
@@ -113,38 +106,75 @@ const JUNK_FOLDERS = new Set([
export class PSTConnector implements IEmailConnector {
private storage: StorageService;
private pstFile: PSTFile | null = null;
constructor(private credentials: PSTImportCredentials) {
this.storage = new StorageService();
}
private async loadPstFile(): Promise<PSTFile> {
if (this.pstFile) {
return this.pstFile;
private getFilePath(): string {
return this.credentials.localFilePath || this.credentials.uploadedFilePath || '';
}
private async getFileStream(): Promise<NodeJS.ReadableStream> {
if (this.credentials.localFilePath) {
return createReadStream(this.credentials.localFilePath);
}
const fileStream = await this.storage.get(this.credentials.uploadedFilePath);
const buffer = await streamToBuffer(fileStream as Readable);
this.pstFile = new PSTFile(buffer);
return this.pstFile;
return this.storage.getStream(this.getFilePath());
}
private async loadPstFile(): Promise<{ pstFile: PSTFile; tempDir: string }> {
const fileStream = await this.getFileStream();
const tempDir = await fs.mkdtemp(join('/tmp', `pst-import-${new Date().getTime()}`));
const tempFilePath = join(tempDir, 'temp.pst');
await new Promise<void>((resolve, reject) => {
const dest = createWriteStream(tempFilePath);
fileStream.pipe(dest);
dest.on('finish', resolve);
dest.on('error', reject);
});
const pstFile = new PSTFile(tempFilePath);
return { pstFile, tempDir };
}
public async testConnection(): Promise<boolean> {
try {
if (!this.credentials.uploadedFilePath) {
const filePath = this.getFilePath();
if (!filePath) {
throw Error('PST file path not provided.');
}
if (!this.credentials.uploadedFilePath.includes('.pst')) {
if (!filePath.includes('.pst')) {
throw Error('Provided file is not in the PST format.');
}
const fileExist = await this.storage.exists(this.credentials.uploadedFilePath);
if (!fileExist) {
throw Error('PST file upload not finished yet, please wait.');
let fileExist = false;
if (this.credentials.localFilePath) {
try {
await fs.access(this.credentials.localFilePath);
fileExist = true;
} catch {
fileExist = false;
}
} else {
fileExist = await this.storage.exists(filePath);
}
if (!fileExist) {
if (this.credentials.localFilePath) {
throw Error(`PST file not found at path: ${this.credentials.localFilePath}`);
} else {
throw Error(
'Uploaded PST file not found. The upload may not have finished yet, or it failed.'
);
}
}
return true;
} catch (error) {
logger.error({ error, credentials: this.credentials }, 'PST file validation failed.');
logger.error(
{ error, credentials: this.credentials },
'PST file validation failed.'
);
throw error;
}
}
@@ -156,8 +186,11 @@ export class PSTConnector implements IEmailConnector {
*/
public async *listAllUsers(): AsyncGenerator<MailboxUser> {
let pstFile: PSTFile | null = null;
let tempDir: string | null = null;
try {
pstFile = await this.loadPstFile();
const loadResult = await this.loadPstFile();
pstFile = loadResult.pstFile;
tempDir = loadResult.tempDir;
const root = pstFile.getRootFolder();
const displayName: string =
root.displayName || pstFile.pstFilename || String(new Date().getTime());
@@ -171,10 +204,12 @@ export class PSTConnector implements IEmailConnector {
};
} catch (error) {
logger.error({ error }, 'Failed to list users from PST file.');
pstFile?.close();
throw error;
} finally {
pstFile?.close();
if (tempDir) {
await fs.rm(tempDir, { recursive: true, force: true });
}
}
}
@@ -183,23 +218,30 @@ export class PSTConnector implements IEmailConnector {
syncState?: SyncState | null
): AsyncGenerator<EmailObject | null> {
let pstFile: PSTFile | null = null;
let tempDir: string | null = null;
try {
pstFile = await this.loadPstFile();
const loadResult = await this.loadPstFile();
pstFile = loadResult.pstFile;
tempDir = loadResult.tempDir;
const root = pstFile.getRootFolder();
yield* this.processFolder(root, '', userEmail);
} catch (error) {
logger.error({ error }, 'Failed to fetch email.');
pstFile?.close();
throw error;
} finally {
pstFile?.close();
try {
await this.storage.delete(this.credentials.uploadedFilePath);
} catch (error) {
logger.error(
{ error, file: this.credentials.uploadedFilePath },
'Failed to delete PST file after processing.'
);
if (tempDir) {
await fs.rm(tempDir, { recursive: true, force: true });
}
if (this.credentials.uploadedFilePath && !this.credentials.localFilePath) {
try {
await this.storage.delete(this.credentials.uploadedFilePath);
} catch (error) {
logger.error(
{ error, file: this.credentials.uploadedFilePath },
'Failed to delete PST file after processing.'
);
}
}
}
}
@@ -281,8 +323,8 @@ export class PSTConnector implements IEmailConnector {
emlBuffer ?? Buffer.from(parsedEmail.text || parsedEmail.html || '', 'utf-8')
)
.digest('hex')}-${createHash('sha256')
.update(emlBuffer ?? Buffer.from(msg.subject || '', 'utf-8'))
.digest('hex')}-${msg.clientSubmitTime?.getTime()}`;
.update(emlBuffer ?? Buffer.from(msg.subject || '', 'utf-8'))
.digest('hex')}-${msg.clientSubmitTime?.getTime()}`;
}
return {
id: messageId,

View File

@@ -1,11 +1,11 @@
import { Worker } from 'bullmq';
import { connection } from '../config/redis';
import indexEmailProcessor from '../jobs/processors/index-email.processor';
import indexEmailBatchProcessor from '../jobs/processors/index-email-batch.processor';
const processor = async (job: any) => {
switch (job.name) {
case 'index-email':
return indexEmailProcessor(job);
case 'index-email-batch':
return indexEmailBatchProcessor(job);
default:
throw new Error(`Unknown job name: ${job.name}`);
}
@@ -13,12 +13,11 @@ const processor = async (job: any) => {
const worker = new Worker('indexing', processor, {
connection,
concurrency: 5,
removeOnComplete: {
count: 1000, // keep last 1000 jobs
count: 100, // keep last 100 jobs
},
removeOnFail: {
count: 5000, // keep last 5000 failed jobs
count: 500, // keep last 500 failed jobs
},
});

View File

@@ -4,8 +4,14 @@
"outDir": "./dist",
"rootDir": "./src",
"emitDecoratorMetadata": true,
"experimentalDecorators": true
"experimentalDecorators": true,
"composite": true
},
"include": ["src/**/*.ts"],
"exclude": ["node_modules", "dist"]
"exclude": ["node_modules", "dist"],
"references": [
{
"path": "../types"
}
]
}

Some files were not shown because too many files have changed in this diff Show More