OpenArchiver

mirror of https://github.com/LogicLabs-OU/OpenArchiver.git synced 2026-04-06 00:31:57 +02:00

Author	SHA1	Message	Date
Wei S.	0c42b30c9e	V0.5.1 dev (#341 ) * OpenAPI root url fix * Journaling OSS setup * feat: add preserve-original-file mode for email ingestion for GoBD compliance - Add `preserveOriginalFile` option to ingestion sources and connectors - Stream original EML/MBOX/PST emails to temp files instead of holding full buffers in memory, reducing memory allocation during ingestion - Skip attachment binary extraction and EML re-serialization when preserve mode is enabled; use raw file on disk as source of truth - Update `EmailObject` to use `tempFilePath` instead of in-memory `eml` buffer across all connectors (EML, MBOX, PST) - Add new database migration (0032) for `preserve_original_file` column - Add frontend UI toggle with tooltip (tippy.js) for the new option - Replace console.warn calls with structured pino logger in connectors * add isjournaled property to archived_email * feat(ingestion): add unmerge ingestion source functionality Introduces the ability to detach a child ingestion source from its merge group, making it a standalone root source. Changes include: - Add `unmerge` controller method with auth and error handling - Add POST `/v1/ingestion-sources/{id}/unmerge` route with OpenAPI docs - Implement `IngestionService.unmerge` backend logic - Add unmerge UI action and handler in the frontend ingestion view - Fix bulk delete to also remove children of deleted root sources - Update docs with new API operation and merging sources user guide * code formatting * Database migration file for enum `partially_active` * Error handling improvement	2026-03-30 22:29:03 +02:00
Wei S.	e5e119528f	V0.5.0 release (#335 ) * adding exports to backend package, page icons update * Integrity report PDF generation * Fixed inline attachment images not displaying in the email preview by modifying `EmailPreview.svelte`. The email HTML references embedded images via `cid:` URIs (e.g., `src="cid:ii_19c6d5f8d5eee7bd6d91"`), but the component never resolved those `cid:` references to actual image data, even though `postal-mime` already parses inline attachments with their `contentId` and binary `content`. The `emailHtml` derived value now calls `resolveContentIdReferences()` before rendering, so inline/embedded images display correctly in the iframe preview. * feat: strip non-inline attachments from EML before storage Add nodemailer dependency and emlUtils helper to remove non-inline attachments from .eml buffers during ingestion. This avoids double-storing attachment data since attachments are already stored separately. * upload error handing for file based ingestion * Use Postgres for sync session management * Google workspace / MS 365 duplicate check, avoid extra API call when previous ingestion fails * OpenAPI specs for API docs * code formatting * ran duplicate check for IMAP import, optimize message listing * Version update	2026-03-20 13:14:41 +01:00
Wei S.	7dac3b2bfd	V0.4.2 (#310 ) * fix(api): correct API key generation and proxy handling This commit resolves an issue where generating a new API key would fail. The root cause was improper handling of POST request bodies in the frontend proxy server. - Refactored `ApiKeyController` methods to use arrow functions to ensure correct `this` binding. * User profile/account page, change password, API * docs(api): update ingestion source provider values Update the `CreateIngestionSourceDto` documentation in `ingestion.md` to reflect the current set of supported providers. * updating tag * feat: add REDIS_USER env variable (#172) * feat: add REDIS_USER env variable fixes #171 * add proper type for bullmq config * Bulgarian UI language strings added (backend+frontend) (#194) * Bulgarian UI Support added * BG language UI support - Create translation.json * update redis config logic * Update Bulgarian language setting, register language * Allow specifying local file path for mbox/eml/pst (#214) * Add agents AI doc * Allow local file path for Mbox file ingestion --------- Co-authored-by: Wei S. <5291640+wayneshn@users.noreply.github.com> * feat(ingestion): add local file path support and optimize EML processing - Frontend: Updated IngestionSourceForm to allow toggling between "Upload File" and "Local File Path" for PST, EML, and Mbox providers. - Frontend: Added logic to clear irrelevant form data when switching import methods. - Frontend: Added English translations for new form fields. - Backend: Refactored EMLConnector to stream ZIP entries using yauzl instead of extracting the full archive to disk, significantly improving efficiency for large archives. - Docs: Updated API documentation and User Guides (PST, EML, Mbox) to clarify "Local File Path" usage, specifically within Docker environments. * docs: add meilisearch dumpless upgrade guide and snapshot config Update `docker-compose.yml` to include the `MEILI_SCHEDULE_SNAPSHOT` environment variable, defaulting to 86400 seconds (24 hours), enabling periodic data snapshots for easier recovery. Shout out to @morph027 for the inspiration! Additionally, update the Meilisearch upgrade documentation to include an experimental "dumpless" upgrade guide while marking the previous method as the standard recommended process. * build(coolify): enable daily snapshots for meilisearch Configure the Meilisearch service in `open-archiver.yml` to create snapshots every 86400 seconds (24 hours) by setting the `MEILI_SCHEDULE_SNAPSHOT` environment variable. --------- Co-authored-by: Antonia Schwennesen <53372671+zophiana@users.noreply.github.com> Co-authored-by: IT Creativity + Art Team <admin@it-playground.net> Co-authored-by: Jan Berdajs <mrbrdo@gmail.com>	2026-02-23 21:25:44 +01:00
Wei S.	6e1ebbbfd7	v0.4 init: File encryption, integrity report, deletion protection, job monitoring (#187 ) * open-core setup, adding enterprise package * enterprise: Audit log API, UI * Audit-log docs * feat: Integrity report, allowing users to verify the integrity of archived emails and their attachments. - When an email is archived, Open Archiver calculates a unique cryptographic signature (a SHA256 hash) for the email's raw `.eml` file and for each of its attachments. These signatures are stored in the database alongside the email's metadata. - The integrity check feature recalculates these signatures for the stored files and compares them to the original signatures stored in the database. This process allows you to verify that the content of your archived emails has not been altered, corrupted, or tampered with since the moment they were archived. - Add docs of Integrity report * Update Docker-compose.yml to use bind mount for Open Archiver data. Fix API rate-limiter warning about trust proxy * File encryption support * Scope attachment deduplication to ingestion source Previously, attachment deduplication was handled globally by enforcing a unique constraint on the content hash (contentHashSha256) in the `attachments` table. This caused an issue where an attachment from one ingestion source would be incorrectly linked if the same attachment was processed by a different source. This commit refactors the deduplication logic to be scoped on a per-ingestion-source basis. Changes: - Schema: The `attachments` table schema has been updated to include a nullable `ingestionSourceId` column. A composite unique index has been added on `(ingestionSourceId, contentHashSha256)` to enforce per-source uniqueness. The `ingestionSourceId` is nullable to ensure backward compatibility with existing databases. - Ingestion Logic: The `IngestionService` has been updated to provide the `ingestionSourceId` when inserting attachment records. The `onConflictDoUpdate` clause now targets the new composite key, ensuring that attachments are only considered duplicates if they have the same hash and originate from the same ingestion source. * Scope attachment deduplication to ingestion source Previously, attachment deduplication was handled globally by enforcing a unique constraint on the content hash (contentHashSha256) in the `attachments` table. This caused an issue where an attachment from one ingestion source would be incorrectly linked if the same attachment was processed by a different source. This commit refactors the deduplication logic to be scoped on a per-ingestion-source basis. Changes: - Schema: The `attachments` table schema has been updated to include a nullable `ingestionSourceId` column. A composite unique index has been added on `(ingestionSourceId, contentHashSha256)` to enforce per-source uniqueness. The `ingestionSourceId` is nullable to ensure backward compatibility with existing databases. - Ingestion Logic: The `IngestionService` has been updated to provide the `ingestionSourceId` when inserting attachment records. The `onConflictDoUpdate` clause now targets the new composite key, ensuring that attachments are only considered duplicates if they have the same hash and originate from the same ingestion source. * Add option to disable deletions This commit introduces a new feature that allows admins to disable the deletion of emails and ingestion sources for the entire instance. This is a critical feature for compliance and data retention, as it prevents accidental or unauthorized deletions. Changes: - Configuration: Added an `ENABLE_DELETION` environment variable. If this variable is not set to `true`, all deletion operations will be disabled. - Deletion Guard: A centralized `checkDeletionEnabled` guard has been implemented to enforce this setting at both the controller and service levels, ensuring a robust and secure implementation. - Documentation: The installation guide has been updated to include the new `ENABLE_DELETION` environment variable and its behavior. - Refactor: The `IngestionService`'s `create` method was refactored to remove unnecessary calls to the `delete` method, simplifying the code and improving its robustness. * Adding position for menu items * feat(docker): Fix CORS errors This commit fixes CORS errors when running the app in Docker by introducing the `APP_URL` environment variable. A CORS policy is set up for the backend to only allow origin from the `APP_URL`. Key changes include: - New `APP_URL` and `ORIGIN` environment variables have been added to properly configure CORS and the SvelteKit adapter, making the application's public URL easily configurable. - Dockerfiles are updated to copy the entrypoint script, Drizzle config, and migration files into the final image. - Documentation and example files (`.env.example`, `docker-compose.yml`) have been updated to reflect these changes. * feat(attachments): De-duplicate attachment content by content hash This commit refactors attachment handling to allow multiple emails within the same ingestion source to reference attachments with identical content (same hash). Changes: - The unique index on the `attachments` table has been changed to a non-unique index to permit duplicate hash/source pairs. - The ingestion logic is updated to first check for an existing attachment with the same hash and source. If found, it reuses the existing record; otherwise, it creates a new one. This maintains storage de-duplication. - The email deletion logic is improved to be more robust. It now correctly removes the email-attachment link before checking if the attachment record and its corresponding file can be safely deleted. * Not filtering our Trash folder * feat(backend): Add BullMQ dashboard for job monitoring This commit introduces a web-based UI for monitoring and managing background jobs using Bullmq. Key changes: - A new `/api/v1/jobs` endpoint is created, serving the Bull Board dashboard. Access is restricted to authenticated administrators. - All BullMQ queue definitions (`ingestion`, `indexing`, `sync-scheduler`) have been centralized into a new `packages/backend/src/jobs/queues.ts` file. - Workers and services now import queue instances from this central file, improving code organization and removing redundant queue instantiations. * Add `ALL_INCLUSIVE_ARCHIVE` environment variable to disable jun filtering * Using BSL license * frontend: Responsive design for menu bar, pagination * License service/module * Remove demoMode logic * Formatting code * Remove enterprise packages * Fix package.json in packages * Search page responsive fix --------- Co-authored-by: Wayne <5291640+ringoinca@users.noreply.github.com>	2025-10-24 17:11:05 +02:00
Wei S.	d372ef7566	Feat: Tika Integration and Batch Indexing (#132 ) * Feat/tika integration (#94) * feat(Tika) Integration von Tika zur Textextraktion * feat(Tika) Integration of Apache Tika for text extraction * feat(Tika): Complete Tika integration with text extraction and docker-compose setup - Add Tika service to docker-compose.yml - Implement text sanitization and document validation - Improve batch processing with concurrency control * fix(comments) translated comments into english fix(docker) removed ports (only used for testing) * feat(indexing): Implement batch indexing for Meilisearch This change introduces batch processing for indexing emails into Meilisearch to significantly improve performance and throughput during ingestion. This change is based on the batch processing method previously contributed by @axeldunkel. Previously, each email was indexed individually, resulting in a high number of separate API calls. This approach was inefficient, especially for large mailboxes. The `processMailbox` queue worker now accumulates emails into a batch before sending them to the `IndexingService`. The service then uses the `addDocuments` Meilisearch API endpoint to index the entire batch in a single request, reducing network overhead and improving indexing speed. A new environment variable, `MEILI_INDEXING_BATCH`, has been added to make the batch size configurable, with a default of 500. Additionally, this commit includes minor refactoring: - The `TikaService` has been moved to its own dedicated file. - The `PendingEmail` type has been moved to the shared `@open-archiver/types` package. * chore(jobs): make continuous sync job scheduling idempotent Adds a static `jobId` to the repeatable 'schedule-continuous-sync' job. This prevents duplicate jobs from being scheduled if the server restarts. By providing a unique ID, the queue will update the existing repeatable job instead of creating a new one, ensuring the sync runs only at the configured frequency. --------- Co-authored-by: axeldunkel <53174090+axeldunkel@users.noreply.github.com> Co-authored-by: Wayne <5291640+ringoinca@users.noreply.github.com>	2025-09-26 11:34:32 +02:00
Wei S.	4b11cd931a	Docs: update rate limiting docs (#91 ) * Adding rate limiting docs * update rate limiting docs * Resolve conflict --------- Co-authored-by: Wayne <5291640+ringoinca@users.noreply.github.com>	2025-09-06 17:56:34 +03:00
Wei S.	22b173cbe4	Feat: Implement API key authentication (#84 ) * feat(auth): Implement API key authentication This commit enables API access with an API key system. This change provides a better experience for programmatic access and third-party integrations. Key changes include: - API Key Management: Users can now generate, manage, and revoke persistent API keys through a new "API Keys" section in the settings UI. - Authentication Middleware: API requests are now authenticated via an `X-API-KEY` header instead of the previous `Authorization: Bearer` token. - Backend Implementation: Adds a new `api_keys` database table, along with corresponding services, controllers, and routes to manage the key lifecycle securely. - Rate Limiting: The API rate limiter now uses the API key to identify and track requests. - Documentation: The API authentication documentation has been updated to reflect the new method. * Add configurable API rate limiting Two new variables are added to `.env.example`: - `RATE_LIMIT_WINDOW_MS`: The time window in milliseconds for which requests are checked (defaults to 15 minutes). - `RATE_LIMIT_MAX_REQUESTS`: The maximum number of requests allowed from an IP within the window (defaults to 100). The installation documentation has been updated to reflect these new configuration options. --------- Co-authored-by: Wayne <5291640+ringoinca@users.noreply.github.com>	2025-09-04 15:07:53 +03:00
Wei S.	774b0d7a6b	Bug fix: Status API response: needsSetup and Remove SUPER_API_KEY support (#83 ) * Disable system settings for demo mode * Status API response: needsSetup * Remove SUPER_API_KEY support --------- Co-authored-by: Wayne <5291640+ringoinca@users.noreply.github.com>	2025-09-03 16:30:06 +03:00
Wayne	82a83a71e4	BODY_SIZE_LIMIT fix, database url encode	2025-08-13 21:55:22 +03:00
Wayne	b03791d9a6	adding FRONTEND_BODY_SIZE_LIMIT to allow bigger file upload for the frontend. This is to fix the pst file upload error.	2025-08-13 19:20:19 +03:00
Wayne	4872ed597f	PST ingestion	2025-08-07 17:03:08 +03:00
Wayne	3201fbfe0b	Email thread improvement, user-defined sync frequency	2025-08-05 21:12:06 +03:00
Wayne	5217d24184	Docker Compose deployment	2025-07-25 16:29:09 +03:00
Wayne	8c12cda370	Docker Compose deployment	2025-07-25 15:50:25 +03:00
Wayne	946da7925b	Docker deployment	2025-07-24 23:43:38 +03:00
Wayne	9b25c8b9d3	Rename: Open Archiver	2025-07-15 00:48:40 +03:00
Wayne	7ed8d78d73	Ingestion service credentials encryption. Ingestion auth handling	2025-07-11 17:39:21 +03:00
Wayne	59e5c5c69b	Pluggable storage service (local + S3 compatible)	2025-07-11 16:21:40 +03:00
Wayne	3eb155ee16	Database migration. Adding ingestion service	2025-07-11 13:36:34 +03:00
Wayne	cc08f35ada	Admin user login support	2025-07-10 22:32:12 +03:00
Wayne	f243775ae6	scaffolding	2025-07-10 13:32:54 +03:00

21 Commits