feat: Add Mbox ingestion (#117)

This commit introduces two major features:

1.  **Mbox File Ingestion:**
    Users can now ingest emails from Mbox files (`.mbox`). A new Mbox connector has been implemented on the backend, and the user interface has been updated to support creating Mbox ingestion sources. Documentation for this new provider has also been added.

Additionally, this commit includes new documentation for upgrading and migrating Open Archiver.

Co-authored-by: Wayne <5291640+ringoinca@users.noreply.github.com>
This commit is contained in:
Wei S.
2025-09-16 20:30:22 +03:00
committed by GitHub
parent ce3f379b7a
commit e9a65f9672
27 changed files with 1979 additions and 212 deletions

View File

@@ -46,12 +46,14 @@ Password: openarchiver_demo
- Microsoft 365
- PST files
- Zipped .eml files
- Mbox files
- **Secure & Efficient Storage**: Emails are stored in the standard `.eml` format. The system uses deduplication and compression to minimize storage costs. All data is encrypted at rest.
- **Pluggable Storage Backends**: Support both local filesystem storage and S3-compatible object storage (like AWS S3 or MinIO).
- **Powerful Search & eDiscovery**: A high-performance search engine indexes the full text of emails and attachments (PDF, DOCX, etc.).
- **Thread discovery**: The ability to discover if an email belongs to a thread/conversation and present the context.
- **Compliance & Retention**: Define granular retention policies to automatically manage the lifecycle of your data. Place legal holds on communications to prevent deletion during litigation (TBD).
- **File Hash and Encryption**: Email and attachment file hash values are stored in the meta database upon ingestion, meaning any attempt to alter the file content will be identified, ensuring legal and regulatory compliance.
- **Comprehensive Auditing**: An immutable audit trail logs all system activities, ensuring you have a clear record of who accessed what and when (TBD).
## 🛠️ Tech Stack

View File

@@ -52,6 +52,7 @@ export default defineConfig({
},
{ text: 'EML Import', link: '/user-guides/email-providers/eml' },
{ text: 'PST Import', link: '/user-guides/email-providers/pst' },
{ text: 'Mbox Import', link: '/user-guides/email-providers/mbox' },
],
},
{
@@ -64,6 +65,20 @@ export default defineConfig({
},
],
},
{
text: 'Upgrading and Migration',
collapsed: true,
items: [
{
text: 'Upgrading',
link: '/user-guides/upgrade-and-migration/upgrade',
},
{
text: 'Meilisearch Upgrade',
link: '/user-guides/upgrade-and-migration/meilisearch-upgrade',
},
],
},
],
},
{

View File

@@ -9,3 +9,4 @@ Choose your provider from the list below to get started:
- [Generic IMAP Server](./imap.md)
- [EML Import](./eml.md)
- [PST Import](./pst.md)
- [Mbox Import](./mbox.md)

View File

@@ -0,0 +1,29 @@
# Mbox Ingestion
Mbox is a common format for storing email messages. This guide will walk you through the process of ingesting mbox files into OpenArchiver.
## 1. Exporting from Your Email Client
Most email clients that support mbox exports will allow you to export a folder of emails as a single `.mbox` file. Here are the general steps:
- **Mozilla Thunderbird**: Right-click on a folder, select **ImportExportTools NG**, and then choose **Export folder**.
- **Gmail**: You can use Google Takeout to export your emails in mbox format.
- **Other Clients**: Refer to your email client's documentation for instructions on how to export emails to an mbox file.
## 2. Uploading to OpenArchiver
Once you have your `.mbox` file, you can upload it to OpenArchiver through the web interface.
1. Navigate to the **Ingestion** page.
2. Click on the **New Ingestion** button.
3. Select **Mbox** as the source type.
4. Upload your `.mbox` file.
## 3. Folder Structure
OpenArchiver will attempt to preserve the original folder structure of your emails. This is done by inspecting the following email headers:
- `X-Gmail-Labels`: Used by Gmail to store labels.
- `X-Folder`: A custom header used by some email clients like Thunderbird.
If neither of these headers is present, the emails will be ingested into the root of the archive.

View File

@@ -138,7 +138,9 @@ docker compose ps
Once the services are running, you can access the Open Archiver web interface by navigating to `http://localhost:3000` in your web browser.
You can log in with the `ADMIN_EMAIL` and `ADMIN_PASSWORD` you configured in your `.env` file.
Upon first visit, you will be redirected to the `/setup` page where you can set up your admin account. Make sure you are the first person who accesses the instance.
If you are not redirected to the `/setup` page but instead see the login page, there might be something wrong with the database. Restart the service and try again.
## 5. Next Steps
@@ -212,9 +214,9 @@ If you are using local storage to store your emails, based on your `docker-compo
Run this command to see all the volumes on your system:
```bash
docker volume ls
```
```bash
docker volume ls
```
2. **Identify the correct volume**:
@@ -224,28 +226,28 @@ Look through the list for a volume name that ends with `_archiver-data`. The par
Once you've identified the correct volume name, use it in the `inspect` command. For example:
```bash
docker volume inspect <your_volume_name_here>
```
```bash
docker volume inspect <your_volume_name_here>
```
This will give you the correct `Mountpoint` path where your data is being stored. It will look something like this (the exact path will vary depending on your system):
```json
{
"CreatedAt": "2025-07-25T11:22:19Z",
"Driver": "local",
"Labels": {
"com.docker.compose.config-hash": "---",
"com.docker.compose.project": "---",
"com.docker.compose.version": "2.38.2",
"com.docker.compose.volume": "us8wwos0o4ok4go4gc8cog84_archiver-data"
},
"Mountpoint": "/var/lib/docker/volumes/us8wwos0o4ok4go4gc8cog84_archiver-data/_data",
"Name": "us8wwos0o4ok4go4gc8cog84_archiver-data",
"Options": null,
"Scope": "local"
}
```
```json
{
"CreatedAt": "2025-07-25T11:22:19Z",
"Driver": "local",
"Labels": {
"com.docker.compose.config-hash": "---",
"com.docker.compose.project": "---",
"com.docker.compose.version": "2.38.2",
"com.docker.compose.volume": "us8wwos0o4ok4go4gc8cog84_archiver-data"
},
"Mountpoint": "/var/lib/docker/volumes/us8wwos0o4ok4go4gc8cog84_archiver-data/_data",
"Name": "us8wwos0o4ok4go4gc8cog84_archiver-data",
"Options": null,
"Scope": "local"
}
```
In this example, the data is located at `/var/lib/docker/volumes/us8wwos0o4ok4go4gc8cog84_archiver-data/_data`. You can then `cd` into that directory to see your files.
@@ -259,44 +261,44 @@ Heres how you can do it:
Open the `docker-compose.yml` file and find the `open-archiver` service. You're going to change the `volumes` section.
**Change this:**
**Change this:**
```yaml
services:
open-archiver:
# ... other config
volumes:
- archiver-data:/var/data/open-archiver
```
```yaml
services:
open-archiver:
# ... other config
volumes:
- archiver-data:/var/data/open-archiver
```
**To this:**
**To this:**
```yaml
services:
open-archiver:
# ... other config
volumes:
- ./data/open-archiver:/var/data/open-archiver
```
```yaml
services:
open-archiver:
# ... other config
volumes:
- ./data/open-archiver:/var/data/open-archiver
```
You'll also want to remove the `archiver-data` volume definition at the bottom of the file, since it's no longer needed.
**Remove this whole block:**
**Remove this whole block:**
```yaml
volumes:
# ... other volumes
archiver-data:
driver: local
```
```yaml
volumes:
# ... other volumes
archiver-data:
driver: local
```
2. **Restart your containers**:
After you've saved the changes, run the following command in your terminal to apply them. The `--force-recreate` flag will ensure the container is recreated with the new volume settings.
```bash
docker-compose up -d --force-recreate
```
```bash
docker-compose up -d --force-recreate
```
After this, any new data will be saved directly into the `./data/open-archiver` folder in your project directory.

View File

@@ -0,0 +1,93 @@
# Upgrading Meilisearch
Meilisearch, the search engine used by Open Archiver, requires a manual data migration process when upgrading to a new version. This is because Meilisearch databases are only compatible with the specific version that created them.
If an Open Archiver upgrade includes a major Meilisearch version change, you will need to migrate your search index by following the process below.
## Migration Process Overview
For self-hosted instances using Docker Compose (as recommended), the migration process involves creating a data dump from your current Meilisearch instance, upgrading the Docker image, and then importing that dump into the new version.
### Step 1: Create a Dump
Before upgrading, you must create a dump of your existing Meilisearch data. You can do this by sending a POST request to the `/dumps` endpoint of the Meilisearch API.
1. **Find your Meilisearch container name**:
```bash
docker compose ps
```
Look for the service name that corresponds to Meilisearch, usually `meilisearch`.
2. **Execute the dump command**:
You will need your Meilisearch Admin API key, which can be found in your `.env` file as `MEILI_MASTER_KEY`.
```bash
curl -X POST 'http://localhost:7700/dumps' \
-H "Authorization: Bearer YOUR_MEILI_MASTER_KEY"
```
This will start the dump creation process. The dump file will be created inside the `meili_data` volume used by the Meilisearch container.
3. **Monitor the dump status**:
The dump creation request returns a `taskUid`. You can use this to check the status of the dump.
For more details on dump and import, see the [official Meilisearch documentation](https://www.meilisearch.com/docs/learn/update_and_migration/updating).
### Step 2: Upgrade Your Open Archiver Instance
Once the dump is successfully created, you can proceed with the standard Open Archiver upgrade process.
1. **Pull the latest changes and Docker images**:
```bash
git pull
docker compose pull
```
2. **Stop the running services**:
```bash
docker compose down
```
### Step 3: Import the Dump
Now, you need to restart the services while telling Meilisearch to import from your dump file.
1. **Modify `docker-compose.yml`**:
You need to temporarily add the `--import-dump` flag to the Meilisearch service command. Find the `meilisearch` service in your `docker-compose.yml` and modify the `command` section.
You will need the name of your dump file. It will be a `.dump` file located in the directory mapped to `/meili_data` inside the container.
```yaml
services:
meilisearch:
# ... other service config
command:
[
'--master-key=${MEILI_MASTER_KEY}',
'--env=production',
'--import-dump=/meili_data/dumps/YOUR_DUMP_FILE.dump',
]
```
2. **Restart the services**:
```bash
docker compose up -d
```
Meilisearch will now start and import the data from the dump file. This may take some time depending on the size of your index.
### Step 4: Clean Up
Once the import is complete and you have verified that your search is working correctly, you should remove the `--import-dump` flag from your `docker-compose.yml` to prevent it from running on every startup.
1. **Remove the `--import-dump` line** from the `command` section of the `meilisearch` service in `docker-compose.yml`.
2. **Restart the services** one last time:
```bash
docker compose up -d
```
Your Meilisearch instance is now upgraded and running with your migrated data.
For more advanced scenarios or troubleshooting, please refer to the **[official Meilisearch migration guide](https://www.meilisearch.com/docs/learn/update_and_migration/updating)**.

View File

@@ -0,0 +1,42 @@
# Upgrading Your Instance
This guide provides instructions for upgrading your Open Archiver instance to the latest version.
## Checking for New Versions
Open Archiver automatically checks for new versions and will display a notification in the footer of the web interface when an update is available. You can find a list of all releases and their release notes on the [GitHub Releases](https://github.com/LogicLabs-OU/OpenArchiver/releases) page.
## Upgrading Your Instance
To upgrade your Open Archiver instance, follow these steps:
1. **Pull the latest changes from the repository**:
```bash
git pull
```
2. **Pull the latest Docker images**:
```bash
docker compose pull
```
3. **Restart the services with the new images**:
```bash
docker compose up -d
```
This will restart your Open Archiver instance with the latest version of the application.
## Migrating Data
When you upgrade to a new version, database migrations are applied automatically when the application starts up. This ensures that your database schema is always up-to-date with the latest version of the application.
No manual intervention is required for database migrations.
## Upgrading Meilisearch
When an Open Archiver update includes a major version change for Meilisearch, you will need to manually migrate your search data. This process is not covered by the standard upgrade commands.
For detailed instructions, please see the [Meilisearch Upgrade Guide](./meilisearch-upgrade.md).

77
open-archiver.yml Normal file
View File

@@ -0,0 +1,77 @@
# documentation: https://openarchiver.com
# slogan: A self-hosted, open-source email archiving solution with full-text search capability.
# tags: email archiving,email,compliance,search
# logo: svgs/openarchiver.svg
# port: 3000
services:
open-archiver:
image: logiclabshq/open-archiver:latest
environment:
- SERVICE_URL_3000
- SERVICE_URL=${SERVICE_URL_3000}
- PORT_BACKEND=${PORT_BACKEND:-4000}
- PORT_FRONTEND=${PORT_FRONTEND:-3000}
- NODE_ENV=${NODE_ENV:-production}
- SYNC_FREQUENCY=${SYNC_FREQUENCY:-* * * * *}
- POSTGRES_DB=${POSTGRES_DB:-open_archive}
- POSTGRES_USER=${POSTGRES_USER:-admin}
- POSTGRES_PASSWORD=${SERVICE_PASSWORD_POSTGRES}
- DATABASE_URL=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}
- MEILI_MASTER_KEY=${SERVICE_PASSWORD_MEILISEARCH}
- MEILI_HOST=http://meilisearch:7700
- REDIS_HOST=valkey
- REDIS_PORT=6379
- REDIS_PASSWORD=${SERVICE_PASSWORD_VALKEY}
- REDIS_TLS_ENABLED=false
- STORAGE_TYPE=${STORAGE_TYPE:-local}
- STORAGE_LOCAL_ROOT_PATH=${STORAGE_LOCAL_ROOT_PATH:-/var/data/open-archiver}
- BODY_SIZE_LIMIT=${BODY_SIZE_LIMIT:-100M}
- STORAGE_S3_ENDPOINT=${STORAGE_S3_ENDPOINT}
- STORAGE_S3_BUCKET=${STORAGE_S3_BUCKET}
- STORAGE_S3_ACCESS_KEY_ID=${STORAGE_S3_ACCESS_KEY_ID}
- STORAGE_S3_SECRET_ACCESS_KEY=${STORAGE_S3_SECRET_ACCESS_KEY}
- STORAGE_S3_REGION=${STORAGE_S3_REGION}
- STORAGE_S3_FORCE_PATH_STYLE=${STORAGE_S3_FORCE_PATH_STYLE:-false}
- JWT_SECRET=${SERVICE_BASE64_128_JWT}
- JWT_EXPIRES_IN=${JWT_EXPIRES_IN:-7d}
- ENCRYPTION_KEY=${SERVICE_BASE64_64_ENCRYPTIONKEY}
- RATE_LIMIT_WINDOW_MS=${RATE_LIMIT_WINDOW_MS:-60000}
- RATE_LIMIT_MAX_REQUESTS=${RATE_LIMIT_MAX_REQUESTS:-100}
volumes:
- archiver-data:/var/data/open-archiver
depends_on:
postgres:
condition: service_healthy
valkey:
condition: service_started
meilisearch:
condition: service_started
postgres:
image: postgres:17-alpine
environment:
- POSTGRES_DB=${POSTGRES_DB}
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${SERVICE_PASSWORD_POSTGRES}
- LC_ALL=C
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ['CMD-SHELL', 'pg_isready -U $${POSTGRES_USER} -d $${POSTGRES_DB}']
interval: 10s
timeout: 20s
retries: 10
valkey:
image: valkey/valkey:8-alpine
command: valkey-server --requirepass ${SERVICE_PASSWORD_VALKEY}
volumes:
- valkeydata:/data
meilisearch:
image: getmeili/meilisearch:v1.15
environment:
- MEILI_MASTER_KEY=${SERVICE_PASSWORD_MEILISEARCH}
volumes:
- meilidata:/meili_data

View File

@@ -1,6 +1,6 @@
{
"name": "open-archiver",
"version": "0.3.1",
"version": "0.3.3",
"private": true,
"scripts": {
"dev": "dotenv -- pnpm --filter \"./packages/*\" --parallel dev",

View File

@@ -7,6 +7,7 @@ import { config } from '../../config/index';
export const uploadFile = async (req: Request, res: Response) => {
const storage = new StorageService();
const bb = busboy({ headers: req.headers });
const uploads: Promise<void>[] = [];
let filePath = '';
let originalFilename = '';
@@ -14,10 +15,11 @@ export const uploadFile = async (req: Request, res: Response) => {
originalFilename = filename.filename;
const uuid = randomUUID();
filePath = `${config.storage.openArchiverFolderName}/tmp/${uuid}-${originalFilename}`;
storage.put(filePath, file);
uploads.push(storage.put(filePath, file));
});
bb.on('finish', () => {
bb.on('finish', async () => {
await Promise.all(uploads);
res.json({ filePath });
});

View File

@@ -2,6 +2,7 @@ import pino from 'pino';
export const logger = pino({
level: process.env.LOG_LEVEL || 'info',
redact: ['password'],
transport: {
target: 'pino-pretty',
options: {

View File

@@ -0,0 +1 @@
ALTER TYPE "public"."ingestion_provider" ADD VALUE 'mbox_import';

File diff suppressed because it is too large Load Diff

View File

@@ -1,146 +1,153 @@
{
"version": "7",
"dialect": "postgresql",
"entries": [
{
"idx": 0,
"version": "7",
"when": 1752225352591,
"tag": "0000_amusing_namora",
"breakpoints": true
},
{
"idx": 1,
"version": "7",
"when": 1752326803882,
"tag": "0001_odd_night_thrasher",
"breakpoints": true
},
{
"idx": 2,
"version": "7",
"when": 1752332648392,
"tag": "0002_lethal_quentin_quire",
"breakpoints": true
},
{
"idx": 3,
"version": "7",
"when": 1752332967084,
"tag": "0003_petite_wrecker",
"breakpoints": true
},
{
"idx": 4,
"version": "7",
"when": 1752606108876,
"tag": "0004_sleepy_paper_doll",
"breakpoints": true
},
{
"idx": 5,
"version": "7",
"when": 1752606327253,
"tag": "0005_chunky_sue_storm",
"breakpoints": true
},
{
"idx": 6,
"version": "7",
"when": 1753112018514,
"tag": "0006_majestic_caretaker",
"breakpoints": true
},
{
"idx": 7,
"version": "7",
"when": 1753190159356,
"tag": "0007_handy_archangel",
"breakpoints": true
},
{
"idx": 8,
"version": "7",
"when": 1753370737317,
"tag": "0008_eminent_the_spike",
"breakpoints": true
},
{
"idx": 9,
"version": "7",
"when": 1754337938241,
"tag": "0009_late_lenny_balinger",
"breakpoints": true
},
{
"idx": 10,
"version": "7",
"when": 1754420780849,
"tag": "0010_perpetual_lightspeed",
"breakpoints": true
},
{
"idx": 11,
"version": "7",
"when": 1754422064158,
"tag": "0011_tan_blackheart",
"breakpoints": true
},
{
"idx": 12,
"version": "7",
"when": 1754476962901,
"tag": "0012_warm_the_stranger",
"breakpoints": true
},
{
"idx": 13,
"version": "7",
"when": 1754659373517,
"tag": "0013_classy_talkback",
"breakpoints": true
},
{
"idx": 14,
"version": "7",
"when": 1754831765718,
"tag": "0014_foamy_vapor",
"breakpoints": true
},
{
"idx": 15,
"version": "7",
"when": 1755443936046,
"tag": "0015_wakeful_norman_osborn",
"breakpoints": true
},
{
"idx": 16,
"version": "7",
"when": 1755780572342,
"tag": "0016_lonely_mariko_yashida",
"breakpoints": true
},
{
"idx": 17,
"version": "7",
"when": 1755961566627,
"tag": "0017_tranquil_shooting_star",
"breakpoints": true
},
{
"idx": 18,
"version": "7",
"when": 1756911118035,
"tag": "0018_flawless_owl",
"breakpoints": true
},
{
"idx": 19,
"version": "7",
"when": 1756937533843,
"tag": "0019_confused_scream",
"breakpoints": true
}
]
}
"version": "7",
"dialect": "postgresql",
"entries": [
{
"idx": 0,
"version": "7",
"when": 1752225352591,
"tag": "0000_amusing_namora",
"breakpoints": true
},
{
"idx": 1,
"version": "7",
"when": 1752326803882,
"tag": "0001_odd_night_thrasher",
"breakpoints": true
},
{
"idx": 2,
"version": "7",
"when": 1752332648392,
"tag": "0002_lethal_quentin_quire",
"breakpoints": true
},
{
"idx": 3,
"version": "7",
"when": 1752332967084,
"tag": "0003_petite_wrecker",
"breakpoints": true
},
{
"idx": 4,
"version": "7",
"when": 1752606108876,
"tag": "0004_sleepy_paper_doll",
"breakpoints": true
},
{
"idx": 5,
"version": "7",
"when": 1752606327253,
"tag": "0005_chunky_sue_storm",
"breakpoints": true
},
{
"idx": 6,
"version": "7",
"when": 1753112018514,
"tag": "0006_majestic_caretaker",
"breakpoints": true
},
{
"idx": 7,
"version": "7",
"when": 1753190159356,
"tag": "0007_handy_archangel",
"breakpoints": true
},
{
"idx": 8,
"version": "7",
"when": 1753370737317,
"tag": "0008_eminent_the_spike",
"breakpoints": true
},
{
"idx": 9,
"version": "7",
"when": 1754337938241,
"tag": "0009_late_lenny_balinger",
"breakpoints": true
},
{
"idx": 10,
"version": "7",
"when": 1754420780849,
"tag": "0010_perpetual_lightspeed",
"breakpoints": true
},
{
"idx": 11,
"version": "7",
"when": 1754422064158,
"tag": "0011_tan_blackheart",
"breakpoints": true
},
{
"idx": 12,
"version": "7",
"when": 1754476962901,
"tag": "0012_warm_the_stranger",
"breakpoints": true
},
{
"idx": 13,
"version": "7",
"when": 1754659373517,
"tag": "0013_classy_talkback",
"breakpoints": true
},
{
"idx": 14,
"version": "7",
"when": 1754831765718,
"tag": "0014_foamy_vapor",
"breakpoints": true
},
{
"idx": 15,
"version": "7",
"when": 1755443936046,
"tag": "0015_wakeful_norman_osborn",
"breakpoints": true
},
{
"idx": 16,
"version": "7",
"when": 1755780572342,
"tag": "0016_lonely_mariko_yashida",
"breakpoints": true
},
{
"idx": 17,
"version": "7",
"when": 1755961566627,
"tag": "0017_tranquil_shooting_star",
"breakpoints": true
},
{
"idx": 18,
"version": "7",
"when": 1756911118035,
"tag": "0018_flawless_owl",
"breakpoints": true
},
{
"idx": 19,
"version": "7",
"when": 1756937533843,
"tag": "0019_confused_scream",
"breakpoints": true
},
{
"idx": 20,
"version": "7",
"when": 1757860242528,
"tag": "0020_panoramic_wolverine",
"breakpoints": true
}
]
}

View File

@@ -8,6 +8,7 @@ export const ingestionProviderEnum = pgEnum('ingestion_provider', [
'generic_imap',
'pst_import',
'eml_import',
'mbox_import',
]);
export const ingestionStatusEnum = pgEnum('ingestion_status', [

View File

@@ -5,6 +5,7 @@ import type {
GenericImapCredentials,
PSTImportCredentials,
EMLImportCredentials,
MboxImportCredentials,
EmailObject,
SyncState,
MailboxUser,
@@ -14,6 +15,7 @@ import { MicrosoftConnector } from './ingestion-connectors/MicrosoftConnector';
import { ImapConnector } from './ingestion-connectors/ImapConnector';
import { PSTConnector } from './ingestion-connectors/PSTConnector';
import { EMLConnector } from './ingestion-connectors/EMLConnector';
import { MboxConnector } from './ingestion-connectors/MboxConnector';
// Define a common interface for all connectors
export interface IEmailConnector {
@@ -43,6 +45,8 @@ export class EmailProviderFactory {
return new PSTConnector(credentials as PSTImportCredentials);
case 'eml_import':
return new EMLConnector(credentials as EMLImportCredentials);
case 'mbox_import':
return new MboxConnector(credentials as MboxImportCredentials);
default:
throw new Error(`Unsupported provider: ${source.provider}`);
}

View File

@@ -26,6 +26,7 @@ import { SearchService } from './SearchService';
import { DatabaseService } from './DatabaseService';
import { config } from '../config/index';
import { FilterBuilder } from './FilterBuilder';
import e from 'express';
export class IngestionService {
private static decryptSource(
@@ -47,7 +48,7 @@ export class IngestionService {
}
public static returnFileBasedIngestions(): IngestionProvider[] {
return ['pst_import', 'eml_import'];
return ['pst_import', 'eml_import', 'mbox_import'];
}
public static async create(
@@ -76,9 +77,13 @@ export class IngestionService {
const connector = EmailProviderFactory.createConnector(decryptedSource);
try {
await connector.testConnection();
const connectionValid = await connector.testConnection();
// If connection succeeds, update status to auth_success, which triggers the initial import.
return await this.update(decryptedSource.id, { status: 'auth_success' });
if (connectionValid) {
return await this.update(decryptedSource.id, { status: 'auth_success' });
} else {
throw Error('Ingestion authentication failed.')
}
} catch (error) {
// If connection fails, delete the newly created source and throw the error.
await this.delete(decryptedSource.id);

View File

@@ -69,7 +69,7 @@ export class EMLConnector implements IEmailConnector {
syncState?: SyncState | null
): AsyncGenerator<EmailObject | null> {
const fileStream = await this.storage.get(this.credentials.uploadedFilePath);
const tempDir = await fs.mkdtemp(join('/tmp', 'eml-import-'));
const tempDir = await fs.mkdtemp(join('/tmp', `eml-import-${new Date().getTime()}`));
const unzippedPath = join(tempDir, 'unzipped');
await fs.mkdir(unzippedPath);
const zipFilePath = join(tempDir, 'eml.zip');
@@ -115,6 +115,14 @@ export class EMLConnector implements IEmailConnector {
throw error;
} finally {
await fs.rm(tempDir, { recursive: true, force: true });
try {
await this.storage.delete(this.credentials.uploadedFilePath);
} catch (error) {
logger.error(
{ error, file: this.credentials.uploadedFilePath },
'Failed to delete EML file after processing.'
);
}
}
}

View File

@@ -0,0 +1,174 @@
import type {
MboxImportCredentials,
EmailObject,
EmailAddress,
SyncState,
MailboxUser,
} from '@open-archiver/types';
import type { IEmailConnector } from '../EmailProviderFactory';
import { simpleParser, ParsedMail, Attachment, AddressObject } from 'mailparser';
import { logger } from '../../config/logger';
import { getThreadId } from './helpers/utils';
import { StorageService } from '../StorageService';
import { Readable } from 'stream';
import { createHash } from 'crypto';
import { streamToBuffer } from '../../helpers/streamToBuffer';
export class MboxConnector implements IEmailConnector {
private storage: StorageService;
constructor(private credentials: MboxImportCredentials) {
this.storage = new StorageService();
}
public async testConnection(): Promise<boolean> {
try {
if (!this.credentials.uploadedFilePath) {
throw Error('Mbox file path not provided.');
}
if (!this.credentials.uploadedFilePath.includes('.mbox')) {
throw Error('Provided file is not in the MBOX format.');
}
const fileExist = await this.storage.exists(this.credentials.uploadedFilePath);
if (!fileExist) {
throw Error('Mbox file upload not finished yet, please wait.');
}
return true;
} catch (error) {
logger.error({ error, credentials: this.credentials }, 'Mbox file validation failed.');
throw error;
}
}
public async *listAllUsers(): AsyncGenerator<MailboxUser> {
const displayName =
this.credentials.uploadedFileName || `mbox-import-${new Date().getTime()}`;
logger.info(`Found potential mailbox: ${displayName}`);
const constructedPrimaryEmail = `${displayName.replace(/ /g, '.').toLowerCase()}@mbox.local`;
yield {
id: constructedPrimaryEmail,
primaryEmail: constructedPrimaryEmail,
displayName: displayName,
};
}
public async *fetchEmails(
userEmail: string,
syncState?: SyncState | null
): AsyncGenerator<EmailObject | null> {
try {
const fileStream = await this.storage.get(this.credentials.uploadedFilePath);
const fileBuffer = await streamToBuffer(fileStream as Readable);
const mboxContent = fileBuffer.toString('utf-8');
const emailDelimiter = '\nFrom ';
const emails = mboxContent.split(emailDelimiter);
// The first split part might be empty or part of the first email's header, so we adjust.
if (emails.length > 0 && !mboxContent.startsWith('From ')) {
emails.shift(); // Adjust if the file doesn't start with "From "
}
logger.info(`Found ${emails.length} potential emails in the mbox file.`);
let emailCount = 0;
for (const email of emails) {
try {
// Re-add the "From " delimiter for the parser, except for the very first email
const emailWithDelimiter =
emailCount > 0 || mboxContent.startsWith('From ') ? `From ${email}` : email;
const emailBuffer = Buffer.from(emailWithDelimiter, 'utf-8');
const emailObject = await this.parseMessage(emailBuffer, '');
yield emailObject;
emailCount++;
} catch (error) {
logger.error(
{ error, file: this.credentials.uploadedFilePath },
'Failed to process a single message from mbox file. Skipping.'
);
}
}
logger.info(`Finished processing mbox file. Total emails processed: ${emailCount}`);
} finally {
try {
await this.storage.delete(this.credentials.uploadedFilePath);
} catch (error) {
logger.error(
{ error, file: this.credentials.uploadedFilePath },
'Failed to delete mbox file after processing.'
);
}
}
}
private async parseMessage(emlBuffer: Buffer, path: string): Promise<EmailObject> {
const parsedEmail: ParsedMail = await simpleParser(emlBuffer);
const attachments = parsedEmail.attachments.map((attachment: Attachment) => ({
filename: attachment.filename || 'untitled',
contentType: attachment.contentType,
size: attachment.size,
content: attachment.content as Buffer,
}));
const mapAddresses = (
addresses: AddressObject | AddressObject[] | undefined
): EmailAddress[] => {
if (!addresses) return [];
const addressArray = Array.isArray(addresses) ? addresses : [addresses];
return addressArray.flatMap((a) =>
a.value.map((v) => ({
name: v.name,
address: v.address?.replaceAll(`'`, '') || '',
}))
);
};
const threadId = getThreadId(parsedEmail.headers);
let messageId = parsedEmail.messageId;
if (!messageId) {
messageId = `generated-${createHash('sha256').update(emlBuffer).digest('hex')}`;
}
const from = mapAddresses(parsedEmail.from);
if (from.length === 0) {
from.push({ name: 'No Sender', address: 'No Sender' });
}
// Extract folder path from headers. Mbox files don't have a standard folder structure, so we rely on custom headers added by email clients.
// Gmail uses 'X-Gmail-Labels', and other clients like Thunderbird may use 'X-Folder'.
const gmailLabels = parsedEmail.headers.get('x-gmail-labels');
const folderHeader = parsedEmail.headers.get('x-folder');
let finalPath = '';
if (gmailLabels && typeof gmailLabels === 'string') {
// We take the first label as the primary folder.
// Gmail labels can be hierarchical, but we'll simplify to the first label.
finalPath = gmailLabels.split(',')[0];
} else if (folderHeader && typeof folderHeader === 'string') {
finalPath = folderHeader;
}
return {
id: messageId,
threadId: threadId,
from,
to: mapAddresses(parsedEmail.to),
cc: mapAddresses(parsedEmail.cc),
bcc: mapAddresses(parsedEmail.bcc),
subject: parsedEmail.subject || '',
body: parsedEmail.text || '',
html: parsedEmail.html || '',
headers: parsedEmail.headers,
attachments,
receivedAt: parsedEmail.date || new Date(),
eml: emlBuffer,
path: finalPath,
};
}
public getUpdatedSyncState(): SyncState {
return {};
}
}

View File

@@ -193,6 +193,14 @@ export class PSTConnector implements IEmailConnector {
throw error;
} finally {
pstFile?.close();
try {
await this.storage.delete(this.credentials.uploadedFilePath);
} catch (error) {
logger.error(
{ error, file: this.credentials.uploadedFilePath },
'Failed to delete PST file after processing.'
);
}
}
}
@@ -273,8 +281,8 @@ export class PSTConnector implements IEmailConnector {
emlBuffer ?? Buffer.from(parsedEmail.text || parsedEmail.html || '', 'utf-8')
)
.digest('hex')}-${createHash('sha256')
.update(emlBuffer ?? Buffer.from(msg.subject || '', 'utf-8'))
.digest('hex')}-${msg.clientSubmitTime?.getTime()}`;
.update(emlBuffer ?? Buffer.from(msg.subject || '', 'utf-8'))
.digest('hex')}-${msg.clientSubmitTime?.getTime()}`;
}
return {
id: messageId,

View File

@@ -19,6 +19,7 @@
"bits-ui": "^2.8.10",
"clsx": "^2.1.1",
"d3-shape": "^3.2.0",
"html-entities": "^2.6.0",
"jose": "^6.0.1",
"lucide-svelte": "^0.525.0",
"postal-mime": "^2.4.4",

View File

@@ -2,6 +2,7 @@
import PostalMime, { type Email } from 'postal-mime';
import type { Buffer } from 'buffer';
import { t } from '$lib/translations';
import { encode } from 'html-entities';
let {
raw,
@@ -18,7 +19,9 @@
if (parsedEmail && parsedEmail.html) {
return `<base target="_blank" />${parsedEmail.html}`;
} else if (parsedEmail && parsedEmail.text) {
return `<base target="_blank" />${parsedEmail.text}`;
// display raw text email body in html
const safeHtmlContent: string = encode(parsedEmail.text);
return `<base target="_blank" /><div>${safeHtmlContent.replaceAll('\n', '<br>')}</div>`;
} else if (rawHtml) {
return `<base target="_blank" />${rawHtml}`;
}
@@ -53,7 +56,7 @@
<div class="mt-2 rounded-md border bg-white p-4">
{#if isLoading}
<p>{$t('app.components.email_preview.loading')}</p>
{:else if emailHtml}
{:else if emailHtml()}
<iframe
title={$t('app.archive.email_preview')}
srcdoc={emailHtml()}

View File

@@ -41,6 +41,10 @@
value: 'eml_import',
label: $t('app.components.ingestion_source_form.provider_eml_import'),
},
{
value: 'mbox_import',
label: $t('app.components.ingestion_source_form.provider_mbox_import'),
},
];
let formData: CreateIngestionSourceDto = $state({
@@ -55,7 +59,6 @@
$effect(() => {
formData.providerConfig.type = formData.provider;
console.log(formData);
});
const triggerContent = $derived(
@@ -101,7 +104,6 @@
formData.providerConfig.uploadedFilePath = result.filePath;
formData.providerConfig.uploadedFileName = file.name;
fileUploading = false;
} catch (error) {
fileUploading = false;
@@ -224,10 +226,13 @@
<Checkbox id="secure" bind:checked={formData.providerConfig.secure} />
</div>
<div class="grid grid-cols-4 items-center gap-4">
<Label for="secure" class="text-left"
<Label for="allowInsecureCert" class="text-left"
>{$t('app.components.ingestion_source_form.allow_insecure_cert')}</Label
>
<Checkbox id="secure" bind:checked={formData.providerConfig.allowInsecureCert} />
<Checkbox
id="allowInsecureCert"
bind:checked={formData.providerConfig.allowInsecureCert}
/>
</div>
{:else if formData.provider === 'pst_import'}
<div class="grid grid-cols-4 items-center gap-4">
@@ -265,6 +270,24 @@
{/if}
</div>
</div>
{:else if formData.provider === 'mbox_import'}
<div class="grid grid-cols-4 items-center gap-4">
<Label for="mbox-file" class="text-left"
>{$t('app.components.ingestion_source_form.mbox_file')}</Label
>
<div class="col-span-3 flex flex-row items-center space-x-2">
<Input
id="mbox-file"
type="file"
class=""
accept=".mbox"
onchange={handleFileChange}
/>
{#if fileUploading}
<span class=" text-primary animate-spin"><Loader2 /></span>
{/if}
</div>
</div>
{/if}
{#if formData.provider === 'google_workspace' || formData.provider === 'microsoft_365'}
<Alert.Root>

View File

@@ -172,6 +172,7 @@
"provider_microsoft_365": "Microsoft 365",
"provider_pst_import": "PST Import",
"provider_eml_import": "EML Import",
"provider_mbox_import": "Mbox Import",
"select_provider": "Select a provider",
"service_account_key": "Service Account Key (JSON)",
"service_account_key_placeholder": "Paste your service account key JSON content",
@@ -187,6 +188,7 @@
"allow_insecure_cert": "Allow insecure cert",
"pst_file": "PST File",
"eml_file": "EML File",
"mbox_file": "Mbox File",
"heads_up": "Heads up!",
"org_wide_warning": "Please note that this is an organization-wide operation. This kind of ingestions will import and index <b>all</b> email inboxes in your organization. If you want to import only specific email inboxes, use the IMAP connector.",
"upload_failed": "Upload Failed, please try again"

View File

@@ -435,7 +435,12 @@
</div>
<Dialog.Root bind:open={isDialogOpen}>
<Dialog.Content class="sm:max-w-120 md:max-w-180">
<Dialog.Content
class="sm:max-w-120 md:max-w-180"
onInteractOutside={(e) => {
e.preventDefault();
}}
>
<Dialog.Header>
<Dialog.Title
>{selectedSource ? $t('app.ingestions.edit') : $t('app.ingestions.create')}{' '}

View File

@@ -23,7 +23,8 @@ export type IngestionProvider =
| 'microsoft_365'
| 'generic_imap'
| 'pst_import'
| 'eml_import';
| 'eml_import'
| 'mbox_import';
export type IngestionStatus =
| 'active'
@@ -81,13 +82,20 @@ export interface EMLImportCredentials extends BaseIngestionCredentials {
uploadedFilePath: string;
}
export interface MboxImportCredentials extends BaseIngestionCredentials {
type: 'mbox_import';
uploadedFileName: string;
uploadedFilePath: string;
}
// Discriminated union for all possible credential types
export type IngestionCredentials =
| GenericImapCredentials
| GoogleWorkspaceCredentials
| Microsoft365Credentials
| PSTImportCredentials
| EMLImportCredentials;
| EMLImportCredentials
| MboxImportCredentials;
export interface IngestionSource {
id: string;

8
pnpm-lock.yaml generated
View File

@@ -217,6 +217,9 @@ importers:
d3-shape:
specifier: ^3.2.0
version: 3.2.0
html-entities:
specifier: ^2.6.0
version: 2.6.0
jose:
specifier: ^6.0.1
version: 6.0.11
@@ -2949,6 +2952,9 @@ packages:
hookable@5.5.3:
resolution: {integrity: sha512-Yc+BQe8SvoXH1643Qez1zqLRmbA5rCL+sSmk6TVos0LWVfNIB7PGncdlId77WzLGSIB5KaWgTaNTs2lNVEI6VQ==}
html-entities@2.6.0:
resolution: {integrity: sha512-kig+rMn/QOVRvr7c86gQ8lWXq+Hkv6CbAH1hLu+RG338StTpE8Z0b44SDVaqVu7HGKf27frdmUYEs9hTUX/cLQ==}
html-to-text@9.0.5:
resolution: {integrity: sha512-qY60FjREgVZL03vJU6IfMV4GDjGBIoOyvuFdpBDIX9yTlDw0TjxVBQp+P8NvpdIXNJvfWBTNul7fsAQJq2FNpg==}
engines: {node: '>=14'}
@@ -7601,6 +7607,8 @@ snapshots:
hookable@5.5.3: {}
html-entities@2.6.0: {}
html-to-text@9.0.5:
dependencies:
'@selderee/plugin-htmlparser2': 0.11.0