Attachment ocr #238

Closed
opened 2026-04-05 16:17:10 +02:00 by MrUnknownDE · 0 comments
Owner

Originally created by @wayneshn on 9/7/2025

Implementation of #93

Key changes

  1. Create a centralized OcrService A singleton service will be created to manage a persistent pool of Tesseract workers for the lifetime of the indexing.worker process.

  2. Update textExtractor.ts to support more file types The extractText function will be updated to handle a wider range of file types that can benefit from OCR.

  3. Integrate OCR service into the indexing worker Modify packages/backend/src/workers/indexing.worker.ts to include graceful shutdown for the OcrService.

  4. Allow users to choose which lanauges the OCR service supports

*Originally created by @wayneshn on 9/7/2025* Implementation of #93 ## Key changes 1. **Create a centralized `OcrService`** A singleton service will be created to manage a persistent pool of Tesseract workers for the lifetime of the `indexing.worker` process. 2. **Update `textExtractor.ts` to support more file types** The `extractText` function will be updated to handle a wider range of file types that can benefit from OCR. 3. **Integrate OCR service into the indexing worker** Modify `packages/backend/src/workers/indexing.worker.ts` to include graceful shutdown for the `OcrService`. 4. Allow users to choose which lanauges the OCR service supports
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github/OpenArchiver#238