feat(Logs): add Phase 5 optimizations for ClickHouse storage and query performance

This commit is contained in:
Nawaz Dhandala
2026-03-12 11:26:10 +00:00
parent 981f609b84
commit 46c9f36578

View File

@@ -302,6 +302,260 @@ These changes directly improve the daily log investigation experience for every
10. **Phase 3.3** - Drop Filters (moderate effort, cost optimization)
11. **Phase 4.x** - Patterns, Shortcuts, Data Scrubbing (future)
## Phase 5: ClickHouse Storage & Query Optimizations (P0) — Performance Foundation
These optimizations address fundamental storage and indexing gaps in the `LogItem` table that directly impact search speed, data correctness, and operational cost. They should be prioritized alongside or before Phase 1 since search UX improvements are only as fast as the underlying queries.
### Current Schema
```
Table: LogItem
Engine: MergeTree
Partition: sipHash64(projectId) % 16
Primary/Sort Keys: (projectId, time, serviceId)
```
| Column | ClickHouse Type | Notes |
|--------|----------------|-------|
| projectId | String | Tenant ID |
| serviceId | String | Service ID |
| time | DateTime | **Second precision only** |
| timeUnixNano | Int128 | Nanosecond timestamp (not in sort key) |
| severityText | String | Trace/Debug/Info/Warning/Error/Fatal |
| severityNumber | Int32 | OTEL severity number (0-24) |
| attributes | JSON | Flexible key-value store |
| attributeKeys | Array(String) | Pre-extracted keys for discovery |
| traceId | String | Optional trace correlation |
| spanId | String | Optional span correlation |
| body | String | Log message text |
### 5.1 Add Skip Indexes for Full-Text Search on `body` (Critical)
**Current**: `body ILIKE '%text%'` in `LogAggregationService.ts:244` performs a full scan of every row in the matching partitions. For a table with billions of rows, this is extremely slow.
**Target**: Add a token-based bloom filter index so ClickHouse can skip granules that definitely don't contain the search term.
**Implementation**:
- Add a `tokenbf_v1` index on the `body` column in the `Log` model definition at `Common/Models/AnalyticsModels/Log.ts`
- Extend `AnalyticsBaseModel` and `StatementGenerator` to support skip index definitions in the CREATE TABLE statement
- Migration to add the index to existing tables via `ALTER TABLE LogItem ADD INDEX`
```sql
-- Target DDL addition
INDEX idx_body body TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 4
```
**Expected improvement**: 10-100x faster text search on `body` for selective queries (terms that appear in a small fraction of granules). Non-selective queries (very common terms) still benefit from reduced I/O.
**Files to modify**:
- `Common/Models/AnalyticsModels/Log.ts` (add index definition)
- `Common/Models/AnalyticsModels/AnalyticsBaseModel/AnalyticsBaseModel.ts` (support skip index metadata)
- `Common/Server/Utils/AnalyticsDatabase/StatementGenerator.ts` (emit INDEX clause in CREATE TABLE)
- `Worker/DataMigrations/` (new migration to add index to existing table)
### 5.2 Add Skip Indexes for `traceId`, `spanId`, `severityText` (Critical)
**Current**: Filtering by traceId or spanId (the primary way to correlate logs with distributed traces) requires scanning all data within the projectId+time range. Severity filtering has the same problem.
**Target**: Add bloom filter and set indexes so ClickHouse can skip irrelevant granules.
**Implementation**:
```sql
-- Bloom filters for high-cardinality string columns
INDEX idx_trace_id traceId TYPE bloom_filter(0.01) GRANULARITY 1
INDEX idx_span_id spanId TYPE bloom_filter(0.01) GRANULARITY 1
-- Set index for low-cardinality severity (only ~7 distinct values)
INDEX idx_severity severityText TYPE set(10) GRANULARITY 4
```
- Add these index definitions to the `Log` model
- Create a data migration to apply indexes to existing tables
**Expected improvement**:
- traceId/spanId lookups: 50-1000x faster (bloom filter skips nearly all granules for a specific trace)
- severity filtering: 2-5x faster (set index skips granules that don't contain the target severity)
**Files to modify**:
- `Common/Models/AnalyticsModels/Log.ts` (add index definitions)
- `Common/Server/Utils/AnalyticsDatabase/StatementGenerator.ts` (emit INDEX clauses)
- `Worker/DataMigrations/` (new migration)
### 5.3 Upgrade `time` Column to `DateTime64(9)` (High)
**Current**: The `time` column uses ClickHouse `DateTime` which has **1-second granularity**. Logs within the same second from the same service are stored in arbitrary order. The `timeUnixNano` field (Int128) preserves nanosecond precision but is not in the sort key, so it cannot be used for sub-second ordering.
**Target**: Use `DateTime64(9)` (nanosecond precision) so the sort key naturally orders logs at sub-second resolution.
**Implementation**:
- Change the `time` column type from `TableColumnType.Date` to a new `TableColumnType.DateTime64` in the Log model
- Add `DateTime64` support to `StatementGenerator` and the ClickHouse type mapping in `Statement.ts`
- Update ingestion code in `OtelLogsIngestService.ts` to write DateTime64-compatible timestamps
- Migration: `ALTER TABLE LogItem MODIFY COLUMN time DateTime64(9)` (this is a metadata-only operation in ClickHouse for MergeTree tables)
- Consider whether `timeUnixNano` column can be deprecated after this change since `time` would carry the same precision
**Impact**: Correct sub-second log ordering. Currently, logs from a burst of activity within the same second may appear in wrong order.
**Files to modify**:
- `Common/Models/AnalyticsModels/Log.ts` (change column type)
- `Common/Types/AnalyticsDatabase/TableColumnType.ts` (add DateTime64 type)
- `Common/Server/Utils/AnalyticsDatabase/Statement.ts` (add DateTime64 mapping)
- `Common/Server/Utils/AnalyticsDatabase/StatementGenerator.ts` (handle DateTime64 in CREATE/SELECT)
- `Telemetry/Services/OtelLogsIngestService.ts` (write DateTime64 timestamps)
- `Worker/DataMigrations/` (new migration)
### 5.4 Add TTL for Automatic Data Retention (High)
**Current**: The ingestion service tracks `dataRetentionInDays` per service but there is **no TTL clause** on the ClickHouse table. Data is never automatically deleted, leading to unbounded storage growth.
**Target**: ClickHouse-native TTL so old data is automatically dropped.
**Implementation**:
- Add a TTL clause to the LogItem table definition:
```sql
TTL time + INTERVAL 30 DAY DELETE
```
- For per-tenant or per-service retention, options include:
- A single global TTL (simplest) with a background job that deletes data beyond the service-specific retention
- A `retentionDate` column populated at ingest time (`time + dataRetentionInDays`) with `TTL retentionDate DELETE`
- Extend `AnalyticsBaseModel` to support TTL configuration
- Migration to add TTL to existing table: `ALTER TABLE LogItem MODIFY TTL time + INTERVAL 30 DAY DELETE`
**Impact**: Prevents disk exhaustion. Without TTL, the only way to remove old data is manual `ALTER TABLE DELETE` which is expensive and must be scheduled externally.
**Files to modify**:
- `Common/Models/AnalyticsModels/Log.ts` (add TTL config)
- `Common/Models/AnalyticsModels/AnalyticsBaseModel/AnalyticsBaseModel.ts` (TTL support)
- `Common/Server/Utils/AnalyticsDatabase/StatementGenerator.ts` (emit TTL clause)
- `Worker/DataMigrations/` (new migration)
### 5.5 Fix SQL Construction in `LogAggregationService` (High)
**Current**: `LogAggregationService.appendCommonFilters()` at lines 206-249 constructs SQL via string interpolation for serviceIds, severityTexts, traceIds, and spanIds:
```typescript
// Current - string interpolation with manual escaping
const idStrings = request.serviceIds.map(id => `'${id.toString()}'`);
statement.append(` AND serviceId IN (${idStrings.join(",")})`);
```
While `ObjectID.toString()` is likely safe, severity and trace/span values come from user input and are only protected by a simple `escapeSingleQuotes` function. This is fragile and inconsistent with the parameterized approach used elsewhere.
**Target**: Use parameterized queries for all filter values.
**Implementation**:
- Refactor `appendCommonFilters` to use the `Statement` class's parameterized query support for all IN clauses
- Each value should be a named parameter with proper type annotation
- Remove the `escapeSingleQuotes` helper since it would no longer be needed
**Files to modify**:
- `Common/Server/Services/LogAggregationService.ts` (refactor appendCommonFilters)
### 5.6 Add ZSTD Compression on `body` Column (Medium)
**Current**: The table uses ClickHouse's default LZ4 compression for all columns. The `body` column is typically the largest column (log messages are verbose text) and is highly compressible.
**Target**: Use ZSTD compression on `body` for better compression ratio.
**Implementation**:
- Add codec specification to the `body` column: `body String CODEC(ZSTD(3))`
- Extend `AnalyticsBaseModel`/`StatementGenerator` to support per-column codec specifications
- Migration: `ALTER TABLE LogItem MODIFY COLUMN body String CODEC(ZSTD(3))`
- Consider ZSTD for `attributes` column as well (JSON is also highly compressible)
**Expected improvement**: 30-50% reduction in storage for the `body` column compared to LZ4, with minimal CPU overhead increase on reads.
**Files to modify**:
- `Common/Models/AnalyticsModels/Log.ts` (add codec config)
- `Common/Types/AnalyticsDatabase/TableColumn.ts` (add codec property)
- `Common/Server/Utils/AnalyticsDatabase/StatementGenerator.ts` (emit CODEC clause)
- `Worker/DataMigrations/` (new migration)
### 5.7 Add Projections for Histogram Queries (Medium)
**Current**: `projections: []` is empty. Every histogram query (group by time bucket + severity) and facet query scans raw data and performs the aggregation from scratch.
**Target**: ClickHouse projections that pre-aggregate data for the most common query patterns.
**Implementation**:
- Add a projection for histogram/severity aggregation:
```sql
PROJECTION proj_severity_histogram (
SELECT
severityText,
toStartOfInterval(time, INTERVAL 1 MINUTE) AS minute,
count() AS cnt
ORDER BY (projectId, minute, severityText)
)
```
- Extend the existing `Projection` type at `Common/Types/AnalyticsDatabase/Projection.ts` to support full projection definitions
- Wire projection creation into `StatementGenerator.toTableCreateStatement()`
- Migration to materialize the projection on existing data: `ALTER TABLE LogItem MATERIALIZE PROJECTION proj_severity_histogram`
**Expected improvement**: 5-10x faster histogram queries since ClickHouse reads the pre-aggregated projection instead of scanning raw log rows.
**Files to modify**:
- `Common/Models/AnalyticsModels/Log.ts` (define projections)
- `Common/Types/AnalyticsDatabase/Projection.ts` (extend type)
- `Common/Server/Utils/AnalyticsDatabase/StatementGenerator.ts` (emit PROJECTION clause)
- `Worker/DataMigrations/` (new migration to materialize)
### 5.8 Store Missing OpenTelemetry Log Fields (Low)
**Current**: Several standard OTEL log record fields are dropped during ingestion:
- `observedTimeUnixNano` — when the log was collected by the pipeline (useful for measuring ingestion lag)
- `droppedAttributesCount` — signals data loss during collection
- `flags` — log record flags (e.g., W3C trace flags)
**Target**: Preserve these fields for full OTEL compliance and operational debugging.
**Implementation**:
- Add optional columns to the Log model:
- `observedTimeUnixNano` (LongNumber)
- `droppedAttributesCount` (Number, default 0)
- `flags` (Number, default 0)
- Update `OtelLogsIngestService.processLogsAsync()` to extract and store these fields
- Migration to add columns to existing table
**Files to modify**:
- `Common/Models/AnalyticsModels/Log.ts` (add columns)
- `Telemetry/Services/OtelLogsIngestService.ts` (extract additional fields)
- `Worker/DataMigrations/` (new migration)
### 5.x Performance Impact Summary
| Optimization | Query Pattern Improved | Expected Speedup | Effort |
|-------------|----------------------|-------------------|--------|
| 5.1 Body text index | Full-text search on body | 10-100x | Medium |
| 5.2 traceId/spanId/severity indexes | Trace correlation, severity filter | 50-1000x (trace), 2-5x (severity) | Small |
| 5.3 DateTime64 time column | Sub-second log ordering | Correctness fix | Medium |
| 5.4 TTL data retention | Storage management | Prevents disk exhaustion | Small |
| 5.5 Parameterized SQL | Security hardening | N/A (security fix) | Small |
| 5.6 ZSTD compression | Storage cost | 30-50% less disk for body | Small |
| 5.7 Histogram projections | Histogram and severity aggregation | 5-10x | Medium |
| 5.8 Missing OTEL fields | OTEL compliance | N/A (completeness) | Small |
### 5.x Recommended Sub-Phase Order
1. **5.2** — Skip indexes for traceId/spanId/severity (smallest effort, largest query speedup)
2. **5.1** — Body text index (biggest impact on user-facing search)
3. **5.5** — Parameterized SQL fix (security, small effort)
4. **5.4** — TTL data retention (operational necessity)
5. **5.3** — DateTime64 upgrade (correctness)
6. **5.6** — ZSTD compression (cost optimization)
7. **5.7** — Projections (performance polish)
8. **5.8** — Missing OTEL fields (completeness)
---
## Verification
For each feature: